ML Ops Engineer
Location: San Francisco
Department: Engineering
Location Type: IN_OFFICE
Employment Type: FULL_TIME
Who We Are
The Role
What You’ll Do
- Own and evolve the deployment lifecycle for our perception systems across edge and cloud environments.
- Design and manage highly available ML serving infrastructure, ensuring high performance, low-latency inference, and reliability in production.
- Build resilient CI/CD pipelines for testing and pushing system updates with confidence and comprehensive fleet observability.
- Implement and manage remote system monitoring, alerting (e.g., Prometheus, Grafana, Sentry), and debugging systems to ensure operational excellence, focusing on fleet health metrics (e.g., uptime, resource utilization, inference latency).
- Work closely with perception and backend teams to design deployable systems that are robust in the real world.
- Integrate and maintain experiment tracking and model management platforms (e.g., Weights & Biases, MLflow) to streamline model lineage, performance comparison, and versioning from research to production.
- Contribute to security policy design and device authentication/attestation infrastructure for fleet safety.
- Build and maintain internal tooling and CLI utilities to streamline the end-to-end development-to-deployment workflow, empowering the broader engineering team to ship perception systems with high velocity and minimal friction.
What You Bring
- 3-5+ years experience in DevOps, deployment engineering, or site reliability, ideally with production ML systems or robotics.
- Deep operational experience with Linux system administration, system packaging (e.g., Deb/RPM), and configuration management tools (e.g., Ansible, SaltStack, Chef).
- Strong experience with ML deployment/serving frameworks and infrastructure (e.g., PyTorch Serve, custom C++ inference services).
- Comfortable working in Linux-heavy environments with advanced shell scripting and strong knowledge of operating system internals.
- Hands-on experience with networking fundamentals, including TCP/IP, firewalls, NAT traversal, and VPNs.
- Prior experience with managing large-scale edge fleets, including over-the-air (OTA) updates and blue-green deployment strategies.
- A proven track record of developing internal developer tools or CLI applications that automate complex infrastructure tasks and improve overall team productivity.
Nice to Have
- Experience deploying AI/ML inference pipelines on bare-metal or virtualized edge hardware (e.g., using GStreamer/Deepstream pipelines, custom executables).
- Expertise in machine learning inference engineering, including quantization and compilation (e.g., using ONNX Runtime, TensorRT), for efficient deployment to various edge hardware targets (e.g., NVIDIA Jetson, custom ARM SoCs).
- Familiarity with writing or debugging high-performance, low-latency ML inference services in C++.
- Exposure to remote logging, log ingestion, and distributed telemetry aggregation.
- Previous experience in early-stage startups or fast-paced hardware/software integration environments.
Why Sauron
- We celebrate as a team and troubleshoot as a team.
- The goal is the mission, not the credit.
- Be ruthless with problems, but kind to people.
- Raise the bar, lower the shield
- Your perspective is a requirement, not a suggestion.
- Speak the hard truths early so we can fix them fast.
- Do what you say you’ll do.
- If it breaks, fix it. If it works, make it better.
- Earn trust through empathy and consistency.
- Anticipate needs before they become requests.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
