Why do you charge job seekers to use EchoJobs?

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.

How many software engineering jobs are on EchoJobs?

We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!

So, where do the jobs come from?

We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.

What makes EchoJobs different?

We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️

How often are new jobs added?

Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀

How fast can I find a job?

Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯

How often should I check EchoJobs?

Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

Description

Senior Machine Learning Engineer - I (MLOps/LLMOps)

Location: United States (HQ)

Department: Software Engineering

Senior Machine Learning Engineer - I (MLOps/LLMOps)

As a Senior Machine Learning Engineer - MLOps/LLMOps, you will design, build, and scale production-grade infrastructure and platforms that enable the full lifecycle of ML and LLM systems. You'll architect robust pipelines for model training, evaluation, deployment, and monitoring while ensuring reliability, observability, and efficiency at scale. This role collaborates closely with ML Engineers, Data Scientists, and Product teams to operationalize AI/ML solutions from prototype to production. Remote candidates will be considered. Ability to participate with fellow ML staff in-office at the company HQ in Redwood City, CA when needed is preferred.

Responsibilities

Platform & Infrastructure

Design and implement scalable MLOps/LLMOps platforms supporting the full ML lifecycle: data versioning, model training, evaluation, deployment, and monitoring
Build and maintain CI/CD pipelines for ML models and LLM applications with automated testing, validation, and rollback capabilities
Develop infrastructure-as-code (IaC) for reproducible, version-controlled ML environments
Architect model serving infrastructure with auto-scaling, A/B testing, and canary deployment capabilities

LLM Operations

Build platforms for LLM fine-tuning, prompt management, and experimentation at scale
Implement evaluation frameworks for LLM performance, quality, safety, and cost optimization
Design and deploy enterprise-grade AI agents and copilots with robust monitoring and guardrails
Establish LLM observability: token usage tracking, latency monitoring, prompt/response logging, and cost attribution

Operational Excellence

Own uptime, reliability, and performance of ML/LLM services (SLIs/SLOs)
Implement comprehensive monitoring, alerting, and incident response for ML systems
Participate in on-call rotations and drive post-incident reviews to improve system resilience
Build automation and tooling to reduce toil and accelerate ML development velocity

Collaboration & Leadership

Partner with ML Engineers and Data Scientists to translate research into production-ready systems
Collaborate with platform and infrastructure teams on cloud architecture and resource optimization
Mentor team members on MLOps best practices, production ML patterns, and operational excellence
Drive technical decisions on tooling, frameworks, and architectural patterns

Required Qualifications and Skills

Education: B.S./M.S./Ph.D. in Computer Science, Engineering, or related technical field
Experience: 4+ years of software engineering experience with 2+ years focused on MLOps/LLMOps
MLOps Expertise:

Production experience with ML model serving frameworks (e.g., TensorFlow Serving, TorchServe, Triton)
Hands-on with ML experiment tracking and model registry tools (MLflow, Weights & Biases, Kubeflow)
Proficiency in workflow orchestration (Airflow, Prefect, Kubeflow Pipelines, Metaflow)

LLMOps Expertise:

Experience with LLM deployment, fine-tuning, and evaluation frameworks (e.g., vLLM, LangChain, LlamaIndex)
Knowledge of prompt engineering, RAG architectures, and LLM application patterns
Familiarity with LLM observability tools (e.g., LangSmith, Arize, WhyLabs)

Cloud & Infrastructure:

Strong experience with major cloud providers (AWS, GCP, or Azure) and ML-specific services (SageMaker, Vertex AI, Azure ML, Bedrock)
Proficiency in containerization (Docker, Kubernetes) and infrastructure-as-code (Terraform, CloudFormation, Pulumi)
Experience with microservices architecture and API development (REST, gRPC)

Software Engineering:

Strong programming skills in Python, terraform and Helm; familiarity with Go, Java, or Rust is a plus
Deep understanding of CI/CD practices and tools (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
Experience with monitoring and observability stacks (Prometheus, Grafana, DataDog, ELK)

Operational Excellence:

Track record of managing production systems with defined SLIs/SLOs
Experience with on-call rotations, incident management, and reliability engineering practices

Desired Qualifications and Skills

Experience building internal ML platforms or developer tooling used by multiple teams
Hands-on with distributed training frameworks (Ray, Horovod, DeepSpeed)
Knowledge of model optimization techniques (quantization, distillation, pruning)
Familiarity with feature stores (Feast, Tecton) and data versioning tools (DVC, LakeFS)
Understanding of ML security best practices, model governance, and compliance requirements
Experience with cost optimization and resource management for large-scale ML workloads
Contributions to open-source MLOps/LLMOps projects
Background in applied ML or data science with practical model development experience

About Us

Sumo Logic, Inc. helps make the digital world secure, fast, and reliable by unifying critical security and operational data through its Intelligent Operations Platform. Built to address the increasing complexity of modern cybersecurity and cloud operations challenges, we empower digital teams to move from reaction to readiness—combining agentic AI-powered SIEM and log analytics into a single platform to detect, investigate, and resolve modern challenges. Customers around the world rely on Sumo Logic for trusted insights to protect against security threats, ensure reliability, and gain powerful insights into their digital environments. For more information, visit www.sumologic.com.

Sumo Logic Privacy Policy. Employees will be responsible for complying with applicable federal privacy laws and regulations, as well as organizational policies related to data protection.

Compensation varies based on a variety of factors which include (but aren’t limited to) role level, skills and competencies, qualifications, knowledge, location, and experience. In addition to base pay, certain roles are eligible to participate in our bonus or commission plans, as well as our benefits offerings, and equity awards.

Must be authorized to work in the United States at time of hire and for duration of employment. At this time, we are not able to offer nonimmigrant visa sponsorship for this position.

Sensu

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Sensu

Senior Machine Learning Engineer (MLOps/LLMOps)

Senior Machine Learning Engineer - I (MLOps/LLMOps)

Senior Machine Learning Engineer - I (MLOps/LLMOps)

Responsibilities

Platform & Infrastructure

LLM Operations

Operational Excellence

Collaboration & Leadership

Required Qualifications and Skills

Desired Qualifications and Skills

About Us