MLOps Platform Engineer
Job Summary
ZestyAI is on a mission to revolutionize how the world understands and manages climate and property risk. By combining cutting-edge artificial intelligence with property-specific data, we empower our customers to make smarter, data-driven risk decisions that protect families, communities, and their financial well-being. Our innovative predictive models provide highly accurate, property-level insights, helping clients anticipate and mitigate risks related to climate events like wildfires and severe storms. Join us, and be part of a team that’s redefining the future of property risk assessment.
We’re seeking a skilled and adaptable MLOps Engineer to join our Platform team, building the scalable, AI-focused infrastructure that powers Zesty’s cloud-based machine learning solutions. This role involves close collaboration with our Machine Learning teams to develop and deploy training pipelines, automate workflows, and ensure high-performance, reliable model deployment. As a key contributor, you’ll foster a culture of automation, efficiency, and innovation within the Data Science and Machine Learning teams, driving advancements in backend systems, cloud infrastructure, and ML frameworks like PyTorch. If you’re passionate about creating impactful AI-driven solutions and thrive in a dynamic, collaborative environment, this role is a great fit.
Responsibilities
- Workflow Optimization: Collaborate with ML engineers and data scientists to deploy models, optimize inference latency, and efficiently manage cloud resources.
- Platform Engineering: Design, build, maintain, and support the core infrastructure underlying our application platform, enabling a seamless ML workflow from training to production.
- CI/CD and Deployment Automation: Implement and maintain CI/CD pipelines and automated deployment processes, promoting consistency and scalability across environments.
- Real-Time Monitoring: Develop monitoring and logging solutions for tracking model performance, system health, and data quality. Proactively detect issues such as model drift or degradation.
- System Optimization and Cost Efficiency: Fine-tune infrastructure settings to support large-scale ML workloads, prioritizing resource and cost efficiency through GCP’s autoscaling and optimization capabilities.
- Documentation and Best Practices: Maintain detailed documentation of processes and best practices for ML operations, enabling efficient collaboration and knowledge sharing.
- Cross-Functional Collaboration: Work closely with software engineering, platform engineering, and product teams to design, enhance, and streamline tools, infrastructure, and ML workflows.
Required Skills
- Technical Stack Proficiency: Strong experience in Python, Docker, and building CI/CD pipelines (e.g., GitHub Actions) to automate and streamline development workflows.
- Machine Learning Tools: Proficiency with PyTorch, Torch Compile, and familiarity with MLOps tools, with experience in ML training and deployment workflows.
- ML Lifecycle Management Knowledge: Understanding of best practices for model retraining, governance, and production ML lifecycle management.
- Adaptability to New Architectures: Able to adapt to and support new and evolving ML model architectures, ensuring seamless integration into existing workflows.
- Cloud Infrastructure Familiarity: Experienced in using GCP to support ML operations, including application logging, debugging, setting up and maintaining VMs, and optimizing resources and costs within the cloud environment.
- Database Skills: Familiar with building data pipelines that interact with BigQuery or PostgreSQL, capable of writing SQL queries, and experienced in handling data workflows within these databases.
- Container Orchestration Knowledge: Strong understanding of container orchestration using Kubernetes to manage and scale ML workflows effectively.
- Observability for ML Workflows: Skilled in setting up monitoring, logging, and alerting specific to ML models in production, including metrics for drift detection and accuracy monitoring.
- Experiment Tracking and Data Versioning: Experience with tools like MLflow or DVC for tracking experiments and managing data and model versions over time.
- Problem-Solving Skills: Ability to troubleshoot complex technical issues and implement robust solutions with ML teams.
- Collaboration and Communication: Strong communication skills to work effectively with cross-functional teams, especially within DSML.
Nice to Have
- Technical Skills: Familiarity with Golang, and experience with Triton Inference Server or other model serving frameworks (e.g., TensorFlow Serving, ONNX).
- Computer Vision and Risk Modeling: Experience with computer vision techniques and risk modeling, particularly in the context of property or climate risk assessment.
- Relevant Experience: Prior experience in a similar MLOps-focused platform role or in supporting ML model deployment and lifecycle management.
Why Join Zesty?
At Zesty, we’re committed to fostering a supportive and innovative environment where team members can grow and succeed. We offer competitive compensation, a flexible, fully remote work schedule, and a collaborative culture that values individual perspectives and talents. Join us in advancing AI-driven solutions that make a real impact.
0 applies
3 views
Similar Jobs
Senior Machine Learning Engineer
Core AI Engineer
Staff Backend Software Engineer | GenAI
Senior Software Engineer, Back End - GenAI (Enterprise Platforms Technology)
Senior Machine Learning Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say