Quantiphi

Senior MLOps Engineer

Bengaluru, India Mumbai, India
Deep Learning Shell Python PyTorch TensorFlow Kubernetes
Description

While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.


If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!


Required Experience: 3 to 6 Years
 

Roles and Responsibilities:

  • Design, deploy, and maintain distributed systems using Kubernetes and Slurm for optimal resource utilization and workload management.

  • Lead the configuration and optimization of Multi-GPU, Multi-Node Deep Learning job scheduling, ensuring efficient computation and data processing.

  • Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions.

  • Experience in working with On-prem NVIDIA GPU servers.

  • Develop and maintain complex shell scripts for various system automation tasks, enhancing efficiency and reducing manual intervention.

  • Monitor system performance, identify bottlenecks, and implement necessary adjustments to ensure high availability and reliability.

  • Troubleshoot and resolve technical issues related to the distributed system, job scheduling, and deep learning processes.

  • Stay updated with industry trends and emerging technologies in distributed systems, deep learning, and automation.

Skill Set Needed:

  • Strong communication and collaboration skills to work effectively within a cross-functional team.

  • Good with Python. 

  • Hands-on experience in MLOps - MLFlow, Kubeflow, AutoML etc.

  • Good to have at least one ML framework understanding - PyTorch / TensorFlow.

  • Experience in shell scripting./linux

  • Good understanding of logical networks. 

  • Understanding of NLP (preferred) / Computer Vision

  • Cloud native stack.

  • Proven experience in designing, deploying, and managing distributed systems, with a focus on Kubernetes and Slurm.

  • Sufficient understanding of AI Model Training and Deployment and Strong background in Multi-GPU, Multi-Node Deep Learning job scheduling and resource management.

  • Proficiency in Linux systems, particularly Ubuntu, and the ability to navigate and troubleshoot related issues.

  • Extensive experience creating complex shell scripts for automation and system orchestration.

  • Familiarity with continuous integration and deployment (CI/CD) processes.

  • Excellent problem-solving skills and the ability to diagnose and resolve technical issues promptly.

Good to Have:

  • Previously working on NVIDIA Ecosystem or well aware of NVIDIA Ecosystem - Triton Inference Server, CUDA, 

  • Good at Slurm, Kubernetes, Linux, and AI Deployment tools.
     

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

Quantiphi
Quantiphi
Artificial Intelligence (AI) Cloud Data Services InsurTech Machine Learning Software

0 applies

4 views

Other Jobs from Quantiphi

Senior Platform Engineer

Bengaluru, India Mumbai, India

Senior Tableau Developer

Bengaluru, India Mumbai, India

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say