AI/ML Lead - 41764
Location: Pune, India; Hyderabad, India
Department: ENGINEERING
Experience: 7-11 years
- Design, implement, and optimize end-to-end ML training workflows including infrastructure setup, orchestration, fine-tuning, deployment, and monitoring.
- Evaluate and integrate multi-cloud and single-cloud training options across AWS and other major platforms.
- Lead cluster configuration, orchestration design, environment customization, and scaling strategies.
- Compare and recommend hardware options (GPUs, TPUs, accelerators) based on performance, cost, and availability.
- At least 4-5 years in AI/ML infrastructure and large-scale training environments.
- Expert in AWS cloud services (EC2, S3, EKS, SageMaker, Batch, FSx, etc.) and familiar with Azure, GCP, and hybrid/multi-cloud setups.
- Strong knowledge of AI/ML training frameworks (PyTorch, TensorFlow, Hugging Face, DeepSpeed, Megatron, Ray, etc.).
- Proven experience with cluster orchestration tools (Kubernetes, Slurm, Ray, SageMaker, Kubeflow).
- Deep understanding of hardware architectures for AI workloads (NVIDIA, AMD, Intel Habana, TPU).
- Expert knowledge of inference optimization techniques including speculative decoding, KV cache optimization (MQA/GQA/PagedAttention), and dynamic batching.
- Deep understanding of prefill vs decode phases, memory-bound vs compute-bound operations.
- Experience with quantization methods (INT4/INT8, GPTQ, AWQ) and model parallelism strategies.
- Hands-on experience with production inference engines: vLLM, TensorRT-LLM, DeepSpeed-Inference, or TGI.
- Proficiency with serving frameworks: Triton Inference Server, KServe, or Ray Serve.
- Familiarity with kernel optimization libraries (FlashAttention, xFormers).
- Proven ability to optimize inference metrics: TTFT (first token latency), ITL (inter-token latency), and throughput.
- Experience profiling and resolving GPU memory bottlenecks and OOM issues.
- Knowledge of hardware-specific optimizations for modern GPU architectures (A100/H100).
- Drive end-to-end fine-tuning of LLMs, including model selection, dataset preparation/cleaning, tokenization, and evaluation with baseline metrics.
- Configure and execute fine-tuning experiments (LoRA, QLoRA, etc.) on large-scale compute setups, ensuring optimal hyperparameter tuning, logging, and checkpointing.
- Document fine-tuning outcomes by capturing performance metrics (losses, BERT/ROUGE scores, training time, resource utilization) and benchmark against baseline models.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
