Responsibilities
- Scaling Expertise: Design and implement strategies to efficiently scale machine learning models across diverse hardware platforms (GPU/TPU).
- Performance Optimisation: Analyse and profile ML systems under heavy load, pinpointing bottlenecks, and implementing targeted optimisations.
- Distributed Systems Architecture: Create robust distributed training and inference solutions for maximum computational efficiency.
- Algorithmic Optimisation: Research and understand the latest deep learning literature to implement and optimise state-of-the-art algorithms and architectures, ensuring compute efficiency and performance.
- Low-Level Mastery: Write high-quality Python, C/C++, XLA, Pallas, Triton, and/or CUDA code to achieve performance breakthroughs.
Required Skills
- Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques
- Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.)
- Expertise with Python and/or C/C++
- Development with machine learning frameworks (JAX, Tensorflow, PyTorch etc.)
- Passion for profiling, identifying bottlenecks, and delivering efficient solutions.
Highly Desirable
- Track record of successfully scaling ML models.
- Experience writing custom CUDA kernels or XLA operations.
- Understanding of GPU/TPU architectures and their implications for efficient ML systems.
- Fundamentals of modern Deep Learning
- Actively following ML trends and a desire to push boundaries.
Example Projects:
- Profile algorithm traces, identifying opportunities for custom XLA operations and CUDA kernel development.
- Implement and apply SOTA architectures (MAMBA, Griffin, Hyena) to research and applied projects.
- Adapt algorithms for large-scale distributed architectures across HPC clusters.
- Employ memory-efficient techniques within models for increased parameter counts and longer context lengths.
What We Offer:
- Real-World Impact: Directly contribute to the performance and reach of our AI solutions.
- Cutting-Edge Challenges: Tackle complex problems at the forefront of machine learning and large-scale system design.
- Growth-Oriented Environment: Expand your expertise in a team of talented engineers dedicated to advancing ML scalability.
Other Jobs from InstaDeep
Senior Machine Learning Engineer - Team Lead
Lead of Machine Learning Systems, Scale and Performance
Similar Jobs
Sr Data Scientist
Senior ML Engineer
Software Engineer - ML Infrastructure
Applied AI Engineer
Principal AI Architect (RapidScale)
AI Data Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say