Responsibilities
- Algorithmic Optimisation: Research and understand the latest deep learning literature to implement and optimise state-of-the-art algorithms and architectures, ensuring compute efficiency and performance.
- Scaling Expertise: Design and implement strategies to efficiently scale machine learning models across different accelerator platforms (GPU/TPU).
- Performance Optimisation: Analyse and profile ML systems under heavy load, pinpointing bottlenecks and implementing targeted optimisations.
- Distributed Systems Architecture: Create robust distributed training and inference solutions for maximum computational efficiency.
- Low-Level Mastery: Understand how to take advantage of the underlying hardware. Although not a prerequisite, be comfortable working with technologies like C/C++, XLA, Pallas, Triton, and/or CUDA code to achieve performance breakthroughs.
Required Skills
- 2+ years of work experience.
- Expertise in Python.
- Experience working with Linux systems.
- Experience with Docker and container orchestration.
- Experience with at least one modern machine learning framework (JAX, Tensorflow, PyTorch, etc.)
- Experience with software profiling, identifying bottlenecks, and delivering efficient solutions.
Highly desirable
- Experience working with JAX and packages within the JAX ecosystem.
- Track record of successfully building and scaling ML models.
- Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.)
- Experience working with HPC clusters and distributing programs over multiple hosts.
- Understanding of GPU/TPU architectures and their implications for efficient ML systems.
- Experience using statically typed languages like C, C++, etc.
- Fundamentals of modern Deep Learning and experience with Reinforcement Learning.
- Actively following ML trends.
What we offer
- Real-World Impact: Directly contribute to the performance and reach of our AI solutions.
- Cutting-Edge Challenges: Tackle complex problems at the forefront of machine learning and large-scale system design.
- Growth-Oriented Environment: Expand your expertise with a team of talented engineers dedicated to advancing ML scalability.
Other Jobs from InstaDeep
Senior DevOps Engineer
Research Engineer
Software Engineer intern
Dev Ops / ML Ops Intern
AI Research Intern
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say