High Performance Computing Software Engineer - Supercomputing
Team: Engineering
Location: Abu Dhabi
Commitment: Full-time
Workplace Type: onsite
Job Responsibilities
- Design and implement high-performance, distributed software solutions for large-scale AI/ML training.
- Optimize low-level system components including Linux kernel, GPU/accelerator kernels, and interconnects.
- Develop and tune communication libraries such as NCCL, MPI, UCX, RCCL, and RDMA-based systems.
- Partner with ML researchers and engineers to support frameworks like PyTorch, MegatronLM, and DeepSpeed in large-scale production environments.
- Contribute to our scheduling, orchestration, and job management systems, including Slurm and Kubernetes.
- Debug and resolve complex issues across the stack—from kernel to container to model.
- Work closely with hardware vendors, upstream open-source communities, and internal teams to drive performance and reliability improvements.
Skills & Experience
- Proven experience developing and optimizing software for large-scale ML workloads (1000+ GPUs preferred).
- Deep understanding of Linux kernel internals and accelerator (GPU) kernel development.
- Proficiency with distributed communication libraries (e.g., NCCL, RCCL, MPI, UCX, SHARP, Libfabric).
- Experience with ML frameworks like PyTorch, TensorFlow, JAX, or MegatronLM.
- Strong knowledge of HPC job scheduling and orchestration tools (e.g., Slurm, Kubernetes, Pyxis).
- Excellent debugging and systems performance tuning skills.
- A collaborative mindset with a focus on shared success and technical excellence.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
