- Lead multi-disciplinary teams to develop solutions for large scale training systems. Assess trade-offs of various solutions and make pragmatic decisions.
- Ensure timely milestone delivery with teamwork and close collaboration
- Responsible for the overall performance of the communication system, including performance benchmarking, monitoring and troubleshooting production issues.
- Defining technical vision and driving a multi-year roadmap to make progress towards the related objectives.
- Work with cross functional teams and provide guidance on the AI network architecture including topologies, transport, congestion control techniques.
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
- Experience with developing, evaluating and debugging host networking protocols such as RDMA.
- 10+ years of experience in designing, deploying and operating networks.
- Experience with triaging performance issues in complex scale-out distributed applications.
- Experience with developing communication libraries, such as MPI, NCCL, and UCX.
- Understanding of AI training workloads and demands they exert on networks.
- Understanding of RDMA congestion control mechanisms on IB and RoCE Networks.
- Understanding of the latest artificial intelligence (AI) technologies.
- Experience with machine learning frameworks such as PyTorch and TensorFlow
- Experience in developing systems software in languages like C++
Other Jobs from Meta
Software Engineer
Software Engineer, Machine Learning
Software Engineer
Software Engineer, Systems
Mechanical Engineer - Research
Similar Jobs
Data Scientist - Intern
Senior Engineer I - Knowledge Graph ML Engineer - Gen AI and NLP
Principal Machine Learning Engineer (URL Filtering Data Science)
Machine Learning Research Engineer, Agent Applications
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say