Machine Learning - Infrastructure
Department: Engineering
Location: San Francisco
Employment Type: FullTime
Our mission is general causal intelligence, AI that is capable of (1) predicting the future and (2) identifying the optimal actions to change that future.
To achieve this breakthrough, we are building a Large Physics foundation Model (LPM) because domains governed by physics have inherent cause and effect relationships, unlike visual or textual data.
Weather is the ideal training ground for an LPM. It is the most well-observed physical system, offering rapid, objective ground truth feedback from sensory observations and data at a scale that dwarfs what is used to train today’s LLMs.
Causal Labs is a team of researchers and engineers from self-driving, drug discovery, and robotics - including Google DeepMind, Cruise, Waymo, Meta, Nabla Bio, and Apple - who believe general causal intelligence will be the most important technical breakthrough for civilization.
We look for infrastructure engineers who are excited to tackle unsolved problems.
Our training and inference challenges demand deep expertise in setting up distributed training clusters and optimizing performance for large models. If you have experience building large-scale ML infrastructure in related fields such as language and vision models, robotics, biology -- join us on this mission.
Responsibilities
Design, deploy, and maintain large distributed ML training and inference clusters
Develop efficient, scalable end-to-end pipelines to manage petabyte-scale datasets and model training throughout the entire ML lifecycle
Research and test various training approaches including parallelization techniques and numerical precision trade-offs across different model scales
Analyze, profile and debug low-level GPU operations to optimize performance
Stay up-to-date on research to bring new ideas to work
What we’re looking for
We value a relentless approach to problem-solving, rapid execution, and the ability to quickly learn in unfamiliar domains.
Strong grasp of state-of-the-art techniques for optimizing training and inference workloads
Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models
Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings
Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)
Background working on distributed task management systems and scalable model serving & deployment architectures
Understanding of monitoring, logging, observability, and version control best practices for ML systems
You don’t have to meet every single requirement above.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
