Inception Labs

Member of Technical Staff, Kernels

Bay Area
CUDA Python C++ Rust Go Docker Kubernetes PyTorch TensorFlow AWS GCP Azure DeepSpeed XLA Triton CuTe
Description

Member of Technical Staff, Kernels

Location: Bay Area

Department: AI Systems

Location Type: IN_OFFICE

Employment Type: FULL_TIME

The Role
We're looking for engineers and scientists to design, optimize, and maintain the compute foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training and serving large models possible.

Key Responsibilities
  • Design and implement custom ML kernels (CUDA, CuTe, Triton) for core dLLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU architectures.
  • Design compute primitives to reduce memory bandwidth bottlenecks and improve kernel efficiency.
  • Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.

Qualifications
  • BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience).
  • Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.
  • Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective.
  • Background in performance optimization and profiling of ML systems.
  • Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (XLA, TVM).
  • Familiarity with distributed training techniques (data parallel, model parallel, pipeline parallel).
  • Proficiency in Python and at least one systems programming language (C++/Rust/Go).
  • Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.

Preferred Skills
  • Experience building and maintaining large-scale language models with tens of billions of parameters or more.
  • Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure).
  • Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM.
  • Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA.
Inception Labs
Inception Labs

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say