Senior / Staff AI Research Engineer, Real-Time Inference
Location: Milpitas, CA
Department: AI
Why RoboForce
-
Develop and optimize inference pipelines for embodied AI models (VLA, perception, world models) targeting real-time execution on edge hardware such as NVIDIA Jetson platforms.
-
Implement CUDA-level optimizations including custom kernels, memory layout tuning, and hardware-aware graph compilation to minimize model latency.
-
Apply and advance model compression techniques — quantization (INT8/FP16/INT4), pruning, distillation, and structured sparsity — to achieve production-grade throughput on constrained devices.
-
Profile and debug end-to-end inference stacks using tools such as NSight, TensorRT, and Triton to identify and eliminate performance bottlenecks.
-
Collaborate with ML research and robotics teams to co-design model architectures that meet real-time control-loop latency requirements.
-
Establish benchmarking frameworks to evaluate model performance across latency, throughput, power consumption, and accuracy tradeoffs on target hardware.
-
Master's degree in Computer Science, Electrical Engineering, or related field with 4+ years of experience, or a PhD degree.
-
Deep expertise in CUDA programming, GPU architecture, and low-level kernel optimization, including custom kernel authoring with tools such as Triton.
-
Hands-on experience with model quantization, pruning, distillation, and deployment using frameworks such as TensorRT, ONNX Runtime, TVM, or Triton.
-
Proficiency in C++ and Python; strong systems programming and performance profiling skills.
-
Experience deploying ML models on edge or embedded hardware (e.g., NVIDIA Jetson, Orin, or equivalent ARM/GPU SoCs).
-
Requires 5 days/week in-office collaboration with the teams.
-
Familiarity with embodied AI models — VLA, multimodal transformers, or diffusion-based policies — and their inference characteristics.
-
Familiarity with compiler-based optimization pipelines such as XLA, torch.compile, or MLIR for graph-level model acceleration.
-
Understanding of robotics system constraints such as control-loop timing, sensor fusion latency, and memory bandwidth limits on edge SoCs.
-
Publication or production work in efficient deep learning or on-device ML systems.
-
Competitive stock options/equity programs.
-
Health, dental, and vision insurance, 401(k) plan.
-
Visa sponsorship and green card support for qualified candidates.
-
Lunches and dinners, a fully stocked kitchen, and regular team-building events.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
