AI Systems & Inference Frameworks Engineer
Department: Platform
Location: San Francisco, New York, United States, Canada
Employment Type: FullTime
About Us
Most AI is frozen in place - it doesn't adapt to the world. We think that's backwards. Our mandate is to build efficient intelligence that evolves in real-time. Our vision is AI systems that are flexible, personalized, and accessible to everyone. We believe efficiency is what makes this possible - it's how we expand access and ensure innovation benefits the many, not the few. We believe in talent density: bringing together the best and most driven individuals to push the boundaries of continual adaptation. We're looking for builders and creative thinkers ready to shape the next era of intelligence.
The Role
You’ll work directly with our founders to design and build the inference and optimization systems that power our core product. This role bridges research and production, combining deep exploration of inference techniques with hands-on ownership of scalable, high-performance serving infrastructure. You’ll own the full lifecycle of LLM inference—from experimentation and performance analysis to deployment and iteration in production—thriving in a zero-to-one environment and helping define the technical foundations of our inference stack.
Responsibilities
Inference Research & Systems: design and build our LLM inference stack from zero to one, exploring and implementing advanced techniques for low-latency, high-throughput serving of language and multimodal models.
Frameworks & Optimization: develop and optimize inference using modern frameworks (e.g., vLLM, SGLang, TensorRT-LLM), experimenting with batching strategies, KV-cache management, parallelism, and GPU utilization to push performance and cost efficiency.
Software–Hardware Co-Design: collaborate closely with founders and model developers to analyze bottlenecks across the stack, co-optimizing model execution, infrastructure, and deployment pipelines.
Qualifications
Strong experience building and optimizing LLM inference systems in production or research environments
Hands-on expertise with inference frameworks such as vLLM, SGLang, TensorRT-LLM, or similar
Deep performance mindset with experience in GPU-backed systems, latency/throughput optimization, and resource efficiency
Solid understanding of transformer inference, serving architectures, and KV-cache–based execution
Strong programming skills in Python; experience with CUDA, Triton, or C++ a plus
Comfort working in ambiguous, zero-to-one environments and driving research ideas into production systems
Nice to have: experience with model quantization or pruning, speculative decoding, multimodal inference, open-source contributions, or prior work in systems or ML research labs
Above all, we're looking for great teammates who make work feel lighter and aren't afraid to go out on a limb with bold ideas. You don't need to be perfect, but you do need to be adaptable. We encourage you to apply, even if you don't check every box.
Benefits
Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.
Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
Well-Being: Comprehensive medical benefits and generous paid time off.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
