Cartesia

Inference Engineer

San Francisco, CA
Python Machine Learning CUDA Triton vLLM SGLang
Description

Inference Engineer

Department: Research

Location: *HQ - San Francisco, CA

Compensation: $180K – $250K

Employment Type: FullTime

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

About the Role

We're hiring an Inference Engineer to advance our mission of building real-time multimodal intelligence.

Your Impact

  • Design and build low latency, scalable, and reliable model inference and serving stack for our cutting edge foundation models using Transformers, SSMs and hybrid models.

  • Work closely with our research team and product engineers to serve our suite of products in a fast, cost-effective, and reliable manner. 

  • Design and build robust inference infrastructure and monitoring for our products. 

  • Have significant autonomy to shape our products and directly impact how cutting-edge AI is applied across various devices and applications.

What You Bring

Given the scale and difficulty of problems we work on, we value strong engineering skills at Cartesia.

  • Strong engineering skills, comfortable navigating complex codebases and an eye for writing clean and maintainable code. 

  • Experience building large-scale distributed systems with high demands on performance, reliability, and observability.

  • Technical leadership with the ability to execute and deliver zero-to-one results amidst ambiguity. 

  • Background in or experience working on inference pipelines with machine learning and generative models.

  • Experience implementing state of the art Machine Learning models and research to applied problems.

  • Preferable: experience with vLLM, SGLang, Continuous Batching or other inference frameworks.

  • Preferable: experience working in CUDA, Triton or similar

What We Offer

🍽 Lunch, dinner and snacks at the office.

🏥 Fully covered medical, dental, and vision insurance for employees.

🏦 401(k).

✈️ Relocation and immigration support.

🦖 Your own personal Yoshi.

Our Culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together, and learning from each other every day.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality or design along the way.

🤝 We support each other. We have an open & inclusive culture that’s focused on giving everyone the resources they need to succeed.

Cartesia
Cartesia

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say