Perplexity AI

Engineering Manager, Inference

San Francisco, CA
Python PyTorch Rust C++ Kubernetes TensorFlow ONNX TensorRT vLLM CUDA Triton
Description

Engineering Manager (AI Inference)

Department: AI

Location: San Francisco

Compensation: $300K – $405K • Offers Equity

Employment Type: FullTime

About the Role

We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs, serving millions of users with state-of-the-art AI capabilities.

You will own the technical direction and execution of our inference systems while building and leading a world-class team of inference engineers. Our current stack includes Python, PyTorch, Rust, C++, and Kubernetes. You will help architect and scale the large-scale deployment of machine learning models behind Perplexity's Comet, Sonar, Search, Deep Research products.

Why Perplexity?

  • Build SOTA systems that are the fastest in the industry with cutting-edge technology

  • High-impact work on a smaller team with significant ownership and autonomy

  • Opportunity to build 0-to-1 infrastructure from scratch rather than maintaining legacy systems

  • Work on the full spectrum: reducing cost, scaling traffic, and pushing the boundaries of inference

  • Direct influence on technical roadmap and team culture at a rapidly growing company

Responsibilities

  • Lead and grow a high-performing team of AI inference engineers

  • Develop APIs for AI inference used by both internal and external customers

  • Architect and scale our inference infrastructure for reliability and efficiency

  • Benchmark and eliminate bottlenecks throughout our inference stack

  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models

  • Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc.

  • Improve the reliability and observability of our systems and lead incident response

  • Own technical decisions around batching, throughput, latency, and GPU utilization

  • Partner with ML research teams on model optimization and deployment

  • Recruit, mentor, and develop engineering talent

  • Establish team processes, engineering standards, and operational excellence

Qualifications

  • 5+ years of engineering experience with 2+ years in a technical leadership or management role

  • Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)

  • Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers

  • Experience with inference optimizations: batching, quantization, kernel fusion, FlashAttention

  • Familiarity with GPU characteristics, roofline models, and performance analysis

  • Experience deploying reliable, distributed, real-time systems at scale

  • Track record of building and leading high-performing engineering teams

  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism

  • Strong technical communication and cross-functional collaboration skills

Nice to Have

  • Experience with CUDA, Triton, or custom kernel development

  • Background in training infrastructure and RL workloads

  • Experience with Kubernetes and container orchestration at scale

  • Published work or contributions to inference optimization research

Perplexity AI
Perplexity AI

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say