Fundamental

Model Serving Engineer

Remote
Python Triton Inference Server Machine Learning TensorFlow TorchServe ONNX Runtime vLLM XGBoost CatBoost Prometheus Grafana Datadog Kubernetes CUDA
Description

Model Serving Engineer

Department: Engineering

Location: Europe

Employment Type: FullTime

About Fundamental

Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS โ€“ the world's most powerful Large Tabular Model (LTM) โ€“ purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict.

At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.

About the role

We are looking for a Model Serving Engineer to own the production inference layer for NEXUS, our Large Tabular Model. You will be responsible for serving models reliably and efficiently at scale, working primarily with Triton Inference Server and building the infrastructure that brings our research directly to customers. This is a deeply technical, Python-heavy role that sits at the intersection of systems engineering and applied ML.

You will work closely with our research and engineering teams to translate model outputs into production-grade inference pipelines that meet strict latency and throughput requirements.


Key responsibilities

  • Design, build, and maintain production model serving infrastructure using Triton Inference Server as the primary framework

  • Implement and optimize inference pipelines including custom backends, dynamic batching strategies, and model ensemble configurations in Triton

  • Optimize Python inference code for performance, with a strong focus on GIL contention, multi-threading, and concurrency patterns

  • Tune throughput and latency across the full serving stack, batching policies, thread pool sizing, model instance groups, and memory layout

  • Work closely with the research team to understand new model architectures at a computational level, batching behavior, dynamic shapes, memory access patterns etc

  • Own the full resource observability and control loop for production inference - instrument GPU memory, CPU, batch queue depth, and latency metrics, and actively tune model instance groups, concurrency limits, memory budgets, and batching configuration in response to observed behavior

  • Evaluate and integrate alternative inference frameworks and runtimes as the model ecosystem evolves

  • Contribute to GPU utilization improvements and resource efficiency across the serving fleet

Must have

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)

  • 5+ years of experience in model serving, ML infrastructure, or a closely related backend engineering role

  • Deep, production-level experience with Triton Inference Server, including custom Python backends, batching configuration, and model repository management

  • Expert-level Python skills with a thorough understanding of the GIL, multi-threading, multiprocessing, and async concurrency patterns

  • Strong understanding of neural network inference mechanics, forward passes, batching strategies, memory management, and numerical precision tradeoffs

  • Hands-on experience with other inference frameworks (TorchServe, TensorFlow Serving, ONNX Runtime, vLLM, etc.) and the ability to evaluate tradeoffs between them

  • Experience profiling and optimizing inference code for latency and throughput at production scale

Nice to have

  • Experience with GPU kernel-level optimizations or CUDA profiling tools

  • Familiarity with model quantization, pruning, or compilation toolchains (TensorRT, torch.compile, ONNX)

  • Experience with KServe or other Kubernetes-native serving platforms

  • Experience serving tabular or structured data models, including classical ML models such as XGBoost and CatBoost

  • Experience with observability tooling such as Prometheus, Grafana, or Datadog in the context of inference monitoring

Benefits

  • Competitive compensation with salary and equity

  • Comprehensive health coverage, including medical, dental, vision, and 401K

  • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys

  • Relocation support for employees moving to join the team in one of our office locations

  • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action

Fundamental
Fundamental

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

๐Ÿฅณ๐Ÿฅณ๐Ÿฅณ 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineersโ€ฆ in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. ๐Ÿ› ๏ธ
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. ๐Ÿš€
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. ๐ŸŽฏ
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. ๐Ÿ“…

What Fellow Engineers Say