Lead Software Engineer, Model Serving Platform
Department: Engineering
Location: San Francisco
Compensation: $230K – $300K • Offers Equity
Employment Type: FullTime
Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications.
About the role
This is a rare chance to help architect and lead the development of Sciforium’s next-generation model serving platform, the high-performance engine that will bring a multimodal, highly efficient foundation model to market. As a senior technical leader, you’ll not only build core components yourself but also guide and mentor other engineers, influencing engineering direction, standards, and execution quality.
You will learn and shape the full AI stack: from GPU kernels and quantized execution paths to distributed serving, scheduling, and the APIs that power real-time AI applications. If you enjoy deep systems work, thrive on ownership, and want to lead engineers in building foundational AI infrastructure, this role puts you at the center of SciForium’s mission and growth.
What you'll do
Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution.
Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems.
Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes.
Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference.
Build Python APIs and services that expose model capabilities to downstream applications.
Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance.
Drive performance profiling, benchmarking, and observability across the inference stack.
Ensure high reliability and maintainability through testing, monitoring, and engineering best practices.
Troubleshoot and resolve complex issues across GPU, runtime, and service layers.
Ideal candidate profile
Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure.
Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache)
Experience with Kubernetes/Ray, Containerization
Strong proficiency in C++, Python.
Strong debugging, profiling, and performance optimization skills at the system level.
Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems.
Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality.
Comfortable working from the office and contributing to a fast-moving, high-ownership team culture.
Nice-to-have
Experience with ML systems engineering, distributed GPU scheduling, open source inference engine like vLLM, Sglang, or TRT-LLM
Experience in building large scale ML/MLOps infrastructure
Proficiency in CUDA or ROCm and experience with GPU profiling tools
Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team.
Familiarity with multimodal model architectures, raw-byte models, or efficient inference techniques.
Contributions to open-source ML or HPC infrastructure
Benefits include
Medical, dental, and vision insurance
401k plan
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity
Equal opportunity
Sciforium is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
