Software Engineer, Inference
Department: Engineering
Location: San Francisco
Compensation: $150K – $230K • Offers Equity
Employment Type: FullTime
Overview
Pulse is tackling one of the most persistent challenges in data infrastructure: extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to document understanding that combines intelligent schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools consistently fail.
We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. We are backed by tier 1 investors and growing quickly.
What makes our tech special is our multi-stage architecture:
Layout understanding with specialized component detection models
Low-latency OCR models for targeted extraction
Advanced reading-order algorithms for complex structures
Proprietary table structure recognition and parsing
Fine-tuned vision-language models for charts, tables, and figures
If you are passionate about the intersection of computer vision, NLP, and data infrastructure, your work at Pulse will directly impact customers and shape the future of document intelligence.
What we are looking for
5 days in-office at our San Francisco office
Eager to learn and adapt quickly
Prior startup or founding experience is a plus
What we are looking for
5 days in-office at our San Francisco office
Eager to learn and adapt quickly
Prior startup or founding experience is a plus
About the Role
Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling across single-tenant and multi-tenant environments.
Responsibilities
Build inference services with smart batching and caching
Optimize kernels, tokenization, and model graphs
Evaluate vLLM, TensorRT LLM, and Triton tradeoffs
Implement autoscaling and admission control with clear SLOs
Own performance dashboards and capacity planning
Requirements
3+ years in performance engineering or ML systems
Strong Python, plus C++ or CUDA exposure
Experience with GPU profiling and model serving
Nice to have
Experience reducing p95 and cost in production ML systems
Sponsorship
Sponsorship available.
Compensation and benefits
Competitive base salary plus equity, performance-based bonus, relocation assistance for Bay Area moves, daily meal stipend, medical, vision, and dental coverage.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
