Tech Lead — ASR / TTS / Speech LLM (IC + Mentor)
Team: Gen AI
Location: Boston
Commitment: Full-time
Workplace Type: hybrid
What You’ll Do
- Own the technical roadmap for STT/TTS/Speech LLM model training: from model selection → fine-tuning → deployment.
- Evaluate and benchmark open-source models (Parakeet, Whisper, etc.) using internal test sets for WER, latency, and entity accuracy.
- Design and review data pipelines for synthetic and real data generation (text selection, speaker selection. voice synthesis, noise/distortion augmentation).
- Architect and optimize training recipes (LoRA/adapters, RNN-T, multi-objective CTC + MWER).
- Lead integration with Triton Inference Server (TensorRT/FP16) and ensure K8s autoscaling for 1000+ concurrent streams.
- Implement Language Model biasing APIs, WFST grammars, and context biasing for domain accuracy.
- Guide evaluation cycles, drift monitoring, and model switcher/failover strategies.
- Mentor engineers on data curation, fine-tuning, and model serving best practices.
- Collaborate with backend/ML-ops for production readiness, observability, and health metrics.
Desired Skills
- Deep expertise in speech models (ASR, TTS, Speech LLM) and training frameworks (PyTorch, NeMo, ESPnet, Fairseq).
- Proven experience with streaming RNN-T / CTC architectures, LoRA/adapters, and TensorRT optimization.
- Telephony robustness: Codec augmentation (G.711 μ-law, Opus, packet loss/jitter), AGC/loudness norm, band-limit (300–3400 Hz), far-field/noise simulation.
- Strong understanding of telephony noise, codecs, and real-world audio variability.
- Experience in Speaker Diarization, turn detection model, smart voice activity detectionEvaluation: WER/latency curves, Entity-F1 (names/DOB/meds), confidence metrics.
- TTS : VITS/FastPitch/Glow-TTS/Grad-TTS/StyleTTS2, CosyVoice/NaturalSpeech-3 style transfer, BigVGAN/UnivNet vocoders, zero-shot cloning.
- Speech LLM: Model development and integration with Voice agent pipeline.
- Experience deploying models with Triton Inference Server, Kubernetes, and GPU scaling.
- Hands-on with evaluation metrics (WER, F1 on entities, latency p50/p95).
- Familiarity with LM biasing, WFST grammars, and context injection.
- Strong mentorship and code-review discipline.
Qualifications
- M.S. / Ph.D. in Computer Science, Speech Processing, or related field.
- 7–10 years of experience in applied ML, at least 3 in speech or multimodal AI.
- Track record of shipping production ASR/TTS models or inference systems at scale.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
