Senior AI Platform Engineer
Location: Remote
Department: Engineering
Location Type: REMOTE
Employment Type: FULL_TIME
- Build and operate core backend services that power AI-enabled workflows (APIs, orchestration, storage, and internal integrations)
- Design scalable data models and registries for versioned artifacts and metadata, with strong traceability and auditability
- Implement secure-by-default service patterns: authN/authZ, audit logs, secrets handling, and least-privilege access
- Build reliability foundations: observability, metrics, tracing, alerting, SLOs, incident response playbooks
- Implement idempotent APIs and state-handling patterns for resilient workflows (retries, partial failure, reconciliation)
- Create integration adapters and event-driven plumbing to safely connect workflows to internal systems
- Establish release and deployment practices: CI/CD pipelines, environment promotion, rollback strategies, and safe migrations
- Partner closely with AI and application engineers to define interfaces, validation layers, and operational constraints
- Identify performance, scalability, and security risks early and ship pragmatic solutions quickly
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)
- U.S. Citizenship and the ability to obtain and maintain a U.S. Government security clearance
- 6+ years of experience building and operating backend/platform systems in production
- 3+ years building platforms that support AI/ML systems in production (e.g., evaluation pipelines, model/app runtime infrastructure, artifact/metadata registries, AI workflow orchestration, MLOps)
- Experience operating LLM-enabled systems with production constraints (latency, cost, reliability), including monitoring quality/regressions and enforcing safe tool/data access
- Strong proficiency in one or more backend languages (e.g., Go, Java, Python, C++) and modern infrastructure practices
- Experience designing APIs, data models, and distributed systems with reliability and security best practices
- Experience with workflow/event systems (queues, pub/sub, orchestration, idempotency, state machines) used to run multi-step AI-driven pipelines
- Experience implementing observability for AI systems (metrics, tracing, logs) including quality/reliability signals beyond uptime (e.g., eval scores, rejection rates, cost/latency budgets)
- Experience with production security controls: RBAC/ABAC, audit logs, secrets management, data access boundaries
- Strong communication and documentation skills
- Experience with AI/ML infrastructure (model serving, inference gateways, feature/data pipelines, experiment tracking, artifact registries)
- Experience with GPU-aware infrastructure and/or high-throughput inference (capacity planning, batching, caching, rate limiting)
- Experience building evaluation platforms (offline/online evals, canaries, A/B testing, regression automation, dataset/version management)
- Familiarity with AI safety/security patterns (prompt injection mitigation, tool sandboxing, policy enforcement, data-loss prevention)
- Experience building internal platforms used by multiple teams with clear contracts, SLAs/SLOs, and well-managed migrations
- Experience with multi-tenant or role-based access control systems
- Remote (U.S.) with the option to be based at our Headquarters in San Diego, CA We welcome candidates who are local or open to relocating; relocation assistance is available and may be included in the offer package where appropriate
- Ability to travel up to 10%; may be required for team collaboration, field testing, or customer support
- We offer comprehensive medical, dental, and visions plans
- 401(k) Retirement Savings Plan to invest in your long-term retirement goals
- Equity grants for new hires
- Unlimited PTO
- Extremely generous company holiday calendar, including a holiday hiatus in November, & December
- Generous Parental Leave
- Lifestyle Spending Account
- FSA
- DCFSA
- HSA
- Hospital Indemnity insurance
- Critical Illness insurance
- Accident insurance
- Basic Life/AD&D, short-term and long-term disability insurance, 100% covered by Firestorm. Plus, the option to purchase additional life insurance for you and your family.
- Mental Health Resources: We provide free mental health resources 24/7 including therapy and more. Additional work-life services, such as free legal and financial support, are available to you as well
Export Control Compliance
Equal Opportunity Statement
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
