Senior AI Engineer, Agentic Systems
Location: Americas - Remote, US - Remote, Canada - Remote
Department: Engineering
Location Type: REMOTE
Employment Type: FULL_TIME
Company Overview
Role Overview
Impact You'll Drive
- Ship agentic features that move core product KPIs with measurable quality and latency targets.
- Establish evaluation gates and on-call reliability for AI systems that handle real users and revenue.
- Reduce cost-to-serve via model routing, KV cache reuse, and retrieval quality improvements.
Key Responsibilities
- Architect and build stateful, graph-based agent workflows with tool use, planning, and memory.
- Integrate LLMs and multimodal models via structured I/O (JSON Schema, Pydantic validators) and function/tool calling.
- Build high-reliability APIs and streaming services for real-time inference, speech, and vision.
- Own production readiness: tracing, logging, metrics, rate limiting, circuit breakers, and SLOs.
- Stand up eval pipelines: offline golden sets, LLM-as-judge with human rubrics, online A/B, and regression tests in CI.
- Implement retrieval and memory: hybrid search, vector and graph retrieval, semantic caches, and long-horizon context.
- Optimize cost/latency: model routing, prompt and tool selection, quantization, and KV cache/prefill strategies.
- Lead cloud-native deployments on Kubernetes with GPU autoscaling, canary/shadow releases, and feature flags.
- Partner cross-functionally to translate research into robust production systems and iterate quickly behind evaluation gates.
- Mentor engineers through code reviews, design docs, and architecture decisions.
Must-Have Qualifications
- 2+ years building agentic AI systems; 4+ years building production backends or ML systems in Python, Go, or similar.
- Fluency with agentic orchestration (e.g., LangGraph, PydanticAI, DSPy, LlamaIndex) and tool/function calling.
- Experience integrating frontier LLMs and multimodal models via managed APIs or self-hosted serving.
- Deep understanding of model serving and inference optimization (vLLM/Triton/TGI/SGLang, batching, KV cache reuse).
- Strong with API design and backend frameworks (FastAPI, Flask) and event-driven architectures.
- Data systems expertise with PostgreSQL and Redis, including caching, token streaming, and throughput tuning.
- Retrieval and memory: vector databases (pgvector, Pinecone, Weaviate, Milvus), hybrid search, and graph/knowledge storage.
- Production evals: LLM-as-judge, human-in-the-loop, rubric design, and CI-integrated regression tests.
- Observability and SRE: OpenTelemetry traces, metrics, structured logs, SLOs, dashboards, and on-call triage.
- Cloud-native delivery: Kubernetes, Terraform, Docker, GPU scheduling/autoscaling on AWS or GCP.
- CI/CD proficiency with GitHub Actions and test automation for prompts, tools, and agents.
- Clear, concise communication and high ownership in fast-paced environments.
Nice-to-Have Qualifications
- Real-time multimodal systems: streaming ASR, low-latency TTS, WebRTC, and vision pipelines.
- Post-training/fine-tuning: DPO/ORPO, RLHF, preference data generation, and safety alignment.
- RAG expertise beyond basics: Graph RAG, multi-hop retrieval, rerankers, query planning, and freshness policies.
- Safety and governance: policy-as-code, red-teaming, PII handling, audit logs, and role-based tool authorization.
- Regulated data experience (HIPAA, SOC 2, GDPR) and data residency controls.
- Personalization at inference time, long-term memory agents, session state, and episodic memory stores.
- Experience with consumer-scale AI apps, high-traffic systems, or on-device/edge acceleration (WebGPU).
Example Tech You'll Touch
- Orchestration: LangGraph, PydanticAI, DSPy, LlamaIndex
- Serving: vLLM, Triton, TGI, SGLang; OpenAI/Anthropic-compatible APIs
- Backend: Python, Go, FastAPI, gRPC, Kafka/PubSub
- Data: PostgreSQL, Redis, pgvector, Pinecone/Milvus/Weaviate
- Observability: OpenTelemetry, Prometheus, Grafana, Sentry
- Infra: Kubernetes, Terraform, Docker, GPU operators, Karpenter/Cluster Autoscaler
- Evals & QA: RAGAS/DeepEval-style frameworks, golden sets, canary/shadow testing
How We Build
- Evaluation-driven development: every change to prompts, tools, routing, or retrieval passes automated eval gates.
- Structured outputs by default: JSON Schema/Pydantic validation, strict tool contracts, and idempotent handlers.
- Safety-first tooling: guardrails, content and data policies, tool sandboxing with timeouts and scopes.
- Pragmatic iteration: short cycles, feature flags, shadow traffic, and fast rollback.
Success in 90 Days
- Launch a production agentic workflow with clear SLOs, tracing, and dashboards.
- Stand up an eval harness with golden sets and CI gates for the top use case.
- Improve latency and cost with routing and KV cache strategies while maintaining quality.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
