Technical Lead ( AI )
Location: Bengaluru, India
Department: Product Management
Experience: 8-12
- Implement and evolve features across the GenAI reasoning and agentic AI platform in alignment with the reference architecture defined by the AI Architect.
- Own the technical design and delivery of your squad’s workstreams — including data flow, API contracts, component boundaries, and integration points with upstream/downstream systems.
- Write and review production-quality Python code; enforce code standards, conduct PR reviews, and maintain a clean, well-tested codebase.
- Participate in build-vs-buy-vs-open-source evaluations for your domain; provide structured trade-off analysis to the AI Architect for final decision.
- Contribute to Architecture Decision Records (ADRs) for decisions within your scope; review ADRs from peer squads.
- Build and ship agent workflows (ReAct, planner–executor, reflection, human-in-the-loop checkpoints) using the chosen orchestration framework (LangGraph / LlamaIndex Workflows / Semantic Kernel or equivalent).
- Implement tool and MCP (Model Context Protocol) integrations for connecting agents to enterprise systems (CRM, ITSM, REST/GraphQL APIs) with scoped auth (OAuth2/OBO), sandboxing, and rate limiting.
- Own agent evaluation coverage for your squad: task-completion benchmarks, tool-call success rates, trajectory evals, and cost-per-task regression tests.
- Implement agent memory patterns (short-term conversation context, long-term episodic/semantic memory) with TTL and retrieval policies defined in collaboration with the AI Architect.
- Implement and iterate on RAG pipelines: hybrid retrieval (BM25 + dense + rerankers), query decomposition, GraphRAG for knowledge-graph workloads, and citation grounding.
- Execute on the fine-tune vs. prompt vs. in-context-learning decision set by the AI Architect; own fine-tuning runs (LoRA/QLoRA/PEFT) end-to-end when required.
- Apply context engineering patterns: prompt contracts, structured outputs (JSON Schema, Pydantic, function-calling), constrained decoding, output validators, and fallback chains.
- Implement grounding and hallucination-mitigation mechanisms: retrieval confidence scoring, “I don’t know” paths, claim-level validators.
- Where the product roadmap requires it, build VLM-first pipelines (GPT-4V class, Gemini, Qwen-VL) and integrate CV outputs (OCR, document understanding, layout parsing) as tools inside the agentic platform.
- Run benchmarks on open-weight VLMs (Qwen-VL, InternVL) vs. closed VLM APIs for your squad’s use cases; report findings with cost-latency-accuracy data.
- Instrument telemetry for your squad’s features: token cost, latency p50/p95, tool-call success rates, retrieval hit rates, cache hit rates — per-tenant where applicable.
- Maintain continuous evaluation coverage: hallucination, faithfulness, jailbreak resistance, and drift detection for all features in your scope; treat prompts as code (versioned, reviewed, rolled back).
- Operate and extend the LLMOps toolchain (LangSmith / Langfuse / Arize / Ragas / DeepEval / Promptfoo) as established by the platform team.
- Implement AI security controls for your squad’s features: prompt-injection defense, PII detection and redaction, RBAC on agent tools, and secrets handling for tool auth.
- Ensure features in your scope meet compliance requirements (SOC 2 Type II, GDPR, DPDP) as defined by the platform governance team.
- Apply responsible AI guardrails: topic filters, PII egress controls, usage-policy enforcement at gateway level.
- Profile and optimise inference for your squad’s workloads: batching, KV-cache utilisation, speculative decoding, and quantisation (AWQ, GPTQ, INT4) under guidance from the AI Architect.
- Make SLM-vs-LLM routing recommendations for individual capabilities (e.g., Phi-3.5 / Qwen-2.5-7B vs. frontier API) with supporting cost-latency-quality data.
- Implement and instrument caching layers (prompt cache, semantic cache, retrieval cache) with invalidation policies.
- Mentor 3–5 engineers (senior and mid-level) through design reviews, PR feedback, pair programming, and career conversations.
- Run squad-level design sessions; surface cross-cutting concerns to the AI Architect and Engineering Manager proactively.
- Partner with Product, SRE, and Security on sprint-level execution, incident triage, and feature flag / release decisions.
- Participate as a technical interviewer for senior-engineer and specialist roles on the AI team.
- 8+ years of software engineering experience, with 3+ years shipping production AI/ML systems and 1+ year delivering LLM-powered applications (RAG, agents, fine-tuning) at meaningful scale.
- Hands-on proficiency in at least one agent orchestration framework (LangGraph, Semantic Kernel, CrewAI, AutoGen, DSPy) — with clear opinions on their limitations.
- Production experience with vector stores (pgvector, Pinecone, Weaviate, Milvus, Qdrant), embedding models (open + proprietary), and hybrid retrieval with rerankers.
- Working knowledge of fine-tuning (LoRA/QLoRA/PEFT) and inference optimisation (vLLM, quantisation, batching) — implementation-level, not just conceptual.
- Solid MLOps / LLMOps foundations: experiment tracking, CI/CD for ML, prompt versioning, and evaluation pipelines (LLM-as-judge, RAG triad metrics, adversarial suites).
- Cloud experience on at least one of AWS (Bedrock, SageMaker), Azure (AI Foundry, Azure OpenAI), or GCP (Vertex AI) — including cost management, not just API usage.
- Python proficiency at senior-engineer level; working knowledge of PyTorch and the HuggingFace stack.
- Comfortable with LLM evaluation methodologies: faithfulness / answer-relevance / context-relevance, agent trajectory evals, and human-in-the-loop eval workflows.
- Strong communication — can write a clear design doc that both a product manager and a senior engineer find useful, and can defend technical trade-offs in sprint planning.
- Experience building features for a multi-tenant GenAI SaaS product (not a pilot or internal tool).
- Hands-on with MCP (Model Context Protocol), A2A, or equivalent agent-tool interoperability standards.
- Exposure to VLMs / multimodal systems (GPT-4V, Gemini, Qwen-VL, InternVL, LLaVA) and VLM fine-tuning.
- Experience with small language models (SLMs) and on-prem / edge deployment (Phi, Gemma, Qwen, Llama 3.1-8B class).
- Familiarity with graph databases (Neo4j, Neptune) for GraphRAG and knowledge-graph workloads.
- Public contributions to AI open-source, technical blog posts, or conference talks are a plus.
- Graduate degree (MS) in CS, ML, or related — shipped systems outweigh credentials.
- A people-manager role — you lead through technical mentorship and delivery, not through headcount. An Engineering Manager owns squad delivery and people processes.
- A research scientist role — research-to-production translation is the job; pure research is not.
- A prompt engineer role — prompting is a tool you use daily, not your primary deliverable.
- A data science role — statistical modelling matters, but systems engineering is the centre of gravity.
- 8+ years of software engineering experience, with 3+ years shipping production AI/ML systems and 1+ year delivering LLM-powered applications (RAG, agents, fine-tuning) at meaningful scale.
- Handson proficiency in at least one agent orchestration framework (LangGraph, Semantic Kernel, CrewAI, AutoGen, DSPy) — with clear opinions on their limitations.
- Production experience with vector stores (pgvector, Pinecone, Weaviate, Milvus, Qdrant), embedding models (open + proprietary), and hybrid retrieval with rerankers.
- Working knowledge of fine-tuning (LoRA/QLoRA/PEFT) and inference optimisation (vLLM, quantisation, batching) — implementation-level, not just conceptual.
- Solid MLOps / LLMOps foundations: experiment tracking, CI/CD for ML, prompt versioning, and evaluation pipelines (LLM-as-judge, RAG triad metrics, adversarial suites).
- Cloud experience on at least one of AWS (Bedrock, SageMaker), Azure (AI Foundry, Azure OpenAI), or GCP (Vertex AI) — including cost management, not just API usage.
- Python proficiency at senior-engineer level; working knowledge of PyTorch and the HuggingFace stack.
- Comfortable with LLM evaluation methodologies: faithfulness / answer-relevance / context-relevance, agent trajectory evals, and human-in-the-loop eval workflows.
- Strong communication — can write a clear design doc that both a product manager and a senior engineer find useful, and can defend technical trade-offs in sprint planning.
- Experience building features for a multi-tenant GenAI SaaS product (not a pilot or internal tool).
- Hands-on with MCP (Model Context Protocol), A2A, or equivalent agent-tool interoperability standards.
- Exposure to VLMs / multimodal systems (GPT-4V, Gemini, Qwen-VL, InternVL, LLaVA) and VLM fine-tuning.
- Experience with small language models (SLMs) and on-prem / edge deployment (Phi, Gemma, Qwen, Llama 3.1-8B class).
- Familiarity with graph databases (Neo4j, Neptune) for GraphRAG and knowledge-graph workloads.
- Public contributions to AI open-source, technical blog posts, or conference talks are a plus.
- Graduate degree (MS) in CS, ML, or related — shipped systems outweigh credentials.
- An opportunity to be part of some of the best enterprise SaaS products to be built out of India
- Opportunities to quench your thirst for problem-solving, experimenting, learning, and implementing innovative solutions
- A flat, collegial work environment, with a work hard, play hard attitude
- A platform for rapid growth if you are willing to try new things without fear of failure.
- Remuneration with best in class industry standards with generous health insurance cover
- Ranked #72 in America’s Most innovative Companies list in 2023 alongside companies like Microsoft, Tesla, Apple, IBM, etc.
- Ranked as one of America' s fastest growing companies by Financial Times for four consecutive years: 2020-2023.
- Ranked as one of America' s fastest-growing private companies by Inc 5000 for Six consecutive years: 2018-2023.
- Recognized in multiple Gartner reports, including Market Guides and Hype Cycle, spanning assortment, merchandising, forecasting, algorithmic retailing, and Unified Price, Promotion, and Markdown Optimization Applications.
- Featured as one of top 25 ML startups to watch by Forbes in 2019.
- Ranked as one of North Americas' fastest-growing technology companies by Deloitte for two consecutive years 2019 & 2020
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
