We are looking for a DevOps Tech Lead to take ownership of our Cloud Infrastructure and Platform Engineering strategy, enabling high-scale, cutting-edge GenAI products running across 40+ Kubernetes clusters on GCP and AWS.
This role is a blend of hands-on engineering and technical leadership, requiring deep expertise in cloud-native technologies, Kubernetes at scale, and modern DevOps principles. You will work closely with engineering teams to design and implement scalable infrastructure solutions, optimize developer workflows, and ensure reliability and efficiency across our platform.
- Cloud & Kubernetes Expertise: Design and implement highly scalable multi-cluster Kubernetes environments across GCP & AWS.
- Developer Experience & Enablement: Lead the development of self-service tools and automation that improve efficiency for R&D teams.
- Incident & Reliability Engineering: Work with engineering teams to optimize cost, performance, and reliability of production infrastructure through monitoring, capacity planning, and scaling strategies.
- Security & Governance: Contribute to best practices for RBAC, IAM, cloud security, and compliance while ensuring infrastructure reliability.
- Automation & Infrastructure as Code: Drive adoption of GitOps workflows and Infrastructure as Code (Terraform, Helm, Crossplane) to enhance automation and consistency.
- Mentorship & Team Growth: Provide technical mentorship within the platform engineering team and contribute to knowledge-sharing across R&D.
- Cross-Team Collaboration: Work closely with engineering teams to align cloud infrastructure goals with business needs and reliability requirements.
Technology Assessment: Assess and advocate for new technologies that improve reliability, efficiency, and scalability within the platform.
Technical Expertise:
- 8+ years of DevOps, SRE, or Platform Engineering experience.
- 6+ years working with public cloud platforms (AWS/GCP) at scale.
- Deep Kubernetes expertise, including managing large-scale, multi-cluster enterprise-grade Kubernetes environments.
- Experience designing and managing Custom Resource Definitions (CRDs) and custom controllers.
- Strong background in Infrastructure as Code (Terraform, Helm) and GitOps principles (ArgoCD, Crossplane, FluxCD, etc.).
- Hands-on experience in observability & monitoring (Prometheus, Grafana, Datadog, OpenTelemetry, etc.).
- Proficiency in scripting & automation (Python, Go, Bash) for infrastructure automation.
- Expertise in cloud networking (VPC, load balancers, service meshes) and security best practices (RBAC, IAM, security groups, network policies, etc.).
- Experience with CI/CD pipelines, optimizing for performance, security, and developer velocity.
Leadership & Execution:
- Ability to design and implement platform solutions, working closely with engineering teams.
- Experience mentoring engineers through code reviews, technical talks, documentation, and hands-on collaboration, while sharing knowledge across teams.
- Strong incident management skills, including on-call experience, root cause analysis, and postmortems.
- Passion for automation, self-service, and building internal tools to streamline workflows.
- Influences engineering teams by driving adoption of DevOps best practices, ensuring a culture of automation, collaboration, and continuous improvement.
Nice-to-Have:
- Experience with self-hosted on-prem deployments and managed private VPC deployments (Bring Your Own Cloud models).
- Advanced expertise in Helm and Crossplane for Kubernetes resource management.
- Experience in GenAI or large-scale SaaS platforms.
- Familiarity with SQL/NoSQL databases and distributed systems.
- DevSecOps experience, with a strong understanding of security automation and compliance frameworks.
AI21 Labs is pioneering the development of Foundation Models and AI Systems for enterprises, accelerating the adoption of Generative AI in production.
Established in 2017 by AI visionaries Prof. Amnon Shashua, Prof. Yoav Shoham, and Ori Goshen, our mission is to equip businesses with cutting-edge LLMs and AI capabilities. Backed by leading investors like Pitango, Google, Nvidia, Intel Capital, and Comcast Ventures.
Join us on this exciting journey and advance your career with AI21 Labs!
Other Jobs from AI21 Labs
Senior Software Engineer
Algorithm Developer
Lead Product Manager
Full Stack Engineer
Similar Jobs
Senior Software Development Engineer, Big Data
Senior Applications Engineer
Business Intelligence Data Engineer II
Senior Solutions Engineer - Cloud Infrastructure
Senior Software Engineer - Infrastructure, Hyderabad, India
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say