Gruve

AI Infrastructure & Deployment Lead

Pune, Maharashtra
Kubernetes OpenShift Ansible Terraform Python AWS Azure GCP Istio NVIDIA GPU Operator Jenkins GitLab CI GitHub Actions Prometheus Grafana ELK stack Cisco ACI VMware ESXi RHEL Docker Bash SQL
Description

AI Infrastructure & Deployment Lead

Location: Pune, Maharashtra, India

Department: Professional Services, Cybersecurity

About Gruve

Gruve is an innovative software services startup dedicated to transforming enterprises to AI powerhouses. We specialize in cybersecurity, customer experience, cloud infrastructure, and advanced technologies such as Large Language Models (LLMs). Our mission is to assist our customers in their business strategies utilizing their data to make more intelligent decisions. As a well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks.

Position Summary:

We are seeking a Solution Architect (AI Infrastructure & Deployment Lead) to lead the strategic design, architecture, and deployment of large-scale, enterprise-grade Red Hat OpenShift and Kubernetes environments. As a technical authority at the L4 level, you will be responsible for defining the blueprint of our cloud-native infrastructure, ensuring it is secure, scalable, and highly automated.
The ideal candidate acts as the bridge between traditional infrastructure and modern DevOps, serving as the lead design authority for global clients. You will collaborate with Network and Firewall Architects to build a unified fabric where containerized workloads, legacy data centers, and hybrid cloud environments coexist seamlessly through advanced automation and Infrastructure-as-Code (IaC).

Key Responsibilities:

  • Architect and Design Enterprise OpenShift Solutions: Lead the high-level design (HLD) and low-level design (LLD) for multi-tenant Red Hat OpenShift and Kubernetes clusters across on-prem and hybrid cloud environments.
  • Define the technology stack, standards, and blueprints for deploying AI solutions across global, multi-region public clouds (AWS/Azure/GCP) and diverse on-premise hardware.
  • Oversee the successful end-to-end rollout of critical services including AI SOC, OpenShift AI, and AI-based Cybersecurity Log Optimization.
  • Drive Network DevOps Strategy: Define and standardize the automation roadmap using Ansible, Terraform, and Python to achieve "Zero-Touch" infrastructure provisioning and configuration.
  • Lead Customer & Stakeholder Engagement: Act as the primary technical consultant for global clients, leading design workshops, architecture validation, and executive-level technical reviews.
  • Integrate Advanced AI apps, Networking & Security: Collaborate with Pre-sales, AI application developers & Engineers, Firewall Architects to design secure AI agents & use cases, container networking (CNI) models, implementing Zero-Trust security, service mesh (Istio), and micro-segmentation within OpenShift environment.
  • Optimize Hybrid Infrastructure: Oversee the seamless integration of OpenShift with physical networking (Cisco ACI, VXLAN) and virtualized platforms (RHEL-V, VMware ESXi).
  • GPU & Hardware Orchestration: Design and manage hardware acceleration using the NVIDIA GPU Operator and Node Feature Discovery (NFD). Implement Multi-Instance GPU (MIG) and time-slicing to optimize resource utilization across multi-tenant clusters.
  • Establish CI/CD Governance: Architect robust CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions) for infrastructure and application delivery, ensuring compliance and security are baked into the workflow.
  • Lead Observability & Reliability: Design comprehensive monitoring and logging architectures using Prometheus, Grafana, and ELK stack to ensure 99.99% availability of cluster services.
  • Mentorship & Technical Leadership: Guide and mentor L2/L3 engineers, providing expert-level escalation support and establishing best practices for the DevOps and Network teams.
  • Innovation & R&D: Evaluate and introduce emerging technologies such as Advanced Cluster Management (ACM), Advanced Cluster Security (ACS), and Cloud-Native Networking (OVN-Kubernetes).

Basic Qualifications:

  • Architect and Design Enterprise OpenShift Solutions: Lead the high-level design (HLD) and low-level design (LLD) for multi-tenant Red Hat OpenShift and Kubernetes clusters across on-prem and hybrid cloud environments.
  • Define the technology stack, standards, and blueprints for deploying AI solutions across global, multi-region public clouds (AWS/Azure/GCP) and diverse on-premise hardware.
  • Oversee the successful end-to-end rollout of critical services including AI SOC, OpenShift AI, and AI-based Cybersecurity Log Optimization.
  • Drive Network DevOps Strategy: Define and standardize the automation roadmap using Ansible, Terraform, and Python to achieve "Zero-Touch" infrastructure provisioning and configuration.
  • Lead Customer & Stakeholder Engagement: Act as the primary technical consultant for global clients, leading design workshops, architecture validation, and executive-level technical reviews.
  • Integrate Advanced AI apps, Networking & Security: Collaborate with Pre-sales, AI application developers & Engineers, Firewall Architects to design secure AI agents & use cases, container networking (CNI) models, implementing Zero-Trust security, service mesh (Istio), and micro-segmentation within OpenShift environment.
  • Optimize Hybrid Infrastructure: Oversee the seamless integration of OpenShift with physical networking (Cisco ACI, VXLAN) and virtualized platforms (RHEL-V, VMware ESXi).
  • GPU & Hardware Orchestration: Design and manage hardware acceleration using the NVIDIA GPU Operator and Node Feature Discovery (NFD). Implement Multi-Instance GPU (MIG) and time-slicing to optimize resource utilization across multi-tenant clusters.
  • Establish CI/CD Governance: Architect robust CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions) for infrastructure and application delivery, ensuring compliance and security are baked into the workflow.
  • Lead Observability & Reliability: Design comprehensive monitoring and logging architectures using Prometheus, Grafana, and ELK stack to ensure 99.99% availability of cluster services.
  • Mentorship & Technical Leadership: Guide and mentor L2/L3 engineers, providing expert-level escalation support and establishing best practices for the DevOps and Network teams.
  • Innovation & R&D: Evaluate and introduce emerging technologies such as Advanced Cluster Management (ACM), Advanced Cluster Security (ACS), and Cloud-Native Networking (OVN-Kubernetes).

Preferred Qualifications:

  • Red Hat Certified OpenShift Administrator, AWS Certified AI Practitioner, Certified Information Systems Security Professional (CISSP), Certified Cloud Security Professional (CCSP).
  • Security Focus: Exposure to DevSecOps tools (e.g., Quay, StackRox) and zero-trust framework implementation.
  • Legacy Integration: Familiarity with Cisco ACI, Arista CloudVision, or Juniper Apstra for end-to-end automation integration.
  • Red Hat OpenShift AI (ROAI) expertise will be preferred.
  • Familiarity with LLM deployment requirements and vector database infrastructure.
  • Background in Cybersecurity infrastructure (SIEM, SOAR, SOC & VAPT platforms).
  • Experience with MLOps infrastructure (Kubeflow, MLflow) and high-speed telemetry pipelines.

Why Gruve

At Gruve, we foster a culture of innovation, collaboration, and continuous learning. We are committed to building a diverse and inclusive workplace where everyone can thrive and contribute their best work. If you’re passionate about technology and eager to make an impact, we’d love to hear from you.

Gruve is an equal opportunity employer. We welcome applicants from all backgrounds and thank all who apply; however, only those selected for an interview will be contacted.

Gruve
Gruve

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say