Description

Overview As the Associate Director Azure SRE, you will be responsible for leading and mentoring a team of Site Reliability Engineers, driving operational excellence, and ensuring the reliability and scalability of our Azure-based infrastructure. You will collaborate with cross-functional teams to design, implement, and maintain highly available systems, while advocating for best practices in automation, monitoring, and incident management. The ideal candidate will have strong leadership skills, deep technical expertise in Azure cloud services, and a passion for improving system reliability and performance. Responsibilities Leadership & Mentorship: Lead, mentor, and grow a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and continuous improvement. Drive operational excellence by setting clear goals, priorities, and performance metrics for the team. Encourage professional development and knowledge sharing within the team. Reliability & Scalability: Own the availability, performance, and scalability of critical services in Azure. Define and implement Site Reliability Engineering (SRE) best practices, including SLAs, SLOs, and error budgets. Proactively identify potential risks and performance bottlenecks, and implement strategies to mitigate them. Automation & Infrastructure Management: Oversee the automation of operational tasks, including provisioning, deployment, monitoring, and incident response. Lead efforts to implement Infrastructure as Code (IaC) using tools such as Ansible, Terraform, or Azure DevOps Pipelines. Incident Management & Resolution: Manage and lead incident response for Azure-based infrastructure, ensuring quick resolution and root cause analysis. Define and continuously improve incident management processes, ensuring minimal downtime and impact on users. Collaboration & Stakeholder Communication: Work closely with engineering, DevOps, and security teams to design and deploy solutions that meet both reliability and security requirements. Communicate effectively with stakeholders across the organization, providing visibility into SRE efforts and service health metrics. Monitoring & Observability: Ensure robust monitoring, logging, and alerting are in place to proactively identify issues before they impact customers. Lead the adoption and continuous improvement of observability frameworks (e.g., Prometheus, Grafana, Azure Monitor). Continuous Improvement: Drive continuous improvement initiatives, including post-incident reviews, blameless retrospectives, and process optimizations. Stay up-to-date with the latest Azure technologies and industry best practices, integrating new solutions to improve reliability. Qualifications Education: Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience). Experience: Should have 15+ years of experience with 10+ year in IT/Infrastructure operations with 5+ years of experience in a Site Reliability Engineering (SRE) or DevOps role with significant exposure to Azure cloud environments. 5+ years of experience in a leadership or management role, ideally leading an SRE or infrastructure team. Proven experience in building and maintaining high-availability, distributed systems on Azure. Hands-on experience with Azure services such as Application Gateways, Azure Networking, NSG, Kubernetes (AKS), App Services, and Azure Functions. Technical Skills: Deep knowledge of Azure architecture, services, and infrastructure. Expertise in automation tools such as Terraform, Ansible, Azure DevOps, ARM templates, or similar. Proficient in scripting languages (e.g., Python, Bash, PowerShell) for automation and orchestration. Strong experience with containerization and orchestration tools, particularly Azure Kubernetes Service (AKS). Familiarity with monitoring tools such as Azure Monitor, Prometheus, Grafana, or ELK stack. In-depth knowledge of CI/CD pipelines and deployment strategies. Soft Skills: Strong leadership and mentoring skills with the ability to inspire and guide a team. Excellent communication skills, with the ability to collaborate across multiple teams and articulate complex technical concepts to both technical and non-technical stakeholders. Problem-solving mindset with a focus on improving operational efficiency, reliability, and scalability. Certifications (Preferred): Microsoft Certified: Azure Solutions Architect Expert or Azure Administrator Associate.

Leadership & Mentorship: Lead, mentor, and grow a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and continuous improvement. Drive operational excellence by setting clear goals, priorities, and performance metrics for the team. Encourage professional development and knowledge sharing within the team. Reliability & Scalability: Own the availability, performance, and scalability of critical services in Azure. Define and implement Site Reliability Engineering (SRE) best practices, including SLAs, SLOs, and error budgets. Proactively identify potential risks and performance bottlenecks, and implement strategies to mitigate them. Automation & Infrastructure Management: Oversee the automation of operational tasks, including provisioning, deployment, monitoring, and incident response. Lead efforts to implement Infrastructure as Code (IaC) using tools such as Ansible, Terraform, or Azure DevOps Pipelines. Incident Management & Resolution: Manage and lead incident response for Azure-based infrastructure, ensuring quick resolution and root cause analysis. Define and continuously improve incident management processes, ensuring minimal downtime and impact on users. Collaboration & Stakeholder Communication: Work closely with engineering, DevOps, and security teams to design and deploy solutions that meet both reliability and security requirements. Communicate effectively with stakeholders across the organization, providing visibility into SRE efforts and service health metrics. Monitoring & Observability: Ensure robust monitoring, logging, and alerting are in place to proactively identify issues before they impact customers. Lead the adoption and continuous improvement of observability frameworks (e.g., Prometheus, Grafana, Azure Monitor). Continuous Improvement: Drive continuous improvement initiatives, including post-incident reviews, blameless retrospectives, and process optimizations. Stay up-to-date with the latest Azure technologies and industry best practices, integrating new solutions to improve reliability.

Education: Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience). Experience: Should have 15+ years of experience with 10+ year in IT/Infrastructure operations with 5+ years of experience in a Site Reliability Engineering (SRE) or DevOps role with significant exposure to Azure cloud environments. 5+ years of experience in a leadership or management role, ideally leading an SRE or infrastructure team. Proven experience in building and maintaining high-availability, distributed systems on Azure. Hands-on experience with Azure services such as Application Gateways, Azure Networking, NSG, Kubernetes (AKS), App Services, and Azure Functions. Technical Skills: Deep knowledge of Azure architecture, services, and infrastructure. Expertise in automation tools such as Terraform, Ansible, Azure DevOps, ARM templates, or similar. Proficient in scripting languages (e.g., Python, Bash, PowerShell) for automation and orchestration. Strong experience with containerization and orchestration tools, particularly Azure Kubernetes Service (AKS). Familiarity with monitoring tools such as Azure Monitor, Prometheus, Grafana, or ELK stack. In-depth knowledge of CI/CD pipelines and deployment strategies. Soft Skills: Strong leadership and mentoring skills with the ability to inspire and guide a team. Excellent communication skills, with the ability to collaborate across multiple teams and articulate complex technical concepts to both technical and non-technical stakeholders. Problem-solving mindset with a focus on improving operational efficiency, reliability, and scalability. Certifications (Preferred): Microsoft Certified: Azure Solutions Architect Expert or Azure Administrator Associate.

PepsiCo

0 applies

13 views

Other Jobs from PepsiCo

Associate Manager, D&AI Data IntegrationOps (MQ, Webogic Sustain)

Hyderabad, India

Supply & Demand Network Design Associate Manager

Remote New York, NY

Data Engineer Technical Lead

Mississauga, Ontario Canada

Senior Data Engineer

Remote Hyderabad, India

Data, Analytics and Ai Product Manager

New York, NY Plano, TX

Associate Manager, D&AI Data IntegrationOps (ECG, TIBCO, KAFKA - Sustain)

Hyderabad, India

Similar Jobs

Senior Cloud Engineer - Edge Compute

Remote Bengaluru, India

Sr. Principal Observability and Reliability Tooling Engineer

Pune, India

Cloud Ops Engineer

Remote Overland Park, KS

Experienced Cloud Engineer (AWS)

Remote Brazil

Cloud Engineer - Mid or Senior

Remote Cluj-Napoca, Romania

Principal Engineer(GoLang/Java/Rust | AWS | Distributed Systems)

Bengaluru, India

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Sid

Very nice portal for searching jobs in this rough market.

Mar 6, 2025

Michael Duran

Software Engineer

I've been using this job search site for a while now, and it’s honestly one of the best out there! The clean and easy-to-navigate UI makes the whole job-hunting process so much smoother. Plus, the job postings are always up-to-date, so I never feel like I’m wasting time. The cherry on top is the owner—super kind and always quick to respond. Definitely recommend checking it out if you're on the job hunt!

Aug 21, 2024

Sai

It’s really great website for finding jobs based on skills it’s really helpful give a go

Aug 21, 2024

Adinadh

What I like most about Echo Jobs is how easy it is to use. The platform helps me quickly find jobs that match my skills and interests, thanks to its great recommendations and filters. Yes, I would definitely recommend Echo Jobs to a friend. It makes job searching simple and efficient, making it a great tool for anyone looking for a new job.

Jul 23, 2024

Rahim

Software Engineer

As a student navigating the job market, I've found LinkedIn increasingly frustrating due to numerous fake postings by consultancies. In contrast, this job posting website has been a game-changer for me. It offers genuine opportunities and a straightforward application process, making it much easier to find and apply for real jobs. Highly recommend it to fellow students seeking reliable job listings!

Jul 16, 2024

Cliff Gor

Software Engineer

Echo Jobs has been exceptional in my job hunt where it provides one platform to job hunt and I don't have to open 10 websites just to look for a job. It has also helped me focus much on the job skill and the location filtering out the onsite jobs and remote ones. The only feature that I would request is to display fully remote jobs that are not restricted to a country since the one available shows ie, Remote, US yet. But if it could show remote only, that would be helpful not only to me but to other people applying for full remote and not tied to only US candidates

Apr 22, 2024

Mauro

Software Engineer

I found EchoJobs in 2022, and I love it. It has a lot of remote jobs. It's exclusive to software and technology jobs (helpful for devs like me). What I like the most are its filters and its API. If you're a tech professional seeking remote work, I highly recommend giving it a try to EchoJobs.

Mar 4, 2024

Kenn Kibadi

Founder & Product Engineer @ EarlyAccessHQ.com

Would definitely recommend it! Excellent product, dedicated founder, Jobs are easier to find. Congrats 🎉 to the entire team!

Mar 3, 2024

Brandon Banks

Echo Jobs is really impressive. It provides a great user experience with an ability to quickly search through the many job postings. There is an impressive amount of jobs here and it is quickly updated. The details in the each job posting is helpful when determining if it is worth pursuing. I would highly recommend using Echo Jobs to find the next step in your career.

Mar 2, 2024

Tyler Young

tylerayoung.com

Best wishes with EchoJobs—it's become my favorite job board overnight!

Dec 16, 2023

Gabriel

Remote Job Seeker

Simply put, it's the most up to date tech jobs aggregator I’ve found. I'm like... "I don't have to check 10+ jobs boards daily just to see if there's a new job listing? sign me up!" The filters are also quite helpful! The UI is very clean and straightforward. Love it!

Oct 5, 2023

Collect testimonials with Senja

PepsiCo

Associate Director- Azure SRE

Other Jobs from PepsiCo

Similar Jobs