Description

Overview We are seeking a highly motivated and experienced Manager of Site Reliability Engineering (SRE) to lead our Azure-focused SRE team. The ideal candidate will combine technical expertise in Azure cloud services with strong leadership skills to ensure the reliability, scalability, and performance of our applications and infrastructure. As a manager, you will oversee a team of SREs, driving automation, incident management, and operational excellence while collaborating with cross-functional teams to achieve business goals. Responsibilities Team Leadership and Development: Lead, mentor, and grow a team of SREs, fostering a culture of collaboration, continuous learning, and operational excellence. Define team goals, metrics, and performance objectives aligned with organizational priorities. Operational Reliability: Ensure the reliability, availability, and performance of Azure-hosted services through proactive monitoring and alerting. Develop and enforce best practices for incident response, root cause analysis, and postmortem reporting. Establish SLAs, SLOs, and error budgets in collaboration with product and engineering teams. Automation and Tooling: Drive the adoption of automation tools to reduce manual operational tasks and improve system reliability. Implement Infrastructure as Code (IaC) principles using tools such as Terraform, ARM templates, or Bicep for Azure resources. Performance and Scalability: Optimize system performance, capacity planning, and scalability to support growth and evolving business needs. Leverage Azure services such as Azure Monitor, Application Insights, and Log Analytics to gain insights into system health. Collaboration and Stakeholder Management: Partner with development, product, and infrastructure teams to align on technical strategies and priorities. Communicate operational health, risks, and opportunities to executive stakeholders. Risk and Security Management: Ensure compliance with security best practices, standards, and policies within Azure environments. Identify and mitigate risks related to cloud infrastructure and applications. Qualifications Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience). 8+ years of experience in cloud-based infrastructure and operations, with at least 3 years in a leadership role. Strong expertise in Microsoft Azure services, including compute, storage, networking, security, and monitoring tools. Proven experience in managing and scaling infrastructure using SRE principles. Proficiency in automation and scripting (e.g., Python, PowerShell) and Infrastructure as Code (e.g., Terraform, ARM templates). Hands-on experience with CI/CD pipelines and DevOps practices. Strong understanding of incident management, change management, and ITIL practices. Preferred Qualifications Azure certifications, such as Azure Solutions Architect, Azure DevOps Engineer, or Azure Administrator. Experience with container orchestration platforms like Kubernetes (AKS) and containerization tools like Docker. Familiarity with FinOps principles for cost optimization in Azure environments. Excellent communication, decision-making, and problem-solving skills.

Team Leadership and Development: Lead, mentor, and grow a team of SREs, fostering a culture of collaboration, continuous learning, and operational excellence. Define team goals, metrics, and performance objectives aligned with organizational priorities. Operational Reliability: Ensure the reliability, availability, and performance of Azure-hosted services through proactive monitoring and alerting. Develop and enforce best practices for incident response, root cause analysis, and postmortem reporting. Establish SLAs, SLOs, and error budgets in collaboration with product and engineering teams. Automation and Tooling: Drive the adoption of automation tools to reduce manual operational tasks and improve system reliability. Implement Infrastructure as Code (IaC) principles using tools such as Terraform, ARM templates, or Bicep for Azure resources. Performance and Scalability: Optimize system performance, capacity planning, and scalability to support growth and evolving business needs. Leverage Azure services such as Azure Monitor, Application Insights, and Log Analytics to gain insights into system health. Collaboration and Stakeholder Management: Partner with development, product, and infrastructure teams to align on technical strategies and priorities. Communicate operational health, risks, and opportunities to executive stakeholders. Risk and Security Management: Ensure compliance with security best practices, standards, and policies within Azure environments. Identify and mitigate risks related to cloud infrastructure and applications.

Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience). 8+ years of experience in cloud-based infrastructure and operations, with at least 3 years in a leadership role. Strong expertise in Microsoft Azure services, including compute, storage, networking, security, and monitoring tools. Proven experience in managing and scaling infrastructure using SRE principles. Proficiency in automation and scripting (e.g., Python, PowerShell) and Infrastructure as Code (e.g., Terraform, ARM templates). Hands-on experience with CI/CD pipelines and DevOps practices. Strong understanding of incident management, change management, and ITIL practices. Preferred Qualifications Azure certifications, such as Azure Solutions Architect, Azure DevOps Engineer, or Azure Administrator. Experience with container orchestration platforms like Kubernetes (AKS) and containerization tools like Docker. Familiarity with FinOps principles for cost optimization in Azure environments. Excellent communication, decision-making, and problem-solving skills.

PepsiCo

0 applies

1 views

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

PepsiCo

Deputy Director- Azure SRE

Other Jobs from PepsiCo

Lead Engineer - 1

Associate Director- Azure SRE

Deputy Director- ServiceNow Solution Architect (EEH, Integrations, Portal, KM)

Architect - Data Engineering

Network Infrastructure Associate Manager

Lead Data Scientist

Similar Jobs

Backend Software Engineer / MTS - Bangalore

Senior Backend Software Engineer / SMTS - Distributed Systems - Bangalore

Senior Technical Support Engineer

Cloud Engineer Lead

Software Engineer III, Release Engineering

Software Engineer III, Release Engineering