Description

Overview We are looking for a seasoned Senior Manager of Site Reliability Engineering (SRE) to lead our AWS-focused SRE initiatives. In this role, you will be responsible for overseeing the reliability, scalability, and performance of critical applications and infrastructure hosted on AWS. You will lead a team of experienced SREs, drive strategic operational improvements, and ensure the seamless functioning of our cloud ecosystem to meet business and customer needs Responsibilities Leadership and Team Management: Lead and mentor a team of SRE professionals, fostering a culture of innovation, collaboration, and accountability. Develop and implement career development plans, provide coaching, and facilitate knowledge-sharing within the team. Operational Excellence: Drive the adoption of SRE principles, including SLAs, SLOs, and error budgets, to enhance system reliability and performance. Oversee incident management processes, ensuring timely resolution and comprehensive root cause analysis. Establish and monitor operational KPIs to measure and improve system availability and performance. Automation and Tooling: Champion the use of automation to reduce manual processes, improve efficiency, and enhance system reliability. Implement and optimize Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or CDK. AWS Infrastructure Management: Design, build, and maintain scalable and secure AWS-based infrastructure to support current and future workloads. Leverage AWS services such as EC2, RDS, Lambda, S3, CloudWatch, and others to enhance operational capabilities. Collaboration and Stakeholder Engagement: Partner with engineering, product, and DevOps teams to align SRE initiatives with business objectives. Act as a key liaison between the SRE team and executive stakeholders, communicating updates on reliability and risks. Risk and Security Management: Ensure compliance with security standards and best practices within AWS environments. Identify risks related to cloud infrastructure and implement strategies for mitigation. Qualifications Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience). 10+ years of experience in cloud-based infrastructure and operations, with at least 4 years in a leadership role. Deep expertise in AWS services, architecture, and tools, including hands-on experience with core AWS services (e.g., EC2, ECS, Lambda, S3, VPC, IAM). Proficiency in automation scripting (e.g., Python, Bash) and Infrastructure as Code (e.g., Terraform, CloudFormation). Strong knowledge of monitoring and observability tools like CloudWatch, Prometheus, Grafana, or Datadog. Proven experience managing large-scale production environments, incident response, and operational scaling. Hands-on experience with CI/CD pipelines and DevOps methodologies. Preferred Qualifications AWS certifications, such as AWS Certified Solutions Architect (Professional) or AWS Certified DevOps Engineer. Experience with Kubernetes (EKS) and containerization technologies like Docker. Familiarity with FinOps principles for cost optimization in AWS environments. Strong analytical skills and a data-driven approach to decision-making. Exceptional communication, leadership, and stakeholder management abilities.

Leadership and Team Management: Lead and mentor a team of SRE professionals, fostering a culture of innovation, collaboration, and accountability. Develop and implement career development plans, provide coaching, and facilitate knowledge-sharing within the team. Operational Excellence: Drive the adoption of SRE principles, including SLAs, SLOs, and error budgets, to enhance system reliability and performance. Oversee incident management processes, ensuring timely resolution and comprehensive root cause analysis. Establish and monitor operational KPIs to measure and improve system availability and performance. Automation and Tooling: Champion the use of automation to reduce manual processes, improve efficiency, and enhance system reliability. Implement and optimize Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or CDK. AWS Infrastructure Management: Design, build, and maintain scalable and secure AWS-based infrastructure to support current and future workloads. Leverage AWS services such as EC2, RDS, Lambda, S3, CloudWatch, and others to enhance operational capabilities. Collaboration and Stakeholder Engagement: Partner with engineering, product, and DevOps teams to align SRE initiatives with business objectives. Act as a key liaison between the SRE team and executive stakeholders, communicating updates on reliability and risks. Risk and Security Management: Ensure compliance with security standards and best practices within AWS environments. Identify risks related to cloud infrastructure and implement strategies for mitigation.

Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience). 10+ years of experience in cloud-based infrastructure and operations, with at least 4 years in a leadership role. Deep expertise in AWS services, architecture, and tools, including hands-on experience with core AWS services (e.g., EC2, ECS, Lambda, S3, VPC, IAM). Proficiency in automation scripting (e.g., Python, Bash) and Infrastructure as Code (e.g., Terraform, CloudFormation). Strong knowledge of monitoring and observability tools like CloudWatch, Prometheus, Grafana, or Datadog. Proven experience managing large-scale production environments, incident response, and operational scaling. Hands-on experience with CI/CD pipelines and DevOps methodologies. Preferred Qualifications AWS certifications, such as AWS Certified Solutions Architect (Professional) or AWS Certified DevOps Engineer. Experience with Kubernetes (EKS) and containerization technologies like Docker. Familiarity with FinOps principles for cost optimization in AWS environments. Strong analytical skills and a data-driven approach to decision-making. Exceptional communication, leadership, and stakeholder management abilities.

PepsiCo

0 applies

1 views

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

PepsiCo

Associate Director- AWS SRE

Other Jobs from PepsiCo

Lead Engineer - 1

Associate Director- Azure SRE

Deputy Director- ServiceNow Solution Architect (EEH, Integrations, Portal, KM)

Architect - Data Engineering

Network Infrastructure Associate Manager

Lead Data Scientist

Similar Jobs

Backend Software Engineer / MTS - Bangalore

Senior Backend Software Engineer / SMTS - Distributed Systems - Bangalore

Data Engineer

Staff Software Engineer, Semantic Layer

Staff Software Engineer, Semantic Layer