SRE Design & Support Engineer
Location: Hyderabad, Telangana, India
Overview We are looking for a self-driven, software engineering mindset SRE engineer to • Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes • Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation, The SRE design & support engineer is integral part of the global team with its main purpose to provide a delightful customer experience for the user of the global consumer, commercial, supply chain and enablement functions in the PepsiCo digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model. The scope of this role is focussed on the cloud architecture application full stack devlopment, B2B pepsiconnect and Direct to Customer and other S&T roadmap applications. Ensures that PepsiCo DPA applications service performance, reliability and availability expected by our customers and internal groups It requires a blend of technical expertise on SRE tools, modern applications cloud architecture i.e. full stack, IT operations experience, and analytics & influence skills. Responsibilities Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications. Ensuring non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution Execute as Pro-active SRE Support engineer, preventing P1, P2, potential P3s, diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems, Work closely with customer-facing support teams to empower them with SRE insights and tooling. Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member. Continuously optimize the L2/support operations work via SRE workflow automation Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams. Actively engage and drive AI Ops adoption across teams Qualifications 8-11 years of work experience evolving to a SRE engineer with 3-5 years of experience in continuously improving and transforming IT operations ways of working Bachelor’s degree in Computer Science, Information Technology or a related field Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs. The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives. A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk. Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets. A firm understanding of cloud archticture for distributed environments. Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js. Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase) Infrastructure: Azure/AWS cloud platforms and/or Client / server environments. Prior experience involving in shaping transformation developing SRE solutions would be a plus.Responsibilities
Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications. Ensuring non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution Execute as Pro-active SRE Support engineer, preventing P1, P2, potential P3s, diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems, Work closely with customer-facing support teams to empower them with SRE insights and tooling. Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member. Continuously optimize the L2/support operations work via SRE workflow automation Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams. Actively engage and drive AI Ops adoption across teamsQualifications
8-11 years of work experience evolving to a SRE engineer with 3-5 years of experience in continuously improving and transforming IT operations ways of working Bachelor’s degree in Computer Science, Information Technology or a related field Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs. The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives. A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk. Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets. A firm understanding of cloud archticture for distributed environments. Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js. Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase) Infrastructure: Azure/AWS cloud platforms and/or Client / server environments. Prior experience involving in shaping transformation developing SRE solutions would be a plus.There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
