Site Reliability Engineer - SaaSOps
Team: Engineering
Location: Hyderabad
Commitment: Full-Time
Workplace Type: onsite
Responsibilities:
- Define and embed SRE best practices across the SaaS platform, ensuring reliability is built into the system from the ground up.
- Establish and maintain meaningful SLA, SLIs, SLOs, and error budgets to protect customer experience and guide engineering priorities.
- Design and continuously improve high-availability and disaster recovery strategies.
- Automate manual processes, manage incident response, optimize performance (SLI/SL0).
- Bridge the gap between development IT operations.
- Ensure strong tenant isolation and consistent performance within a DB-per-tenant architecture.
- Strengthen system resiliency across both Azure and on-prem deployments in our hybrid environment.
- Lead incident response efforts with structured troubleshooting and clear communication.
- Drive thorough root cause analysis (RCA) and conduct blameless postmortems focused on long-term improvements.
- Translate incidents into systemic fixes rather than temporary patches.
- Develop and maintain operational runbooks to standardize responses.
- Design and maintain a comprehensive observability framework for both cloud and on-prem environments.
Requirements:
- Must have a minimum of 3+ years of hands-on experience in Site Reliability Engineering (SRE), supporting production-grade, cloud-native enterprise software platform/applications.
- Prior experience as a DevOps engineer, cloud system administrator or software developer.
- Strong proficiency in scripting languages such as Python, PowerShell etc
- Deep hands-on experience working with Microsoft Azure in production environments.
- Possess a solid understanding of Terraform, Ansible, Kubernetes internals, including networking, scheduling, scaling, and resource management.
- Have proven experience in PostgreSQL performance tuning and optimization in production systems.
- Demonstrate hands-on experience with Azure Monitor, Application Insights, and Log Analytics for cloud-based observability.
- Implement and manage Prometheus and Grafana for Kubernetes and on-prem monitoring.
- Understand how to turn metrics, logs, and traces into actionable insights that improve reliability and performance.
- Troubleshoot and improve CI/CD pipelines to ensure stable and predictable releases.
- Apply GitOps principles to manage deployments and infrastructure changes in a controlled and auditable manner.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
