Quantiphi

Senior Site Reliability Engineer

Bengaluru, KA Mumbai
Go AWS GCP Terraform Ansible Chef Puppet Kubernetes Docker Prometheus Grafana ClickHouse Redis MySQL
Description

Sr SRE

Location: IN KA Bengaluru

Time Type: Full time

Job Description

While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.


If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!

Job Description
Role: Site Reliability Engineer
Experience Level: 5-8 Years
Work location: Bangalore, Mumbai (Hybrid)

As Site Reliability Engineer, you'll be responsible for ensuring the reliability, performance, and scalability of a
serverless platform. You'll work on improving system observability, automating operational tasks, optimizing
resource utilization, and maintaining our stringent SLOs while balancing cost efficiency. This role requires deep
technical expertise in distributed systems, cloud infrastructure, and a passion for operational excellence.

What You'll Do:

  • Ensure Platform Reliability: Own the availability, latency, performance, and efficiency of NG-SIEM platform services
  • Build Automation & Tooling: Design and implement automation solutions for deployment, monitoring, incident response, and capacity planning to reduce toil and improve operational efficiency
  • Monitor & Optimize: Develop comprehensive observability solutions using metrics, logs, and traces; proactively identify and resolve performance bottlenecks and reliability issues
  • Incident Management: Lead incident response efforts, conduct blameless post-mortems, and drive continuous improvement initiatives to prevent recurrence
  • Capacity Planning: Analyze system performance data and growth trends to forecast infrastructure needs and ensure the platform scales efficiently with customer demand
  • SLO/SLA Management: Define, measure, and maintain Service Level Objectives and error budgets; balance feature velocity with reliability requirements
  • Cost Optimization: Implement strategies to optimize cloud resource utilization and reduce operational costs while maintaining performance and reliability standards
  • Collaborate Cross-Functionally: Partner with engineering teams to improve system design for reliability, influence architectural decisions, and embed SRE best practices
  • On-Call Participation: Participate in on-call rotation to provide 24/7 support for critical production systems
  • Documentation: Create and maintain runbooks, operational procedures, and technical documentation to enable team scalability

What You'll Need:

  • Experience in Site Reliability Engineering, DevOps, or similar roles supporting large-scale distributed systems in production environments
  • Strong programming skills in at least one language (Go) for automation and tooling development
  • Deep cloud expertise with hands-on experience in at least one major cloud platform (AWS or GCP) including compute, storage, networking, and managed services
  • Distributed systems knowledge: Understanding of distributed system design patterns, consistency models, fault tolerance, and scalability principles
  • Infrastructure as Code: Proficiency with IaC tools (Terraform) and configuration management (Ansible, Chef, Puppet)
  • Container orchestration: Experience with Kubernetes, Docker, Podman and container-based deployment patterns
  • Observability expertise: Hands-on experience with monitoring and observability tools (Prometheus, Grafana)
  • CI/CD pipelines: Experience building and maintaining continuous integration and deployment pipelines
  • Incident management: Proven track record of managing high-severity incidents and implementing preventive measures
  • Data-driven approach: Ability to analyze system metrics and logs to identify trends, anomalies, and optimization opportunities
  • Communication skills: Excellent verbal and written communication abilities for remote collaboration across global teams


Bonus Points:

  • Massive scale experience: 3+ years owning systems handling over 1 trillion requests per day or more than 10 PB of data per day
  • Multi-cloud experience: Hands-on work with hybrid or multi-cloud environments
  • Database expertise: Deep knowledge of distributed databases, data lakes, or SIEM platforms (ClickHouse, Redis, MySQL)
  • Security background: Exposure to cybersecurity, threat intelligence, or security operations
  • Networking expertise: Advanced understanding of network protocols, load balancing, and CDN technologies.
     

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

Quantiphi
Quantiphi

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say