Qualys

Director, Site Reliability Engineer

Pune, India
Kubernetes Terraform
Description

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Qualys’ site reliability engineering (SRE) team supports all Qualys products across all our production environments, including our 11 global multi-tenant platforms and over 90 on-premise setups. Effective incident management is a big part of our SRE efforts to minimize the disruption of an incident and restore normal business operations as quickly as possible.

 

We are seeking a highly motivated and talented Director , Site Reliability Engineering to lead our SRE team that works on a 24/7 rotation. In this role, you will be responsible for leading a group that responds proactively to alerts and is accountable for the efficiency and effectiveness of service delivery over the life cycle of an incident, Deployment of applications in production , automating the deployments , making the production environments very stable .

 

We are looking for an individual who believes in SRE principles, has a software engineering mindset, and wants to be part of an organization that is transforming itself to be more agile and nimble operationally.

 

Responsibilities

 

Ensure effective performance and 24x7 availability of all production systems.

Strong understanding of industry best practices for Site Reliability Engineering and ops automation

Proactively work to implement and improve automation of applications tasks

Knows system performance, testing, and programming; monitor, measure, and optimize system and application performance.

Work with other SRE leaders in setting the enterprise strategy for designing and developing resiliency in the application code

Working closely with Product Management and partner Sales and architect teams.

Track record of success in delivering quality products from concept to launch

Monitor alerts coming out of all Qualys platforms, and coordinate with Operations/SRE/DBRE/Engineering teams as necessary to take preventive or corrective action to resolve any incidents, with a goal to minimize MTTR.

Put in place and manage an effective on-call rotation within the team.

Work with engineering teams to set up proper monitoring and alerting thresholds across all Qualys services and applications so SRE team is focusing on key areas to stabilize the platforms .

Accountability for platform uptime SLAs.

 

Desired Skills

 

15 or more years of experience working in application support or Site Reliability Engineering.

Experience in a leadership role on a development or engineering team

Strong prior production operations experience leading a first responder incident management team for a high-traffic platform.

CI/CD pipelines to achieve the automation of software delivery process

Knowledge of the products and services regarding cloud platforms ; Strong skills to develop cloud solutions and deploy applications on cloud platforms.

Solid exposure to monitoring tools such as Prometheus, ELK, Kibana, AppDynamics, Splunk, Grafana, etc.

Very good experience on how to use Kubernetes , Jenkins , Terraform templates .

Very good experience on the capacity sizing of the applications .

Good experience in configuring and managing on-call and alerting platforms like PagerDuty, etc.

Comfortable working in a dynamic environment with ability to coordinate multiple tasks simultaneously.

Strong verbal and written communication skills are essential as are the ability to work in a disciplined manner and to remain composed under pressure.

Obtain and exhibit expert knowledge of Qualys’ infrastructure, monitoring, and its products and services

Coordinate with Incident management team to produce weekly reports and dashboards for various products to clearly showcase, backed by data, any areas of improvement that need to be taken up.

Must have a strong passion for continuous improvement.

Qualys
Qualys
Business Process Automation (BPA) Compliance Security Software

0 applies

2 views

Other Jobs from Qualys

Lead QA Engineer

Pune, India

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say