NetApp

Site Reliability Engineer - Incident Manager

Remote US
USD 137k - 193k
AWS Azure GCP
Search for More Jobs Talk to a recruiter now 💪
This job is closed! Check out or
Description

About NetApp

We’re forward-thinking technology people with heart. We make our own rules, drive our own opportunities, and try to approach every challenge with fresh eyes. Of course, we can’t do it alone. We know when to ask for help, collaborate with others, and partner with smart people. We embrace diversity and openness because it’s in our DNA. We push limits and reward great ideas. What is your great idea?

"At NetApp, we fully embrace and advance a diverse, inclusive global workforce with a culture of belonging that leverages the backgrounds and perspectives of all employees, customers, partners, and communities to foster a higher performing organization." -George Kurian, CEO

Job Summary

At NetApp, we have an amazing opportunity to transform industries with cutting-edge data services. We are developing a broad portfolio of solutions that harness the power of data. NetApp public/private cloud offerings, high performance & highly scalable storage products are stretching what’s possible for our customers & partners. These solutions change how data is stored, consumed & interpreted unleashing innovation. Join our team & push the boundaries of what’s possible.

The SRE team within NetApp Public Cloud Service group is responsible for the scaling and support of our multi-region, multi-cloud application. This team is made up of a group of software engineers, site reliability engineers, and security experts that own the deployment architecture and strive to improve our infrastructure through deep partnership with other teams across the organization.

We are a customer focused team continuously improving our services to meet their needs and reduce toil on our team. We meet regularly to mentor and challenge each other as we collaborate across projects. We're only able to accomplish our mission through diverse teams innovating, together.

As an SRE Incident Manager, you will work in a command-and-control role focusing on uptime and Time to Recovery (TTR). To fulfill this role, you will collaborate with multiple NetApp teams to bring about safe and rapid mitigation of incidents that impact customers on a global scale. You will partner with Site Reliability Engineers (SREs) and lead by example by being more of a contributor than a delegator.

Job Requirements

  • Define and refine incident management, change management and problem management-related workflows.
  • Reduce operational inefficiencies in the incident management process to ensure the fastest path to SREs through automation and continuous process improvement. Identify when escalation is required and trigger such escalation accordingly.
  • Manage proactive notification for planned events and broad communication associated with critical incidents. This requires composure under pressure, broad analytical, and problem-solving expertise, and the ability to confidently collaborate with varied partners. These skills would be applied in producing both written and verbal communication to update customers, partners, and senior leadership.
  • Create and maintain recovery playbooks for commonly occurring customer patterns and issues.
  • Drive down resolution times by improving alert coverage and accuracy. Deflect customer incident submission by promoting supportability tools (e.g. documentation, self-service workflows).
  • Lead after action reviews and root cause analysis. Complete postmortems on a timely basis that identify repair items preventing future customer impact. Ensure resolution of product/service defects, process improvements and documentation enhancement to address live site or customer reported incidents.
  • Present monthly incident availability and operability metrics to cross functional leadership teams. Build dashboards to provide insights and visibility into critical business metrics for a variety of audiences.
  • You must be able to work outside of normal business hours (weekend shifts, holidays, & evenings) as needed.

Education & Experience

  • Typically requires a minimum 3 years of related experience with a bachelor’s degree; or 2 years and a master’s degree; or equivalent work experience.
  • Experience managing incidents and running incident management programs, preferably in large-scale environments.
  • Experience working with service owners running a DevOps team in public cloud platforms such as AWS, Azure, or Google Cloud is big plus.
  • A basic understanding of public cloud vendors such as AWS, Azure, Google Cloud, or others.
  • Outstanding communication and presentation skills, written and verbal. Excellent listening skills and a high degree of empathy.
  • You are great at solving problems, sorting meaningful information from noise, and taking action.

Equal Opportunity Employer:

NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, protected veteran status, and any other protected classification. 

Did you know…
Statistics show women apply to jobs only when they’re 100% qualified. But no one is 100% qualified. We encourage you to shift the trend and apply anyway! We look forward to hearing from you.

Why NetApp?

In a world full of generalists, NetApp is a specialist. No one knows how to elevate the world’s biggest clouds like NetApp. We are data-driven and empowered to innovate. Trust, integrity, and teamwork all combine to make a difference for our customers, partners, and communities. 

We expect a healthy work-life balance. Our volunteer time off program is best in class, offering employees 40 hours of paid time off per year to volunteer with their favorite organizations.  We provide comprehensive medical, dental, wellness, and vision plans for you and your family.  We offer educational assistance, legal services, and access to discounts. We also offer financial savings programs to help you plan for your future.  

If you run toward knowledge and problem-solving, join us. 

USA and Canada Residents Only:

The base salary hiring wage range for this position which the Company reasonably and in good faith expects to pay for the position in the specified geographic areas or locations, is $137,300 - 193,100. Final compensation will be dependent on various factors relevant to the position and candidate such as geographical location, candidate qualifications, certifications, relevant job-related work experience, education, skillset and other relevant business and organizational factors, consistent with applicable law.  In addition, the position may include some of the following comprehensive benefits such Medical, Dental, Vision, Life, 401(K), Paid Time off (PTO), sick time, leave of absence as per the FMLA and other relevant leave laws, Company bonus/commission, employee stock purchase plan, and/or restricted stocks (RSU’s).

Apply

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 307 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers