About HashiCorp
HashiCorp solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications. Our products enable companies large and small to mix and match AWS, Microsoft Azure, Google Cloud, and other clouds as well as on-premises environments, easing their ability to deliver new applications.
We use the Tao of HashiCorp as our guiding principles for product development and operate according to a strong set of company principles for how we interact with each other. We value top-notch collaboration and communication skills, both among internal teams and in how we interact with our users.
Our Team
The HashiCorp Incident Excellence team is responsible for improving HashiCorp’s incident response while maximizing learning from incidents. Our focus is on helping all engineers feel confident when they are on-call and improving communication to efficiently resolve incidents and build trust in our brand. We partner closely with teams to drive a holistic incident management strategy and share learnings to help our business continuously improve.
About this Role
This engineering role is on a nascent engineering team. The team is responsible for products that touch many areas of engineering organizations at HashiCorp, so applicants will need to excel at collaboration, have product-focused mindsets, and be comfortable iterating in an agile manner towards solutions.
You will provide expert execution of the incident command process, including running and managing high-severity incident bridges and driving transparent communication that promotes maximum levels of internal and external customer satisfaction.
Collaborate with an array of technical stakeholders and executives to drive resolution during incidents and improve overall response for future incidents and technical escalations.
Utilize top-notch troubleshooting techniques to identify, organize, and advocate for novel solutions to remediate customer impact on complex interconnected systems.
Participate in a closed-loop post-incident learning process driving insights and meaningful action
Iterative improvements in response through consistent drills, tabletops, and game-day exercises
Push the boundaries of innovation in incident management to deliver best-in-class incident response.
In this role, you can expect to:
- Be responsible for and drive incident management capabilities and culture.
- Contribute to incident command on-call
- Build technical skills and relationships within a team of engineers and SREs.
- Lead and refine our incident response strategy, ensuring rapid and effective response to operational disruptions.
- Analyze incident trends and root causes to drive continuous improvements in system reliability and response processes.
- Develop and maintain tools for incident detection, analysis, and resolution, automating responses where possible to minimize human intervention.
- Create comprehensive incident response documentation and conduct training sessions to prepare all relevant teams for effective incident handling.
- Work closely with development, operations, and security teams to coordinate incident response efforts and post-incident analyses.
You may be a good fit for our team if:
- Minimum 5 years of experience in site reliability engineering, systems administration, or software engineering, with a significant focus on incident response and operational reliability.
- experiene in managing, coordinating, and ensuring resolution of major incidents.
- Professional experience with incident management in cloud environments.
- Enjoy working on a variety of scopes spanning software engineering, cloud infrastructure, and SRE.
- Proven track record of managing and resolving incidents in cloud-based environments, with expertise in major public cloud platforms (AWS, GCP, Azure).
- Understanding of fundamental network technologies like DNS, Load Balancing, SSL, TCP/IP, HTTP
- Strong understanding of monitoring and alerting systems, with the ability to develop metrics and alarms that accurately reflect system health and operational risks.
- Experience with incident management tools and practices, including post-mortem analysis and root cause investigation.
- Passion for consistently responding to and leading complex incidents in a 24x7x365 environment utilizing a globalized follow-the-sun model.
- Customer-centric attitude with a focus on providing best-in-class incident response for customers and stakeholders
- Familiarity with HashiCorp’s product suite and infrastructure automation tools is a plus.
- Demonstrate strong leadership skills during periods of significant business impact, remaining calm and professional during high-pressure situations
- A strong desire to drive customer success with partner teams and management on high-profile issues critical to the long-term success of the business
- Outstanding verbal and written communication skills with the ability to convey information in a meaningful way to both engineers and executive-level management, during and outside of incidents
- Adaptable to a wide variety of technologies and capable of incident response and troubleshooting activities in complex interconnected environments #LI-Hybrid
“HashiCorp is an IBM subsidiary which has been acquired by IBM and will be integrated into the IBM organization. HashiCorp will be the hiring entity. By proceeding with this application you understand that HashiCorp will share your personal information with other IBM subsidiaries involved in your recruitment process, wherever these are located. More information on how IBM protects your personal information, including the safeguards in case of cross-border data transfer, are available here: link to IBM privacy statement.”

0 applies
9 views
Other Jobs from HashiCorp
Sr. Engineer - Scale & Performance Engineering (Hybrid)
Sr. Engineer II - Scale & Performance Engineering (Hybrid)
Sr. Engineer II - Hashicorp Cloud DR (Hybrid)
Sr. Engineer - Hashicorp Cloud DR (Hybrid)
Engineer II - Scale & Performance Engineering (Hybrid)
Engineer II - Hashicorp Cloud DR (Hybrid)
Similar Jobs
Senior Director, Enterprise Architect
Senior Software Engineer I- Full stack
DreamWorks Technology - Sr. Platform Engineer I
Business Technology Data Engineer
Associate Engineer - Backend (ROR)
AI/NLP/Data Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say