Siemens Digital Industries Software is a leading provider of solutions for the design, simulation, and manufacture of products across many different industries. Formula 1 cars, skyscrapers, ships, space exploration vehicles, and many of the objects we see in our daily lives are being conceived and manufactured using our Product Lifecycle Management (PLM) software.
The DISW SRE organization is dedicated to enhancing service and application availability, optimizing processes by automating manual and repetitive tasks, and addressing complex technical challenges in a dynamic, collaborative, inclusive, and iterative environment. This position plays a crucial role in developing automated solutions and processes that support and sustain best-in-class cloud-based applications.
Position Overview:
The candidate will support the Siemens Xcelerator platform and will be for coordinating major incident response, maintaining stakeholder communication during service-impacting events, and facilitating resolution in compliance with service level agreement (SLA). A strong relationship with the various product teams of the Xcelerator platform is necessary to support core objectives. This roles success will be defined by product teams within DISW business units meeting their SLAs.
Responsibilities:
- Incident Management: Act as the primary point of contact and leader during major incidents, coordinating the response, communication, and resolution efforts across all involved teams.
- Incident Response: Quickly assess the severity of incidents, determine the impact, and drive the appropriate response to restore services as quickly as possible.
- Communication: Ensure clear, concise, and timely communication with stakeholders, including technical teams, management, and customers, throughout the incident lifecycle.
- Post-Incident Analysis: Lead post-incident reviews to identify root causes, drive improvements, and implement preventive measures to reduce the likelihood of recurrence.
- Collaboration: Work closely with SRE, DevOps, Development, and other relevant teams to ensure that incident management processes are well-defined and continuously improved.
- Training & Preparedness: Conduct regular incident response drills, train teams on incident management processes, and ensure readiness for handling high-severity incidents.
- Documentation: Maintain and update incident management documentation, ensuring that all procedures are up-to-date and accessible to all relevant teams.
- Monitoring & Alerts: Collaborate with SRE and monitoring teams to define and refine alerting criteria, ensuring that incidents are detected and escalated promptly.
- Continuous Improvement: Identify opportunities to improve system reliability, scalability, and performance based on lessons learned from incidents.
- 24x7 On-call rotation: Participate in 24x7 on-call rotation.
Minimum Requirements
Required Knowledge/Skills, Education, and Experience:
- Driven Learner: Highly motivated and driven to learn new technologies, skillsets, and methodologies, continuously seeking to expand your knowledge and adapt to evolving industry trends.
- Leadership: Demonstrated experience in leading incident response efforts and managing cross-functional teams during critical situations.
- Technical Skills: Familiar with cloud infrastructure (AWS, GCP, Azure), containerization (Docker, Kubernetes), and monitoring tools (Prometheus, Datadog, etc.).
- Problem-Solving: Excellent troubleshooting and problem-solving skills, with the ability to quickly analyze complex systems and identify the root cause of issues.
- Communication: Outstanding communication skills, both verbal and written, with the ability to convey complex technical information to non-technical stakeholders.
- Calm Under Pressure: Ability to remain calm, focused, and effective in high-pressure situations.
Preferred Required Knowledge/Skills, Education, and Experience:
- Certifications: Relevant certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator) are a plus.
- Experience with Incident Command Systems (ICS): Familiarity with structured incident response frameworks, such as Incident Command Systems, is highly desirable.
- Automation: Experience with automation tools and scripting languages (e.g., Python, Bash) to streamline incident response and remediation.
- Culture of Learning: Passion for fostering a culture of learning and continuous improvement within the organization.
- Experience: Enterprise IT environment with distributed environments
We are an equal opportunity employer and value diversity at our company. We do not discriminate based on race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status.
Working at Siemens Software
Why us?
Working at Siemens Software means flexibility - Choosing between working at home and the office at other times is the norm here. We offer great benefits and rewards, as you'd expect from a world leader in industrial software.
#LI-PLM
#LI-Hybrid
#LI-PA1
Other Jobs from Siemens
System Engineer- Automation & Protection
Global Lead Architect (w/m/d) Service Software Online (SeSO)
Associate Test Engineer (Automation)
Software Developer Intern: Strategic Student Program
Senior Python Developer
Similar Jobs
Backend Engineer LMTS
Distributed Systems Software Engineer - Public Cloud (Mid/Senior/Lead/Principal)
Staff Cloud Engineer
Infrastructure Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say