Description

Job Posting Title:

Senior Systems Reliability Operations Engineer

Req ID:

10058991

Job Description:

The Disney Technology Operations Command Center (DTOC) is a 24x7x365 mission-critical services operation center responsible for service availability, with primary focus to rapidly respond to, correlate for, and reduce impact of outages. We are accountable for identifying and facilitating the resolution of service impacting events, and collaborating with other technology teams to prevent future impact through proactive event management, incident and problem analysis. The DTOC drives the execution of the major incident process including communication to executives and key stakeholders. The DTOC owns and executes the IT Emergency Operations Center Crisis Management plan and process, with responsibility for maturing the plan and its integration into the overall Corporate Crisis Management and TWDC programs. The DTOC also provides ongoing first and second-level technical support of requests, performs validation procedures for routine system/service checks, and fulfills proactive monitoring with communication for HyperCare of significant business events.

The SRO Engineer will provide operational oversight and technical leadership and is responsible for monitoring, identifying, and coordinating with other technologists across segments to fine-tune system operations rallying to resolve service interruptions. This role is responsible for the end-to-end reliability and operations of IT services and performing consultations and training to other clients and segments within TWDC.

The SRO Engineer will examine IT systems for defects and communicate maintenance schedules and critical events across the company.

Working with Engineers and Analysts at all levels and the SRO will interact with computer and software engineers, quality control specialists, infrastructure service leads, segment technologists, and others to ensure service availability, increase efficiency, and establish best practices for the execution and continuous improvement of the Event, Incident, Major Incident, Crisis Management, Hypercare execution, and Problem Management processes within the DTOC.

Additionally this position will drive service improvement initiatives through proactive monitoring and enhancement actions from gaps identified through analytics and problem management. The SRO engineer is an active member of the DTOC service team focused on Operations, but ensuring the operations sustainability by contributing to the development, testing, and evaluation of services supported.

Leverage partnerships with the Business, Customer base and the Suppliers to successfully deliver services to meet agreed upon expectations. Provides 24x7x365 first point-of-contact for centralized incident response and recovery that consistently and reliably triages reported or automated incidents, applies recovery procedures, and engages domain experts to restore steady-state operations; provides all core services on a priority basis and with dedicated support to ensure the success of critical events.

Technology Focus

Carries and maintains a relevant and up to date skill set in the areas of x86 hardware technology, Windows, Linux, RISC operating systems, P-Series hardware, SAN, NAS and data protection technologies.
Must have a working knowledge of relevant WAN/LAN technologies, wireless infrastructure, DNS/DHCP, Load-Balancers, WAN Accelerators, and other network technologies.
Implement and maintain technology observability and alerting solutions to provide real-time insights into system health, performance, and compliance.
Establish and maintain service technology level objectives (SLOs) and service level indicators (SLIs) for critical enterprise services.
Monitor and manage the performance and availability of enterprise applications, systems, and infrastructure, ensuring they meet or exceed established service level objectives (SLOs).
Proactively identify, diagnose, troubleshoot, and resolve infrastructure, application, and IT operations issues in collaboration with other IT support teams.
Develop, implement, and maintain automation tools and scripts to improve the efficiency and reliability of IT operations and infrastructure.
Seasoned technologist whom will identify technology and operational challenges in solutions and products offered by Architecture and Engineering teams as well as outside vendors and OEMs.
In partnership and cooperation with the architecture and engineering teams – ensures that products currently in ideation and development are being engineered with long term operational sustainment goals in mind.
Must have a solid understanding of Internet technologies and availability strategies for digital platforms.
Must be familiar with complex network topics and availability approaches in an effort to drive performance from all network operations center functions.

Responsibilities

Drive the efficiency and effectiveness of the Event, Incident, Major Incident, Request Fulfillment and Problem Management processes
Experience in enterprise IT operations, including system administration, application platforms, infrastructure, networking fundamentals, and IT service management.
Strong understanding of Windows, Linux/Unix operating systems, networking platforms & concepts.
Proficiency in one or more scripting languages (e.g., Python, Bash, Ruby) and automation tools (e.g., Python, PowerShell).
Solid understanding of observability, monitoring and alerting tools (e.g., Splunk, New Relic, Grafana, ELK Stack, Datadog).
Familiarity with modern operations support methodologies and practices, such as Site Reliability Engineering (SRE).
Strong technology problem-solving and analytical skills, with the ability to quickly diagnose and resolve complex technical issues.
Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
Identify service improvement opportunities through trend analysis, proactive techniques, and after-action reviews
Analyze and publish operational utilization and service performance metrics regularly
Identify and drive service availability improvement opportunities by executing leading practices
Ensure that all DTOC services are designed to deliver the levels of availability required by the business, and validate of the final design to meet the minimum levels of availability as agreed by the business for IT services
Elevate any service gaps proactively with leadership.
Participate in creating, maintaining, and regularly reviewing department procedures, operational readiness plans and posture, aimed at improving the overall availability of IT services and infrastructure components, to ensure that existing and future business availability requirements can be met. This includes compiling daily operational reports and facilitation of operational readiness calls.
Ensure the DTOC is effectively monitoring available tools and systems for high availability and swift response to potential and actual outage situations.
Perform as the incident commander on service outage calls, orchestrating recovery activities of DTOC and other technology teams to drive fast restoration of service without added risk to the organization, providing command and control of the call
Effectively apply Incident Analysis and Problem Analysis technique during an incident and post-incident and ensure staff apply the same
During outage situations consistently provide Situation Reports in a timely fashion, ensure work streams toward resolution are clearly articulated following department procedures, and business impacts are obtained and all communicated
Manage and provide the technical direction of the team to ensure 100% on-site coverage required to effectively support incidents, service requests, proactive health checks and HyperCare services
Perform DR/BCP activities for critical events and emergency onsite response.

Strategy

Responsible for influencing and socializing DTOC solutions, practices, roles, responsibilities, and processes
Responsible for influencing and socializing Operational service gaps to Engineering for capability enhancements.
Participate in creating, maintaining, and regular reviews targeting the overall readiness of services for existing and future business needs, including Operational Readiness Reviews (ORR)
Contribute to the development and sustainment of an enterprise level incident, event, and availability management strategy
Participate in the development and governance of service level agreements.

Qualifications:

BA/BS in Computer Science, Engineering or related field. Equivalent work experience within large IT Operations organizations would be considered in lieu of degree.

Master’s in IT Systems or Business Administration (MBA) or MS in technical discipline.

Work Experience:

5+ years experience supporting converged infrastructure stacks, including: application, compute, storage and networking
5+ years leading incident recovery with multi-disciplined geographically dispersed teams in a Fortune 500 organization
3+ years of experience in either a large IT shared services organization or outsourced environment
Experience leading technical recovery of major incidents for Fortune 500 organization
Experience with hands-on support of cloud operations with one or more: AWS, Google Cloud or Azure
Experience supporting diverse portfolios, multiple business applications and IT services
Experience working in a 24x7 IT operations environment.
Demonstrated experience with Service and Event Management tools.
Demonstrated experience in systems integration, application infrastructure support and middleware operations.
Demonstrates management skills, both from a resource management perspective and from the overall control of a process
Proven experience and understanding of root cause analysis techniques
Proven ability to be detail, deadline, and results-oriented
Strong leadership skills with the ability to motivate and encourage others
Ability to manage competing priorities and workflow
Solid interpersonal skills for
written, oral, and face to face communications
Practical experience with influence and negotiation methods and techniques
Ability to serve as mentor and coach
Strong customer service orientation, seeking opportunities to serve clients.

Skills / Specialized Knowledge/ Competencies

IT Automation and scripting in languages such as Python and/or PowerShell
Experience with ITIL frameworks and processes
Experience working within large, complex production teams
Experience working within an outsourced environment
Vendor relationship management experience
Comfortable working within a highly matrixed organization
Strong technology driven process experience
Ability to work under pressure, meet internal and external work schedules and or deadlines and show effective time and crisis management skills
Expertise in supporting large-scale environments in a diverse culture
Demonstrated attentiveness to detail
Demonstrated strong partnering skills
Demonstrated proactive problem-solving and decision making skills
Demonstrated ability to delivery work on time
Proven team player with the ability to mentor, guide, and influence cross-functional teams
ITIL v3 Certification Preferred

Job Posting Segment:

Corporate

Job Posting Primary Business:

Corp All

Primary Job Posting Category:

Site/System Reliability Engineer

Employment Type:

Full time

Primary City, State, Region, Postal Code:

Mumbai, India

Alternate City, State, Region, Postal Code:

Date Posted:

2023-09-28

Disney

Digital Media E-Commerce Media and Entertainment Multi-level Marketing Performing Arts Digital Media E-Commerce Media and Entertainment Multi-level Marketing Performing Arts Employment Media and Entertainment Personal Development

0 applies

12 views

Other Jobs from Disney

Data Analyst, VX Analytics

Remote Santa Monica, CA

Lead Data Engineer

Ontario US

Director, Machine Learning Engineering

Remote Seattle, WA

Software Engineer II

Remote San Francisco, CA

Sr Data Analyst

Remote Santa Monica, CA

Disney Consumer Products Product Design Intern, Summer/Fall 2025

Remote US

Similar Jobs

Senior Software Engineer, Infrastructure

Remote New York, NY

SecOps Engineer Lead, New York

New York, NY Remote Hybrid

Lead DevOps Engineer, New York

New York, NY Remote Hybrid

Data Analyst, GTM Data Infrastructure

Remote San Francisco, CA

Systems Integration Engineer

Sunnyvale, CA US

Full-stack Software Engineer, Wireless – Ecublens

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Sid

Very nice portal for searching jobs in this rough market.

Mar 6, 2025

Michael Duran

Software Engineer

I've been using this job search site for a while now, and it’s honestly one of the best out there! The clean and easy-to-navigate UI makes the whole job-hunting process so much smoother. Plus, the job postings are always up-to-date, so I never feel like I’m wasting time. The cherry on top is the owner—super kind and always quick to respond. Definitely recommend checking it out if you're on the job hunt!

Aug 21, 2024

Sai

It’s really great website for finding jobs based on skills it’s really helpful give a go

Aug 21, 2024

Adinadh

What I like most about Echo Jobs is how easy it is to use. The platform helps me quickly find jobs that match my skills and interests, thanks to its great recommendations and filters. Yes, I would definitely recommend Echo Jobs to a friend. It makes job searching simple and efficient, making it a great tool for anyone looking for a new job.

Jul 23, 2024

Rahim

Software Engineer

As a student navigating the job market, I've found LinkedIn increasingly frustrating due to numerous fake postings by consultancies. In contrast, this job posting website has been a game-changer for me. It offers genuine opportunities and a straightforward application process, making it much easier to find and apply for real jobs. Highly recommend it to fellow students seeking reliable job listings!

Jul 16, 2024

Cliff Gor

Software Engineer

Echo Jobs has been exceptional in my job hunt where it provides one platform to job hunt and I don't have to open 10 websites just to look for a job. It has also helped me focus much on the job skill and the location filtering out the onsite jobs and remote ones. The only feature that I would request is to display fully remote jobs that are not restricted to a country since the one available shows ie, Remote, US yet. But if it could show remote only, that would be helpful not only to me but to other people applying for full remote and not tied to only US candidates

Apr 22, 2024

Mauro

Software Engineer

I found EchoJobs in 2022, and I love it. It has a lot of remote jobs. It's exclusive to software and technology jobs (helpful for devs like me). What I like the most are its filters and its API. If you're a tech professional seeking remote work, I highly recommend giving it a try to EchoJobs.

Mar 4, 2024

Kenn Kibadi

Founder & Product Engineer @ EarlyAccessHQ.com

Would definitely recommend it! Excellent product, dedicated founder, Jobs are easier to find. Congrats 🎉 to the entire team!

Mar 3, 2024

Brandon Banks

Echo Jobs is really impressive. It provides a great user experience with an ability to quickly search through the many job postings. There is an impressive amount of jobs here and it is quickly updated. The details in the each job posting is helpful when determining if it is worth pursuing. I would highly recommend using Echo Jobs to find the next step in your career.

Mar 2, 2024

Tyler Young

tylerayoung.com

Best wishes with EchoJobs—it's become my favorite job board overnight!

Dec 16, 2023

Gabriel

Remote Job Seeker

Simply put, it's the most up to date tech jobs aggregator I’ve found. I'm like... "I don't have to check 10+ jobs boards daily just to see if there's a new job listing? sign me up!" The filters are also quite helpful! The UI is very clean and straightforward. Love it!

Oct 5, 2023

Collect testimonials with Senja