DayOne

Reliability Director

Singapore
Electrical Engineering Mechanical Engineering
Description

Reliability Director

Location: Singapore

Time Type: Full time

Job Description

Join DayOne – Shaping the Future of Data Infrastructure

DayOne is a global leader in the development and operation of high-performance data centers. As one of the fastest-growing companies in the industry, we’ve built a robust presence across Asia and Europe — and we’re just getting started.

As we expand into new international markets, we’re looking for talented, driven individuals to join us on this exciting journey. This is more than a job — it’s an opportunity to be a key contributor to our dynamic team and help shape the future of global data infrastructure.

If you're passionate about innovation, technology, and growth, we invite you to be part of DayOne’s next chapter.

Position Overview

The Reliability Director is responsible for governing infrastructure reliability and systemic technical risk across the global data centre portfolio. The role leads equipment performance monitoring, failure analysis, and reliability improvement initiatives across electrical, mechanical, and control systems.

The Reliability Director establishes portfolio-wide reliability frameworks, monitors critical equipment performance, investigates failure mechanisms, and ensures reliability risks are proactively identified and mitigated. The role works closely with GERA engineering authorities, DCO teams, design teams, and vendors to ensure mission‑critical infrastructure reliability is maintained across all campuses.

Key Responsibilities

Portfolio Reliability Governance

  • Establish and maintain the global reliability governance framework across the data centre portfolio.
  • Maintain and manage the Global Systemic Risk Register.
  • Identify systemic infrastructure risks across campuses and define mitigation strategies.
  • Ensure reliability practices are consistently applied across all sites.

Equipment Performance Monitoring

  • Establish monitoring frameworks for critical infrastructure equipment performance.
  • Analyse operating data from electrical and mechanical systems to identify degradation trends.
  • Monitor redundancy utilisation, abnormal operating conditions, and reliability indicators.
  • Identify early warning signals for potential equipment failures.

Failure Analysis & Root Cause Investigation

  • Lead structured Root Cause Analysis (RCA) for major infrastructure incidents.
  • Perform failure mode analysis using fault tree and event chain methodologies.
  • Identify recurring failure mechanisms across sites.
  • Ensure lessons learned from failures are captured and shared across the organisation.

Reliability Risk Assessment

  • Assess reliability risks associated with infrastructure design, operations, and vendor equipment.
  • Evaluate cross-campus failure exposure and correlated infrastructure vulnerabilities.
  • Provide technical recommendations to mitigate systemic reliability risks.

Reliability Data Analytics & Reporting

  • Develop reliability performance dashboards and trend analysis.
  • Monitor incident frequency, failure severity, and infrastructure performance trends.
  • Provide reliability reports and insights to engineering and executive leadership.

Reliability Improvement Programs

  • Lead initiatives to improve infrastructure resilience and reliability.
  • Identify reliability improvement opportunities across systems and sites.
  • Validate effectiveness of remediation and reliability improvement actions.

Lifecycle & Obsolescence Management

  • Define strategies for managing aging infrastructure and equipment lifecycle risks.
  • Assess replacement versus life‑extension strategies for critical infrastructure systems.
  • Support long-term infrastructure planning from a reliability perspective.

Vendor & OEM Reliability Oversight

  • Evaluate vendor reliability performance and technical claims.
  • Assess OEM failure rates and design-related reliability exposure.
  • Collaborate with vendors to address systemic equipment reliability issues.

Cross-Team Collaboration

  • Work closely with DCO teams, Engineering Authorities, Design Teams, and external vendors.
  • Support reliability engineering input into ROCC and MCR operational frameworks.
  • Promote knowledge sharing and reliability best practices across the organisation.

Candidate Requirements

  • Bachelor’s degree in Engineering (Electrical, Mechanical, or related discipline).
  • 10+ years’ experience in mission‑critical infrastructure
  • Strong experience in equipment reliability analysis, failure investigation, and RCA
  • Deep understanding of electrical and mechanical systems in data centre environments.
  • Proven ability to identify systemic reliability risks and implement mitigation strategies.
  • Strong analytical and problem‑solving capabilities.
  • Experience working across multiple sites or regional infrastructure portfolios preferred.
  • Excellent communication and stakeholder management skills.
  • Willingness to travel across regional sites when required.

DayOne is proud to be an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

If you're ready to grow with one of the fastest-moving companies in the data center industry, apply now and be part of our global journey.

DayOne
DayOne

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say