OpenAI

Software Engineer, Fleet Hardware Health

San Francisco, CA
USD 325k - 490k
SQL Pandas Python Go
Description

About the team

The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI’s models to operate seamlessly at scale, supporting both internal research and external products like ChatGPT. We prioritize safety, reliability, and responsible AI deployment over unchecked growth.

About the role

As a software engineer on the Fleet Hardware team, you will be responsible for the reliability and uptime of all of OpenAI’s compute fleet.  Minimizing hardware failure is key to research training progress and stable services, as even a single hardware hiccup can cause significant disruptions. With increasingly large supercomputers, the stakes continue to rise.

Being at the forefront of technology means that we are often the pioneers in troubleshooting these state-of-the-art systems at scale. This is a unique opportunity to work with cutting-edge technologies and devise innovative solutions to maintain the health and efficiency of our supercomputing infrastructure.

Our team empowers strong engineers with a high degree of autonomy and ownership, as well as ability to effect change. This role will require a keen focus on system-level comprehensive investigations and the development of automated solutions. We want people who go deep on problems, investigate as thoroughly as possible, and build automation for detection and remediation at scale.

In this role, you will:

  • Build and maintain automation systems for provisioning and managing server fleets.

  • Develop tools to monitor server health, performance, and lifecycle events.

  • Collaborate with clusters, networking, and infrastructure teams.

  • Partner with external operators to ensure a high level of quality.

  • Identify and fix performance bottlenecks and inefficiencies.

  • Continuously improve automation to reduce manual work.

You might thrive in this role if you have:

  • Experience managing large-scale server environments.

  • A balance of strengths in building and operationalizing.

  • Proficiency in Python, Go, or similar languages.

  • Strong Linux, networking, and server hardware knowledge.

  • Comfort digging into noisy data with SQL, PromQL, and Pandas or any other tool.

Prior hardware expertise is not required for this role.

Bonus Skills:

  • Experience with low level details of hardware components, protocols, and associated Linux tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning)

  • Knowledge of hardware management protocols (e.g., IPMI, Redfish).

  • High-performance computing (HPC) or distributed systems experience.

  • Prior experience developing, managing, or designing hardware.

  • Familiarity with monitoring tools (e.g., Prometheus, Grafana).

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. 

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. 

OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

OpenAI
OpenAI
Artificial Intelligence (AI) Generative AI Machine Learning Natural Language Processing Software

0 applies

7 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Sid avatar
Sid
Very nice portal for searching jobs in this rough market.
Mar 6, 2025
Michael Duran avatar
Michael Duran
Software Engineer
I've been using this job search site for a while now, and it’s honestly one of the best out there! The clean and easy-to-navigate UI makes the whole job-hunting process so much smoother. Plus, the job postings are always up-to-date, so I never feel like I’m wasting time. The cherry on top is the owner—super kind and always quick to respond. Definitely recommend checking it out if you're on the job hunt!
Aug 21, 2024
Sai avatar
Sai
It’s really great website for finding jobs based on skills it’s really helpful give a go
Aug 21, 2024
Adinadh avatar
Adinadh
What I like most about Echo Jobs is how easy it is to use. The platform helps me quickly find jobs that match my skills and interests, thanks to its great recommendations and filters. Yes, I would definitely recommend Echo Jobs to a friend. It makes job searching simple and efficient, making it a great tool for anyone looking for a new job.
Jul 23, 2024
As a student navigating the job market, I've found LinkedIn increasingly frustrating due to numerous fake postings by consultancies. In contrast, this job posting website has been a game-changer for me. It offers genuine opportunities and a straightforward application process, making it much easier to find and apply for real jobs. Highly recommend it to fellow students seeking reliable job listings!
Jul 16, 2024
Cliff Gor avatar
Echo Jobs has been exceptional in my job hunt where it provides one platform to job hunt and I don't have to open 10 websites just to look for a job. It has also helped me focus much on the job skill and the location filtering out the onsite jobs and remote ones. The only feature that I would request is to display fully remote jobs that are not restricted to a country since the one available shows ie, Remote, US yet. But if it could show remote only, that would be helpful not only to me but to other people applying for full remote and not tied to only US candidates
Apr 22, 2024
I found EchoJobs in 2022, and I love it. It has a lot of remote jobs. It's exclusive to software and technology jobs (helpful for devs like me). What I like the most are its filters and its API. If you're a tech professional seeking remote work, I highly recommend giving it a try to EchoJobs.
Mar 4, 2024
Would definitely recommend it! Excellent product, dedicated founder, Jobs are easier to find. Congrats 🎉 to the entire team!
Mar 3, 2024
Brandon Banks avatar
Brandon Banks
Echo Jobs is really impressive. It provides a great user experience with an ability to quickly search through the many job postings. There is an impressive amount of jobs here and it is quickly updated. The details in the each job posting is helpful when determining if it is worth pursuing. I would highly recommend using Echo Jobs to find the next step in your career.
Mar 2, 2024
Tyler Young avatar
Tyler Young
tylerayoung.com
Best wishes with EchoJobs—it's become my favorite job board overnight!
Dec 16, 2023
Simply put, it's the most up to date tech jobs aggregator I’ve found. I'm like... "I don't have to check 10+ jobs boards daily just to see if there's a new job listing? sign me up!" The filters are also quite helpful! The UI is very clean and straightforward. Love it!
Oct 5, 2023