NVIDIA

Senior SRE Software Engineer, Storage and Data

Taipei, Taiwan
Ansible Chef Kubernetes Git Bash Terraform Swift API Java Puppet AWS Go Docker Python
Description

SRE at NVIDIA ensures that our DGX Cloud platform continues to be reliable and performant to meet the needs of our users. You will play a critical role in ensuring the reliability, availability, and performance of storage infrastructures for NVIDIA DGX GPU cloud platforms. To collaborate with cross-functional teams to design, build, and maintain scalable and fault-tolerant storage solutions that support our mission-critical applications and services. Your expertise in storage systems and reliability engineering will be instrumental in minimizing downtime, improving system efficiency, and enhancing the overall user experience.

SRE is also a mindset and a set of engineering approaches to running efficient production systems, with a focus on eliminating manual work through modern automation practices and performance tuning. We promote self-direction to work on meaningful projects while striving to build an environment that provides the support and mentorship needed to learn and grow.

What You Will Be Doing:

  • Develop strategies to ensure the reliability and availability of storage systems, including redundancy, failover, and disaster recovery plans.

  • Continuously analyze and fine-tune storage systems for optimal performance, including throughput optimization, caching, and latency reduction. Identify and resolve performance bottlenecks to enhance overall system efficiency.

  • Develop and maintain automation scripts and tools to streamline storage provisioning, configuration, and maintenance tasks.

  • Implement monitoring and alerting systems to proactively identify and address issues.

  • Participate in on-call rotation to respond to storage-related incidents promptly conduct root cause analysis of outages and implement preventive measures.

  • Collaborate with cross-functional teams, including Compute SRE, development, and networking, to ensure seamless integration of large-scale storage solutions.

  • Work with AI/ML workloads to capture and correlate behavior in large clusters and workflows, which are otherwise hard to understand.

What We Need To See:

  • BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), with 5+ years equivalent practical experience.

  • Proven experience in storage system administration and site reliability engineering.

  • Experience with Git, RESTFul API, Linux service operation, networking, complexity analysis, AWS S3, software design, and maintaining large-scale Linux based systems.

  • Experience in one or more of the following languages: Ansible, Bash, Python, Go, YAML, Java

  • Good knowledge of infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform.

  • Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic(OpenSearch) stack, Grafana. 

Ways to stand out from the crowd:

  • Experience with storage solutions like: OpenStack Swift(object), AWS S3(object), DDN, Lustre.

  • Strong Linux and network troubleshooting skills by running various commands and tools.

  • Demonstrated experience in having an SRE mindset, customer-first approach, and focus on customer satisfaction and passion for ensuring customer success..

  • Interest in crafting, analyzing, and fixing large-scale distributed systems. Strong debugging skills with a systematic problem-solving approach to identify complex problems.

  • Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker.

With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the most desirable employers in the world. We have some of the most brilliant and talented people in the world working for us. If you are creative, autonomous and love a challenge, we want to hear from you. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

NVIDIA
NVIDIA
Artificial Intelligence (AI) GPU Hardware Software Virtual Reality

0 applies

21 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Sid avatar
Sid
Very nice portal for searching jobs in this rough market.
Mar 6, 2025
Michael Duran avatar
Michael Duran
Software Engineer
I've been using this job search site for a while now, and it’s honestly one of the best out there! The clean and easy-to-navigate UI makes the whole job-hunting process so much smoother. Plus, the job postings are always up-to-date, so I never feel like I’m wasting time. The cherry on top is the owner—super kind and always quick to respond. Definitely recommend checking it out if you're on the job hunt!
Aug 21, 2024
Sai avatar
Sai
It’s really great website for finding jobs based on skills it’s really helpful give a go
Aug 21, 2024
Adinadh avatar
Adinadh
What I like most about Echo Jobs is how easy it is to use. The platform helps me quickly find jobs that match my skills and interests, thanks to its great recommendations and filters. Yes, I would definitely recommend Echo Jobs to a friend. It makes job searching simple and efficient, making it a great tool for anyone looking for a new job.
Jul 23, 2024
As a student navigating the job market, I've found LinkedIn increasingly frustrating due to numerous fake postings by consultancies. In contrast, this job posting website has been a game-changer for me. It offers genuine opportunities and a straightforward application process, making it much easier to find and apply for real jobs. Highly recommend it to fellow students seeking reliable job listings!
Jul 16, 2024
Cliff Gor avatar
Echo Jobs has been exceptional in my job hunt where it provides one platform to job hunt and I don't have to open 10 websites just to look for a job. It has also helped me focus much on the job skill and the location filtering out the onsite jobs and remote ones. The only feature that I would request is to display fully remote jobs that are not restricted to a country since the one available shows ie, Remote, US yet. But if it could show remote only, that would be helpful not only to me but to other people applying for full remote and not tied to only US candidates
Apr 22, 2024
I found EchoJobs in 2022, and I love it. It has a lot of remote jobs. It's exclusive to software and technology jobs (helpful for devs like me). What I like the most are its filters and its API. If you're a tech professional seeking remote work, I highly recommend giving it a try to EchoJobs.
Mar 4, 2024
Would definitely recommend it! Excellent product, dedicated founder, Jobs are easier to find. Congrats 🎉 to the entire team!
Mar 3, 2024
Brandon Banks avatar
Brandon Banks
Echo Jobs is really impressive. It provides a great user experience with an ability to quickly search through the many job postings. There is an impressive amount of jobs here and it is quickly updated. The details in the each job posting is helpful when determining if it is worth pursuing. I would highly recommend using Echo Jobs to find the next step in your career.
Mar 2, 2024
Tyler Young avatar
Tyler Young
tylerayoung.com
Best wishes with EchoJobs—it's become my favorite job board overnight!
Dec 16, 2023
Simply put, it's the most up to date tech jobs aggregator I’ve found. I'm like... "I don't have to check 10+ jobs boards daily just to see if there's a new job listing? sign me up!" The filters are also quite helpful! The UI is very clean and straightforward. Love it!
Oct 5, 2023