NVIDIA

Senior DevOps Engineer - AI Infrastructure

Shanghai, China Beijing, China
Deep Learning Kubernetes Docker Microservices Go Git Ansible Terraform AWS Python Bash
Search for More Jobs Talk to a recruiter now 💪
Description

We are now looking for a Senior DevOps Engineer - AI Infrastructure!

NVIDIA is hiring engineers to scale up its AI infrastructure. You will need to have strong programming skills, a deep understanding of cloud technologies, orchestration & automation systems, data centers and cloud architectures, as well as excellent communication and planning skills. You and other specialists in this team will help advance NVIDIA's capacity to build and deploy leading solutions for a broad range of AI-based applications such as autonomous vehicles, healthcare, virtual reality, graphics engines and visual computing.

This is an ambitious and exciting role in the AI Infrastructure Software team that gives you a chance to create and scale out a new product category. We are a dynamic, startup-like environment with strong focus on execution, flexibility and teamwork. We are looking for highly motivated software engineers who share our real passion for building phenomenal software.

NVIDIA is at the forefront of the DL and AI revolutions. Come join us as we craft the future of Artificial Intelligence on NVIDIA GPUs.

What you’ll be doing:

  • Collaborate with multiple AI product teams to understand their data and compute requirements (focusing on Autonomous Vehicle at this moment)

  • Build infrastructure and tools that will increase the productivity of teams developing AI-based systems (data close loop, labeling/training of deep learning, debugging/replay of Autonomous Vehicle issues, etc.)

  • Enable development team by providing automated build and test solutions in simulation environments using cloud computing, Kubernetes, Docker, and physical deep learning machines

  • Maintain version control schemas to track development, staging, and production code using git

  • Orchestrate create/delete/upgrade of live systems using maintenance windows, HA failover, and immutable infrastructure patterns

  • Work with multiple teams and domain experts to integrate multiple NVIDIA products into the CI workflow

  • Automate sophisticated tasks and improve the efficiency of functional automated tests

  • Be part of an on-call rotation to support production systems, respond to incidents promptly, conduct root cause analysis of outages and implement preventive measures.

What we need to see:

  • BS/MS with 4+ years of experience

  • Solid technical foundation in automation, cloud infrastructure and orchestration, including experience with at least one orchestration system (Kubernetes, Swarm, Mesos, Marathon, Aurora, etc)

  • Experienced with microservices and ETL jobs

  • You have experience with cloud automation tools (Ansible, Terraform, etc)

  • Excellent understanding of AWS: EC2, S3, RDS, ECS, CloudFront, VPC, or equivalents in Aliyun, Tencent Cloud, etc.

  • CI/CD: Jenkins, GitHub, GitLab, etc

  • Programming: Go, Python, Bash

  • Linux: Debian package management, Docker, systemd

  • Networking: Linux firewall, PXE, NFS, ZFS, CIFS

  • Understanding of observability instrumentation techniques and standard methodologies, including Prometheus, Grafana, OpenTelemetry, log system.

Ways to stand out from the crowd:

  • Phenomenal teammate, loves to work in a team environment

  • Worked in tier 1 Autonomous Vehicles companies, automating and accelerating the data driven development close loop for AV

  • Fluent English

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and talented people on the planet working for us. If you're creative and autonomous, we would like to hear from you!

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say