NVIDIA is searching for a highly motivated DevOps engineer for the NVIDIA NMX team that is building a next gen Network management and Telemetry system in cloud and on-prem using modern design principles at internet scale. NVIDIA NMX is a highly scalable, modern network operations toolset that provides visibility, troubleshooting, validation and telemetry of NVLink/NVSwitch InfiniBand and Ethernet fabrics in real time. NMX utilizes telemetry and delivers actionable insights about the health of a data center network, integrating the fabric into the DevOps ecosystem.
What you'll be doing:
The person will be part of the NVIDIA NMX team that is building the SaaS platform and the on-premise solution for network management and telemetry.
The responsibility specifically is for Devops, infrastructure and Site Reliability Engineering (SRE) requirements for NMX.
Focus on efficiency by automating repetitive workflows.
Working on microservices based architecture.
Deploying and troubleshooting non-disruptive cloud operations with an emphasis on secure production infrastructure.
Continuous evaluation of existing system and driving improvements.
Managing deployment/upgrade for Operating Systems, Kubernetes(k8s) clusters and/or or other orchestration tools.
Day to day support for engineering activities with CI/CD tools like git, jenkins.
Efficiently multi-tasking on the different tracks to efficiently address evolving priorities .
What we need to see:
5+ years of experience in complex microservices based architectures
Highly skilled in Kubernetes and Docker
Having good programing background in one high level language like Golang or python or equivalent experience
Strong knowledge of NoSQL DB (e.g. MongoDB), Kafka/Kafka Streams.
Experienced with modern deployment architecture for non-disruptive cloud operations including blue green and canary rollouts
Infrastructure as code (IaC) skills in frameworks like Ansible & Terraform
Expert in AWS
Knows best practices and discipline of managing and monitoring a highly available and secure production infrastructure
Ways to stand out from the crowd:
Skills in Linux/Unix Administration
Experience with Prometheus/Grafana.
Experience with APM tools like Dynatrace, Datadog, AppDynamics, New Relic, etc.
Implemented highly scalable log aggregation systems in past using ELK stack or similar
Implemented robust metrics collection and alerting infrastructure
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you! NVIDIA is leading the way in ground-breaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.
Other Jobs from NVIDIA
Senior CPU Verification Engineer
Senior CPU Verification Engineer
Senior Verification Engineer - Memory Subsystem
Senior Verification Engineer - Memory Subsystem
Similar Jobs
DevOps Platform Engineer – Software Product Engineering
Senior Site Reliability Engineer
Senior Site Reliability Engineer
Principal Site Reliability Engineer
Senior Infra Automation Engineer
Senior DevOps Engineer - AI Infrastructure
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say