NVIDIA is the world leader in GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC, datacenters and networking in addition to our traditional OEM business. NVIDIA is also well positioned as the ‘AI Computing Company’, and NVIDIA GPUs are the brains powering Deep Learning software frameworks, analytics, data centers, and driving autonomous vehicles. We have some of the most experienced and dedicated people in the world working for us. If you are dedicated, forward-thinking, and hard-working technical people across countries sounds exciting, this job is for you. NVIDIA is looking for an outstanding individual who thrives in a diverse work environment, has outstanding interpersonal skills and possesses a strong sense of engagement and continuous process improvement. This candidate must have enterprise server integration, strong Linux experience, reliability testing with various telemetries, scale out cluster, test plan development, track record in developing AI tools and NLP, DevOps, CI/CD experience to join our platform SWQA team.
What you’ll be doing:
Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
Installing and testing various systems OS, server firmware and SW stack.
Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
Build, develop/debug server and OS level automation front-end and back-end framework and tests
Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
Work in an agile software development team with very high production quality standards.
Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
What we need to see:
Bachelor’s Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
5+ years proven experience; or master’s degree.
Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc…) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc…), NLP and LLM benchmarking
Experience in using AI development tools for test plans creation, test cases development and test cases automation
Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) – huge plus
Ways to stand out from the crowd:
AI related tools, LLM and NLP.
Experience working with NVIDIA GPU hardware is a strong plus.
Good to have solid understanding of virtualization in Linux (KVM, Docker orchestrated with Kubernetes)
Background in parallel programming ideally CUDA/OpenCL is a plus
With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
Other Jobs from NVIDIA
Senior Data Engineer, Cloud Operations Engineering
Senior Firmware Engineer - Memory Subsystem
Senior Signal and Power Integrity Engineer - Hardware
Senior Mechanical Product Design Engineer
Senior Mixed Signal Design Validation Engineer
Senior ASIC Verification Engineer, Coherent High Speed Interconnect
Similar Jobs
Software Engineer - Sr. Consultant level
MTS Software Engineer
Senior Data Scientist
Senior Software Engineer ( Generative AI Developer, AI-Powered Code Developer)
AI Data Scientist
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say