Description

NVIDIA's Deep Learning Libraries Team is looking for excellent interns to enable the next wave of NVIDIA’s highest performing deep learning libraries, such as cuDNN, cuBLAS and TensorRT. The mission is to design and develop scalable and modular software products that enable breakthroughs in problems from image classification to speech recognition to natural language processing and artificial intelligence. Join the team that is building the underlying software used across the world to power the revolution in artificial intelligence! We’re always striving for peak GPU efficiency on current and future-generation GPUs. To get a sense of the code we write, check out our CUTLASS open-source project showcasing performant matrix multiply on NVIDIA’s Tensor Cores with CUDA. This specific position primarily deals with code lower in the deep learning software stack, right down to the GPU HW.

What you'll be doing:

In this role, you will be responsible for developing and delivering highly optimized deep learning products. The scope of these efforts ranges from defining the public APIs to performance tuning and analysis, from building developer infrastructure to testing automation, from joining architecture discussion to learning latest and greatest technologies from the research community. During your internship, you will be doing one or multiple of activities below:

Writing highly tuned compute kernels, mostly in C++ CUDA, to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations)
Collaborating with teams across NVIDIA:
CUDA compiler team on generating optimal assembly code
Deep learning training and inference performance teams on which layers require optimization
Hardware and architecture teams on the programming model for new deep learning hardware features
Develop robust and scalable GPU-accelerated deep learning libraries, using C++ and object oriented design
Building scalable automation for build, test, integration, and release processes for publicly distributed deep learning libraries
Maintain and test environments for new hardware, new OSes, and platforms by using industry-standard tools (e.g. Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc)
Participate in a high-energy and dynamic company culture to develop state of the art software and hardware products and practice hardware-software co-design

What we need to see:

Pursuing a BS, MS or PhD in Computer Science, Compute Engineering or similar
Demonstrated strong C++ programming and software design skills, including debugging, problem solving, performance analysis, and test design
Experience with performance-oriented parallel programming, even if it’s not on GPUs (e.g. with OpenMP or pthreads)
Or experience in SCM (e.g. Git, Perforce) and build systems (e.g. Make, CMake, Bazel)
Passion for “it just works” automation and enabling team members

Ways to stand out from the crowd:

Experience in optimizing/tuning BLAS or deep learning library kernel code
Knowledge of CUDA/OpenCL GPU programming
Numerical methods and linear algebra
LLVM, TVM tensor expressions, or TensorFlow MLIR
Experience with code coverage and static code analysis tools

This is an opportunity to have a wide impact at NVIDIA by improving development velocity across our many compute software projects. Are you creative, driven, and autonomous? Do you love a challenge? If so, we want to hear from you.

NVIDIA

Artificial Intelligence (AI) GPU Hardware Software Virtual Reality

0 applies

2 views

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

NVIDIA

Deep Learning Software Engineering Intern, AI - 2025

Ugh.. sorry 😔 This job is closed.

Check out similar jobs below 😊

Other Jobs from NVIDIA

Senior Chip Design Engineer

Senior Software Engineer

Senior Software QA Automation Engineer

Senior Chip-Design Verification Engineer, Networking Chip Design

Software Engineer, Chip Design

Senior Software Developer, HPC