As a Senior Manager, Storage Production Engineering, you will lead a team responsible for designing, building, and maintaining large-scale, high-performance storage infrastructure to support NVIDIA’s GPU cloud services, AI/ML workloads, and high-throughput computing environments. This role requires a deep understanding of storage architectures, scalability challenges, and performance optimization techniques, along with strong leadership and strategic planning abilities.
You will drive the evolution of distributed storage systems, object storage, and parallel file systems to meet the growing demands of NVIDIA’s compute and AI workloads. In this role, you will collaborate closely with engineering, infrastructure, and operations teams to ensure the reliability, scalability, and efficiency of our storage solutions. You will also be responsible for building and mentoring a world-class team of storage production engineers, driving automation and operational excellence, and defining long-term strategies for storage infrastructure.
What You Will Be Doing:
Lead and mentor a team of highly skilled Storage Production Engineers, fostering a culture of innovation, collaboration, and technical excellence.
Oversee the design, deployment, and optimization of large-scale storage systems, including distributed storage, parallel file systems, and object storage platforms.
Partner with cross-functional teams to drive storage automation, monitoring, and predictive analytics to enhance reliability and efficiency.
Establish best practices for capacity planning, data lifecycle management, and cost optimization for storage infrastructure.
Implement high-availability and disaster recovery strategies, ensuring minimal downtime and data loss across mission-critical storage environments.
Drive the adoption of modern storage architectures, including NVMe over Fabrics (NVMe-oF), RDMA, high-speed interconnects, and cloud-based storage solutions.
Lead incident response and root cause analysis efforts, implementing proactive measures to enhance system stability and resilience.
Work closely with engineering, DevOps, and AI/ML teams to optimize data pipelines, storage access patterns, and workflow performance. Advocate for continuous improvements in automation, operational efficiency, and performance tuning within the storage infrastructure.
What We Need To See:
BS/MS in Computer Science, Storage Systems, or a related technical field (or equivalent experience).
10+ overall years of experience in large-scale storage architecture, production engineering, or infrastructure roles.
5+ years of management experience, leading high-performing storage, infrastructure, or site reliability engineering teams.
Proven expertise in scalable storage architectures, including parallel file systems (Lustre, GPFS), distributed storage (Ceph, MinIO), and enterprise-scale object storage (S3, NetApp, Pure Storage, etc.).
Strong background in block, file, and object storage technologies, including their performance tuning, high-availability strategies, and data protection mechanisms.
Experience with storage networking protocols, such as NFS, SMB, iSCSI, Fibre Channel, RDMA, and NVMe-oF.
Hands-on experience with automation and infrastructure as code using Terraform, Ansible, Puppet, or similar tools.
Deep understanding of capacity planning, performance tuning, and troubleshooting large-scale storage systems.
Expertise in monitoring and observability tools like Prometheus, InfluxDB, and Elastic stack for storage infrastructure.
Ways to Stand Out from the crowd:
Experience in designing and scaling storage infrastructure for AI/ML workloads and high-performance computing (HPC). Familiarity with hybrid cloud and multi-cloud storage solutions, including AWS S3, Azure Blob, and Google Cloud Storage.
Proven ability to drive cross-functional initiatives, aligning storage strategies with broader business and engineering objectives.
Experience with software-defined storage (SDS), cloud-native storage, and Kubernetes-based storage orchestration. Passion for mentoring engineers, fostering career growth, and creating a high-performance team culture.
At NVIDIA, you’ll be at the forefront of innovative storage technologies, working on high-performance storage solutions that power the next generation of AI, HPC, and cloud computing. NVIDIA is leading in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking, and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you!
The base salary range is 272,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
Other Jobs from NVIDIA
Senior Data Engineer, Cloud Operations Engineering
Senior Firmware Engineer - Memory Subsystem
Senior Signal and Power Integrity Engineer - Hardware
Senior Mechanical Product Design Engineer
Senior Mixed Signal Design Validation Engineer
Senior ASIC Verification Engineer, Coherent High Speed Interconnect
Similar Jobs
RPA Application Engineer, FinTech
Cloud Support Engineer I - Networking, AWS Support Engineering
DevOps Cloud Engineer
Senior Manager, Software Development Automation
Senior Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say