Description

NVIDIA is looking for a hardworking Senior Compute Cluster Deployment Engineer to join our Professional Services team.

You'll join a small team working around the globe to build some of the most cutting-edge Datacenters in the world. This role will focus on working to deploy server and compute clusters built with brand new GPU platforms responsible for AI and Machine Learning. You'll be working with some of the world's largest and most sophisticated customers and supercomputers. You'll work alongside our Infiniband and Ethernet network engineers to deploy a complete solution for customers looking to adopt NVIDIA solutions into their business.

Opportunities for global travel and learning about the newest GPU-related technologies are plentiful as we seek to build, shape and expand this new aspect of our business.

What you will be doing:

Primary responsibilities will include managing and maintaining AI/HPC infrastructure in Linux-based environments for new and existing customers.
Support operational and reliability aspects of large scale AI clusters with focus on performance at scale, real time monitoring, logging and alerting
Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health
Provide feedback into internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
Be part of an on call rotation to support production systems

What we need to see:

5+ years providing in-depth support and deployment services, solving problems for hardware and software products.
Knowledge and experience with Linux System Administration, process management, package management, task scheduling, kernel management, boot procedures/troubleshooting, performance reporting/optimization/logging, network-routing/advanced networking (tuning and monitoring).
Cluster management technologies, EX: Bright Cluster Manager
Scripting proficiency.
Good social skills with the ability to maintain and deliver resolutions for customer blocking issues as they arise.
Superb communication and presentation/oral skills.
Excellent verbal and written English skills.
Strong organizational skills and ability to prioritize/multi-task easily with limited supervision.
Candidates should have a minimum of a four-year degree from an accredited university or college in Computer Science, or Electrical or Computer Engineering.
Industry-standard Linux certifications.

Ways to stand out of a crowd:

InfiniBand experience.
Experience with GPU focused hardware/software.
Experience with MPI.
Automation tooling background (Ansible, Salt, Puppet etc.).
Ethernet and Storage technologies.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/.

NVIDIA

Artificial Intelligence (AI) GPU Hardware Software Virtual Reality

0 applies

5 views

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 320 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

NVIDIA

Senior Compute Cluster Deployment Engineer

Ugh.. sorry 😔 This job is closed.

Check out similar jobs below 😊

Jobs from our Partners

Embedded Software Engineer (ISR)

Embedded Software Engineer (ISR)

ETS Engineer II – Platform Engineering, Virtual Server Engineering (VSE)

ETS Engineer II – Platform Engineering, Virtual Server Engineering (VSE)

ETS Engineer II – Platform Engineering, Virtual Server Engineering (VSE)

Chief Engineer - Homewood Suites Chelsea

Other Jobs from NVIDIA

Senior Software Engineer – Simulation and Virtualization

Senior System Software Engineer

Embedded Memory Qualification Software Engineer

Senior Software Engineer - Test Infrastructure and Automation

Senior Technical Program Manager - Deep Learning Compute Server Software

Senior Technical Program Manager - Datacenter Compute Server Software

Similar Jobs

ETS Engineer II – Platform Engineering, Virtual Server Engineering (VSE)

ETS Engineer II – Platform Engineering, Virtual Server Engineering (VSE)

Site Reliability Engineer (SRE) - Data

Site Reliability Engineer (SRE) - Data

Site Reliability Engineer (SRE) - Data

ETS Engineer II – Platform Engineering, Virtual Server Engineering (VSE)

Wall of love from fellow engineers