Sr Engineer -Compute
Team: MS Infrastructure
Location: Gurugram, Haryana
Commitment: Full Time
Workplace Type: remote
Principal Duties and Responsibilities
- Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities
- Plan and perform software and firmware maintenance activities
- Assess customer environments for performance and design issues and propose resolutions
- Work across technical teams to troubleshoot complex infrastructure issues
- Create and maintain detailed documentation
- Serve as a subject matter expert and escalation point for compute technologies
- Work with vendors to resolve compute issues
- Communicate with customers and internal team with transparency
- Participate in on-call rotation
- Completion of training and certification as assigned to further skills and knowledge
Education and Experience
- Bachelor’s degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education
- 5+ years of advanced Linux administration and troubleshooting
- 5+ years managing RedHat OpenShift Kubernetes and Virtualization clusters
- 5+ years of expert level experience managing infrastructure in high-performance computing environments including configuration, troubleshooting, and best practice
- 2+ years of experience with Nvidia DGX preferred
- Experience with HPC schedulers (e.g., SLURM, Kubernetes, PBS, Run:ai) required
- Proficient in physical server environments
- Experience configuring, maintaining and troubleshooting containers
- Experience with storage technology (e.g., Ceph or Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS)
- Experience with machine learning or data science workflows in HPC/AI environments
- 1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus
- 1+ years working with an enterprise ITSM system: Service Now is a bonus
- Previous experience with automation tools such as Ansible, Puppet, or Chef a plus
- Managed Services or consulting experience is required
- Strong background with customer service
- High level problem-solving and communication skills
- Strong oral and written communications skills
- Related Linux, Nvidia, Scheduler, Containerization, Virtualization, and Clustering certifications are a bonus
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
