We are looking for an AIOps Principal Engineer who can design, develop, and deploy AI-powered solutions for IT operations. You will work with a team of engineers, data scientists, and domain experts to create and implement innovative applications that leverage NVIDIA's Observability, Infrastructure and Gen AI platforms. You will also collaborate with internal and external customers to understand their needs, define requirements, and deliver high-quality products.
What you'll be doing:
Lead the design, development, testing, and deployment of AIOps platform.
Apply machine learning, deep learning, natural language processing, and other AI techniques to solve IT operations challenges such as anomaly detection, root cause analysis, incident management, and automation.
Improve IT Infrastructure and Operations Management by defining and measuring AIOps metrics such as accuracy, reliability, scalability, performance, and efficiency.
Experience in implementing observability principles and practices such as monitoring, logging, tracing, and alerting.
Deep Knowledge in data science engineering such as data collection, data cleaning, data analysis, data modeling, and data visualization.
Expertise in integrating AIOps tools with IT operations management (ITOM) and IT service management (ITSM) systems, service desk, change management, configuration management, etc.
Demonstrate solid leadership skills and ability to lead and empower engineers and data scientists.
Design and communicate the AIOps roadmap, vision, and strategy to the team and the partners.
Collaborate effectively with customers, such as IT managers, business users, vendors, and partners, to ensure alignment and satisfaction.
Playing a pivotal role in harnessing AI, generative AI, and machine learning for Nvidia IT teams.
What we need to see:
Bachelor's degree or higher in computer science, engineering, or related field (or equivalent experience).
15+ years of industry experience in extensive engineering projects, with a particular emphasis on infrastructure automation, distributed systems, and tool development for managing large-scale private or public cloud systems.
5+ years of experience and understanding working with AIOps technologies and platforms.
Proficient in Python, TensorFlow, PyTorch, or other AI frameworks and libraries.
Proficiency in Python and Go programming; your coding and debugging expertise are pivotal to your success in this role.
Demonstrated commitment to sound software engineering principles and a strong willingness to acquire new skills.
Experience in working with IT systems, tools, and processes such as ITSM, ITOM, monitoring, logging, and alerting.
Ability to work independently and collaboratively in a fast-paced and dynamic environment.
Hands-On experience in designing and implementing end-to-end architecture and large-scale rollout of AIOps product.
Developed Gen AI applications using LLMs, RAG for incident diagnosis, identifying root causes and incident resolution.
Ways to stand out from the crowd:
Proficiency in developing and deploying generative AI solutions such as language model, chatbot, and conversational assistant.
Hands-On experience in Integrating workflow automation tools with AIOps for incident resolution and self-healing
Deep background and understanding of Machine Learning: developing, training, and applying machine learning models across large operational datasets.
Experience with pre-training & fine-tuning LLM models and working on ML frameworks such as SKLearn, XGBoost, PyTorch, Tensorflow.
Have hands-on experience with various AIOps platforms such as BigPanda, DataDog, Moogsoft, ITOM Health, Splunk, Elastic Stack, Dynatrace, New Relic, etc.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're a creative individual who thrives on achieving goals and enjoys a dynamic learning environment, then why not seize this opportunity? Apply today!
The base salary range is 248,000 USD - 385,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
Other Jobs from NVIDIA
System Software Architect, Programmable Vision Accelerator
Software Engineer Intern - Mapping and Generative AI
Machine Learning Engineer Intern - 2025
System Software Engineer Intern - Autonomous Vehicles Platform - 2025
Autonomous Driving Product User Experience Intern - 2025
Similar Jobs
Lead Machine Learning Engineer
Principal Associate- Machine Learning Engineer
Lead Machine Learning Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say