Microsoft's vision for Azure Machine Learning (ML) centers around democratizing ML and ensuring its accessibility to all enterprises, developers, and data scientists. We are seeking individuals to join our team entrusted with the responsibility of serving all internal and external ML workloads. With our current efforts, we are already catering to billions of requests daily, encompassing the most cutting-edge scenarios and models throughout the company
As a member of the Inference team, you will contribute to the development of the next generation of model serving. This includes hosting OpenAI models such as ChatGPT, as well as scaling model hosting for Bing and Office, tackling numerous captivating challenges at the intersection of AI and Cloud. We are actively seeking a highly skilled Software Engineer who possesses a profound passion for designing and constructing exceptionally reliable and available platforms, capable of supporting model inferencing on a massive scale
In addition to platform development, you will be tasked with addressing high throughput/low latency scenarios and spearheading performance optimization capabilities. This position provides a unique opportunity to thrive in an environment that fosters innovation, fosters collaborative teamwork, and upholds the pursuit of excellence, all in alignment with Microsoft's mission.
Required and Preferred
- B Tech or M Tech in computer science, engineering, mathematics or a related field, or equivalent industry experience
- 1+ year(s) of software development experience focused C/C++ and/or Python development
- Knowledge and experience in OSS, Docker, Kubernetes, Python, GOLANG programming languages
- Good communication, collaboration skills and a great team player.
- Experience working in a geo-distributed team
- Practical experience hosting and running large scale machine learning models in enterprise grade applications.
- Experience in building enterprise grade applications in C++, Pytho
- Experience in developing and operating low latency, high scale, reliable online service
#AIPLATFORM#
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Responsibilities:
- Engage directly with key partners to understand state-of-the-art LLMs and Diffusion models, run them at scale in performance and cost effective manner
- Leverage latest hardware stack technologies improvements in CUDA, infiniband and fast-moving software stack to deliver best of class inference
- Anticipate, identify, assess, track, and mitigate project risks and issues in a fast-paced start up like environment
- Motivated to build constructive and effective relationships and solve problems collaboratively
- Support production inference for core AI scenarios on one of the largest GPU fleets in the world
0 applies
0 views
Other Jobs from Microsoft
Research Intern - Data Center and AI Networking - Transport and Telemetry
Senior Software Engineer
Research Intern - Generative AI and Volumetric Data
Research Intern - AI Networking - SPARC (Strategic Planning and Architecture)
Research Intern - Conversational AI
Similar Jobs
Lead Software Engineer (GOlang, Microservices, Cloud, kubernetes, docker)
Senior MLOps Engineer
Software Engineer Intern, Autonomous Vehicle - 2025
Senior Distributed Systems Engineer, AI Infrastructure
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say