This role is a member of the AI/ML Infrastructure Engineering team and will be dedicated to implementing and supporting AI/ML infrastructure solutions in cloud and on-premise environments. The role will work directly with infrastructure teams and potentially face off with data scientists, machine learning engineers, application developers, and quantitative analysts by functioning as both a solutions architect, helping them implement their own AI/ML solutions, and as a professional services engineer, implementing solutions for them in cloud environments such as AWS, GCP, and Kubernetes.
This is a hands-on developer role and candidates ideally have had experience deploying and supporting their own production-ready AI/ML models in cloud environments as well as automating the build and management of a broad range of cloud infrastructure using tools like Terraform. Candidates should be familiar with developing unit and functional tests, have experience designing and implementing CI/CD tools with infrastructure as code pipelines, and have knowledge of Linux systems administration, containerization, networking, security, automated configuration and state management, cross-system orchestration, configuration management, logging, metrics, monitoring, and alerting.
Principal Responsibilities:
• Architect, develop and maintain internal AI/ML infrastructure components, frameworks, and offerings
• Architect, develop and maintain AI/ML solutions for customers in cloud environments
• Help customers architect, develop and maintain their own AI/ML solutions in cloud environments
• Implement CI/CD pipelines which include application tests, security tests, and gates
• Implement availability, security, performance monitoring, and alerting of AI/ML solutions
• Automate data resiliency and replication for AI/ML models
• Manage multiple environments and promote code between them
• Automate systems configuration and orchestration using tools such as Terraform, Chef, Ansible, or Salt
• Automate creation of machine images and containers
Required Qualifications/Skills
• 6+ years of experience designing and supporting production cloud environments
• Experience consulting with customers to develop AI/ML solutions
• Experience developing collaboratively, including infrastructure as code, preferably in Python
• Systems engineering knowledge, including understanding of Linux, security, and networking
• Cloud templating tools such as Terraform
• Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch)
• Experience with distributed computing tools (e.g., Ray, Dask)
• Experience with model serving tools (e.g., vLLM, KFServing)
• Experience with building, monitoring, and alerting on logs and metrics
• Cloud Networking including connectivity, routing, DNS, VPCs, proxies, and load balancers
• Cloud Security including IAM, Certificate Management, and Key Management
• Excellent written and verbal communications
• Excellent troubleshooting and analytical skills
• Self-starter able to execute independently, on a deadline, and under pressure
Other Jobs from Millennium Management
Full Stack Developer – Data Warehouse
ML Infrastructure Engineer
Full Stack Developer – Data Warehouse
Rapid Application Developer
Similar Jobs
ML Infrastructure Engineer
Security Data Scientist
Productivity Tools and Technology Engineer
Distributed Systems Engineer
Distributed Systems Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say