CloudWalk

MLOps Engineer (LLM Serving and Infrastructure)

Remote Sao Paulo, Brazil
Terraform PyTorch R Git Bash Machine Learning Kubernetes
Description
Join the CloudWalk Wolfpack as a MLOps Engineer.

Your Mission:
At CloudWalk, we're at the cutting edge of AI, pioneering the use of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to drive innovation. As a MLOps Engineer, you will play a critical role in operationalizing the visionary work of our LLM Data Scientists. Your expertise will ensure the smooth deployment, efficient management, and scalable performance of LLMs across our extensive infrastructure. Your contributions will turn advanced AI research into scalable, high-performance solutions, with a particular focus on optimizing network communication and parallel processing capabilities.

What You’ll Do:

  • Deploy and Manage LLMs: Employ Kubernetes, Terraform, and cloud services to deploy and scale LLMs efficiently, ensuring their adaptability to high-demand scenarios.
  • Optimize Computing Infrastructure: Focus on enhancing GPU utilization, distributed training, bandwidth efficiency between machines, and VPC connections to maximize system performance.
  • Leverage Cutting-Edge Technologies: Utilize libraries such as Hugging Face's Accelerate and PyTorch's torchrun to facilitate parallel training across multiple machines in a cluster, optimizing our AI models' training and inference processes.
  • Collaborate on Innovation: Partner with our R&D team to transition LLM and RAG technologies from conceptual stages to scalable, production-ready systems.
  • Monitor and Improve System Performance: Implement advanced monitoring and logging practices to ensure system reliability and performance, continuously seeking improvements.
  • Stay Updated on Industry Advances: Actively pursue the latest developments in MLOps, cloud computing, and AI technologies to implement innovative solutions and maintain our infrastructure's leading edge.

Technologies You Will Work With:

  • Kubernetes, Terraform, and cloud computing platforms for scalable AI model deployment.
  • CI/CD pipelines, Git for version control, and Bash scripting for operational efficiency.
  • Hugging Face's Accelerate and PyTorch's torchrun for parallel training and optimization across multiple machines.
  • A comprehensive understanding of network infrastructure to optimize bandwidth and secure VPC connections is essential.

What We Expect From You:

  • Technical Mastery: Solid experience with DevOps, cloud infrastructure, and deploying machine learning models. Expertise in network optimization and parallel computing is crucial.
  • Problem-Solving Mindset: The ability to navigate complex challenges, strategically manage resources, and improve system efficiency.
  • Collaborative Approach: Strong communication skills and the ability to contribute effectively within a dynamic, interdisciplinary team.
  • Lifelong Learner: A commitment to continuous learning, staying abreast of the latest technological advancements, and applying innovative solutions.
Why CloudWalk?
By joining CloudWalk, you become part of a team that's reshaping the future with technological innovations. We cherish creativity, teamwork, and a dedication to excellence. Here, your work contributes to a mission of driving forward technological advancements.

Dare to innovate, dare to impact, dare to join the Wolfpack. Apply now!

CloudWalk
CloudWalk
Credit Cards Financial Services Point of Sale Virtual Currency

0 applies

1 views

Other Jobs from CloudWalk

Staff Software Engineer – Flutter

Remote Sao Paulo, Brazil

Data Scientist

Remote Sao Paulo, Brazil

Senior Software Engineer – Golang

Remote Sao Paulo, Brazil

Machine Learning Engineer - LLM

Remote Sao Paulo, Brazil

Similar Jobs

Sr. Machine Learning Engineer

New York, NY Rochester, NY

Site Reliability Engineer III

Remote Galway, Ireland

Senior Site Reliability Engineer

Remote Galway, Ireland

Data Scientist

Remote Portugal

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say