Rakuten

AI/ML infrastructure engineer - Machine Learning & Deep Learning Engineering Department (MDE)

Tokyo, Japan
TensorFlow PyTorch Docker Machine Learning Deep Learning Kubernetes Python Go
Description

Job Description:

Business Overview

Rakuten is one of the world's leading e-commerce site operators, with a mission to empower people and society through the internet. We are striving to become a global innovation company while expanding various businesses.

Department Overview

The Machine learning and Deep learning Engineering Department (MDE) is a group of engineers and scientists who specialize in natural language processing (NLP), search, and recommendation systems. We conduct state-of-the-art research and apply cutting-edge technologies, such as transformer model, dense retrieval, distributed GPU training, and large-scale machine learning, to a variety of Rakuten products and services. We are looking for passionate experts in machine learning research and engineering to join us in our journey to define the next-generation e-commerce experience. 

The GPU Engineering team is at the forefront of delivering a robust GPU infrastructure and cutting-edge ML platforms that powers the development and deployment of ML models across various teams of ML engineers and researchers within Rakuten. Use cases include semantic search, visual search, recommendation, LLMs, and more 

Position:

Why We Hire

As an MLOps Engineer in the GPU Engineering team, you will be at the heart of Rakuten's ML operations, focusing on the deployment, monitoring, and management of ML models. You'll work closely with ML Engineers across the department to provide a reliable infrastructure that supports rapid model development, training, and deployment. Your expertise will contribute to the efficiency and scalability of our ML projects, directly impacting Rakuten's product innovation and service excellence. 

Position Details

Key Responsibilities: 

- Design, implement, and maintain ML pipelines for automated training, testing, and deployment of machine learning models, ensuring scalability and efficiency. 

- Work collaboratively with ML engineers to troubleshoot and optimize model performance, ensuring models are production-ready and meet defined SLAs. 

- Manage and monitor Kubernetes clusters and related infrastructure to support high-volume ML workloads, implementing best practices for security and resilience. 

- Develop and maintain documentation on ML infrastructure, tools, and best practices, providing guidance and support to ML teams. 

- Continuously evaluate and incorporate new technologies and tools to enhance the ML platform's capabilities and performance. 

Mandatory Qualifications:

- Experience: 3 years or more of experience in MLOps, with a proven track record of managing ML infrastructure

- Kubernetes Proficiency: Deep understanding of Kubernetes (K8s) infrastructure and its application in managing ML workloads

- Programming Skills: Proficiency in Python or Golang

- Proven experience with Linux OS, with the ability to maintain system performance, ensure proper configuration, and leverage tools to troubleshoot software, hardware, and network-related issues

- Education: Bachelor’s or higher degree in Computer Science, Engineering, or a related technical discipline

- Strong communication and teamwork skills

- Passion for technology and solving challenging problems

Desired Qualifications:

- Familiarity with ML frameworks (e.g., TensorFlow, PyTorch) and CUDA

- CI/CD Tools: Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI) and container technologies (e.g., Docker)

- Experience training large models, including LLMs

#engineer #technologyservicediv 

Languages:

English (Overall - 4 - Fluent)

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say