CerebrasSystems

Machine Learning Research Engineer

Toronto, Ontario Sunnyvale, CA
USD 140k - 200k
TensorFlow PyTorch Python Deep Learning Machine Learning
Description

Cerebras is developing a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.

We are innovating at every level of the stack – from chip, to microcode, to power delivery and cooling, to new algorithms and network architectures at the cutting edge of ML research. Our fully-integrated system delivers unprecedented performance because it is built from the ground up for the deep learning workload.

Cerebras is building a team of exceptional people to work together on big problems. Join us!

About The Role

Cerebras' fully-integrated system is built from the ground up with a singular focus on ML, where the hardware, software, and ML algorithms are co-designed in tight collaboration. The foundation is the Wafer Scale Engine (WSE), a single chip that is 56x larger than a GPU and has orders of magnitude higher memory bandwidth and fully unstructured sparsity acceleration. On top of the WSE, there is a cluster architecture that scales to train the largest neural networks in the world.  

This is an applied research engineer role working in tight collaboration with senior researchers to co-design state-of-the-art ML algorithms on this unique specialized architecture. It will focus on designing the novel software architecture, workflows, analysis tools, and infrastructure for state-of-the-art ML algorithms. Areas of research beyond what’s possible on GPU include new sparsity algorithms, unique approaches to model scaling and parallelism, and novel efficient training techniques.

Research Directions

Our research is focused on improving state-of-the-art large language models (e.g. BERT, GPT) and computer vision models (e.g. ResNet, Vision Transformer) in many dimensions unique to the Cerebras architecture, such as: 

  • Sparse and low-precision training algorithms for reduced training time and increased accuracy 
  • Compute- and memory-efficient training techniques such as reversibility and low-rank 
  • Scaling laws for increasing model size: accuracy/loss, architecture scaling, hyperparameter transfer 
  • Optimizers, initializers, normalizers to improve distributed training on large scale clusters 

Responsibilities

  • Design and develop ML workflows and user interfaces for novel algorithms 
  • Design and develop software for scaling models and large-scale experimentation 
  • Design and develop analysis tools to drive efficient research insight, including dataset cleaning, analysis of training dynamics and gradient quality 
  • Publish and present research at leading machine learning conferences 

Requirements

  • Strong grasp of machine learning fundamentals and computer science 
  • Experience with scaling state-of-the-art models on large distributed clusters 
  • Deep knowledge of machine learning frameworks, such as TensorFlow and PyTorch 
  • Deep knowledge of distributed training concepts and frameworks such as Megatron and Deepspeed 
  • Fluency in a programming language, such as Python and C++ 

Preferred

  • Experience in research environments in academic or industry labs 
  • Track record of relevant publications/patents 

Salary

  • $140,000 - $200,000 (US only, based on experience, location and other determining factors)

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.


This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

CerebrasSystems
CerebrasSystems
Artificial Intelligence Computer Hardware Software

7 applies

431 views

Other Jobs from CerebrasSystems

IT/DevOps Engineer

Bengaluru, India Europe

Network Engineer

Sunnyvale, CA San Diego, CA

Senior Applied ML Engineer

Sunnyvale, CA San Diego, CA

Similar Jobs

AI Engineer

Bengaluru, India

AI Engineer

Bengaluru, India

AI Engineer

Bengaluru, India

AI Engineer

Bengaluru, India

AI Engineer

Bengaluru, India

AI Engineer

Bengaluru, India

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 223 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers