Why do you charge job seekers to use EchoJobs?

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.

How many software engineering jobs are on EchoJobs?

We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!

So, where do the jobs come from?

We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.

What makes EchoJobs different?

We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️

How often are new jobs added?

Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀

How fast can I find a job?

Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯

How often should I check EchoJobs?

Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

AI Model Training/Inference Engineer at AMD

Description

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. Responsibilities Design, develop, and optimize core training operators on AMD GPUs, including GEMM, Grouped GEMM, Attention, DeepEP, and related kernels, with a strong focus on maximizing performance. Analyze performance bottlenecks in large-scale model training workloads and drive end-to-end system-level optimizations. Work closely with hardware, compiler, runtime, and framework teams to continuously improve the performance, stability, and usability of the ROCm ecosystem. Contribute to advanced research and development initiatives, including next-generation GPU architectures, compute–communication fusion, and AGI-driven automatic generation of high-performance operators. Qualifications Solid foundation in computer architecture and high-performance computing. Strong proficiency in C/C++, with hands-on experience in GPU programming and parallel development using HIP, CUDA, and Triton, and strong engineering implementation capabilities. Deep understanding of parallel computing principles and GPU execution models, with proven skills in performance profiling, analysis, and optimization. Practical experience with large-scale model training pipelines and operator-level performance optimization. Strong collaboration skills and the ability to work effectively across teams and technical domains. Preferred Qualifications Familiarity with modern GPU architectures and performance tuning techniques (e.g., AMD CDNA4, NVIDIA Blackwell). Demonstrated experience optimizing high-performance kernels such as GEMM, Attention, Grouped GEMM, and DeepEP. Experience with collective communication primitives (e.g., AllReduce, All-to-All, ReduceScatter) and performance optimization. Experience in one or more of the following areas: Low-precision computing (FP8 / FP4) Compute–communication overlap Compiler optimizations Automatic generation of high-performance operators Experience developing or optimizing large-scale training systems such as Megatron-LM, TorchTitan, or similar frameworks. ACADEMIC CREDENTIALS: Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent #LI-FL1 Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

Responsibilities Design, develop, and optimize core training operators on AMD GPUs, including GEMM, Grouped GEMM, Attention, DeepEP, and related kernels, with a strong focus on maximizing performance. Analyze performance bottlenecks in large-scale model training workloads and drive end-to-end system-level optimizations. Work closely with hardware, compiler, runtime, and framework teams to continuously improve the performance, stability, and usability of the ROCm ecosystem. Contribute to advanced research and development initiatives, including next-generation GPU architectures, compute–communication fusion, and AGI-driven automatic generation of high-performance operators. Qualifications Solid foundation in computer architecture and high-performance computing. Strong proficiency in C/C++, with hands-on experience in GPU programming and parallel development using HIP, CUDA, and Triton, and strong engineering implementation capabilities. Deep understanding of parallel computing principles and GPU execution models, with proven skills in performance profiling, analysis, and optimization. Practical experience with large-scale model training pipelines and operator-level performance optimization. Strong collaboration skills and the ability to work effectively across teams and technical domains. Preferred Qualifications Familiarity with modern GPU architectures and performance tuning techniques (e.g., AMD CDNA4, NVIDIA Blackwell). Demonstrated experience optimizing high-performance kernels such as GEMM, Attention, Grouped GEMM, and DeepEP. Experience with collective communication primitives (e.g., AllReduce, All-to-All, ReduceScatter) and performance optimization. Experience in one or more of the following areas: Low-precision computing (FP8 / FP4) Compute–communication overlap Compiler optimizations Automatic generation of high-performance operators Experience developing or optimizing large-scale training systems such as Megatron-LM, TorchTitan, or similar frameworks. ACADEMIC CREDENTIALS: Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent #LI-FL1

Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

Tags: No, CNY ¥563,430.00/Yr., CNY ¥804,900.00/Yr., Global Careers (do not use for US or Canada)

AMD

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say