AMD

Platform Engineer

Remote Austin, TX
USD 134k - 191k
Ansible Terraform Python Go Kafka Kubernetes
Description
WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. AMD together we advance_ THE TEAM: AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people. THE ROLE: The Software Platform Architecture (SPA) team has an open position for a Platform Engineer. SPA is the hardware-accelerated, software-focused wing of the newly-formed Cluster Platform Engineering (CPE) team at AMD and rolls up through the Data Center GPU (DCGPU) business unit. This role will be responsible for helping to select, curate, design, automate, and document all software underpinning an entire full-stack AI-focused platform. This work is not net-new code development but instead focused on choosing the right software properties and how data and operations flow through it to ease the adoption and operations of large-scale GPU-accelerated AI (Artificial Intelligence) and HPC (High Performance Computing) Cluster systems within AMD. SPA works closely with the Site Reliability Engineering (SRE) and Data Center Operations (DCOps) teams who tackle day-to-day commissioning and operations of the clusters under CPE’s control. SPA’s work is measured by how much we reduce the operational toil while increasing the rigor and repeatability of processes for the SRE and DCOps teams. SPA has design responsibility for the full Day 0 – Day 2 software platform. This position is an exciting opportunity to help build a platform leveraging AMD’s industry-leading infrastructure and choosing a world-class software stack in support of this critical growth area for AMD, its engineering teams, its customers, and for the industry. The Platform Engineer role in SPA cuts across all hardware and software infrastructure, up through platform software, consumption portals, and ultimately the real goal: having the AI application software experience be optimized for AMD. AI applications are focused on those best-leveraging the AMD Instinct GPU and AMD EYPC CPU in cluster systems. THE PERSON: Excellent communication and interpersonal skills The ability to interact with various teams in order to account for their needs in platform design Technology Orientation – affinity towards seeing application and platform trends, and testing/validating those trends to allow AMD to take best, and earliest advantage Outstanding Integrity – a thoroughly honest and forthright individual, who is upfront and direct with subordinates, peers, and management executives to whom he/she reports  Effective working in a culturally diverse organization KEY RESPONSIBILITIES: The Platform Engineer role in SPA cuts across all hardware and software infrastructure, up through platform software, consumption portals, and ultimately the real goal: having the AI application software experience be optimized for AMD. AI applications are focused on those best-leveraging the AMD Instinct GPU and AMD EYPC CPU in cluster systems Work with all CPE teams to validate that SPA’s platform designs are Day 0 – Day 2 ready and able to integrate with other teams’ workflows Work with the Release Engineering team to automate the application of updates and system configuration management tools Maintain tight interaction with the SRE team to continually improve how what SPA designs is integrated into an operational change process and cadence Ensure that all applications and infrastructure elements expose/export telemetry that is centrally managed and used to guide the management of the entire platform Write the glue-code necessary to connect systems to each other if no native mechanisms exist Ensure all platform designs reflect Security as a core principle, with input to Policy, Guidelines, and participate in platform and project retrospectives/blameless post-mortems PREFERRED EXPERIENCE: Experience in full-stack (infra, platform, application) multi-site, multi-region solutions at scale Strong multi-distro Linux knowledge across deployment, configuration, and management Cloud Native platform implementation Kubernetes as application dial-tone all the way up through Service Mesh and multi-tenant application deployment and management Strong knowledge of multiple virtualization and containerization technologies systems like KVM, Xen, and Kubernetes – OpenShift a bonus Experience with automation platforms at scale using Ansible, Terraform / OpenTofu Some experience with application and platform telemetry frameworks, such as OpenTelemetry Strong networking knowledge with a primary focus on L3 and path-vector routing protocols Experience with RDMA/RoCE and InfiniBand a plus Demonstrated record of accomplishment of successfully building and delivering complex operational solutions at scale, with the ability to learn new systems quickly in a rapidly changing environment Python ang Golang experience a plus Platform message-bus (such as Kafka) experience Remote position with ability to travel when required (up to 10%) ACADEMIC CREDENTIALS: BSEE or relevant technical degree; MSEE or MBA is desirable and preferred #LI-RW1 #LI-HYBRID At AMD, your base pay is one part of your total rewards package. Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD’s Employee Stock Purchase Plan. You’ll also be eligible for competitive benefits described in more detail here. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

THE TEAM: AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people. THE ROLE: The Software Platform Architecture (SPA) team has an open position for a Platform Engineer. SPA is the hardware-accelerated, software-focused wing of the newly-formed Cluster Platform Engineering (CPE) team at AMD and rolls up through the Data Center GPU (DCGPU) business unit. This role will be responsible for helping to select, curate, design, automate, and document all software underpinning an entire full-stack AI-focused platform. This work is not net-new code development but instead focused on choosing the right software properties and how data and operations flow through it to ease the adoption and operations of large-scale GPU-accelerated AI (Artificial Intelligence) and HPC (High Performance Computing) Cluster systems within AMD. SPA works closely with the Site Reliability Engineering (SRE) and Data Center Operations (DCOps) teams who tackle day-to-day commissioning and operations of the clusters under CPE’s control. SPA’s work is measured by how much we reduce the operational toil while increasing the rigor and repeatability of processes for the SRE and DCOps teams. SPA has design responsibility for the full Day 0 – Day 2 software platform. This position is an exciting opportunity to help build a platform leveraging AMD’s industry-leading infrastructure and choosing a world-class software stack in support of this critical growth area for AMD, its engineering teams, its customers, and for the industry. The Platform Engineer role in SPA cuts across all hardware and software infrastructure, up through platform software, consumption portals, and ultimately the real goal: having the AI application software experience be optimized for AMD. AI applications are focused on those best-leveraging the AMD Instinct GPU and AMD EYPC CPU in cluster systems. THE PERSON: Excellent communication and interpersonal skills The ability to interact with various teams in order to account for their needs in platform design Technology Orientation – affinity towards seeing application and platform trends, and testing/validating those trends to allow AMD to take best, and earliest advantage Outstanding Integrity – a thoroughly honest and forthright individual, who is upfront and direct with subordinates, peers, and management executives to whom he/she reports  Effective working in a culturally diverse organization KEY RESPONSIBILITIES: The Platform Engineer role in SPA cuts across all hardware and software infrastructure, up through platform software, consumption portals, and ultimately the real goal: having the AI application software experience be optimized for AMD. AI applications are focused on those best-leveraging the AMD Instinct GPU and AMD EYPC CPU in cluster systems Work with all CPE teams to validate that SPA’s platform designs are Day 0 – Day 2 ready and able to integrate with other teams’ workflows Work with the Release Engineering team to automate the application of updates and system configuration management tools Maintain tight interaction with the SRE team to continually improve how what SPA designs is integrated into an operational change process and cadence Ensure that all applications and infrastructure elements expose/export telemetry that is centrally managed and used to guide the management of the entire platform Write the glue-code necessary to connect systems to each other if no native mechanisms exist Ensure all platform designs reflect Security as a core principle, with input to Policy, Guidelines, and participate in platform and project retrospectives/blameless post-mortems PREFERRED EXPERIENCE: Experience in full-stack (infra, platform, application) multi-site, multi-region solutions at scale Strong multi-distro Linux knowledge across deployment, configuration, and management Cloud Native platform implementation Kubernetes as application dial-tone all the way up through Service Mesh and multi-tenant application deployment and management Strong knowledge of multiple virtualization and containerization technologies systems like KVM, Xen, and Kubernetes – OpenShift a bonus Experience with automation platforms at scale using Ansible, Terraform / OpenTofu Some experience with application and platform telemetry frameworks, such as OpenTelemetry Strong networking knowledge with a primary focus on L3 and path-vector routing protocols Experience with RDMA/RoCE and InfiniBand a plus Demonstrated record of accomplishment of successfully building and delivering complex operational solutions at scale, with the ability to learn new systems quickly in a rapidly changing environment Python ang Golang experience a plus Platform message-bus (such as Kafka) experience Remote position with ability to travel when required (up to 10%) ACADEMIC CREDENTIALS: BSEE or relevant technical degree; MSEE or MBA is desirable and preferred #LI-RW1 #LI-HYBRID

At AMD, your base pay is one part of your total rewards package. Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD’s Employee Stock Purchase Plan. You’ll also be eligible for competitive benefits described in more detail here. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Tags: No, USD $134,120.00/Yr., USD $191,600.00/Yr., US Careers (External)
AMD
AMD
Cloud Computing Computer Embedded Systems GPU Hardware Semiconductor

0 applies

3 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say