
Description
WHAT YOU DO AT AMD CHANGES EVERYTHING
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE ROLE:
As an MTS Software Engineer in AMD's AI Models Group, you'll be at the forefront of enabling and accelerating large-scale AI model training on AMD GPUs. Our team works across the AI stack — from model reproduction and innovation, to performance tuning and ecosystem enablement — to ensure AMD's ROCm platform is fully capable of supporting cutting-edge AI workloads.
You'll contribute to reproducing and optimizing state-of-the-art models like LLMs, diffusion models, and reinforcement learning agents. Your work will directly impact ROCm's competitiveness and ease of use in the machine learning community through performance analysis, software tooling, and open-source engagement.
THE PERSON:
We're looking for a hands-on engineer with a strong background in deep learning, large-scale model training, and system-level performance tuning. You are curious, self-driven, and excited about working on both research and production code. You care deeply about the usability and performance of AI infrastructure, and you're motivated to help build a best-in-class developer experience on AMD GPUs.
KEY RESPONSIBILITIES:
- Enhance large-scale AI models on ROCm to validate AMD's training capability.
- Analyze training performance and collaborate with compiler/runtime teams to improve throughput and efficiency.
- Explore model-level or algorithmic innovations to improve convergence speed or resource efficiency.
- Build and publish user-friendly Docker images and example projects to improve out-of-the-box usability.
- Engage with the open-source community through technical blogs, tutorials, and upstream contributions.
PREFERRED EXPERIENCE:
- Solid understanding of AI model training pipelines, from data loading to optimization and evaluation.
- Familiarity with mainstream deep learning frameworks such as PyTorch, Hugging Face Transformers, or similar.
- Hands-on experience with distributed training frameworks like DeepSpeed, FSDP, or Megatron-LM.
- Exposure to large language models (LLMs) — including pretraining, fine-tuning, or inference optimization.
- Experience or interest in reinforcement learning or its applications in training large reasoning models (e.g., using PPO, DPO, etc.).
- General comfort working across the ML system stack — from models and algorithms down to runtime behavior and GPU performance.
ACADEMIC CREDENTIALS:
- Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
#LI-FL1
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
Apply on company website