Back to Search Results
Get alerts for jobs like this Get jobs like this tweeted to you
Company: AMD
Location: Markham, ON, Canada
Career Level: Mid-Senior Level
Industries: Technology, Software, IT, Electronics

Description

WHAT YOU DO AT AMD CHANGES EVERYTHING 

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.  Together, we advance your career.  

THE ROLE:

The Quality Returns Debug Team is looking for an experienced GPU PCBA Debug and Failure Analysis Engineer who will work with our engineering functions to perform board level (PCBA) failure analysis on customer and factory failures of GPU Accelerators to reproduce reported failures, isolate the cause of failure and work closely with cross-functional teams including design, validation, FW and manufacturing to drive root cause analysis and corrective actions. Your contributions will directly impact product quality, reliability, and customer satisfaction.

 

THE PERSON:

The ideal candidate is a skilled engineer with a strong analytical mindset and hands-on approach to technical problem-solving. They excel in both collaborative and independent environments, demonstrating initiative, adaptability, and a drive to tackle new challenges in fast-paced settings. With experience in system integration and High Performance Computing, they bring a proactive attitude and the ability to manage multiple tasks with limited supervision. Their excellent communication skills support effective teamwork and documentation, while their curiosity and persistence enable them to deliver high-quality solutions through thorough failure analysis and repair.

 

KEY RESPONSIBILITIES:

  • Support internal and external requests to troubleshoot PCBA-level AMD GPU product failures for customer quality support within expected timelines.
  • Perform thorough incoming visual inspection and document the condition of all units submitted for analysis.
  • Triage and communicate with the contract manufacturer and internal AMD teams (such as Design, BIOS, firmware, memory, I/O, display, diagnostics, Test Engineering, Board operations, etc.) as needed to isolate the cause of failure.
  • Document all findings into the FA database and create a complete failure analysis report for customer consumption.
  • Present findings to key stakeholders, including senior management.
  • Implement continuous improvements of failure analysis process & techniques and create procedures for the steps to follow.
  • Create new scripts to reduce debug time.
  • Document SOPs.
  • Plan for lab expansion and power needs.

PREFERRED EXPERIENCE:

  • Proven expertise in PCBA diagnostics, failure analysis, and debug techniques for computing products, from NPI through production.
  • Skilled in FA lab operations, including maintenance, SOP documentation, and planning for lab expansion (power and liquid cooling requirements).
  • Ability to perform incoming visual inspections and maintain detailed failure analysis reports for customer consumption.
  • Hands-on experience with system integration, including assembling, installing, and configuring computer systems and servers, as well as updating BIOS and firmware.
  • Experience using lab equipment such as oscilloscopes, logic analyzers, and custom/in-house test tools for hardware validation.
  • Familiarity with PCBA manufacturing processes and ability to relate failures to specific steps; IPC-A-610 quality standards training preferred.
  • Proficiency in Linux/Ubuntu and Windows environments, with strong skills in Python and shell scripting for automation and debug time reduction.
  • Skilled in MS Excel for data analysis and reporting.
  • Ability to read schematics, interpret datasheets, identify components, and perform soldering/rework for debug.
  • Knowledge of high-speed digital design, memory interfaces (HBM, GDDR), PCIe bus, and display outputs (DP, HDMI); deep understanding of GPU architecture and system integration.
  • Solid understanding of firmware, drivers, and hardware interactions, with capability to tune firmware as needed.
  • Experience with GPU data center infrastructure and AI/ML technologies.
  • Experience in server installation, configuration, and maintenance.
  • Familiarity with test equipment installation and planning for advanced cooling/power needs.

ACADEMIC CREDENTIALS:

Bachelor's or master's degree in electrical or computer engineering preferred.

 

LOCATION:

CA,ON,Markham

 

 

 

#LI-BS1

Benefits offered are described:  AMD benefits at a glance.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.


 Apply on company website