Senior AI Performance Engineer

Advanced Micro Devices Näytä kaikki työpaikat

Helsinki
Vakituinen
Täyspäiväinen

1 kuukausi sitten

Job Description:WHAT YOU DO AT AMD CHANGES EVERYTHINGAt AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.SENIOR AI PERFORMANCE ENGINEERTHE ROLE:AMD is looking for a performance-obsessed engineer to push AI inference performance to the limit on AMD GPUs. You will work end-to-end across the stack: profiling, diagnosing, and optimizing leading models on customer-relevant serving configurations (e.g. agentic coding, long-context, high-throughput serving). You tackle hard performance problems across strategic customer engagements and leave behind measurable uplifts. This is not a sustaining role: every engagement is different, every optimization leaves a lasting impact.THE PERSON:You can take an AI workload, understand it top to bottom, and make it faster. You are comfortable profiling a serving deployment, diagnosing a kernel-level bottleneck, and presenting results to a customer's engineering team. You understand GPU kernel performance: not just how to use profiling tools, but how to reason about occupancy, cache behavior, memory coalescing, and instruction-level bottlenecks. You are AI-fluent: you leverage AI agents and tools daily to accelerate your work. You move fast and measure everything.KEY RESPONSIBILITIES:

Drive performance optimization across the stack on leading models and customer-relevant serving configurations, closing competitive gaps through kernel and systems-level optimizations
Profile, diagnose, and resolve cross-stack performance bottlenecks, from GPU kernels and operator dispatch to framework-level scheduling and multi-node communication
Diagnose kernel-level performance issues using profiling tools: identify occupancy limitations, L2 cache thrashing, register pressure, memory coalescing issues, etc, and translate findings into actionable optimizations
Participate in customer-facing technical engagements: present findings, recommend optimizations, and deliver measurable performance uplifts
Integrate and optimize custom kernels (Triton, Gluon, CK, PyDSL, ASM, AITER) within serving frameworks, understanding dispatch paths, shape extraction, and backend selection
Optimize multi-node distributed inference: communication-compute overlap, parallelism strategies, and scale-out performance
Contribute to shared performance optimization methodology across the broader team
Leverage AI agents to accelerate daily work
Upstream optimizations into open-source frameworks such as vLLM, SGLang, and PyTorch

PREFERRED EXPERIENCE:

5+ years of software development experience in GPU computing, AI systems, or high-performance computing
Hands-on experience with AI serving frameworks (vLLM, SGLang, TensorRT-LLM, or similar) and their internals
Strong background in end-to-end workload profiling and bottleneck diagnosis
Understanding of GPU kernel performance characteristics: occupancy, register and LDS pressure, memory coalescing, cache utilization, and instruction-level bottlenecks
Ability to read and reason about kernel-level profiling data and translate it into concrete optimization actions
Understanding of model architectures (transformers, MoE, diffusion), inference paradigms (speculative decoding, prefill-decode disaggregation, continuous batching), and how they map to hardware
Experience with custom kernel development or integration (HIP, CUDA, Triton, CK, or similar)
Understanding of multi-GPU and multi-node distributed systems
Strong proficiency in Python and C++
Customer-facing technical experience
Daily user of AI agents and development tools
Strong Linux systems knowledge
Excellent written and verbal English communication skills

ACADEMIC CREDENTIALS:Bachelor's or Master's in Computer Science, Computer Engineering, Electrical Engineering, or equivalent.LOCATION:Helsinki (Finland), Stockholm (Sweden), or Amsterdam (the Netherlands) preferred#LI-MH3#LI-HYBRIDBenefits offered are described: .AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is availableThis posting is for an existing vacancy.

Advanced Micro Devices

Hae nyt