Senior Framework Engineer — Diffusion Inference

Advanced Micro Devices Näytä kaikki työpaikat

Helsinki
Vakituinen
Täyspäiväinen

21 päivää sitten

Job Description:WHAT YOU DO AT AMD CHANGES EVERYTHINGAt AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.Senior Framework Engineer – Diffusion InferenceAs a Framework Engineer for Diffusion Model Inference, you will design, build, and evolve a production-grade inference framework for Diffusion Transformers (DiTs) powering state-of-the-art image and video generation. You will focus on framework-level engineering—model integration, scalable parallel inference, kernel plumbing, packaging, testing, and release management—ensuring diffusion workloads run out-of-the-box with exceptional performance on modern GPU systems.This role sits at the intersection of model architecture, distributed systems, and GPU software stacks (ROCm, runtimes, libraries). You will collaborate closely with kernel, compiler, and hardware teams, and support top-tier customers by converting real-world requirements into upstream features, optimized pipelines, and dependable releases.Key Responsibilities

Develop and maintain a diffusion inference framework for image/video generation with clean APIs and strong compatibility with widely used diffusion ecosystems (e.g., HuggingFace diffusers-style pipelines and model wrappers).
Own scalable parallel inference features for DiT workloads—single-node and multi-node—including:

Sequence parallel approaches (e.g., Unified Sequence Parallelism)
Pipeline-style methods
CFG-parallel execution strategies
Integrate optimized operator backends (attention, GEMM, quantized paths) by bridging Python/C++ layers and ensuring correctness and high performance.
Ship production-grade packaging & releases including:

Containers (Docker)
Versioned artifacts
Dependency hygiene
Pip-installable distributions
Build continuous testing & benchmarking infrastructure, including correctness tests, performance regression gating, and "known-good configurations" across GPU SKUs and cluster topologies.
Collaborate across the GPU software stack (runtime, collectives, libraries, compilers) and translate framework needs into actionable upstream improvements.
Support strategic customers by mapping real-world inference constraints (latency, throughput, memory) into framework features, reference configurations, and reproducible deployment recipes.
Communicate clearly around technical tradeoffs, performance bottlenecks, and roadmap decisions with teams and stakeholders across Europe, the US, and Asia.

Example Tasks for the First 6 Months

Implement “day-1” enablement for a new image/video DiT model, including wrapper creation, pipeline integration, correctness verification, and performance baseline setup.
Develop or improve distributed inference recipes (e.g., USP + communication-aware tuning) and validate across multi-node clusters.
Integrate a new high-performance attention backend (e.g., FlashAttention v3-class) and expose it cleanly via the framework API.
Build a regression suite targeting attention-heavy and GEMM-heavy workloads and add CI gates to prevent performance regressions.
Define and drive a framework roadmap covering features, milestones, and prioritized upstream kernel/runtime requests.

Ideal Candidate ProfileYou enjoy building real frameworks—not just scripts—and taking features all the way to production: documented, tested, packaged, benchmarked, and reproducible.You likely have experience with:

Diffusion inference internals (DiTs, attention scaling, VAE integration points, scheduler loops, memory behavior).
Distributed inference systems (multi-GPU & multi-node), including communication patterns and latency-sensitive execution.
Framework plumbing: integrating optimized kernels/operators into user-facing APIs with correctness and reliability guarantees.
Release engineering: containers, dependency management, CI pipelines, performance dashboards, reproducible benchmarks.

Required Skills & Qualifications

Strong Python and/or C++ engineering skills (debugging, profiling, testing, navigating complex codebases, clean abstractions).
Experience with ML frameworks—PyTorch strongly preferred, JAX/TF welcome—and familiarity with diffusion model execution.
Proven ability to work in GPU-accelerated environments with intuition for performance, memory/compute tradeoffs, and profiling.
Comfort with containers (Docker) and modern dev workflows (git, CI, build systems).
Strong cross-functional collaboration and clear technical communication skills.

Nice to Have

Experience with diffusion inference engines or parallel inference frameworks for DiTs (sequence, pipeline, CFG-parallel concepts).
Exposure to operator libraries such as AITER-style kernel collections (attention/GEMM/quant/comm).
GPU kernel development experience (HIP/CUDA/Triton) or familiarity with compiler/codegen backends.
Knowledge of high-performance networking (RDMA, RoCE, InfiniBand, UCX) for multi-node inference.
Experience building benchmarking and performance regression systems at scale.

Academic Credentials

BSc, MSc, PhD, or equivalent experience in Computer Science, Electrical Engineering, or a related field.

#LI-MH3#LI-HYBRIDBenefits offered are described: .AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is availableThis posting is for an existing vacancy.

Advanced Micro Devices

Hae nyt