Senior Agentic System & Application Engineer
Advanced Micro Devices Näytä kaikki työpaikat
- Helsinki
- Vakituinen
- Täyspäiväinen
- Build agentic frameworks for planning, tool-use, memory, and long-running workflow execution.
- Automate env setup, containerization, and reproducible dependency resolution for target workloads.
- Compose scheduler jobs and launch runs with resource-aware configuration and policy constraints.
- Diagnose failures from logs/telemetry, propose fixes, and automate safe remediation loops.
- Profile bottlenecks, apply optimizations, and benchmark candidates with variance control.
- Ship an onboarding MVP: env + container + job compose/launch + failure triage, with reproducible run bundles.
- Deploy to 1 production workloads; cut time-to-first-successful-run and deliver measurable speedups.
- Add optimization loop for one hotspot kernel with correctness tests, rollback, and perf regression gates.
- MS/PhD (or equivalent) in CS/CE/EE or related field.
- Shipped production agentic AI systems: orchestration, tool-use, memory, and long-horizon reliability.
- Built LLM inference/serving on premise or using cloud APIs, with fallbacks, rate-limit handling, and observability.
- Strong Python plus one system language: C/C++ or Rust/Go.
- Delivered reliable systems with evals, regression gates, CI/CD, and incident-grade debugging.
- Integrated automation with real tooling: code execution, build/test, schedulers, and telemetry.
- HPC workflows: Slurm/Kubernetes, multi-node GPU runs, and containerization.
- GPU profiling and performance analysis with ROCm/CUDA tools and trace workflows.
- LLM pretraining or tuning (SFT, RL) and evaluation workflows.
- Kernel authoring/tuning/codegen (Triton, CUDA/HIP) and production integration.
- GEMM/Matmul optimization, fusion, mixed precision, and numerical validation.
- Compiler stacks (MLIR/LLVM, XLA-like flows) and performance portability.