Principal AI Infrastructure Engineer - Hosting
Hamilton Barnes Näytä kaikki työpaikat
- Helsinki
- Vakituinen
- Täyspäiväinen
- Cluster Architecture: Architecting and scaling high-density GPU training environments utilising advanced InfiniBand interconnects (1000+ node scale).
- System Optimisation: Owning end-to-end performance engineering, transforming high-level uptime and speed requirements into technical IOPS and throughput reality.
- Development & Automation: Crafting high-level Python automation to handle everything from bare-metal provisioning and OS deployment to complex firmware orchestration.
- Strategic Leadership: Defining the technical roadmap for the engineering organisation, spearheading peer reviews, and cultivating top-tier internal talent.
- Monitoring & Reliability: Developing robust observability frameworks to track fabric health and compute baselines while deploying proactive anomaly detection.
- Cross-Team Integration: Acting as the technical bridge between ML research teams, datacenter operations, and hardware procurement to turn theoretical needs into physical capacity.
- Track Record of deep systems engineering or infrastructure architecture experience in massive-scale environments.
- Specialised Knowledge: Extensive experience managing large-scale HPC or AI clusters with specific expertise in InfiniBand fabrics and NVIDIA Hopper-class hardware.
- Automation: Expert-level Python skills paired with a deep background in Infrastructure-as-Code (Terraform, Ansible, or SaltStack).
- Technical Authority: Proven ability to lead complex engineering workstreams, establish technical standards, and mentor senior-level peers.
- Stock option equity
- Comprehensive healthcare, lunch, and wellbeing benefits
- €170,000 base per year