Solutions Architect - AI / ML - Training & GPU infra

EUR 300000-300000 Full time Remote

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

AI/ML Solutions Architect – Distributed Training & GPU Infrastructure

Company

Join a fast-moving AI infrastructure team working on the cutting edge of large-scale ML workloads. This role is ideal for engineers who enjoy solving deep technical challenges in distributed training, multi-GPU systems, and scalable AI inference infrastructure. You will work directly with AI-focused clients, helping them get the most out of modern GPUs (H100, B200, etc.) and ML frameworks such as PyTorch (and JAX in some environments).

Team & Responsibilities

Work alongside senior AI and infrastructure engineers building large-scale GPU platforms. As part of the customer solutions team, you will:

Design and validate production-grade distributed training (primary) and largescale inference architectures on large GPU clusters, typically tens to thousands of GPUs
Work hands-on with customers to debug, optimize, and scale ML workloads across multi-node GPU environments
Act as a technical authority on GPU performance, networking, and schedulers, making trade-offs at scale and translating customer needs into concrete platform requirements
Collaborate closely with engineering, product, and R&D to influence roadmap decisions based on real-world ML workloads
This is a hands-on, technical role; you are expected to work directly in customer environments, not only advise at a high level

Required skills and experience

Hands-on experience designing and operating production-grade, multi-node GPU workloads for training or inference
Strong background in distributed deep learning (PyTorch Distributed, DeepSpeed) on GPU clusters
Deep understanding of GPU architecture and interconnects (H100/A100 class, NVLink, InfiniBand)
Experience with Kubernetes or Slurm and performance tuning using GPU profiling and monitoring tools

This role is not a fit if your experience is limited to single-node training, high-level AI strategy, or non-production research environments. We are looking for engineers and architects who thrive at the intersection of AI workloads and large-scale infrastructure.

What's offered

Location: Remote from anywhere in Europe

Total compensation up to EU 300k (base + variable), depending on level and experience

Originally posted on Himalayas

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If repetitive applications get heavy, Managed Job Search adds supervised execution for $99/month.

View Managed Job Search

The Next Chapter W&S Company profile preview

Source: Himalayas
Location
Compensation: EUR 300000-300000
Open on Caio: 2 roles

Salary insight

EUR 300000-300000

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full time roles Remote matches Himalayas postings

Company stats

Current index details for The Next Chapter W&S, based on roles Caio has indexed from public sources.

2open roles 2sources 0markets Posted 1mo agolatest role