Back to search
genesis Ashby · Posted 27d ago

Member of Technical Staff, Training (Paris, London)

Paris, France Fulltime

Engineering & Research FullTime Ashby
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

What You’ll Do Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures What You’ll Bring Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years) Production-grade expertise in Python Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism System-level mindset with a track record of tuning hardware–software interactions for maximum utilization

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.

Unlock free search