Senior ML Infrastructure Engineer
Indexed description
- Health, Medical Science & Generative Biology
- Food Security & Sustainable Agriculture
- Climate Change & Managing CO₂
- Artificial Intelligence & Robotics
Requirements
Our MLOps team
Join our MLOps team to build the cloud and compute foundation that enables scientific breakthroughs. Deliver reliable, secure platforms and self-service guardrails that accelerate experimentation and turn ideas into results—faster, at scale, and with confidence.
Day-to-day, you might:
- Build, operate, and continuously optimise our high-performance GPU training and inference clusters, focusing on robust, high-availability scheduling, isolation, and automated lifecycle management.
- Drive systems design and implementation for high-throughput data paths, optimising I/O, caching, and data locality across compute and storage (including our current Lustre implementation).
- Proactively benchmark, profile, and resolve performance bottlenecks across the compute, network, and orchestration layers to maximise efficiency for distributed training and inference.
- Establish comprehensive observability, resilience, and automated security controls to ensure compliance and robust operation of sensitive research environments.
- Partner with Research, Data, and Applied teams to forecast capacity and cost for GPU and storage needs, setting quotas and streamlining ML experimentation pipelines.
- Proven experience leading the design, build, and operation of high-performance ML compute clusters at scale
- A proactive, autonomous approach to systems design and the proven ability and desire to ideate, co-create and implement optimal solutions
- Exposure to migrating or transforming ML infrastructure from traditional schedulers to modern, containerised systems
- Expertise with high-throughput storage systems for ML/HPC workloads
- Expert-level understanding of GPU architecture, high-speed networking for distributed training, and performance profiling to resolve bottlenecks
- A solid grasp of IaC and CI/CD practices (e.g., Terraform, Argo CD)
Pension
Life Assurance
Income Protection
Private Medical Insurance
Hospital Cash Plan
Therapy Services
Perk Box
Electric Car Scheme
--
Why work for EIT:
At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged, and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to have a shared commitment to excellence. Join us and make an impact!
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search