Mirantis Himalayas · Posted today

Senior AI Infrastructure & Platform Operations Engineer

Full time Remote

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Role Overview

As a Senior AI Infrastructure & Platform Operations Engineer, you will serve as a technical leader within the operations organization, providing deep expertise across infrastructure, networking, platform operations, and service reliability. You will be responsible for driving operational excellence across complex production environments while acting as a key escalation point for critical incidents and challenging technical issues.

What You Will Do

Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents. Support large-scale NVIDIA GPU infrastructure and high-performance networking environments. Troubleshoot complex Linux, Kubernetes, networking, storage, and hardware-related issues.

Why It Might Be a Fit

We offer: Operate some of the most advanced AI infrastructure environments in production today. Work with the latest NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments. Help define operational standards and reliability practices for next-generation AI infrastructure services.

Requirements

7+ years of experience in infrastructure operations, platform operations, site reliability engineering, network operations, cloud operations, datacenter operations, or related technical roles.
Expert-level Linux administration and troubleshooting skills.
Strong networking expertise, including experience diagnosing complex performance, connectivity, and reliability issues.
Strong experience operating Kubernetes in production environments.
Experience supporting large-scale production infrastructure and distributed systems.
Proven experience leading technical investigations and managing complex incidents.
Experience performing root cause analysis and driving long-term operational improvements.
Strong understanding of observability, monitoring, and service reliability practices.
Excellent troubleshooting and analytical skills across multiple infrastructure domains.
Strong communication, collaboration, and stakeholder management skills.

Benefits

Operate some of the most advanced AI infrastructure environments in production today.
Work with the latest NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments.
Help define operational standards and reliability practices for next-generation AI infrastructure services.
Influence the adoption of AI-powered operational capabilities through k0rdent AI.
Work alongside highly skilled engineers solving complex infrastructure and platform challenges at scale.
Join a growing organisation investing heavily in AI infrastructure, platform services, and operational innovation.

Originally posted on Himalayas

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If CV tailoring and application tracking get heavy, Full Caio Agent adds a human specialist.

View Full Agent

Mirantis Company profile preview

Source: Himalayas
Location
Compensation: Not listed
Open on Caio: 64 roles

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full time roles Remote matches Himalayas postings

Company stats

Current index details for Mirantis, based on roles Caio has indexed from public sources.

64open roles 5sources 6markets Posted todaylatest role