AI Inference Infrastructure Software Engineer (Kubernetes / Cloud)

Seattle, Washington, United States

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

AI Inference Infrastructure Software Engineer (Kubernetes / Cloud)

Seattle, WA (Hybrid – 3 days/week onsite)

Compensation: Targeting $170K - $210K, meaningful start up equity

A fast‑moving AI engineering team is looking for an Inference Infrastructure Software Engineer to build and operate the Kubernetes and cloud backbone behind large‑scale accelerated inference workloads. If you thrive at the intersection of distributed systems, cloud infrastructure, and high‑performance AI, this role puts you right at the core of next‑generation inference platforms. If you want to help push AI inference to its performance limits and build the infrastructure that makes it possible, we’d love to connect.

What You’ll Do

Build and operate Kubernetes infrastructure powering large‑scale inference services
Run accelerated workloads with strict latency, throughput, and reliability requirements
Manage AWS, GCP, and on‑prem environments across networking, storage, IAM, and observability
Develop automation and tooling in Python, Bash, and Go to streamline deployments and scaling
Partner with ML, runtime, and hardware teams to productionize new inference capabilities
Contribute to capacity planning, cost optimization, and reliability engineering
Participate in on‑call rotation for critical services

What You Bring

3–5 years of hands‑on Kubernetes experience (EKS, GKE, or self‑hosted)
2–3 years operating production workloads on AWS or GCP
Experience running ML or accelerated inference services at scale
Strong skills in Python, Bash, and Go
Deep understanding of GPU/accelerator scheduling, device plugins, and cluster performance
Experience with IaC (Terraform/Pulumi), config management (Ansible/Puppet/Salt), and GitOps (Argo/Flux)
Comfortable operating in fast‑moving, early‑stage environments

Bonus Points

Experience with inference servers (Triton, vLLM, TGI)
Exposure to non‑GPU accelerators (FPGAs, ASICs)
Background in SRE, observability, or performance engineering
Experience building customer‑facing API platforms

Prime Team Partners is an equal opportunity employer. Prime Team Partners does not discriminate on the basis of race, color, religion, national origin, pregnancy status, gender, age, marital status, disability, medical condition, sexual orientation, or any other characteristics protected by applicable state or federal civil rights laws. For contract positions, hired candidates will be employed by Prime Team for the duration of the contract period and be eligible for our company benefits. Benefits include medical, dental and vision. Employees are covered at 75%. We offer a 401K after 6 months, we do not provide paid holidays or PTO, sick time is offered in accordance with local laws. This position is open until filled.

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If CV tailoring and application tracking get heavy, Full Caio Agent adds a human specialist.

View Full Agent

Prime Team Partners Company profile preview

Source: Linkedin
Location: Seattle, Washington, United States
Compensation: Not listed
Open on Caio: 3 roles

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full-time roles Location flexible matches Linkedin postings

Company stats

Current index details for Prime Team Partners, based on roles Caio has indexed from public sources.

3open roles 1sources 1markets Posted 2mo agolatest role