Machine Learning Engineer

San Francisco, California, United States

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Member of Technical Staff - ML Systems & Inference

San Francisco, CA

Onsite

$200-$300K + Equity (experience + skills dependent)

They are hiring a Member of Technical Staff focused on ML systems and inference to design and build the systems that execute models end-to-end under real production constraints.

You’ll work across model architecture, inference runtimes, scheduling, memory management, and system performance to make inference faster, more predictable, and more cost-efficient at scale.

Responsibilities

Build and optimize end-to-end inference pipelines
Design inference runtimes that balance latency, throughput, and concurrency
Work on batching, queuing, scheduling, and tail-latency trade-offs
Manage KV cache allocation, reuse, placement, and eviction
Optimize prefill and decode paths, including attention and memory usage
Profile and debug performance across model, runtime, and system layers
Collaborate with compiler, kernel, networking, and distributed systems engineers

Requirements

Strong software engineering fundamentals
Experience with ML inference, model serving, or production ML systems
Strong understanding of latency, throughput, memory, and concurrency
Experience with Python and C++
Interest in working close to models, runtimes, and systems performance

Nice to Have

Experience with TensorRT-LLM, vLLM, Triton, or custom serving systems
Knowledge of modern model architectures and attention mechanisms
Experience with KV cache management, batching, scheduling, or GPU-backed systems
Background in distributed systems, compilers, kernels, or high-performance infrastructure

Why Join?

Backed by significant Series A funding
Already generating eight-figure revenue
Deployed with major enterprise and AI-native customers
Building core infrastructure for next-generation AI inference
Opportunity to work across the full AI stack: models, runtimes, compilers, kernels, orchestration, and hardware

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If CV tailoring and application tracking get heavy, Full Caio Agent adds a human specialist.

View Full Agent

Acceler8 Talent Company profile preview

Source: Linkedin
Location: San Francisco, California, United States
Compensation: Not listed
Open on Caio: 24 roles

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full-time roles Location flexible matches Linkedin postings

Company stats

Current index details for Acceler8 Talent, based on roles Caio has indexed from public sources.

24open roles 1sources 5markets Posted 1mo agolatest role