Machine Learning Engineer
Indexed description
Member of Technical Staff - ML Systems & Inference
San Francisco, CA
Onsite
$200-$300K + Equity (experience + skills dependent)
They are hiring a Member of Technical Staff focused on ML systems and inference to design and build the systems that execute models end-to-end under real production constraints.
You’ll work across model architecture, inference runtimes, scheduling, memory management, and system performance to make inference faster, more predictable, and more cost-efficient at scale.
Responsibilities
- Build and optimize end-to-end inference pipelines
- Design inference runtimes that balance latency, throughput, and concurrency
- Work on batching, queuing, scheduling, and tail-latency trade-offs
- Manage KV cache allocation, reuse, placement, and eviction
- Optimize prefill and decode paths, including attention and memory usage
- Profile and debug performance across model, runtime, and system layers
- Collaborate with compiler, kernel, networking, and distributed systems engineers
Requirements
- Strong software engineering fundamentals
- Experience with ML inference, model serving, or production ML systems
- Strong understanding of latency, throughput, memory, and concurrency
- Experience with Python and C++
- Interest in working close to models, runtimes, and systems performance
Nice to Have
- Experience with TensorRT-LLM, vLLM, Triton, or custom serving systems
- Knowledge of modern model architectures and attention mechanisms
- Experience with KV cache management, batching, scheduling, or GPU-backed systems
- Background in distributed systems, compilers, kernels, or high-performance infrastructure
Why Join?
- Backed by significant Series A funding
- Already generating eight-figure revenue
- Deployed with major enterprise and AI-native customers
- Building core infrastructure for next-generation AI inference
- Opportunity to work across the full AI stack: models, runtimes, compilers, kernels, orchestration, and hardware
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search