Back to search
Acceler8 Talent Linkedin · Posted 17d ago

Machine Learning Engineer

San Francisco, California, United States

Linkedin
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Member of Technical Staff - ML Systems & Inference

San Francisco, CA

Onsite

$200-$300K + Equity (experience + skills dependent)


They are hiring a Member of Technical Staff focused on ML systems and inference to design and build the systems that execute models end-to-end under real production constraints.

You’ll work across model architecture, inference runtimes, scheduling, memory management, and system performance to make inference faster, more predictable, and more cost-efficient at scale.

Responsibilities

  • Build and optimize end-to-end inference pipelines
  • Design inference runtimes that balance latency, throughput, and concurrency
  • Work on batching, queuing, scheduling, and tail-latency trade-offs
  • Manage KV cache allocation, reuse, placement, and eviction
  • Optimize prefill and decode paths, including attention and memory usage
  • Profile and debug performance across model, runtime, and system layers
  • Collaborate with compiler, kernel, networking, and distributed systems engineers

Requirements

  • Strong software engineering fundamentals
  • Experience with ML inference, model serving, or production ML systems
  • Strong understanding of latency, throughput, memory, and concurrency
  • Experience with Python and C++
  • Interest in working close to models, runtimes, and systems performance

Nice to Have

  • Experience with TensorRT-LLM, vLLM, Triton, or custom serving systems
  • Knowledge of modern model architectures and attention mechanisms
  • Experience with KV cache management, batching, scheduling, or GPU-backed systems
  • Background in distributed systems, compilers, kernels, or high-performance infrastructure

Why Join?

  • Backed by significant Series A funding
  • Already generating eight-figure revenue
  • Deployed with major enterprise and AI-native customers
  • Building core infrastructure for next-generation AI inference
  • Opportunity to work across the full AI stack: models, runtimes, compilers, kernels, orchestration, and hardware

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.

Unlock free search