Featherless AI Himalayas · Posted 1mo ago

AI Researcher — Inference Optimization

USD Full time Remote

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Role Overview

We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization, improving latency, throughput, and cost efficiency across real-world production environments.

Key Responsibilities

Research and develop techniques to optimize inference performance for large neural networks.
Improve latency, throughput, memory efficiency, and cost per inference.
Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).
Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).
Benchmark inference workloads across hardware accelerators.
Collaborate with engineering teams to deploy optimized inference pipelines.
Translate research insights into production-ready improvements.

Required Qualifications

Strong background in machine learning, deep learning, or AI systems.
Hands-on experience optimizing inference for large-scale models.
Proficiency in Python and modern ML frameworks (e.g., PyTorch).
Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).
Ability to design experiments and communicate results clearly.

Preferred / Nice-to-Have Qualifications

Experience deploying production inference systems at scale.
Familiarity with distributed and multi-GPU inference.
Experience contributing to open-source ML or inference frameworks.
Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.
Experience working close to hardware (CUDA, ROCm, profiling tools).

What Success Looks Like

Measurable gains in latency, throughput, and cost efficiency.
Optimized inference systems running reliably in production.
Research ideas successfully translated into deployable systems.
Clear benchmarks and documentation that inform product decisions.

Relevant Research Areas (Bonus)

Long-context inference optimization
Speculative decoding
KV-cache compression and paging
Efficient decoding strategies
Hardware-aware inference design

Originally posted on Himalayas

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If CV tailoring and application tracking get heavy, Full Caio Agent adds a human specialist.

View Full Agent

Featherless AI Company profile preview

Source: Himalayas
Location
Compensation: USD
Open on Caio: 49 roles

Salary insight

USD

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full time roles Remote matches Himalayas postings

Company stats

Current index details for Featherless AI, based on roles Caio has indexed from public sources.

49open roles 4sources 11markets Posted 1mo agolatest role

Indexed description

Role Overview

Key Responsibilities

Required Qualifications

Preferred / Nice-to-Have Qualifications

What Success Looks Like

Relevant Research Areas (Bonus)

Long-context inference optimization

Speculative decoding

KV-cache compression and paging

Efficient decoding strategies

Hardware-aware inference design