Machine Learning Engineer

Canada

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

ML Model Serving Engineer

Want to build the layer that actually makes AI usable in real time?

You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments.

They’re building real-time AI systems that need to respond instantly, reliably, and at scale. That means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack.

You’ll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load.

This is not about training models. It’s about making them fast, efficient, and production-ready.

What you’ll work on:

Building high-performance serving systems for LLM, speech, and vision models
Scaling inference to production workloads with strict latency requirements
Optimising GPU utilisation and execution efficiency
Implementing techniques like continuous batching, KV cache optimisation, speculative decoding, and prefill/decode separation
Improving frameworks such as vLLM, TensorRT-LLM, Triton, and SGLang
Profiling and debugging performance across GPU, memory, and system layers

What you’ll bring:

Strong experience with ML inference or model serving systems
Deep understanding of latency and throughput optimisation in production
Solid Python and PyTorch skills, plus a systems or performance engineering mindset
Familiarity with distributed systems and production infrastructure

Exposure to CUDA, GPU profiling tools, or systems like Kubernetes and Ray is useful, but the key is knowing how to make models run efficiently at scale.

You’ll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research.

There’s real ownership here. You’ll help define how next-generation AI systems are served.

Package:

$220,000 – $320,000 base + equity

San Francisco, onsite 3 days per week

If you’re interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring.

All applicants will receive a response.

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If repetitive applications get heavy, Managed Job Search adds supervised execution for $99/month.

View Managed Job Search

techire.® Company profile preview

Source: Linkedin
Location: Canada
Compensation: Not listed
Open on Caio: 1 role

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full-time roles Location flexible matches Linkedin postings

Company stats

Current index details for techire.®, based on roles Caio has indexed from public sources.

1open roles 1sources 1markets Posted 2mo agolatest role