Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco

Canada

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Job Description

Job Description - Plaud Inc.

Speech Evaluation Engineer (Speech LLM)

Company: Plaud Inc.

Location: San Francisco, CA

Type: On-site (Hybrid: Minimum 3x in-office per week)

Machine Learning & Artificial Intelligence

Job Overview

Plaud is seeking a candidate to turn ambiguous concepts like voice naturalness and cadence into clear, automated metrics. You will partner with ML researchers to define benchmarks for Speech LLMs, build scalable data pipelines, and own dashboards that track model health and performance.

Key Responsibilities

Define and automate metrics for subjective concepts such as naturalness, expressiveness, and conversational cadence.
Build reliable distributed systems and data pipelines that run at scale against live model checkpoints.
Partner with ML researchers to translate Speech LLM capabilities (e.g., ASR robustness, TTS emotional steerability) into measurable benchmarks.
Develop and own dashboards to track model health during training, improve signal-to-noise ratios, and reduce evaluation latency.
Debug anomalous mid-training results to identify root causes (architecture, data, or infrastructure).
Communicate complex statistical results and model behaviors to technical and non-technical stakeholders.

Required Qualifications

Engineering Skills: Strong software engineering skills, particularly in Python, with experience in distributed systems and evaluation harnesses.
ML Collaboration: Ability to deeply partner with researchers to define "good" performance for AI models.
Observability: Experience building trusted tracking dashboards (e.g., Weights & Biases, MLflow).
Communication: Ability to clearly articulate complex statistical results.

Preferred Qualifications

Speech Metrics: Familiarity with WER, CER, PESQ, and automated MOS scoring frameworks.
LLM-as-a-Judge: Experience using frontier or fine-tuned multi-modal LLMs to evaluate conversational logic, transcription accuracy, and audio quality.
Human Evaluation: Background in managing large-scale crowdsourcing for RLHF/DPO efforts.
Adversarial Datasets: Experience curating datasets to test edge cases (heavy accents, overlapping speech, noisy environments).

Compensation & Benefits

Salary: $180,000 - $270,000 base salary + performance bonus + Equity.
Healthcare: Top-tier healthcare (employee + dependents) including dental and vision.
Retirement: 401(k) with company matching.
Time Off: Unlimited PTO plus 13 paid holidays.
Parental Leave: 12 weeks of paid leave for all new parents.
Equipment: Choice of top-of-the-line laptops/workstations.
Perks: Annual offsites and fully stocked office.

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If repetitive applications get heavy, Managed Job Search adds supervised execution for $99/month.

View Managed Job Search

ChatGPT Jobs Company profile preview

Source: Linkedin
Location: Canada
Compensation: Not listed
Open on Caio: 170 roles

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full-time roles Location flexible matches Linkedin postings

Company stats

Current index details for ChatGPT Jobs, based on roles Caio has indexed from public sources.

170open roles 1sources 20markets Posted 2mo agolatest role