Founding Machine Learning Engineer
Indexed description
We're hiring a Founding Machine Learning Engineer to build the pipelines that turn raw footage into clean, structured training data. This is a hands-on, execution-heavy role at the intersection of ML engineering and research. You'll own the full lifecycle — from writing and testing processing scripts to deploying them at scale across thousands of hours of video.
What You'll Do
- Build and enhance post-processing pipelines that clean, validate, and package large volumes of video and audio data for multimodal model training. These pipelines must handle wide variation in speech, visual quality, and format — making robustness a huge engineering challenge.
- Deploy and fine-tune open-source models for speech recognition, speaker diarization, video segmentation, and related tasks.
- Design infrastructure for large-scale distributed processing — parallelizing thousands of compute jobs across cloud platforms and optimizing for throughput and cost.
- Strong experience in ML infrastructure, speech/audio processing, or large-scale data pipelines.
- Proficiency in PyTorch. Familiarity with distributed job orchestration.
- Claude Code pilled.
- A bias toward shipping. You default to building, not theorizing.
- Ability to work effectively in an early-stage environment where scope is broad and priorities shift fast.
- Prior work at a data company or frontier AI lab.
- Track record building pipelines that process tens of thousands of hours of audio or video.
- Experience with infrastructure cost optimization or model fine-tuning for production use.
- You're energized by operational work with immediate, visible impact.
- You treat broken processes as engineering problems worth solving properly.
- You hold yourself to a high bar for data quality — because you understand it directly determines model performance.
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search