Back to search
hyphenconnect Greenhouse · Posted 1mo ago

Multimodal AI Systems Architect (AI Engineering)

United States

Engineering Greenhouse
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.

Responsibilities:

  • Integrate vision encoders and audio-native models into core agent reasoning loops.
  • Optimize streaming latency for voice-to-voice AI interactions.
  • Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.

Qualifications:

  • Experience with Whisper, CLIP, and multimodal LLM integration.
  • Knowledge of streaming architectures and WebRTC.
  • Expertise in cross-modal alignment.
Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent