Jobless Developer
Hyphen Connect Limited logo

Posted 2 months ago

Open

Multimodal AI Systems Architect (AI Engineering)

San FranciscoRemoteFull-time

AI Summary

Multimodal AI Systems Architect focuses on integrating vision and audio models within core AI systems, optimizing streaming latency for voice interactions, and architecting multimodal retrieval using videos and PDFs.

About this role

We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.

Responsibilities:

  • Integrate vision encoders and audio-native models into core agent reasoning loops.
  • Optimize streaming latency for voice-to-voice AI interactions.
  • Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.

Qualifications:

  • Experience with Whisper, CLIP, and multimodal LLM integration.
  • Knowledge of streaming architectures and WebRTC.
  • Expertise in cross-modal alignment.

Skills

Audio-native ModelsCLIPCross-modal AlignmentMultimodal LLM IntegrationStreaming ArchitecturesVision EncodersWebRTCWhisper

Explore related jobs

Browse these categories