Jobless Developer
Hyphen Connect Limited logo

Posted 2 months ago

Open

Multimodal AI Systems Architect (AI Engineering)

BostonRemoteFull-time

AI Summary

Multimodal AI Systems Architect designs and optimizes AI systems that integrate vision and audio models, improving voice-to-voice interactions and multimodal retrieval.

About this role

We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.

Responsibilities:

  • Integrate vision encoders and audio-native models into core agent reasoning loops.
  • Optimize streaming latency for voice-to-voice AI interactions.
  • Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.

Qualifications:

  • Experience with Whisper, CLIP, and multimodal LLM integration.
  • Knowledge of streaming architectures and WebRTC.
  • Expertise in cross-modal alignment.

Skills

Audio-native ModelsCLIPCore Agent Reasoning LoopsCross-modal AlignmentMultimodal LLM IntegrationMultimodal RAG SystemsRetrieval From Videos And PDFsStreaming ArchitecturesVision EncodersWebRTCWhisper

Explore related jobs

Browse these categories