Posted 2 months ago
Multimodal AI Systems Architect (AI Engineering)
ChinaOn-siteFull-time
AI Summary
Multimodal AI Systems Architect focuses on integrating vision and audio models into core AI reasoning, optimizing voice interactions, and building multimodal retrieval systems.
About this role
We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.
Responsibilities:
- Integrate vision encoders and audio-native models into core agent reasoning loops.
- Optimize streaming latency for voice-to-voice AI interactions.
- Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.
Qualifications:
- Experience with Whisper, CLIP, and multimodal LLM integration.
- Knowledge of streaming architectures and WebRTC.
- Expertise in cross-modal alignment.
Skills
Audio-native ModelsCLIPCross-modal AlignmentMultimodal LLM IntegrationMultimodal RAG SystemsPDFsStreaming ArchitecturesVideosVision EncodersWebRTCWhisper
Explore related jobs
More jobs at Hyphen Connect Limited
- Content Operations Specialist (US stock contract) - Crypto ExchangeGlobal
- VIP BD Manager/ Director - Global RemoteGlobal
- Web3 Frontend Developer (UX)APAC
- Sales Operations ManagerRemote - Hangzhou, China
- Compliance Officer/ Money Laundering Reporting Officer (CO/MLRO)Hong Kong
- DeFi Product Owner (Bilingual: English & Mandarin)Remote - Global