Multimodal AI Systems Architect (AI Engineering)
Hyphenconnect
Posted: April 24, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Multimodal AI Systems Architect with expertise in integrating vision and audio models and optimizing voice-to-voice interactions. The ideal candidate should have experience with Whisper and be able to architect multimodal RAG systems for efficient and innovative AI systems.
Required Skills
Job Description
We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.
Responsibilities:
• Integrate vision encoders and audio-native models into core agent reasoning loops.
• Optimize streaming latency for voice-to-voice AI interactions.
• Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.
Qualifications:
• Experience with Whisper, CLIP, and multimodal LLM integration.
• Knowledge of streaming architectures and WebRTC.
• Expertise in cross-modal alignment.