Multimodal AI Systems Architect (AI Engineering)
Hyphenconnect
Posted: April 24, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
A Multimodal AI Systems Architect is required to develop and optimize AI systems that seamlessly integrate vision and audio models for voice-to-voice interactions.
Required Skills
Job Description
We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.
Responsibilities:
• Integrate vision encoders and audio-native models into core agent reasoning loops.
• Optimize streaming latency for voice-to-voice AI interactions.
• Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.
Qualifications:
• Experience with Whisper, CLIP, and multimodal LLM integration.
• Knowledge of streaming architectures and WebRTC.
• Expertise in cross-modal alignment.