Member of Technical Staff - ML Research Engineer; Multi-Modal - Vision
Liquid Ai
Posted: October 23, 2025
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Builds general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability.
Required Skills
Job Description
About Liquid AI
Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there.
The Opportunity
Our VLM team builds vision-language models that run on-device, at the edge, and under real-time constraints without sacrificing quality. This role offers full technical ownership for someone who wants to own outcomes, make decisions, and shape the direction of vision AI at a company where your work is the product.
What We're Looking For
We need someone who:
• Has expertise in VLMs: This role hits the ground running. You'll tackle real problems from day one.
• Takes ownership: We give people problems, not tasks. We need someone who will own an end-to-end workstream and deliver outcomes.
• Writes production code: Our models ship to customers. We need code that's maintained, not one-off research prototypes.
• Stays resilient: Training runs fail. Experiments don't work. We need someone who iterates through setbacks.
The Work
• Design and run large-scale VLM training experiments on distributed GPU clusters
• Own pre-training or SFT pipelines for multimodal models
• Build data pipelines for image-text datasets at scale
• Collaborate on vision encoder architecture and image compression tradeoffs
• Help grow the team through interviewing and network referrals
Desired Experience
Must-have:
• Direct VLM experience (training, architecture, or significant research)
• Distributed training at scale (PyTorch Distributed, DeepSpeed, FSDP, or Megatron-LM)
• Production-quality coding ability
• Can work independently
Nice-to-have:
• Video understanding experience
• Data quality or dataset design expertise
• Vision encoder or image compression research
What Success Looks Like (Year One)
• Our VLM models are SOTA across all major benchmarks
• This hire owns a major workstream (video understanding, data quality, or encoder architecture) end-to-end
• At least one model has shipped to production with this hire's direct contribution
What We Offer
• Full ownership: You own your work from architecture to deployment.
• Compensation: Competitive base salary with equity in a unicorn-stage company
• Health: We pay 100% of medical, dental, and vision premiums for employees and dependents
• Financial: 401(k) matching up to 4% of base pay
• Time Off: Unlimited PTO plus company-wide Refill Days throughout the year