Multimodal Generative AI Researcher

Stabilityai

Remote Remote permanent

Posted: January 29, 2026

Job Description

Multimodal Generative AI Researcher

Location: Remote

About the Role

We’re looking for a Research Scientist with deep expertise in training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs) for downstream multimodal tasks. You’ll help push the next frontier of models that reason across vision, language, and 3D, bridging research breakthroughs with scalable engineering.

What You’ll Do

• Design and fine-tune large-scale VLMs / LLMs — and hybrid architectures — for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction.

• Build robust, efficient training and evaluation pipelines (data curation, distributed training, mixed precision, scalable fine-tuning).

• Conduct in-depth analysis of model performance: ablations, bias / robustness checks, and generalisation studies.

• Collaborate across research, engineering, and 3D / graphics teams to bring models from prototype to production.

• Publish impactful research and help establish best practices for multimodal model adaptation.

What You Bring

• PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics.

• Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks.

• Strong engineering mindset — you can design, debug, and scale training systems end-to-end.

• Deep understanding of multimodal alignment and representation learning (vision–language fusion, CLIP-style pre-training, retrieval-augmented generation).

• Familiarity with recent trends, including video-language and long-context VLMs, spatio-temporal grounding, agentic multimodal reasoning, and Mixture-of-Experts (MoE) fine-tuning.

• Awareness of 3D-aware multimodal models — using NeRFs, Gaussian splatting, or differentiable renderers for grounded reasoning and 3D scene understanding.

• Hands-on experience with PyTorch / DeepSpeed / Ray and distributed or mixed-precision training.

• Excellent communication skills and a collaborative mindset.

Bonus / Preferred

• Experience integrating 3D and graphics pipelines into training workflows (e.g., mesh or point-cloud encoding, differentiable rendering, 3D VLMs).

• Research or implementation experience with vision-language-action models, world-model-style architectures, or multimodal agents that perceive and act.

• Familiarity with efficient adaptation methods — LoRA, adapters, QLoRA, parameter-efficient finetuning, and distillation for edge deployment.

• Knowledge of video and 4D generation trends, latent diffusion / rectified flow methods, or multimodal retrieval and reasoning pipelines.

• Background in GPU optimisation, quantisation, or model compression for real-time inference.

• Open-source or publication track record in top-tier ML / CV / NLP venues.

Equal Employment Opportunity:

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Multimodal Generative AI Researcher

Interested in this position?

Quick Summary

Required Skills

Job Description

Why Apply Through MisuJob?

Frequently Asked Questions

How do I apply for this position?

Is MisuJob free for job seekers?

How does AI matching work?

Can I apply to jobs in other countries?

Ready to Apply?