ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

Member of Technical Staff - Multimodal VLM/LLM

Blackforestlabs

Freiburg (Germany) (Freiburg) Remote permanent

Posted: February 16, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

We're looking for a skilled engineer to join our team in Freiburg, Germany, to work on developing production applications for our FLUX models.

Job Description

What if the future of generative AI isn't just better images or better text, but models that understand both—and use that understanding to create in ways neither modality could alone?

Our founding team pioneered Latent Diffusion and Stable Diffusion - breakthroughs that made generative AI accessible to millions. Today, our FLUX models power creative tools, design workflows, and products across industries worldwide.

Our FLUX models are best-in-class not only for their capability, but for ease of use in developing production applications. We top public benchmarks and compete at the frontier - and in most instances we're winning.

If you're relentlessly curious and driven by high agency, we want to talk.

With a team of ~50, we move fast and punch above our weight. From our labs in Freiburg - a university town in the Black Forest - and San Francisco, we're building what comes next.

But here's the frontier we're exploring: vision-language models that don't just caption images or generate from prompts, but truly understand the relationship between visual and linguistic information. Models that can enhance prompts intelligently, moderate content contextually, and unlock generative capabilities we haven't imagined yet. That's the research you'll lead.

What You'll Pioneer

You'll run cutting-edge projects in multimodal vision-language and large language models, integrating them into our media generation pipeline in ways that push beyond what either modality could achieve alone. This isn't about implementing existing VLMs—it's about developing novel approaches that make FLUX more powerful, more controllable, and more aligned with what creators actually need.

You'll be the person who:

• Leads the development and training of state-of-the-art multimodal vision-language models within the FLUX technology stack—not just applying existing architectures, but innovating on them

• Designs and implements specialized fine-tuning strategies for VLMs to address specific use cases and performance requirements that general-purpose models can't handle

• Develops and optimizes LLM implementations for prompt enhancement, content moderation, and novel applications that improve how people interact with generative models

• Drives innovation by integrating VLM/LLM capabilities into our media generation pipeline in creative ways that enhance generative capabilities

• Conducts research to creatively combine vision and language models—exploring questions about how these modalities can inform and improve each other

• Maintains cutting-edge knowledge of the latest developments in multimodal AI and LLM research, evaluating emerging models and architectures for potential integration

• Collaborates with cross-functional teams to implement and deploy models at scale, contributing to architectural decisions and technical roadmap planning

• Documents and shares research findings with the broader team, translating breakthroughs into practical improvements

Questions We're Wrestling With

• How can vision-language models improve prompt understanding in ways that make generation more controllable and aligned with user intent?

• What's the right architecture for integrating VLMs into diffusion model workflows without creating computational bottlenecks?

• How do you fine-tune vision-language models for specialized creative tasks that weren't in the training data?

• Where can LLMs enhance the generative pipeline—prompt rewriting, content moderation, parameter suggestion—and where would they add more friction than value?

• What novel capabilities emerge when you deeply integrate vision and language understanding into generative workflows?

• How do you evaluate whether multimodal models are actually improving generation quality versus just adding complexity?

These aren't solved problems—they're research directions we're actively exploring.

Who Thrives Here

You've trained and fine-tuned large-scale vision-language models and understand the nuances of multimodal learning. You have strong intuitions about what makes VLMs work well, backed by either publications or practical projects that pushed the field forward. You're comfortable operating at the intersection of research and production, where models need to be both innovative and deployable.

You likely have:

• Demonstrated expertise in training and fine-tuning large-scale vision-language models—not just using pre-trained ones, but developing them

• Strong publication record or practical experience with relevant projects in multimodal AI research that shows you can push the frontier

• Proficiency in PyTorch or similar deep learning frameworks with deep understanding of their capabilities and limitations

• Experience with distributed training systems and large-scale model optimization—because VLMs don't fit on one GPU

• Track record of implementing and scaling AI models in production environments where research meets real-world constraints

We'd be especially excited if you:

• Have experience with diffusion models and generative AI architectures alongside autoregressive modeling—understanding how different paradigms can complement each other

• Bring a background in computer vision that informs your approach to multimodal models

• Contribute to open-source AI projects and understand the community

• Have worked in fast-paced startup environments where iteration speed matters

• Bring strong software engineering practices and system design skills

• Have experience with open-source VLM inference frameworks like vLLM

What We're Building Toward

We're not just adding VLMs to our stack—we're exploring fundamental questions about how vision and language understanding can make generative models more powerful and more aligned with human intent. Every model you train teaches us something about multimodal learning. Every integration reveals new capabilities. Every research finding shapes where the field goes next. If that sounds more compelling than applying existing techniques, we should talk.

We're based in Europe and value depth over noise, collaboration over hero culture, and honest technical conversations over hype. Our models have been downloaded hundreds of millions of times, but we're still a ~50-person team learning what's possible at the edge of generative AI.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply