ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

Applied ML Engineer, Real‑Time Video Generation

Cantina

Europe permanent

Posted: December 12, 2025

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

We're looking for an Applied ML Engineer with hands-on experience building large-scale video generation.

Job Description

About Cantina:

Cantina Labs is a social AI company, developing a suite of advanced real-time models that push the boundaries of expression, personality, and realism. We bring characters to life, transforming how people tell stories, connect, and create. We build and power ecosystems. Cantina, our flagship social AI platform, is just the beginning.

If you're excited about the potential AI has to shape human creativity and social interactions, join us in building the future!

About the Role:
We’re looking for an Applied ML Engineer with hands‑on experience building large‑scale video generation models—from data and training to distillation and acceleration into a fast, production‑ready model. Our models are human‑centric and product‑oriented: think interactive characters that can respond to text/audio/image inputs and generate video with very low latency.

This is an applied research + engineering role: you’ll work on training runs, data, model optimization, and the “make it fast” path that turns a capable research model into a real‑time experience.

Typical time split (roughly):

• 60–75% training / fine‑tuning / distillation of large video models

• 15–25% inference optimization (latency/memory/cost), model runtime work

• 10–15% prototyping + product integration (demos → shipped features)

What You’ll Do:

• Train and scale video generation models: run large‑scale training/fine‑tuning on multi‑GPU (and when needed multi‑node) setups; own the training loop, stability, checkpoints, and iteration speed.

• Own data for video modeling: build and improve video datasets/pipelines (decode/sampling, filtering/quality, conditioning alignment, storage formats), and keep the pipeline fast and reliable at scale.

• Distill and compress big models into fast ones: teacher→student distillation, step reduction, architectural simplifications, and quality/speed trade‑offs to hit real‑time constraints.

• Make models run in real time: profiling, memory optimizations, quantization-aware tactics where appropriate, kernel/runtime improvements, and practical throughput/latency wins.

• Build the bridge to product: package models into simple inference APIs and prototypes; collaborate with product to turn research progress into user-facing experiences (interactive characters, conversational video).

• Evaluate what matters: set up evaluation harnesses that track perceptual quality + temporal consistency + identity/character fidelity + latency/cost.

What You’ll Bring:

• 2+ years building and shipping ML systems (or equivalent), with clear ownership and delivery.

• Strong PyTorch + Python, comfortable touching both training and inference code.

• Hands‑on experience training or scaling generative models, ideally video generation (diffusion/transformers/VAEs or similar), not just using pre‑trained checkpoints.

• Experience with distributed training and large runs (e.g., DDP/FSDP/DeepSpeed‑style workflows), and the practical debugging that comes with them.

• Proven ability to improve performance in practice: latency/memory/cost optimizations, profiling, and shipping measurable wins.

• Product mindset: can move from research ideas → robust implementation → iterating against real constraints.

Bonus Points For:

• Experience with multimodal conditioning: audio‑to‑video, text+audio+image control, lip‑sync / gesture / character animation constraints.

• End‑to‑end distillation experience (teacher/student design, eval strategy, failure analysis).

• Familiarity with acceleration toolchains (Torch compile, Triton, TensorRT, ONNX, custom kernels) or model compression (quantization, pruning) where applicable.

• Experience with real‑time streaming / WebRTC prototypes or low‑latency media delivery (helpful, but not the core of the role).

Technical Stack You’ll Work With:

• ML: PyTorch (training + inference)

• Models: large video generation (diffusion/transformers/VAEs), multimodal conditioning

• Optimization: distillation, inference acceleration, multi‑GPU strategies

• Product: rapid prototyping, lightweight inference APIs

• Infra (supporting, not primary): Docker; cloud basics (AWS‑like services)

Location:

This role can be performed remotely in Europe, within GMT +/- 2 hours.

Compensation:

The anticipated annual base salary range for this role is between €190,000-€225,000, plus bonus. When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply