Member of Technical Staff - Large Scale Data Infrastructure
Blackforestlabs
Posted: February 2, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Build scalable data loaders for training runs across thousands of GPUs, with a focus on research rigor, open science, and building systems that enable real breakthroughs.
Required Skills
Job Description
About Black Forest Labs
Black Forest Labs builds generative models for image and video used by millions of creators, developers, and businesses worldwide. Our FLUX models operate at the frontier of visual AI and are trained at scales where data movement becomes a first-order constraint.
We’re headquartered in Freiburg, Germany, with a growing presence in San Francisco, and we focus on research rigor, open science, and building systems that enable real breakthroughs.
What You’ll Work On
• Build scalable data loaders for training runs across thousands of GPUs
• Design storage and retrieval systems for petabyte-scale image and video datasets
• Develop abstractions over multi-cloud object storage to support flexible training workflows
• Execute and validate large-scale data migrations across storage systems and providers
• Identify and resolve performance bottlenecks in distributed data pipelines
• Work closely with research and infrastructure teams as training requirements evolve
Technical Focus
• Languages & Frameworks: Python, PyTorch (DataLoader internals)
• Storage: Object storage systems (e.g. S3, Azure Blob, GCS)
• Data & Metadata: Parquet, large-scale file layouts
• Video: ffmpeg, PyAV, codec fundamentals and efficient access patterns
What We’re Looking For
• Experience building or operating data pipelines at meaningful scale
• Strong intuition for optimizing data loading and I/O in distributed systems
• Hands-on work with large image or video datasets, often spanning millions of files
• Experience debugging performance issues across large fleets of machines
• Comfort working in research-adjacent environments where requirements evolve alongside the models
• Familiarity with distributed job orchestration (e.g. Slurm, Kubernetes) and object storage performance tuning is a plus
If this sounds like work you’d enjoy, we’d love to hear from you.
Annual Salary (SF) : $180,000–$300,000 USD + Equity depending on profile and experience