Master Thesis (all genders) - Semantic 4D Occupancy Forecasting
Xitaso
Posted: May 5, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Master Thesis - Semantic 4D Occupancy Forecasting
Required Skills
Job Description
Abstract:
Semantic 4D occupancy forecasting is vital for safe autonomous driving, allowing vehicles to anticipate future scene dynamics and geometry. However, training state-of-the-art models relies heavily on fully supervised methods that require massive, prohibitively expensive dense 3D voxel annotations.
To overcome this data bottleneck, cutting-edge research is shifting towards self-supervised and weakly-supervised paradigms that leverage pre-trained 2D foundation models (e.g., DINOv2, CLIP, or SAM). By aligning these rich, open-vocabulary 2D semantic features with 3D/4D spatial representations using advanced Transformer architectures, it is possible to achieve robust spatial-temporal understanding without dense 3D ground truth.
Building upon these breakthroughs, this Master's thesis focuses on developing a foundation-model-aligned framework for vision-based 4D occupancy forecasting. You will design an architecture that distills rich multi-view semantics into a 4D forecasting pipeline, bridging the gap between scalable camera-only inputs and high-fidelity environment prediction.
For outstanding results, we actively encourage and support submissions to top-tier conferences.
These tasks interest you:
• Develop a Transformer-based network for predicting future semantic 4D occupancy from sequential multi-view camera inputs using weak or self-supervision.
• Build and train the PyTorch pipeline, designing alignment mechanisms to distill semantic features from 2D foundation models into your 4D spatial-temporal representation.
• Benchmark against fully-supervised baselines on large-scale datasets (e.g., nuScenes, OpenOccupancy), focusing on forecasting accuracy (IoU), semantic precision, and label efficiency.
That makes you stand out:
• You are registered in a master's program in computer science, artificial intelligence, robotics, or a related field.
• You have excellent programming skills in Python as well as solid experience with deep learning frameworks (especially PyTorch).
• You have a solid background in 3D computer vision. Practical experience with semantic segmentation, occupancy networks, or 3D Gaussian splatting is a major plus.
• You have knowledge of Vision Transformers (ViT), Foundation Models (DINO, CLIP), and paradigms of self- and weakly-supervised learning.
• You work independently and are solution-oriented, highly motivated, and have very good German and English skills (at least C1 level) to ensure clear and confident communication within the team and with our partners.
Your contact person:
Daniela
+49 821 885882-0
[email protected]