Senior AI Researcher (f/m/d)
AlephAlpha
Posted: February 26, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
We're looking for a Senior AI Researcher to join our team in Heidelberg, Germany, to work on reinforcement learning-based models that excel at addressing customer needs. As a Senior AI Researcher, you'll be expected to combine theoretical knowledge of reinforcement learning methods with a desire to improve model capabilities and improve the state-of-the-art in large-scale training. The ideal candidate will have a strong background in AI research and a passion for pushing the boundaries of what's possible with machine learning.
Required Skills
Job Description
Our Mission
Aleph Alpha is one of the few companies in Europe with end-to-end in-house model development including pre- and post-training. We’re building models that have general-purpose capabilities, but also specifically excel at addressing the needs of our customers.
We're growing our post-training team in Heidelberg (or hybrid in Germany) and are looking for an AI Researcher who combines a deep theoretical understanding of reinforcement learning methods with a desire to improve on the state of the art and improve model capabilities in large-scale training.
The Role
As a (senior) AI Researcher for reinforcement learning you will shape and improve the underlying RL methodology, maintain a high-quality training code-base, and conduct large-scale experiments to hill-climb our performance benchmarks. This role is for you if you both have a strong theoretical background on RL and the engineering drive to bring these methods into production and improve on the methods as part of the reinforcement learning team.
In your day-to-day you will conduct large-scale reinforcement learning experiments, derive hypotheses from the results, and iterate on both the implementation and methodology based on the observations. Together with a collaborative team, you will have direct impact on the models that we ship to our customers.
This role is for Aleph Alpha research.
Your Responsibilities
• Hill-climb in large-scale training: Conduct large-scale LLM training runs, analyze evaluation scores in depth, propose hypotheses for improvement and directly implement them in order to maximize performance on our benchmarks.
• Theoretical innovation: Stay at the bleeding edge of RL research. You will identify, implement, and iterate on novel approaches to multi-turn reinforcement learning.
• Scale our training infrastructure: Identify bottlenecks in our training setup and optimize our RL training loops for large-scale training.
• Cross-functional collaboration: Partner with our other post-training teams to turn raw feedback into actionable training signals, ensuring that our RL iterations lead to measurable improvements in downstream performance.
Your Profile
Basic Qualifications
• A deep understanding of Reinforcement Learning theory and how it relates to modern RL methods.
• Experience with multi-node LLM training (ideally using RL). You understand how to scale multi-node RL trainings and can reason about and implement distributed algorithms.
• Familiarity with statistical methods for evaluation and experiment design.
• Ability to reason about what an evaluation/environment measures and whether it matters - not just run benchmarks, but understand them.
• Strong Python skills and comfort with ML tooling (especially torch distributed)
• Willingness to relocate to Heidelberg or travel regularly (potentially weekly).
Preferred Qualifications
• PhD in reinforcement learning or equivalent research experience.
• A history of contributions to top-tier venues (NeurIPS, ICML, ICLR, etc.) specifically regarding RL.
• Experience evaluating LLM models and crafting environments for training.
Why This Role
What sets us apart is our team culture: we’re a highly collaborative, interactive, and non-hierarchical team. We’re all co-located in the same time-zone and you can directly impact our core methodologies and the quality of the models that we build.