Snapshot

At Google DeepMind, we foster an environment where ambitious, long-term research flourishes. Our team is tackling one of the hardest problems in modern AI: Post-training Frontier models. Unlike smaller models that can rely on distillation, our frontier models require novel training signals to advance the state of the art. We are defining the horizontal recipes—from revamping RL prompts to advancing Reward Models (RM) —that allow these models to "think" better, reason deeper, and align more closely with human intent. We believe that mastering the feedback loop between user signals and model behavior is the key to breaking through current performance plateaus.

About Us

Artificial Intelligence could be one of humanity’s most useful inventions. At Google DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.

The Role

We are seeking a Research Scientist or Engineer to lead the development of next-generation post-training recipes for Gemini. In this role, you will move beyond standard tuning; you will architect the Reward Modeling and Reinforcement Learning strategies that define how our most capable models learn. You will focus specifically on "hard" capabilities—such as improving chain-of-thought reasoning and complex instruction following—where synthetic data and distillation fall short. You will work horizontally to ensure these recipes scale across text, audio, and multimodal domains, establishing the gold standard for how Gemini evolves.

Key responsibilities:

• Frontier Recipe Development: Design and validate novel post-training pipelines (SFT, RLHF, RLAIF) specifically for frontier-class models where no "teacher" model exists.

• Advance Reward Modeling: Lead research into next-gen Reward Models, including investigating new architectures, reducing reward hacking, and improving signal-to-noise ratios in preference data.

• Unlock "Thinking" Capabilities: innovative methods to improve the model's internal reasoning (chain-of-thought), focusing on correctness, logic, and self-correction in multi-step tasks.

• Revamp RL Paradigms: critically re-evaluate and optimize RL prompts and feedback mechanisms to extract maximum performance from the underlying base models.

• Solve the "Flywheel" Challenge: create robust mechanisms to turn user signals and interactions into training data that continuously improves the model without introducing regression or bias.

Horizontal Impact: collaborate across teams to apply these advanced recipes to various model sizes and modalities (e.g., Audio), ensuring consistent high-quality behavior.

About You

In order to set you up for success as a Research Scientist at Google DeepMind, we look for the following skills and experience:

• PhD in machine learning, artificial intelligence, or computer science (or equivalent practical experience).

• Strong background in Large Language Models (LLMs), Reinforcement Learning (RL), or preference learning.

• Research interest in aligning AI systems with human feedback and utility.

• Familiarity with experiment design and analyzing large-scale user data.

• Strong coding and communication skills.

Preferred requirements

• Experience with RLHF (Reinforcement Learning from Human Feedback) or DPO (Direct Preference Optimization).

• Experience building or improving reward models and conducting human evaluation studies.

• A proven track record of publications in top-tier conferences (e.g., NeurIPS, ICML, ICLR).

• Experience with Chain-of-Thought (CoT) reasoning research or process-based supervision.

• Deep understanding and experience training models from scratch or using self-play/self-improvement techniques.

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.

Research Scientist, Frontier, Zurich

Interested in this position?

Required Skills

Job Description

Why Apply Through MisuJob?

Frequently Asked Questions

How do I apply for this position?

Is MisuJob free for job seekers?

How does AI matching work?

Can I apply to jobs in other countries?

Ready to Apply?