(Senior) Data Scientist – AI Booster Team (f|m|d)
Confidential
Posted: January 30, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Design and develop AI models to drive business growth and innovation.
Required Skills
Job Description
At idealo, Generative AI (GenAI) is becoming a multiplier across every team. The AI Booster Team is our internal technical competence center: we pair with product teams, build reusable GenAI building blocks and share best practices company-wide. We validate AI business cases through data and ship evaluation frameworks that turn pilots into production. As a Data Scientist you will translate ideas into evidence: designing experiments, measuring LLM quality, and unlocking the full value of idealo’s data assets to guide today’s and tomorrow’s GenAI initiatives.
This position is available full-time or part-time.
About your new role
• Quantify opportunities & run experiments - perform causal analyses using experiments and observational methods to evaluate the business impact of GenAI features.
• Own model evaluation pipelines - create metrics dashboards and human / AI-assisted reviews that benchmark LLM quality, cost and safety.
• Guide model selection - compare foundation models, fine-tunes and RAG setups, recommending the right balance of performance vs. cost.
• Champion data strategy - surface high-value datasets (product, pricing, behaviour) and advocate their use in current and future AI products.
• Pair & coach - work embedded with engineers and analysts, sharing best practices in experimentation, metrics, and GenAI evaluation.
• Harvest patterns - document reusable evaluation playbooks so every team can measure GenAI success consistently.
Skills & Requirements
• 3 + years in data science / analytics, including A/B testing or causal inference at scale.
• Expert SQL and Python (pandas, StatsModels / SciPy, scikit-learn); comfortable with notebooks and BI tools for storytelling.
• Hands-on with LLM assessment - prompt / temperature sweeps, embedding similarity metrics, human-in-the-loop studies, and LLM-as-a-judge tools (e.g. Bedrock model evaluation, OpenAI Evals).
• Familiar with Generative AI stacks (Hugging Face, LangChain/LlamaIndex, vector DBs like Pinecone/Qdrant) and retrieval-augmented generation concepts.
• Proficiency in AWS analytics & MLOps: SageMaker Experiments / Pipelines, Bedrock, Athena, Lambda, Step Functions; able to automate evaluation workflows and cost dashboards.
• Strong communication: can turn complex findings into clear, actionable insights and coach cross-functional teams.
• We’re keen to see evidence of exceptional achievement - perhaps you’ve scaled a personal project to thousands of users, published influential research, ranked highly in competitive arenas (e.g. sports, Kaggle, hackathons) or maintain widely-used open-source libraries. Tell us what makes you stand out!
You don’t tick every single box? No worries! We hire people, not checklists, and value motivation to grow.