ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

AI Research Engineer (all genders)

ellamind

Location not specified Remote permanent

Posted: December 8, 2025

Interested in this position?

Create a free account to apply with AI-powered matching

Job Description

At ellamind, we build evaluation-first AI infrastructure. Our platform elluminate turns AI evaluation from ad-hoc “vibe checks” into rigorous, repeatable engineering to enable teams to test, measure, and improve LLM applications with confidence.

What you'll do

Advance LLM evaluation research: Design, implement, and validate new benchmarks, metrics, and workflows that measure correctness, robustness, safety, and reliability. Across languages and modalities.

Build LLM-as-a-judge setups and reward models: Develop rubric-based graders, preference data pipelines, reward models, do DPO/RLHF/RLAIF/RLVF training

Generate and curate synthetic data: Create high-quality synthetic datasets for pre-training, post-training and evaluation of LLMs with filtering, deduplication and decontamination to reliably improve model capabilities.

Train and adapt open models: Pre-train and fine-tune open-source LLMs. Use LLM training frameworks to run rigorous ablations.

Scale experiments on GPU clusters: Orchestrate large-scale training inference, and evaluation jobs. Optimize efficiency, and ensure reproducibility end-to-end. We are working with thousands of GPUs.

Multilingual data and evaluation: Extend training datasets and eval pipelines to European languages.

Open science & collaboration: Release datasets/tools, publish technical reports blog posts and papers, and collaborate with partners (e.g., OpenEuroLLM) to push evaluation standards forward.

Productize research: Turn prototypes into elluminate features—automated eval suites, graders, and data pipelines. Work with platform engineers and product to ship reliable workflows.

You’ll mostly work with a Python-based LLM research stack (Huggingface ecosystem, PyTorch, Megatron-LM/torchtitan, vLLM/SGLang, lm-eval-harness/LightEval, dataframe libraries, SLURM, Ray).

What we're looking for

Must-haves

Strong Python engineering skills: Experience building LLM-centric systems with clean, maintainable code, comprehensive testing, and performance optimization at scale.

LLM operations expertise: You’re comfortable with tokenizers/vocabs, data specs (e.g., Parquet), sampling/decoding configs, and evaluation.

Distributed training & inference literacy: Solid grasp of multi-GPU/multi-node fundamentals (e.g., FSDP/DeepSpeed), scheduling, and monitoring—plus practical debugging of throughput/memory issues.

Experiment design & statistics: You plan ablations, track experiments, and use sound statistical methods (significance testing, uncertainty estimates) to draw reliable conclusions.

Data hygiene mindset: You care about dataset quality—deduplication, contamination checks, multilingual coverage, and traceable versioning.

Linux comfort: You’re productive on Linux servers—shell workflows, virtual environments, containers, GPU tooling, logs/metrics, and remote development/debugging.

On-site collaboration: 3 days/week in Berlin or Bremen. Travel to our Bremen HQ during onboarding.

Fluency in English: At least B2 level for team collaboration and technical discussions.

Valid EU work authorization.

Nice-to-haves

Experience with LLM evaluation frameworks (lm-eval-harness, LightEval) or a track record of rigorous custom benchmarks and metrics.

Background in preference learning and reward modeling (DPO/RLHF/RLAIF), including rubric design and high-quality preference data pipelines.

Multilingual expertise: building or evaluating models across European languages; data collection, alignment, and cross-lingual transfer.

Comfort with high-throughput inference systems (vLLM, SGLang), latency/memory optimization, and model quantization.

Experience with systems and orchestration (Slurm/Ray/Kubernetes) and containers (Docker/Apptainer) – including GPU observability, scheduling, and performance tuning.

Familiarity with MLOps and reproducibility: experiment tracking (e.g., W&B), dataset/model/prompt versioning, CI for research workflows, and dependable artifact management.

Experience building open-source tools or publishing research artifacts (datasets, models, papers) or strong technical writing.

Experience working directly with partners or customers to validate results and translate research into product impact.

Advanced degree in Computer Science, Machine Learning, Data Science, or a related field (PhD preferred, or equivalent achievements).

What matters most

We prioritize demonstrated excellence in your projects and career. If you’re motivated to build and optimize AI solutions, we want to hear from you—even if you don’t meet every single criterion.

Diversity & inclusion

Different perspectives make us stronger. We welcome applicants from all backgrounds and encourage you to apply.

Why us?

Shape the future of AI research: Influence our research agenda and Europe’s LLM ecosystem—help set evaluation standards and training practices that serious AI teams and institutions rely on.

Technical excellence meets cutting-edge research: Push the frontier of LLM training and evaluation—design multilingual benchmarks, build LLM-as-a-judge and reward models, generate high-quality synthetic data, and run rigorous ablations at scale on large GPU clusters.

Career-defining opportunity: Systematic evaluation is becoming as fundamental to AI as version control is to software. Work at the center of this shift and contribute methods, datasets, and tools that others adopt and build upon.

Ownership and impact: Lead research end-to-end—formulate hypotheses, build datasets and benchmarks, run large-scale experiments, and publish results (papers, technical reports, OSS). Collaborate with top-tier partner labs and see your work shape model behavior and evaluation practices across the industry.

Compute that matches your ambition: Access serious GPU resources.

Open science by default: Freedom to release datasets, models, and tools; backing for conference submissions and travel.

Competitive package with upside: In addition to a competitive salary, we offer a VSOP (Virtual Stock Option Program) to give you a real stake in the company’s success as we grow.

Best-in-class development experience: Fast and streamlined access to all AI technologies that make your life (and development work) easier, plus the latest tools and platforms to maximize your productivity.

Work environment: Our Bremen office features stunning waterfront views, complimentary beverages, smoothies, and a boat. We’re opening our Berlin office at the end of 2025, giving you flexibility as we expand.

Grow with transformative technology: Build deep expertise in LLM evaluation and infrastructure, contribute to open standards, and advance the state of the art alongside a team that values rigor and impact.

About us

We are a cash-flow-positive Germany-based AI startup building elluminate, the enterprise platform that turns AI evaluation from ad-hoc experiments into rigorous, repeatable workflows so teams can ship reliable AI with confidence. Teams use elluminate to design test suites, benchmark models, track regressions, and ship reliable AI with clear, measurable quality gates. We pair elluminate with custom large-language-model solutions and full on-prem deployment options. Our products have already earned the trust of renowned clients such as Deutsche Telekom, the German Federal Government, and leading health insurers like hkk.

Rooted in Bremen and collaborating with leading organizations, our team has a track record in advanced model and dataset development. We like owning problems end-to-end and shipping pragmatically, and contribute to the open-source community across initiatives like OpenEuroLLM, and regularly publish models and tools to accelerate the broader ecosystem.

Compensation Range: €70.00 - €110,000.00

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply