Research Engineer (LLM Training and Performance)

Jetbrains

Amsterdam, Netherlands; Berlin, Germany; Limassol, Cyprus; London, United Kingdom; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia (Amsterdam, Berlin, Cyprus, Germany, Limassol, London, Munich, Netherlands, Paphos Remote permanent

Posted: January 15, 2026

Required Skills

Job Description

At JetBrains, code is our passion. Ever since we started back in 2000, we have been striving to make the strongest, most effective developer tools on earth. By automating routine checks and corrections, our tools speed up production, freeing developers to grow, discover, and create.

We’re looking for a Research Engineer who will own the training stack and model architecture for our Mellum LLM family. Your job is easier said than done: make training faster, cheaper, and more stable at a large scale. You’ll profile, design, and implement changes to the training pipeline – from architecture to custom GPU kernels, as needed.

As part of our team, you will:

• Be responsible for improving end-to-end performance for multi-node LLM pre-training and post-training pipelines.

• Profile hotspots (Nsight Systems/Compute, NVTX) and fix them using compute/comm overlap, kernel fusion, scheduling, etc.

• Design and evaluate architecture choices (depth/width, attention variants including GQA/MQA/MLA/Flash-style, RoPE scaling/NTK, and MoE routing and load-balancing).

• Implement custom ops (Triton and/or CUDA C++), integrate via PyTorch extensions, and upstream when possible.

• Push memory/perf levers: FSDP/ZeRO, activation checkpointing, FP8/TE, tensor/pipeline/sequence/expert parallelism, NCCL tuning.

• Harden large runs by building elastic and fault-tolerant training setups, ensuring robust checkpointing, strengthening reproducibility, and improving resilience to preemption.

• Keep the data path fast using streaming and sharded data loaders and tokenizer pipelines, as well as improve overall throughput and cache efficiency.

• Define the right metrics, build dashboards, and deliver steady improvements.

• Run both pre-training and post-training (including SFT, RLHF, and GRPO-style methods) efficiently across sizable clusters.

We’ll be happy to bring you on board if you have:

• Strong PyTorch and PyTorch Distributed experience, having run multi-node jobs with tens to hundreds of GPUs.

• Hands-on experience with Megatron-LM/Megatron-Core/NeMo, DeepSpeed, or serious FSDP/ZeRO expertise.

• Real profiling expertise (Nsight Systems/Compute, nvprof) and experience with NVTX-instrumented workflows.

• GPU programming skills with Triton and/or CUDA, and the ability to write, test, and debug kernels.

• A solid understanding of NCCL collectives, as well as topology and fabric effects (IB/RoCE), and how they show up in traces.

Our ideal candidate would have experience with:

• FlashAttention-2 and 3, CUTLASS and CuTe, TransformerEngine and FP8, Inductor, AOTAutograd, and torch.compile.

• MoE at scale (expert parallel, router losses, capacity management) and long-context tricks (ALiBi/YaRN/NTK scaling).

• Kubernetes or SLURM at scale, placement and affinity tuning, as well as AWS, GCP, and Azure GPU fleets.

• Web-scale data plumbing (streaming datasets, Parquet and TFRecord, tokenizer perf), eval harnesses, and benchmarking.

• Safety and post-training methods, such as DPO, ORPO, GRPO, and reward models.

• Inference ecosystems such as vLLM and paged KV.

#LI-KP1

We process the data provided in your job application in accordance with the Recruitment Privacy Policy.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.