ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Togetherai

San Francisco Remote permanent

Posted: December 10, 2025

Interested in this position?

Create a free account to apply with AI-powered matching

Job Description

About the Role

Together AI is building the fastest, most capable open-source-aligned LLMs and inference stack in the world. As part of the Turbo organization, you will be a critical bridge between cutting-edge model research and real-world behavioral reliability. This role focuses on deeply understanding model behavior — probing reasoning, tool use, function calling, multi-step interactions, and subtle failure modes — and building the evaluation systems that ensure models behave intelligently and consistently in production.

You will develop robust evaluation pipelines, design high-quality behavioral test suites, and work closely with training, post-training, inference, and product teams to identify regressions, shape datasets, and influence model improvements. Your work will directly define how Together measures model quality and reliability across releases.

Responsibilities

• Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors.

• Develop specialized evaluation suites for:

• Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery.

• Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences.

• Tool-augmented interactions — search, retrieval, code execution, API-driven actions.

• Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing.

• Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains.

• Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements.

• Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases.

• Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers.

Requirements

• Strong engineering skills with Python, evaluation tooling, and distributed workflows.

• Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming.

• Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns.

• Experience designing experiments, building datasets, and interpreting noisy behavioral signals.

• Understanding of function calling and structured output formats.

• Familiarity with GPU or distributed compute environments.

• Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines.

• Experience with multi-turn or multi-step reasoning tasks.

• Familiarity with inference systems, distributed infrastructure, or post-training workflows.

• Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures.

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our mission is to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets including FlashAttention, Hyena, FlexGen, ATLAS, and RedPajama. We invite you to join a passionate group of researchers and engineers in building the next generation of AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits. The US base salary range for this full-time position is: $220,000 – $270,000 + equity + benefits. Compensation varies by location, level, and experience.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal opportunity to all individuals regardless of race, color, ancestry, religion, sex, sexual orientation, national origin, age, citizenship, marital status, disability, gender identity, veteran status, or other protected characteristics.

Please see our privacy policy at https://www.together.ai/privacy

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply