ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

AI Evaluation Engineer

Distyl

San Francisco, California, USA Remote permanent

Posted: February 20, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

As a Technical Lead, you will be responsible for building and deploying AI models to drive business value for our clients. You will work closely with cross-functional teams to identify and address technical challenges, and develop high-quality AI solutions. You will also provide training and support to junior engineers to ensure they are equipped to handle complex AI projects.

Job Description

About Distyl AI

Distyl AI develops production-grade AI systems to power core operational workflows for Fortune 500 companies. Powered by a strategic partnership with OpenAI, in-house software accelerators, and deep enterprise AI expertise, we deliver working AI systems with rapid time to value – within a quarter.

Our products have helped Fortune 500 customers across diverse industries, from insurance and CPG to non-profits. As part of our team, you will help companies identify, build, and realize value from their GenAI investments, often for the first time. We are customer-centric, working backward from the customer’s problem and holding ourselves accountable for creating both financial impact and improving the lives of end-users.

Distyl is led by proven leaders from top companies like Palantir and Apple and is backed by Lightspeed, Khosla, Coatue, Dell Technologies Capital, Nat Friedman (Former CEO of GitHub), Brad Gerstner (Founder and CEO of Altimeter), and board members of over a dozen Fortune 500 companies.

What We Are Looking For

At Distyl, we build AI systems using Evaluation-Driven Development—an approach where evaluation is not an afterthought, but the primary mechanism for iterating, improving, and trusting AI behavior in production.

AI Evaluation Engineers focus on designing and implementing the evaluation systems that drive this process. They are hands-on engineers who write production Python code, build evaluation pipelines, and use structured signals to guide system design, prompt iteration, and deployment decisions for real customer-facing AI systems.

This role is for engineers who believe that AI systems only improve when measurement is tightly coupled to development—and who want to apply that philosophy directly to systems that matter.

Key Responsibilities

• Design and implement evaluation frameworks that enable Evaluation-Driven Development for AI systems deployed in customer environments

• Define how system quality is measured in each domain, ensuring that evaluation signals reflect real user needs, domain constraints, and business objectives

• Build and maintain golden test cases and regression suites in Python, using both human-authored and AI-assisted test generation to capture critical behaviors and edge cases. These test suites are treated as first-class system components that evolve alongside the AI system itself

• Develop and maintain evaluation pipelines—offline and online—that integrate directly into system iteration loops. Evaluation results inform prompt design, agent logic, model selection, and release readiness, ensuring that system changes are driven by measurable improvements rather than intuition alone

• Define, calibrate, and operate LLM-based graders, aligning automated judgments with expert human assessments. They investigate where evaluation signals diverge from real-world outcomes and refine grading approaches to maintain signal quality as systems and domains evolve

• Work closely with Forward Deployed AI Engineers, Architects, Product Engineers, AI Strategists, and domain experts to ensure evaluation frameworks meaningfully guide system development and deployment in production

What We Require

• 2+ years of software engineering experience

• Strong Python Engineering Skills: Write clean, maintainable Python and are comfortable building evaluation and experimentation pipelines that run in production environments. You treat evaluation code with the same rigor as application code

• Experience with Evaluation-Driven or Experiment-Driven Development: Experience using structured evaluation or experimentation frameworks to drive system iteration, and understand the pitfalls of overfitting to metrics that don’t reflect real outcomes

• Ability to Translate Human Judgment into Code: Work with subject matter experts to elicit high-quality judgments and encode them into test cases, scoring functions, and graders that scale

• Systems-Oriented Mindset: Understand how evaluation interacts with prompts, agents, data, and deployment. You design evaluation systems that support fast iteration while maintaining trust and safety in production

• AI-Native Working Style: Use AI tools to generate tests, analyze failures, explore edge cases, and accelerate debugging and iteration

• Travel: Ability to travel 25-50%

What We Offer

• The base salary range for this role is $130K – $250K, depending on experience, location, and level. In addition to base compensation, this role is eligible for meaningful equity, along with a comprehensive benefits package

• 100% covered medical, dental, and vision for employees and dependents

• 401(k) with additional perks (e.g., commuter benefits, in‑office lunch)

• Access to state‑of‑the‑art models, generous usage of modern AI tools, and real‑world business problems

• Ownership of high‑impact projects across top enterprises

• A mission‑driven, fast‑moving culture that prizes curiosity, pragmatism, and excellence

Distyl has offices in San Francisco and New York. This role follows a hybrid collaboration model with 3+ days per week (Tuesday–Thursday) in‑office..

We’re grateful for the strong interest in this role. The best way to get your profile in front of our team is to apply directly through our careers page, where all applications are reviewed. Due to the high volume of interest, we’re not able to review or respond to all direct emails or LinkedIn messages. We will be in touch with every applicant once we’ve completed our review, regardless of the decision.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply