Frontend Developer — LLM Evaluation & Experiment Visualization
JetBridge
Posted: November 25, 2025
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Our client is a well-funded nonprofit research organization focused on measuring frontier AI capabilities—especially agentic / autonomous capabilities and the ability of models to conduct AI R&D, because those capabilities can create outsized societal and security risk if they scale faster than our ability to evaluate and govern them.
Required Skills
Job Description
Our Client is a well-funded nonprofit research organization focused on measuring frontier AI capabilities—especially agentic / autonomous capabilities and the ability of models to conduct AI R&D, because those capabilities can create outsized societal and security risk if they scale faster than our ability to evaluate and govern them.
Their work is unusually “real-world” compared to typical benchmarks: they build evaluations with high realism and measure performance against skilled-human baselines (often multi-hour tasks), and publish research on how quickly models are improving at completing long tasks.
You’d be building the UI that turns messy LLM evaluation outputs into clear, explorable artifacts that researchers can trust.
What you’ll do
- Build React + TypeScript interfaces for exploring LLM evaluation results and experiment outputs.
- Design and implement data visualizations that make model behavior, metrics, and results easy to inspect.
- Build workflows that support end-to-end traceability of LLM runs (prompts → intermediate steps → decisions → outputs).
- Partner closely with researchers; iterate quickly while balancing clarity, accuracy, and performance.
Tech stack / must-haves
- React + TypeScript
- Hands-on with at least one major visualization library: D3, Plotly, Vega/Vega-Lite, Visx, Three.js, Highcharts, ECharts
Why this matters
- Their mission is to give society and AI labs grounded answers to: “What can frontier models actually do?” and “When do capabilities become dangerous?”
- The team includes researchers and engineers with backgrounds across top AI orgs and programs (e.g., OpenAI, DeepMind, and alumni of Oxford, Caltech, MIRI, and ML interpretability programs).
Location
- On-site in the San Francisco Bay Area (relocation sponsored).