Scientific AI Evaluation & Computational Problem Designer
Weekday AI
Posted: May 5, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
We are seeking a skilled AI evaluation and computational problem designer to create rigorous, research-grade problems for a large-scale evaluation benchmark. The ideal candidate will have expertise in scientific domains and the ability to work with real-world software tools. The role requires the ability to create original, graduate-level problems that are effective in assessing AI systems' ability to solve complex challenges.
Required Skills
Job Description
This role is for one of our clients
Compensation: $45-$100 per hour
We are building a large-scale evaluation benchmark to test advanced AI reasoning across scientific and engineering domains. This role focuses on designing rigorous, research-grade computational problems that assess how effectively AI systems can leverage real scientific software tools to solve complex challenges.
Unlike traditional annotation roles, this position requires creating original, graduate-level problems rooted in real-world scientific workflows. You will iteratively refine these problems through calibration against state-of-the-art AI models, ensuring the right balance of difficulty, depth, and reasoning complexity.
Requirements:
What You’ll Do
• Design advanced computational problems requiring the use of domain-specific scientific software
• Create tasks that test both precise execution (multi-step workflows, simulations) and strategic reasoning (experiment design, inference from partial data)
• Develop problem setups, solution pathways, and validation mechanisms
• Calibrate and refine tasks based on model performance to achieve target difficulty levels
• Ensure problems emphasize reasoning strategy over brute-force computation
Domains & Tools of Interest
We are particularly seeking candidates with hands-on experience in:
• Bioinformatics & Single-Cell Genomics: scanpy, scvelo, squidpy, gudhi (RNA-seq, trajectory inference, spatial transcriptomics)
• Computational Chemistry: PySCF (HF, DFT, TDDFT, CASSCF, post-HF methods)
• Particle & Nuclear Physics: scikit-hep, Monte Carlo simulations, collider data analysis
• Electrical Engineering: scikit-rf, ngspice (RF systems, circuit simulation)
• Astrophysics & Cosmology: astropy (cosmological modeling, survey analysis)
• Structural & Mechanical Engineering: scikit-fem (finite element analysis, elasticity, beam theory)
• Seismology & Geophysics: ObsPy, SPECFEM (waveform analysis, inversion, tomography)
• Pharmacokinetics & Systems Biology: libRoadRunner, Tellurium, SBML-based tools
Experience with other specialized tools in related domains is also welcomed.
What Makes You a Strong Fit
• Graduate-level expertise (MS or PhD preferred) in a relevant STEM field
• Hands-on experience using scientific software libraries for real research problems
• Strong Python programming skills, including building computational workflows and validators
• Ability to design challenging problems that require deep reasoning rather than surface-level solutions
• Familiarity with edge cases, limitations, and practical challenges of scientific tools
Requirements
• Demonstrated proficiency with at least one relevant scientific library (via research, open-source work, or industry experience)
• Ability to work independently and iterate based on feedback
• Comfort working in Linux/terminal environments and remote compute setups
• Availability of at least 15–20 hours per week
Nice to Have
• Experience across multiple domains or tools
• Background in evaluation frameworks or benchmarking
• Experience in teaching, pedagogy, or problem-set design
• Familiarity with reproducible research practices and containerized environments
Engagement Details
• Independent contractor role
• Fully remote with flexible scheduling
• Project scope may evolve based on performance and research needs
Compensation & Payments
• Competitive compensation based on expertise and domain specialization
• Weekly payments via supported global payment platforms
Additional Information
• Work must not involve sharing confidential or proprietary information from any current or past employer or institution
• Projects may be extended, modified, or concluded based on performance and business requirements
• This opportunity does not currently support certain work authorization categories