Jobs

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Saudi Arabia Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesFair ChallengesDebuggingDockerEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

South Africa Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesFair ChallengesAI FailuresDockerReactEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Poland Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesAI EvaluationDockerGitHub ActionsEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Denmark Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesReasoningDockerGitHub ActionsEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Milan, Metropolitan City of Milan, Italy Remote part_time

PythonFull-StackTestingFunctional TestsEdge-CasesReasoningDockerGitHub ActionsEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Portugal Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesReasoningDockerEnglish ProficiencyQuality Criteria

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Paris, Île-de-France, France Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesFair ChallengesAI FailuresDockerEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Spain Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesReasoningDockerEnglish ProficiencyFull-Stack Development

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Germany Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesFull-Stack DevelopmentReactBack-end SystemsDockerEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Australia Remote part_time

PythonFull-StackTestingFunctional TestsEdge-CasesAI Failure AnalysisDockerEnglish ProficiencyFair Challenges

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

Canada Remote part_time

PythonFull-StackTestingFunctional TestsEdge CasesDockerGitHub ActionsReactEnglish Proficiency

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift

United Kingdom Remote part_time

PythonFull-StackFull-Stack developmentReactDockerTestingFunctional TestsEdge CasesAI EvaluationSoftware DevelopmentComputer Science

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, eval...

March 13, 2026 View Details

AI Evaluation Scientist

BMO Financial Group

Toronto, ON, CAN permanent

Data ScienceMachine LearningLLMsDeep LearningEvaluation MetricsRobustnessReliabilityFairnessExplainabilityCalibrationSafetyPerformance

Application Deadline: 04/29/2026 Address: 100 King Street West Job Family Group: Data Analytics & Reporting About the Team BMO’s Applied AI team is responsible for building high‑performing, saf...

March 12, 2026 View Details

AI Evaluations Engineer

BMO Financial Group

New York, NY, USA permanent

Data ScienceMachine LearningLLMsDeep LearningEvaluation MetricsRobustnessReliabilityFairnessExplainabilityCalibrationSafetyPerformance

Application Deadline: 03/30/2026 Address: 151 W 42nd Street Job Family Group: Data Analytics & Reporting About the Team BMO’s Applied AI team is responsible for building high‑performing, safe, ...

March 12, 2026 View Details

Business Analyst with AI evaluation skills

Gramian Consulting Group

Turkey Remote permanent

Computer SoftwareCommand-LineData QualityEvaluation FrameworksStructured AnalysisFact-CheckingJustificationBrowser-based ToolsContent Evaluation

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help compan...

March 9, 2026 View Details

Business Analyst with AI evaluation skills

Gramian Consulting Group

Pakistan Remote permanent

Computer SoftwareBrowser ExtensionsCommand-LineFact-CheckingEvaluation FrameworksStructured AnalysisData Quality

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help compan...

March 5, 2026 View Details

Business Analyst with AI evaluation skills

Gramian Consulting Group

Argentina Remote permanent

Computer SkillsBrowser ExtensionsCommand LineFact-CheckingEvaluationStructured AnalysisCommunication Skills

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help compan...

March 5, 2026 View Details

Applied AI Evaluation Engineer

BMO Financial Group

Toronto, ON, CAN permanent

Programming languagesMachine LearningData ScienceEvaluation PipelinesMLOpsAI SystemsCloud InfrastructureRegulatory CompliancePythonGit

Application Deadline: 03/30/2026 Address: 100 King Street West Job Family Group: Data Analytics & Reporting Hybrid role About the Team BMO’s Applied AI team sets the standard for safe, high-pe...

March 4, 2026 View Details

Psychologist - AI Evaluation & Research Consultant

Weekday AI

India Remote part_time

PsychologyClinicalCognitiveDevelopmentalSocialAI EvaluationReasoningEthicsBenchmarkingFeedback

This role is for one of our clients Compensation: $30-$80 per hour Experienced psychologists are invited to contribute to a high-impact AI research collaboration with a leading artificial intelligen...

February 25, 2026 View Details

Applied AI Evaluation Scientist

Jump App

US Remote permanent

AI SoftwareData ScienceInformation RetrievalMachine LearningProduct ThinkingEvaluation FrameworksEmbeddingRetrievalRankingGenerationChunking

Applied AI Evaluation Scientist Location: Remote (U.S.) Team: AIML Quality — reporting into Engineering leadership Level: Senior (IC) About Jump Jump's mission is to empower financial advisors, f...

February 23, 2026 View Details

AI Evaluation Engineer

Distyl

San Francisco, California, USA Remote permanent

PythonEvaluation OperationsEvaluation PipelinesLLVM ProficiencyRegulatory Constraints

About Distyl AI Distyl AI develops production-grade AI systems to power core operational workflows for Fortune 500 companies. Powered by a strategic partnership with OpenAI, in-house software acceler...

February 20, 2026 View Details

Staff Developer, AI Evaluation & Reliability

Caseware

Bogotá, Colombia Remote permanent

AI StrategyAgentic AILLM EvaluationRetrieval-Augmented GenerationData ScienceReliability EngineeringRegulatory ControlsFeature FlagsCode DeploymentsAudit Trails

Caseware is one of Canada's original Fintech companies, having led the global audit and accounting software industry for over 30 years, with more than 500,000 users across 130 countries and available ...

February 18, 2026 View Details

Machine Learning Engineer, AI Evaluation

Wayve

London (London, United Kingdom) Hybrid permanent

Machine LearningAIAttributionBig Data VisualizationProductionizationFull-Stack CollaborationRapid PrototypingModel IntrospectionSaliencyLatent Diagnostics

At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief...

February 18, 2026 View Details

Sr AI Research Scientist, AI Evaluation and Reliability

Upwork

Toronto, Ontario, Canada (Toronto, Canada) permanent

AIResearchEvaluationReliabilityLeadershipMethodologyMitigationCross-functional

Upwork Inc.’s (Nasdaq: UPWK) family of companies connects businesses with global, AI-enabled talent across every contingent work type including freelance, fractional, and payrolled. This portfolio inc...

February 5, 2026 View Details

Senior Product Manager, AI Evaluations

ServiceNow

Santa Clara, California, United States Remote permanent

Customer ServiceInnovationAgentic AIRoadmapProduct VisionRequirementsUser ResearchStakeholder CommunicationGenerative AICollaboration

It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow stands as a global market le...

January 21, 2026 View Details

Director of Engineering - AI Evaluations & Experimentation

Salesforce

New York - New York Hybrid permanent

AI SystemsLeadershipEngineeringEvaluationExperimentationCI/CDLLMObservabilityTeam Leadership

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Job Category Software Engineering Job Details Ab...

February 3, 2026 View Details

Senior Engineer, AI Evaluation & Reliability (Agentic AI)

Anomali

Redwood City, CA Hybrid permanent

AI EvaluationReliabilityQuality MetricsContinuous EvaluationsDataset ManagementSafety & ReliabilityAdversarial TestingExplainabilityInfrastructure DesignProduction Observability

Company: Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of it is an omnipresent, intellige...

November 5, 2025 View Details

AI Evaluation Engineer

Weekday AI

Pune, Maharashtra, India permanent

PythonMachine LearningArtificial IntelligenceData AnalyticsLLM EvaluationRAG PipelinesCI/CDTesting PipelinesAPI ValidationSystem Performance

This role is for one of the Weekday's clients We are seeking an AI Evaluation Engineer to evaluate, validate, and ensure the quality of AI/ML systems working with complex, real-world data. This role ...

January 23, 2026 View Details

AI Evaluation Analyst

Ema

Bengaluru, Karnataka, India permanent

AI Model EvaluationTest Set DevelopmentModel AssessmentReport GenerationUX Flow TestingAutomated TestingAnalytical SkillsCommunication SkillsAnalytical BackgroundAttention to Detail

About Ema: Ema is at the forefront of artificial intelligence research and application, dedicated to creating cutting-edge AI solutions that empower users globally. Join our dynamic team to help shape...

November 14, 2025 View Details

AI Evaluation Product Manager

Plaud

Singapore, Singapore permanent

Product ManagementAI EvaluationMemory QualityEvaluation FrameworksTest Case DevelopmentScalable SystemsData SecurityComplianceCross-Functional CollaborationBenchmarking

Plaud AI is hiring AI Evaluation Product Manager Location: Singapore About Plaud Inc. Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and perf...

November 17, 2025 View Details

Alpha Pictoris | Arabic (Gulf) AI Evaluation Specialist

Welocalize

Cairo, Egypt Remote freelance

ArabicGulf ArabicAI EvaluationLarge Language ModelsScenario DesignEdge Case PromptsOutput AnalysisInstruction AdherenceFactual AccuracyToneSafetyUsefulness

Overview We are seeking Arabic (Gulf) AI Evaluation Specialists to help assess and improve the performance of advanced AI systems. In this role, you’ll contribute directly to the evaluation and enhanc...

January 23, 2026 View Details

Alpha Pictoris | Arabic (Levantine) AI Evaluation Specialist

Welocalize

Cairo, Egypt Remote freelance

Arabic FluencyLevantine Dialect ProficiencyAI EvaluationModel AssessmentScenario DesignPrompt EngineeringOutput AnalysisRubric DevelopmentEvaluation CriteriaReference Material CreationHallucination DetectionCultural Contextual Understanding

Overview We are seeking Arabic (Levantine) AI Evaluation Specialists to help assess and improve the performance of advanced AI systems. In this role, you’ll contribute directly to the evaluation and e...

January 23, 2026 View Details

Alpha Pictoris | Arabic (Egyptian) AI Evaluation Specialists

Welocalize

Cairo, Egypt Remote freelance

Arabic LanguageAI EvaluationScenario DesignPrompt EngineeringData AnnotationQuality AssuranceEvaluation RubricsReference Material CreationCritical ThinkingCultural Awareness

Overview We are seeking Arabic (Egyptian) AI Evaluation Specialists to help assess and improve the performance of advanced AI systems. In this role, you’ll contribute directly to the evaluation and en...

January 23, 2026 View Details

Pictor | Arabic (Levantine) AI Evaluation Specialist

Welocalize

Cairo, Egypt Remote freelance

Arabic Language ProficiencyLevantin Arabic FluencyAI EvaluationPrompt DesignEvaluation RubricsData AnnotationContent Quality ReviewSearch Quality RatingPrompt EngineeringLinguistic QA

Overview We are looking for Arabic (Levantine) AI Evaluation Specialists to support the testing and evaluation of an Arabic language model. In this role, you will be instrumental in refining and evalu...

January 15, 2026 View Details

Pictor | Arabic (Gulf) AI Evaluation Specialist

Welocalize

Cairo, Egypt Remote freelance

Arabic Language ProficiencyAI EvaluationPrompt DesignEvaluation RubricsData AnnotationCritical ThinkingCultural SensitivityRemote Work ExperienceFactuality AssessmentSafety Evaluation

Overview We are looking for Arabic (Gulf) AI Evaluation Specialists to support the testing and evaluation of an Arabic language model. In this role, you will be instrumental in refining and evaluating...

January 15, 2026 View Details

Research Engineer Model training, new architectures, performance enhancements, generative AI evaluation (RE2) - AI4S

Barcelona Supercomputing Center (BSC)

Location not specified

Machine LearningModel TrainingNew ArchitecturesPerformance EnhancementsGenerative AI EvaluationNLPSpanish LanguageCatalan LanguageEU-Funded ProjectsEquity, Diversity and Inclusion

Job Reference 28_26_LS_LT_RE2 Position Research Engineer Model training, new architectures, performance enhancements, generative AI evaluation (RE2) - AI4S Closing Date Monday, 02 February, 2026 Refer...

January 22, 2026 View Details

Technical Program Manager, AI Evaluation Specialist

Chime

Remote - US Remote permanent

QAEvaluationOperational AnalyticsHuman-in-the-LoopModel MonitoringText ReviewRubricsScorecardsSQLLookerSnowflakeAttention to Detail

About the Role We’re hiring an AI Evaluation Specialist to strengthen how Chime governs, evaluates, and improves AI systems across Operations. As part of Speech Analytics, you will own the human-in-t...

January 8, 2026 View Details

AI Evaluations Program Manager

Sofi

United States (NY - New York City) Remote permanent

Program ManagementAI TechnologiesFinancial ServicesRegulatory ComplianceProcess ImprovementStakeholder ManagementData AnalysisFinancial ProductsRisk Management

Employee Applicant Privacy Notice Who we are: Shape a brighter financial future with us. Together with our members, we’re changing the way people think about and interact with personal finance. We...

January 7, 2026 View Details

Principal AI Evaluation Engineer

Workatbackbase

Hyderabad permanent

PythonAIEvaluationsWindows scriptingObservabilityData AnalysisAI Agent BuildingMentoring

About Backbase As a a Principal AI Evaluation Engineeryou will be leading the evaluation efforts in our AI-powered SDLC team. You will own the evaluation strategy for AI assistants and agentic workfl...

December 12, 2025 View Details

Senior Platform Engineer I, AI Evaluation (24 months fixed-term)

Khanacademy

Mountain View, CA / Remote (Continental US + Hawaii + Canada Only) (Mountain View, CA) Remote permanent

GoGraphQLJavaScriptReactReduxAI evaluationsoffline benchmark testsOnline Experimentsstratified datasetsground truth labeling

ABOUT KHAN ACADEMY Khan Academy is a nonprofit with the mission to deliver a free, world-class education to anyone, anywhere. Our proven learning platform offers free, high-quality supplemental learn...

December 11, 2025 View Details

Latest Job Openings