Quick Summary

Innodata is a leading data engineering company that provides AI technology solutions to 4 out of 5 of the world's biggest technology companies, as well as leading companies across finance, insurance, technology, law, and medicine. We're seeking a skilled Language Data Scientist with a strong background in machine learning and artificial intelligence to join our team.

Required Skills

Data Analysis Machine Learning Artificial Intelligence Natural Language Processing GenAI Query Understanding Ranking Systems Search Relevance User Experience Semantic Matching

Job Description

Who we are:

Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are the AI technology solutions provider-of-choice to 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine.

By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of clean and optimized digital data to all industries. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms.

Our global workforce includes over 3,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.

Position Summary:

Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance search and information retrieval applications powered by GenAI. You will work hands-on with search-specific datasets (queries, documents, relevance judgments) in multi-modal and multilingual environments, collaborating with cross-functional partners including search engineers and product teams. You will leverage your expertise in query understanding, semantic matching, and ranking systems alongside human and synthetic data workflows to drive innovation in search relevance and user experience.

Who We’re Looking For:

You have at least 5 years of relevant experience with data creation, curation, and analysis for search and information retrieval systems, including work with GenAI applications (e.g. neural ranking, semantic search, query understanding, RAG-enhanced search, multi-stage ranking pipelines). Your experience spans creating and annotating search datasets — from query-document pairs to relevance judgments, and query intent classifications. You have demonstrated success working on search product challenges such as relevance optimization, query intent understanding, or improving search result diversity and freshness. You understand the unique data annotation challenges in search (inter-rater disagreement on relevance, context-dependent query understanding, geographic and temporal relevance).

You are experienced driving long term projects where you set the strategic plan towards success, using your knowledge of AI, data science, and process design excellence. You are an expert at working cross functionally with both technical and non-technical stakeholders. Despite ambiguity, you use your technical knowledge and experience of working with multiple stake holder to drive solutions.

You bring a research-oriented mindset towards developing long-term excellence in search systems. You are an expert in designing collection, evaluation and quality assurance processes for search data, using human-in-the-loop and synthetic techniques. You understand search-

specific evaluation metrics and quality frameworks, and you can design human relevance judging workflows that account for query ambiguity and subtlety.

Your understanding of machine learning, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), neural ranking architectures, and dense retrieval methods help you tackle search and information retrieval challenges with a critical, innovative mindset. You can assess how GenAI techniques improve search relevance, ranking, and user experience.

Tell Me More:

As a Senior Language Data Scientist, you lead projects and own processes for optimizing search and retrieval systems by creating, validating and annotating search-specific data for LLM/ML applications. This includes query-document pairs, relevance judgments, query intent labels, search result quality assessments, and multimodal search scenarios (image search, product search, news search). You work across different search domains—from web search to e-commerce to vertical search. You consult and engage with customers to understand their business goals and design processes to meet them. You generate insights about the client’s processes and products to drive improvement and innovation. You advise and support business unit heads on engaging with customers to understand the upstream activities that would be performed using Innodata Inc services.

Responsibilities:

• You can lead long-term projects with high complexity and ambiguity from first discussion with the client to completion

• Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data-collection workflows, as well as synthetic ones

• Design and refine search data annotation frameworks, including relevance judging guidelines that handle nuanced query-document relationships, query ambiguity, and domain-specific search challenges (e.g., freshness for news search, user intent for product search)

• Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers

• Assess and optimize search-specific evaluation approaches, including A/B testing frameworks, ranking metrics, and human evaluation studies for search result quality

• Critically assess annotation tooling and workflows

• Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance

• Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.

• Set an ambitious research agenda for improving our products and services

• Contribute to establishing best practices and standards for generative AI development with customers and within the organization

Innodata Sr Language Data Scientist – Search Specialization

Interested in this position?

Quick Summary

Required Skills

Job Description

Why Apply Through MisuJob?

Frequently Asked Questions

How do I apply for this position?

Is MisuJob free for job seekers?

How does AI matching work?

Can I apply to jobs in other countries?

Ready to Apply?