Lead Data Scientist
Confidential
Posted: April 10, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
The Lead Data Scientist at PAI is responsible for developing and deploying advanced security solutions using data science techniques, working closely with cross-functional teams to identify and mitigate security threats.
Required Skills
Job Description
Company Profile:
Prevalent AI (PAI) is a Security Data Science Company, founded in the UK, by experts recognized globally, for solving the world’s toughest security problems. We apply the world’s best Security Data Science knowledge and expertise to help companies understand, deploy, and support the most advanced security solutions, by developing a security architecture based on a deep understanding of Data Science, Security Tradecraft and Big Data Technologies.
PAI’s Security Data Science (SDS) platform is a big data security analytics platform that can ingest wide range of security telemetry data and apply advanced analytical approaches to identify and detect control weakness and security risks within enterprises.
PAI team consists of Cyber Security Domain Specialists, Information Security Analysts, Data Scientists, Data Engineers, and Data Analysts focused on developing advanced security analytics solutions (Solution Development) and delivering security insights to our clients.
Prevalent AI India Pvt Ltd., a subsidiary of Prevalent AI, has offices in Infopark, Cochin, Kerala. For more information, please visit https://www.prevalent.ai
ROLE PURPOSE
As a Lead Data Scientist at Prevalent, you will lead a team in developing AI-driven solutions that power our core Security Data Science Products. You will work with diverse, large-scale data to uncover insights, build predictive and generative AI models, and solve complex business problems in the cybersecurity and third-party risk management domain.
Beyond hands-on technical work, you will help shape product strategy, drive innovation across the AI/ML stack, and mentor your team. This role offers the opportunity to experiment with cutting-edge technologies—including large language models and agentic AI—lead impactful projects, and make a real difference in Prevalent’s data-driven future.
KEY ACCOUNTABILITIES
Data Science & Machine Learning
Collaborate with business SMEs to understand requirements and translate them into data science solutions using data preparation, visualization, statistical modeling, and machine learning techniques (supervised, unsupervised, and optimization)
Design, build, and deploy predictive and classification models, including deep learning architectures (CNNs, Transformers, GNNs) suited to security data problems
Analyze and validate data for consistency; develop prototypes to demonstrate key elements of models, visualizations, and data transformations
Communicate insights and predictions through clear reports and visualizations tailored for both technical and non-technical audiences
Work closely with engineering teams to ensure accurate, production-grade implementation of data science designs through documentation, prototype code, testing, and code reviews
Generative AI & LLM Integration
Design and implement LLM-powered features such as intelligent document processing, automated risk assessment, threat summarization, and conversational interfaces
Build and optimize Retrieval-Augmented Generation (RAG) pipelines using vector databases (e.g., Pinecone, Weaviate, pgvector) and embedding models for domain-specific knowledge retrieval
Evaluate, fine-tune, and deploy foundation models (e.g., OpenAI, Anthropic, open-source LLMs such as Llama/Mistral) using techniques like LoRA, RLHF, and DPO
Design agentic AI workflows and multi-step reasoning systems using frameworks such as LangChain, LangGraph, or CrewAI for complex security automation tasks
Implement prompt engineering best practices, evaluation frameworks, and guardrails to ensure reliable, safe, and auditable LLM outputs in production.
MLOps & Productionization
Own the end-to-end ML lifecycle: experiment tracking (MLflow/W&B), model registry, CI/CD for ML, automated retraining, and model versioning
Deploy and monitor models in production using cloud-native services (AWS SageMaker, GCP Vertex AI, or Azure ML) with containerized workflows (Docker, Kubernetes)
Build model monitoring and observability pipelines to track data drift, performance degradation, and model health in real time
Design and manage feature stores and data pipelines to ensure reproducibility and efficiency at scale.
LLMOps
Build and manage LLM serving infrastructure using tools like vLLM, TGI (Text Generation Inference), or Triton Inference Server for efficient, low-latency model deployment
Implement prompt versioning, management, and regression testing pipelines to ensure consistency and traceability across prompt iterations
Set up LLM observability and tracing using platforms such as LangSmith, Arize Phoenix, or Helicone to monitor latency, token usage, cost, and output quality
Optimize inference costs through strategies like semantic caching, request batching, model routing (large vs. small model tiering), and quantization
Design and maintain automated evaluation pipelines for LLM outputs, combining programmatic evals, LLM-as-judge patterns, and human-in-the-loop review workflows
Orchestrate production guardrails including content filtering, output validation, PII detection, and toxicity screening as part of the serving pipeline
Manage LLM gateway and API layer for centralized rate limiting, usage tracking, key management, fallback routing, and multi-provider abstraction.
Responsible AI & Security
Champion responsible AI practices: bias and fairness auditing, model explainability (SHAP, LIME), and compliance with AI governance frameworks
Ensure robustness against adversarial attacks, prompt injection, data leakage, and other LLM-specific security risks
Maintain documentation and audit trails for model decisions in alignment with regulatory and enterprise requirements.
TEAM LEADERSHIP
Lead, mentor, and grow a team of data scientists by setting clear goals, assigning responsibilities, conducting regular 1:1s, and tracking performance
Promote best practices in data science, solution architecture, code quality, and experimentation methodology across the team
Communicate complex data-driven insights to non-technical stakeholders and executive leadership with clarity and impact
Drive a culture of continuous learning, knowledge sharing, and innovation within the data science team
Partner with Product Management and Engineering leadership to influence product roadmap, prioritize AI/ML initiatives, and conduct build-vs-buy analysis for AI capabilities.
SKILLS & EXPERIENCE
Core Data Science & ML (Required)
8+ years of experience in Data Science or Machine Learning, with at least 2 years in a lead or senior IC role
Strong proficiency in Python and SQL; working knowledge of R, Spark, or Scala is a plus
Deep understanding of ML algorithms: logistic regression, tree-based models (XGBoost, LightGBM), SVMs, KNN, ensemble methods, and neural networks
Hands-on experience with deep learning frameworks (PyTorch, TensorFlow) and architectures (CNNs, RNNs, Transformers, Attention mechanisms)
Strong foundation in NLP techniques: text classification, NER, sentiment analysis, topic modeling, and semantic search
Experience with statistical analysis, A/B testing, causal inference, and experimental design.
Generative AI & LLMs (Required)
Practical experience building applications with LLMs (GPT-4, Claude, Llama, Mistral, or equivalent)
Hands-on experience designing RAG architectures, working with vector databases, and implementing embedding-based retrieval systems
Familiarity with fine-tuning techniques (LoRA, QLoRA, PEFT), RLHF/DPO, and prompt engineering methodologies
Experience with agentic AI frameworks and multi-step LLM orchestration patterns
Understanding of LLM evaluation, red-teaming, hallucination mitigation, and production guardrails
Infrastructure & Tools (Required)
Experience with cloud platforms (AWS, GCP, or Azure) and managed ML services (SageMaker, Vertex AI, Azure ML)
Proficiency with MLOps tooling: experiment tracking (MLflow, W&B), model registries, and CI/CD for ML pipelines
Familiarity with containerization (Docker, Kubernetes) and infrastructure-as-code practices
Experience with modern data stack: data lakehouse architectures (Databricks, Snowflake), streaming (Kafka), and feature stores
Proficiency with data visualization tools and frameworks (Tableau, Streamlit, Gradio, or D3.js) for prototyping and stakeholder communication
Nice to Have
Experience in cybersecurity, third-party risk management, or GRC (Governance, Risk, and Compliance) domains.
Contributions to open-source ML/AI projects.
Published research in ML, NLP, or AI safety.
Experience with graph neural networks or knowledge graphs for security applications.
EDUCATION
Master’s or Ph.D. in Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field. Equivalent practical experience with a strong portfolio of ML/AI work will also be considered.