Lead AI Data Engineer
Josys
Posted: October 15, 2025
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
We're looking for an experienced AI Data Engineer to join our data team in Bangalore, India, to design and build scalable data pipelines/transformations using Spark / PySpark / Scala.
Required Skills
Job Description
Lead AI Data Engineer
Location: Bengaluru, Karnataka, India
About the Role:
We’re looking for an experienced AI Data Engineer (4-8 years) to join our data team. In this role, you’ll build and maintain our data infrastructure on AWS, enabling analytics and AI teams to extract actionable insights. You’ll design and manage end-to-end data pipelines, ensuring high-quality, reliable, and real-time data, while also contributing to ML/GenAI workflows and model deployment pipelines.
What You'll Do:
• Design and build scalable data pipelines/transformations using Spark / PySpark / Scala.
• Manage and optimize Airflow DAGs for complex data workflows.
• Clean, transform, and prepare data for analytics, AI, and ML use cases.
• Use Python for automation, data processing, and internal tooling.
• Work with AWS services (S3, Redshift, EMR, Glue, Athena) to maintain robust data infrastructure.
• Collaborate with Analytics and AI teams to design pipelines for ML/GenAI projects.
• Contribute to Node.js (TypeScript) backend development for data services.
• Automate deployments using CI/CD pipelines (GitHub Actions).
• Monitor, troubleshoot, and ensure data quality, consistency, and reliability across systems.
• Build and maintain data warehouses/lakes and handle real-time streaming data using Kafka or similar technologies.
What You'll Need:
• Bachelor’s or Master’s in Computer Science, Engineering, or related field.
• 4-8 years of hands-on experience in data engineering.
• Strong expertise in Spark / Scala for large-scale data processing.
• Proficient in Airflow for managing and optimizing complex DAGs.
• Advanced Python skills for data manipulation, automation, and tool development.
• Proven experience with AWS related cloud services (S3, Redshift, EMR, Glue, Athena, IAM, EC2).
• Solid understanding of ETL/ELT, data preparation, and analytics workflows.
• Familiar with Node.js and TypeScript for backend data services.
• Experience with automated CI/CD (GitHub Actions).
• Familiarity with CDC Tools like Debezium.
• Strong SQL, knowledge of data warehousing and streaming (Kafka, Flink, Kinesis), and excellent communication skills.
Bonus Points:
• Experience with data lake technologies (Delta Lake, Apache Iceberg).
• Knowledge of ML/GenAI model deployment pipelines.
• Familiarity with data governance, quality frameworks, and statistics.
• Experience with infrastructure as code (Terraform).
• Familiarity with containers (Docker, Kubernetes).
• Experience with monitoring and logging tools (Prometheus, Grafana).