Data Engineer - Generative AI Pipelines
Palup
Posted: September 8, 2025
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Design and implement scalable data pipelines for generative AI applications, including ETL/ELT, data enrichment, and data lake management.
Required Skills
Job Description
We’re looking for a Data Engineer —to build scalable data pipelines that power Generative AI applications like RAG, summarization, NER, and FAQ systems. You’ll design systems to scrape, ingest, transform, and enrich data at scale, ensuring it’s clean and optimized for AI/ML workflows.
What You’ll Do:
• Build and optimize ETL/ELT pipelines for large-scale structured & unstructured data
• Develop data enrichment workflows (entity extraction, embeddings, metadata tagging)
• Manage data lakes, warehouses, and vector databases to support AI retrieval
• Collaborate with ML engineers on AI-ready data infrastructure
• Ensure pipeline reliability, scalability, and observability
Who You Are
• 5+ years in data engineering (Python, SQL, Spark/Dask/Ray)
• Experience with web scraping frameworks (Scrapy, Playwright, Selenium)
• Strong knowledge of cloud platforms (AWS/GCP/Azure) & orchestration tools (Airflow/Prefect/Dagster)
• Familiarity with GCP storage solutions such as BigQuery, Cloud Storage etc.
• Familiarity with vector search databases (Pinecone, Weaviate,, Elasticsearch)
• Understanding of NLP concepts relevant to generative AI