Quick Summary

We are seeking an experienced Data Scientist to join our team and work on designing and deploying scalable ML solutions in production, focusing on enterprise-grade deployments.

Required Skills

Data Science Machine Learning Python PySpark TensorFlow PyTorch XGBoost LightGBM Databricks

Job Description

Data Scientist (Hybrid- Raleigh, NC)

We are looking for a candidate with the specified skills and experience mentioned below for one of our clients within the automobile industry.

Role Summary

We are seeking an experienced Data Scientist with strong expertise in Data Science and machine learning engineering, including hands-on experience designing and deploying ML solutions in production. This role focuses on building scalable ML solutions, productionizing models, and enabling robust ML platforms for enterprise-grade deployments.

Location: This position requires 4 days in the office and one remote day per week, based at our corporate headquarters in Raleigh, North Carolina (North Hills).

Key Responsibilities

Build ML Models: Design and implement predictive and prescriptive models for regression, classification, and optimization problems. Apply advanced techniques such as structural time series modeling and boosting algorithms (e.g., XGBoost, LightGBM).

Train and Tune Models: Develop, train, evaluate, and optimize machine learning models using Python, PySpark, TensorFlow, and PyTorch.

Collaboration & Communication: Work closely with stakeholders to understand business challenges and translate them into scalable data science solutions. Participate in end-to-end solution design and collaborate with cross-functional teams to ensure successful integration of models into business processes.

Monitoring & Visualization: Rapidly prototype and test hypotheses to validate model approaches. Build automated workflows for model monitoring and performance evaluation. Create dashboards using tools like Databricks and Palantir to visualize key model metrics such as model drift, feature importance, and Shapley values.

Productionize ML: Build repeatable and scalable paths from experimentation to deployment (batch, streaming, and low-latency endpoints), including feature engineering, training, validation, and evaluation.

Own ML Platform: Develop and maintain core ML platform components including model registry, feature store, experiment tracking, artifact repositories, and standardized CI/CD pipelines for ML workflows.

Pipeline Engineering: Design and implement robust data and ML pipelines orchestrated using Step Functions, Airflow, or Argo to train, validate, and deploy models based on schedules or event-driven triggers.

Observability & Quality: Implement end-to-end monitoring, data validation, model drift detection, quality checks, and alerting mechanisms aligned with SLA/SLO requirements.

Governance & Risk: Ensure model/version lineage, reproducibility, approvals, rollback strategies, auditability, and cost optimization aligned with enterprise governance policies.

Partner & Mentor: Collaborate with onshore and offshore teams, mentor data scientists on packaging, testing, and optimization best practices, and contribute to engineering standards and code reviews.

Hands-on Delivery: Prototype innovative ML solutions and troubleshoot production issues across data, model, application, and infrastructure layers.

Required Qualifications:

Bachelor’s degree in Computer Science, Information Technology, Data Science, Engineering, or a related field.

5+ years of hands-on experience with Python (pandas, PySpark, scikit-learn), Bash scripting, and Docker; familiarity with TensorFlow and PyTorch preferred.

Strong experience designing and implementing predictive and prescriptive models for regression, classification, and optimization problems.

Expertise with advanced modeling techniques such as structural time series modeling and boosting algorithms (e.g., XGBoost, LightGBM).

5+ years of experience with SageMaker (training, processing, pipelines, model registry, endpoints) or equivalent platforms such as Kubeflow, MLflow/Feast, Vertex AI, or Databricks ML.

5+ years of experience with Databricks DABS, Airflow, Step Functions, and event-driven architectures using EventBridge, SQS, and Kinesis.

3+ years of experience working with AWS, Azure, or GCP services including ECR/ECS, Lambda, API Gateway, S3, Glue, Athena, EMR, RDS/Aurora (PostgreSQL/MySQL), DynamoDB, CloudWatch, IAM, VPC, and WAF.

Strong understanding of Snowflake warehouses, databases, schemas, stages, Snowflake SQL, RBAC, UDFs, and Snowpark.

3+ years of hands-on experience with CodeBuild/CodePipeline, GitHub Actions, or GitLab CI/CD; experience with blue/green, canary, and shadow deployments for ML services and applications.

Proven experience building and optimizing batch and streaming pipelines, schema management, partitioning strategies, performance tuning, and parquet/iceberg best practices.

Experience with unit and integration testing for data and ML models, contract testing for feature pipelines, reproducible training workflows, and model/data drift monitoring.

Strong troubleshooting and incident response experience for ML services with exposure to SLOs, dashboards, runbooks, and debugging across data, model, and infrastructure layers.

Strong communication skills, collaborative mindset, problem-solving ability, and a proactive approach toward automation and documentation.

Experience in retail and/or manufacturing domains is preferred.

Other Job Details:

Job Type: C2C or W2

Pay Rate: $ 58/hr on C2C / $53/hr on W2

Start/End Dates: 6/15/2026

Work Location: Raleigh NC - Hybrid

Docs Required: ID proof will be required

Data Scientist (Raleigh, NC - Hybrid)

Interested in this position?

Quick Summary

Required Skills

Job Description

Why Apply Through MisuJob?

Frequently Asked Questions

How do I apply for this position?

Is MisuJob free for job seekers?

How does AI matching work?

Can I apply to jobs in other countries?

Ready to Apply?