Quick Summary

Design and implement a scalable and secure Identity & Access Management (IAM) data modernization solution for a large-scale data warehouse using PySpark-based processing and cloud-native DevOps pipelines.

Required Skills

Cloud AWS Azure GCP Data Lake SQL PySpark DevOps OpenShift Containers Kubernetes

Job Description

Role : Google Cloud Data Architect – IAM Data Modernization

Location : Dallas, TX / Charlotte, NC (Hybrid – 4 days office)

Highly Preferred OCP exp

Project/Program

Identity & Access Management (IAM) Data Modernization – migration of an on‑premises SQL data warehouse to a target‑state Data Lake on Google Cloud (GCP), enabling metrics & reporting, advanced analytics, and GenAI use cases (natural language querying, accelerated summarization, cross‑domain trend analysis) leveraging PySpark‑based processing, cloud‑native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and high‑performance data solutions.

About Program/Project

The IAM Data Modernization project involves migrating an on-premises SQL data warehouse to a target state Data Lake in GCP cloud environment. Key highlights include:

Integration Scope: 30+ source system data ingestions and multiple downstream integrations

Capabilities: Metrics, reporting, and Gen AI use cases with natural language querying, advanced pattern/trend analysis, faster summarizations, and cross-domain metric monitoring

Benefits:

Scalability and access to advanced cloud functionality

Highly available and performant semantic layer with historical data support

Unified data strategy for executive reporting, analytics, and Gen AI across cyber domains

This modernization establishes a single source of truth for enterprise-wide data-driven decision-making.

Required Skills

DevOps / CI‑CD

Experience implementing CI/CD pipelines for data and analytics workloads

Familiarity with Git‑based source control, build automation, and deployment strategies

Containers & Platform

Experience with OpenShift Container Platform (OCP) for deploying data workloads and services

Understanding of containerized architecture, scaling, and environment management

Proven ability to build CI/CD pipelines for data and infrastructure workloads

Experience managing secrets securely using GCP Secret Manager

Ownership of observability, SLOs, dashboards, alerts, and runbooks

Proficiency in logging, monitoring, and alerting for data pipelines and platform reliability

Big Data & Processing

Hands‑on experience with PySpark for ETL/ELT, data transformation, and performance optimization

Solid understanding of distributed data processing concepts

Data & Cloud Architecture

Strong experience designing data platforms on Google Cloud Platform (GCP)

Experience with Data Lakes, data warehousing, and large‑scale migration programs

Data Lake Architecture & Storage

Proven experience designing and implementing data lake architectures (e.g., Bronze/Silver/Gold or layered models).

Strong knowledge of Cloud Storage (GCS) design, including bucket layout, naming conventions, lifecycle policies, and access controls

Experience with Hadoop/HDFSarchitecture, distributed file systems, and data locality principles

Hands-on experience with columnar data formats (Parquet, Avro, ORC) and compression techniques

Expertise in partitioning strategies, backfills, and large-scale data organization

Ability to design data models optimized for analytics and BI consumption

Data Ingestion & Orchestration

Experience building batch and streaming ingestion pipelinesusing GCP-native services

Knowledge of Pub/Sub-based streaming architectures, event schema design, and versioning

Strong understanding of incremental ingestion and CDC patterns, including idempotency and deduplication

Hands-on experience with workflow orchestrationtools (Cloud Composer / Airflow)

Ability to design robust error handling, replay, and backfill mechanisms

Data Processing & Transformation

Experience developing scalable batch and streaming pipelinesusing Dataflow (Apache Beam) and/or Spark (Dataproc)

Strong proficiency in BigQuery SQL, including query optimization, partitioning, clustering, and cost control.

Hands-on experience with Hadoop MapReduceand ecosystem tools (Hive, Pig, Sqoop)

Advanced Python programming skillsfor data engineering, including testing and maintainable code design

Experience managing schema evolutionwhile minimizing downstream impact

Analytics & Data Serving

Expertise in BigQuery performance optimizationand data serving patterns

Experience building semantic layers and governed metricsfor consistent analytics

Familiarity with BI integration, access controls, and dashboard standards

Understanding of data exposure patterns via views, APIs, or curated datasets

Data Governance, Quality & Metadata

Experience implementing data catalogs, metadata management, and ownership models

Understanding of data lineagefor auditability and troubleshooting

Strong focus on data quality frameworks, including validation, freshness checks, and alerting

Experience defining and enforcing data contracts, schemas, and SLAs

Good to have

Security, Privacy & Compliance

Hands-on experience implementing fine-grained access controlsfor BigQuery and GCS

Experience with Sprint planning and helping team technically.

Strong stakeholder communication and solution‑architecture skills

Qualifications

Experience: [10–14]+ years in DevOps and Data Architecture, 5+ years designing on Pyspark/GCP/OCP at scale; prior on‑prem → cloud migration a must.

Education: Bachelor’s/Master’s in Computer Science, Information Systems, or equivalent experience.

Certifications: Google Cloud Professional Cloud Architect/DevOps/OCP (required or within 3 months). Plus: Professional Data Engineer, Security Engineer.

Other Job Details:

Job Type: C2C or W2

Pay Rate: $60-65 hr on C2C / $55/hr on W2

Duration: 12 months (high possibility of extension)

Location: Dallas, TX / Charlotte, NC (Hybrid – 4 days in office)

Docs Required: ID proof will be required

Please review the job description and let me know if it aligns with your experience. Looking forward to your response.

Google Cloud Data Architect – IAM Data Modernization

Interested in this position?

Quick Summary

Required Skills

Job Description

Why Apply Through MisuJob?

Frequently Asked Questions

How do I apply for this position?

Is MisuJob free for job seekers?

How does AI matching work?

Can I apply to jobs in other countries?

Ready to Apply?