Sr. Director - Backend Engineering
Coupang
Posted: December 9, 2025
Interested in this position?
Create a free account to apply with AI-powered matching
Required Skills
Job Description
•
Sr Director- Backend Engineering
Key Skills and Role Responsibilities:
This role is for a strategic and technical leader to define, build, and operate the infrastructure orchestration systems that power our organization's cutting-edge Artificial Intelligence (AI) initiatives. The Senior Director will lead a team responsible for ensuring a robust, scalable, cost-efficient, and high-performance platform for all stages of the AI lifecycle, from experimentation and training to deployment and inference.
Strategy and Leadership
•
Define and execute the long-term vision and roadmap for the company’s AI infrastructure Network Services, aligning it with overall business and AI Services goals.
•
Lead, mentor, and grow a high-performing engineering and operations team focused on AI infrastructure and platform engineering.
•
Manage budget and resource allocation for AI infrastructure Network Services deliverables.
•
Act as a key liaison between AI infrastructure and other services owners and consumers, core engineering, Cloud infrastructure, and executive leadership.
AI Infra Development and Operations
•
Oversee the design, implementation, and maintenance of the core network orchestration platforms for large-scale AI model training (e.g., distributed training, hyperparameter tuning) and deployment (e.g., containerization, serverless functions, edge deployment).
•
Ensure reliability, security, and compliance of the AI infrastructure, meeting strict standards for data governance and model integrity.
•
Establish Service Level Objectives (SLOs) and Key Performance Indicators (KPIs) for the AI platform services and lead efforts for continuous optimization and performance tuning.
Technology and Architecture
•
Select, evaluate, and integrate the core technologies required for the AI stack (e.g., Cloud Overlay/Under networking, Infiniband, Load Balancer, DNS, Core Networking, Kubernetes, Ray, GPU/accelerator management, distributed file systems).
•
Champion infrastructure-as-code (IaC) principles to manage and provision AI resources consistently and at scale.
Qualifications
Required
•
Education: Bachelor's or Master’s degree in Computer Science, Engineering, or a related technical field.
•
Experience:
•
15+ years of progressive experience in software engineering, infrastructure, or platform operations.
•
5+ years of experience leading and managing technical teams, ideally in a Director or Sr. Director level or equivalent capacity.
•
Deep, hands-on experience designing and operating large-scale distributed systems and cloud-native network architectures.
•
Proven experience specifically with AI infrastructure orchestration (e.g., using Kubernetes) and managing accelerated compute resources (GPUs, TPUs, etc.).
•
15+ years of Cloud backend engineering, Cloud Design, Deployment, DevOps.
•
15+ years of experience leading system design and architecture leveraging Private Clouds and AWS and/or Azure/GCP.
•
10+ years of demonstrable experience building and operating infrastructure as code, Infra Automation, and comfort with various flavors of Linux.
•
15+ years of experience in building high-performance, highly available, and scalable distributed systems in the cloud.
•
15+ years of experience in building and managing high-performance, highly available, and scalable Hybrid Cloud environments.
•
Excellent cross-group collaboration, outstanding verbal and written communication skills.
•
Skills:
•
Expert-level knowledge of containerization and orchestration (Docker, Kubernetes).
•
Software Defined Cloud Networking.
•
Strong background in DevOps and MLOps principles and tooling.
•
Proficiency in at least one modern programming language (e.g., Python, Go).
•
Exceptional strategic planning, organizational, and written/verbal communication skills.
Preferred
•
Prior experience managing infrastructure for training and inference of large language models (LLMs) or foundation models.
•
Experience in a regulated industry with strict compliance requirements.
•
AI Private Cloud - Building and operating.
Success Metrics
A successful Senior Director - AI Infrastructure Orchestration will be measured by:
•
The time-to-market for AI infrastructure build, scale, and operation.
•
The resource utilization rate and cost efficiency of the AI compute infrastructure.
•
The reliability and uptime of the core AI platform services.
•
The talent retention and development within the AI Infrastructure team.