Infrastructure Lead (DevOps & Cloud)
Weekday AI
Posted: April 6, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Design, develop, and maintain scalable cloud infrastructure on AWS and Azure platforms, leading DevOps and SRE initiatives and ensuring reliability and performance of mission-critical systems.
Required Skills
Job Description
This role is for one of the Weekday's clients
Min Experience: 8 years
Location: Mumbai
JobType: full-time
We are looking for an experienced Infrastructure Lead to drive the design, implementation, and optimization of scalable, secure, and highly available cloud infrastructure. This role will lead DevOps/SRE initiatives, establish best practices, and ensure reliability and performance of mission-critical systems.
Requirements:
Key Responsibilities
1. Cloud Infrastructure & Architecture
• Design, develop, and maintain scalable cloud infrastructure on AWS and Azure platforms.
• Lead architectural decisions to ensure high availability, fault tolerance, and optimal performance.
• Promote infrastructure automation through Infrastructure as Code (Terraform).
2. DevOps & CI/CD Enablement
• Develop and enhance CI/CD pipelines using tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD.
• Adopt GitOps methodologies for consistent and dependable deployments.
• Increase deployment frequency, shorten lead times, and reduce failure rates.
3. Kubernetes & Containerization
• Oversee and scale Kubernetes clusters across EKS, AKS, and on-premises environments.
• Implement container orchestration, service mesh solutions, and cluster optimization techniques.
• Ensure platform reliability and conduct performance tuning.
4. Monitoring, Reliability & Incident Management
• Establish and uphold SLOs, SLAs, and reliability benchmarks.
• Deploy observability tools such as Prometheus, Grafana, Datadog, and ELK stack.
• Lead incident management processes including root cause analysis and reducing mean time to recovery (MTTR).
5. Automation & Operational Excellence
• Promote automation across infrastructure provisioning, monitoring, and recovery workflows.
• Create reusable infrastructure modules and accelerators.
• Minimize manual tasks through scripting using Python and Bash, along with supporting tools.
6. Security & Compliance
• Apply cloud security best practices involving IAM, network security, and policy enforcement.
• Maintain compliance via Kubernetes policies and governance frameworks.
• Champion secure-by-design principles in infrastructure development.
7. Cost Optimization
• Monitor cloud resource consumption and implement cost-saving strategies.
• Utilize right-sizing, auto-scaling, and efficient resource utilization methods.
8. Leadership & Stakeholder Management
• Lead and mentor DevOps and SRE teams.
• Collaborate effectively with engineering, product, and architecture teams.
• Promote infrastructure best practices across various projects and teams.
9. Innovation & AI-driven Operations (Preferred)
• Explore AI and machine learning-driven infrastructure enhancements and AIOps capabilities.
• Implement intelligent monitoring, anomaly detection, and automate root cause analysis.
Required Skills & Experience
• At least 8 years of experience in Infrastructure, DevOps, or SRE roles.
• Strong expertise in AWS (preferred).
• Hands-on experience with Terraform (Infrastructure as Code).
• Comprehensive knowledge of Kubernetes and containerization (Docker).
• Experience working with CI/CD tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD.
• Strong understanding of monitoring and observability tools.
• Proficient in scripting languages including Python and Bash.
• Experience managing high-availability, large-scale systems.
Skills
Infrastructure as code
Lead Infrastructure
DevOps
SRE
Terraform
Kubernetes
Docker
CI CD