Senior Cloud Operations Engineer
Confidential
Posted: April 28, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Manage cloud operations for public clouds such as AWS and Alibaba Cloud, with a focus on monitoring, DevOps implementation, and troubleshooting for application systems.
Job Description
Responsibilities
Manage installation, configuration, optimization, maintenance, troubleshooting, data backup, and log analysis for application systems running on public clouds such as AWS and Alibaba Cloud.
Responsible for cloud operations monitoring, DevOps implementation, cloud-native deployment and maintenance, and secondary development of operations platforms.
Provide emergency response and resolution for various incidents and failures, ensuring platform stability.
Configure and manage big data platforms such as ADB, DataWorks, Flink CDC, and Quick BI.
Plan and manage multi-cloud networking.
Requirements
Familiar with AWS and Alibaba Cloud products, including but not limited to: VPC/NAT/GW/EC2/ECS/RDS/Aurora/ElastiCache/MSK/WAF/CloudFront/PrivateLink/L2Connection/EKS/ACK/AnalyticDB/DataWorks/MaxCompute/Glue.
Proficient in container orchestration and microservices architecture; experienced in configuring AWS EKS and Alibaba Cloud ACK; familiar with Kubernetes fundamentals and management tools such as Kuboard and Rancher.
Familiar with project management tools (Alibaba Cloud Yunxiao, Tencent TAPD, PingCode, etc.); experienced in full lifecycle project management, agile development configuration, hybrid cloud DevOps, multi-branch testing environments, and related needs such as ticketing, defect tracking, time management, project management, efficiency analysis, and reporting.
Able to independently build and maintain monitoring systems (Nightingale, Prometheus, CloudWatch, Alertmanager, PrometheusAlert, WatchAlert, etc.), integrate visualization tools (Grafana) for real-time system metrics analysis; familiar with Prometheus rules, Grafana dashboards, and capable of building unified monitoring, alerting, and visualization platforms.
Knowledge of OpenTelemetry, metrics, logs, and traces; experienced in integrating intelligent observability platforms for multi-source data correlation and automated incident handling.
Skilled in configuring and managing AWS CloudWatch and Alibaba Cloud monitoring, with experience in hybrid monitoring integration.
Strong Linux background; proficient in deploying, configuring, and optimizing Nginx, Redis, Kafka, MongoDB, and related applications.
Solid understanding of public cloud networking and TCP/IP fundamentals; experienced in VPC, subnetting, cloud enterprise networks, transit gateways, and multi-cloud interconnectivity.
Proficient in one or more programming languages (Shell/Python/Go); familiar with Infrastructure as Code (IaC) concepts; experienced with Terraform and CloudFormation; project development experience is a plus.