Quick Summary

Design and maintain the infrastructure and platforms for a global consumer internet platform, ensuring high availability and resilience across regions.

Required Skills

Reliability Engineering Incident Response Disaster Recovery High Availability Observability Monitoring Configuration Management High-Traffic Apps Global Architecture Multi-region Deployment

Job Description

Site Reliability Engineer (SRE) – Globalization

Location: Singapore
Function: Infrastructure / SRE / Platform Engineering

About the Role

Our client is a rapidly scaling global consumer internet platform with hundreds of millions of users worldwide and strong momentum in international markets. As the business accelerates its global expansion, this role is central to building next-generation international infrastructure, ensuring that critical systems are highly available, globally distributed, and resilient across regions. This is a high-impact opportunity to work on multi-region architecture, global traffic routing, and large-scale distributed systems, directly shaping the reliability and scalability of a fast-growing global platform.

Key Responsibilities

- Global Architecture & Disaster Recovery
- Participate in the design and implementation of global infrastructure architecture.
- Own cross-region architecture, disaster recovery (DR), and high availability (HA) capabilities.
- Enable critical systems to support multi-region deployment, disaster recovery failover, and fault isolation.
- Improve overall stability and resilience of international business systems.

- Overseas Infrastructure Platform Deployment & Operations
- Build, deploy, operate, and continuously optimise core infrastructure platforms in overseas regions, including:
- Release/deployment systems
- Monitoring & alerting systems
- Configuration management systems
- Service governance frameworks
- Traffic scheduling systems
- Ensure consistency, reliability, and parity between overseas and domestic infrastructure environments.

- Reliability Engineering & Incident Response
- Design and implement a comprehensive reliability engineering framework for international systems, including:
- Observability systems (metrics, logs, tracing)
- Incident response mechanisms
- Root cause analysis and postmortem processes
- Lead cross-functional coordination during major incidents.
- Drive rapid service recovery and long-term systemic improvements.

- Internationalisation Infrastructure Enablement
- Deeply understand overseas business requirements and architectural constraints.
- Drive the implementation of infrastructure capabilities in global environments, including:
- Multi-region architecture design
- Network architecture optimisation
- Data architecture optimisation
- Adaptation of core infrastructure services for global deployment

- Cross-Team Collaboration & System Alignment
- Collaborate closely with domestic infrastructure teams, product engineering teams, and platform teams.
- Ensure overseas systems align with internal architecture standards and best practices.
- Establish and promote reliability best practices across the organisation.

Requirements

- SRE & Reliability Engineering Experience
- Strong experience in large-scale internet system reliability engineering.
- Proven expertise in high availability architecture design, fault tolerance and incident management, capacity planning and scaling, and emergency response handling.
- Background in SRE, Platform Engineering, or Infrastructure teams preferred.

- Global / Multi-Region Architecture Experience
- Hands-on experience designing and operating cross-region systems, including:
- Multi-region deployment strategies
- Traffic routing and scheduling
- Data replication and synchronisation
- Disaster recovery and failover mechanisms
- Experience supporting global or international infrastructure is highly preferred.

- Core Infrastructure & Systems Knowledge
- Strong understanding of Linux systems and networking fundamentals.
- Familiarity with common middleware systems such as MySQL, Redis, and Kafka.
- Experience with cloud-native infrastructure (e.g., Kubernetes, Service Mesh) and observability systems (monitoring, logging, distributed tracing).

- Engineering & Automation Capability
- Proficient in at least one programming language: Python, Go, or Java.
- Experience building automation tools, reliability platforms, and infrastructure systems.

- Problem Solving & Collaboration
- Strong analytical and troubleshooting skills in complex distributed systems.
- Ability to quickly identify and resolve issues under high-pressure environments.
- Strong communication skills and a team collaboration mindset.

- Language Requirements
- Fluency in both English and Chinese.
- Ability to collaborate effectively in a global, bilingual engineering environment.

- Location Requirement
- Singapore Citizens or Permanent Residents are preferred.

Nice to Have

- Experience with multi-cloud or cross-cloud environments, including AWS, GCP, Azure, Alibaba Cloud etc.
- Experience in global traffic routing technologies such as DNS, GSLB, Anycast, and Global Load Balancing.
- Experience in reliability engineering practices such as Chaos Engineering, failure drills/game days, and automated recovery systems.

Why This Role

- Direct ownership of global-scale infrastructure for a platform experiencing massive user growth.
- Opportunity to design multi-region, highly available systems from the ground up.
- Exposure to real-world distributed systems challenges at scale (hundreds of millions of users).
- Work at the intersection of China tech excellence and global expansion.

* Due to volume of applicants, only shortlisted candidates will be contacted

EA Licence No.: 25S3232
EA Personnel No.: R1874604
EA Personnel Name: Kenneth Ho

Site Reliability Engineer SRE - Globalization

Interested in this position?

Quick Summary

Required Skills

Job Description

Why Apply Through MisuJob?

Frequently Asked Questions

How do I apply for this position?

Is MisuJob free for job seekers?

How does AI matching work?

Can I apply to jobs in other countries?

Ready to Apply?