AWS TECHNICAL SERVICE MANAGEMENT
Santander
Posted: March 25, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
The successful candidate will be responsible for managing ITIL practices for AWS-based services, ensuring service stability and adherence to SLAs/OLAs through operational controls and continuous improvement initiatives.
Required Skills
Job Description
AWS TECHNICAL SERVICE MANAGEMENT
Country: Mexico
To succeed in this role, you will be responsible for:
• Own and continuously improve ITIL practices for Incident Management, Change Management, and Problem Management for AWS-based services.
• Ensure service stability and adherence to SLAs/OLAs through operational controls, service reviews, and continuous improvement initiatives.
• Establish and track service health KPIs (availability, incident volume, MTTR/MTTA, change success rate, problem recurrence).
• Incident Management (incl. Major Incidents)
• Lead incident triage and coordination across cloud infrastructure, platform, security, and application teams.
• Use Dynatrace / Cloudwatch insights (alerts, traces, service flow, SLOs) to accelerate identification of impact scope and probable root cause domains (app vs infra vs dependencies).
• Coordinate communications and status updates during incidents, ensuring timely escalation, stakeholder alignment, and restoration targets.
• Change Management & Governance (CAM / CAB / Committees)
• Create, validate, and control change requirements in ServiceNow, ensuring quality of change records (scope, impact, risk, test evidence, implementation plan, backout plan, approvals).
• Drive the end-to-end change lifecycle: intake, risk/impact analysis, scheduling, approvals, implementation tracking, post-change validation, and closure.
• Prepare and present changes to CAM, CAB, and other change forums, ensuring compliance with governance and regulatory expectations.
• Monitor change calendars/pipelines to prevent conflicts and reduce change-related incidents.
• Problem Management & Continuous Improvement
• Lead or coordinate problem investigations for recurring incidents; ensure strong root cause analysis (RCA) and corrective/preventive action plans (CAPA).
• Track action items to closure and measure effectiveness (e.g., recurrence reduction, improved SLO attainment).
• Monitoring, Metrics & Reporting (ServiceNow + Dynatrace)
• Analyze and interpret data from ServiceNow (tickets, categories, backlog, SLA breaches) and Dynatrace (availability/performance indicators) to detect deviations, risks, and trends.
• Produce weekly/monthly operational reports and dashboards: SLA compliance, incident trends, change success rate/failure modes, top recurring issues, operational risk indicators.
• Propose mitigation plans and service improvements based on evidence and measurable outcomes.
• Process, Documentation, and Automation Enablement
• Define and maintain operational processes and standards for cloud service operations.
• Identify opportunities for systems automation (auto-remediation, workflow automation, alert tuning) and partner with engineering teams to implement.
• Stakeholder Management & Cross-Team Coordination
• Act as the operational focal point between cloud teams, application owners, security/risk, and governance stakeholders.
• Support decision-making by providing clear risk assessments, impact narratives, and recommended actions.
• Negotiate priorities, timelines, maintenance windows, and resource needs across teams.