MisuJob - AI Job Search Platform MisuJob

Senior Site Reliability Engineer

Qode

Texas, Texas, United States Hybrid permanent

Posted: April 15, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

A Senior Site Reliability Engineer is responsible for ensuring the reliability of our customers' digital products and services.

Job Description

Job Title: Senior Site Reliability Engineer (Observability & Transaction Reliability)
Location: Austin, TX
Type: Full-Time

About Incedo
Incedo is a global AI and data transformation firm helping organizations drive measurable business impact from digital investments. We operate at the intersection of business and technology, combining AI, data, and digital engineering to deliver scalable, high-impact solutions.
With over 4,000 professionals across the U.S., Canada, Latin America, and India, Incedo partners with Fortune 500 and high-growth organizations across banking, payments, wealth management, telecom, and life sciences.

Role Overview
We are seeking a Senior Site Reliability Engineer (SRE) to drive reliability, observability, and performance across business-critical distributed systems.
This is a hands-on engineering role with strong ownership, focused on building and scaling observability platforms, improving transaction visibility, and enhancing system resilience. You will work closely with engineering, platform, and infrastructure teams to ensure high availability, performance, and operational excellence across microservices, APIs, and cloud-native systems.
The ideal candidate combines deep technical expertise in SRE practices with a passion for automation, monitoring, and continuous improvement.

Key Responsibilities
Observability & Monitoring
• Design, implement, and maintain observability solutions across distributed systems
• Build and optimize logging, metrics, and tracing pipelines using tools like Dynatrace, Datadog, Splunk, ELK, Grafana, and OpenTelemetry
• Enable end-to-end transaction tracing across microservices and APIs
• Develop dashboards and alerting strategies for proactive issue detection

Reliability & Incident Management
• Own service reliability, uptime, and operational performance for critical systems
• Lead incident response, root cause analysis (RCA), and postmortems
• Reduce MTTD and MTTR through automation and improved observability
• Create and maintain runbooks and incident response playbooks

Performance Engineering
• Monitor and optimize system performance (latency, throughput, error rates)
• Partner with application and database teams to troubleshoot bottlenecks
• Use distributed tracing and telemetry data to identify and resolve issues
• Implement performance testing and tuning strategies

Resiliency & Automation
• Build and maintain fault-tolerant, highly available systems
• Implement resiliency patterns (failover, retries, circuit breakers, self-healing)
• Drive chaos engineering practices to validate system reliability
• Automate operational tasks using scripting (Python, Go, etc.)

SRE Best Practices & Governance
• Define and enforce SLOs, SLIs, and error budgets aligned to business goals
• Promote SRE principles across engineering teams
• Partner with DevOps and platform teams to improve CI/CD reliability
• Contribute to building a culture of operational excellence and accountability

Required Qualifications
• 7–10+ years of experience in Site Reliability Engineering or Production Support Engineering
• Strong hands-on experience with observability tools (Dynatrace, Datadog, Splunk, ELK, Grafana, OpenTelemetry, Jaeger)
• Experience supporting cloud-native environments (AWS, Azure, or GCP)
• Deep understanding of microservices architecture and distributed systems
• Proficiency in scripting/programming (Python, Go, Java, or similar)
• Experience with monitoring, alerting, and incident management in production environments

Preferred Qualifications
• Experience implementing OpenTelemetry at scale
• Background in chaos engineering and resiliency testing
• Familiarity with AIOps or intelligent monitoring platforms
• Experience in financial services, banking, or wealth management environments
• Dynatrace certification (Associate or Professional)

What Success Looks Like
• Measurable reduction in MTTD and MTTR
• Increased proactive detection of issues through monitoring
• Improved system uptime, performance, and reliability
• Strong adoption of SRE best practices across engineering teams

Why Join Us
• Work on high-impact, mission-critical systems
• Drive modern SRE and observability practices at scale
• Collaborate with top-tier engineering and architecture teams
• Opportunity to influence reliability strategy across the organization

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply