Infrastructure Resilience QA Engineer
Oxylabs
Posted: August 22, 2025
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
We're looking for a Chaos Engineer (Resilience QA Engineer) to join our team in Vilnius, Lithuania. The role involves maintaining our infrastructure with 60PB+ monthly data traffic, 300k+ service requests/sec processed, and 500k+ Kafka messages/sec streamed. The ideal candidate will have expertise in scaling and maintaining large-scale data systems.
Required Skills
Job Description
We’re a team of 500+ professionals who develop cutting-edge proxy and web data scraping solutions for thousands of the world’s best known businesses, including Fortune 500 companies.
What’s in store for you:
You’ll be solving complex challenges and maintaining our own infrastructure with 60PB+ monthly data traffic. Here are its scale and maturity in numbers:
- 6PB+ Ceph storage
- 60PB+ monthly data traffic through our systems
- 300k+ service requests/sec processed
- 500k+ Kafka messages/sec streamed
A word from the team:
Join us as a Chaos Engineer (Resilience QA Engineer) and become the guardian of reliability in our distributed system. You’ll design chaos experiments, uncover hidden weaknesses, and make our platform stronger against real-world failures. This is a hands-on role where your work directly impacts system uptime, customer trust, and engineering velocity. If you’re passionate about resilience, systems thinking, and pushing software beyond its limits, this is your chance to make a real difference.
Your day-to-day::
• Design and execute fault injection experiments (service crashes, latency, network partitions, resource exhaustion).
• Conduct load, stress, and soak testing of microservices and system components.
• Validate recovery strategies (circuit breakers, retries, failovers).
• Verify observability and monitoring coverage, highlighting blind spots.
• Automate resilience test suites and integrate them into CI/CD.
• Maintain resilience benchmarks (latency/error budgets).
• Collaborate with engineers, SREs, and QA to prioritize improvements.
• Provide clear reports with reproduction steps and impact assessments.
Check-out our tech stack here: stackshare.io/oxylabs/oxylabs
Your skills & experience::
• Programming/scripting skills for automation (Python, Go, Bash).
• Automated testing framework knowledge.
Nice to have:
• Experience with chaos testing tools (Chaos Mesh, Gremlin, Litmus, etc.).
• Strong understanding of Kubernetes/Docker and microservice architectures.
• Familiarity with observability stacks (Prometheus, Grafana, ELK, OpenTelemetry).
• Knowledge of resilience design patterns (circuit breakers, retries, failover).
• Exposure to private cloud environments.
• Previous experience in high-availability environments.
Salary::
• Gross salary: 3300 - 6500 EUR/month. Keep in mind that we are open to discussing a different salary based on your skills and experience.
Up for the challenge? Let’s talk!