Senior Site Reliability Engineer
Veterinaryemergencygroupst
Posted: March 23, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Senior Site Reliability Engineer
Required Skills
Job Description
ABOUT VEG
In 2014, VEG was born with a mission to help people and their pets when they need it most by challenging norms and fixing the ER experience. Since then, we’ve expanded rapidly, with hospitals nationwide open 24/7/365, and created an ER experience that focuses on what our pets and pet parents really need. We’ve done the same for our people (VEGgies), finding a way to say YES so they are empowered to achieve great things, grow in unexpected ways, and find a place where they truly belong.
We’re rethinking emergency care from every angle—from how we run our hospitals to how we support the people working inside them. That’s where our headquarters team comes in. Whether building technology to make our hospitals more efficient, recruiting and growing incredible VEGgies, or bringing our brand to life through marketing, our VQ (VEG Headquarters) team makes it all possible—ensuring our hospitals and people have everything they need to help pets and their families.
VEG is a 2025 and 2026 certified Great Place to Work®.
THE JOB
We are looking for a Senior Site Reliability Engineer who understands that at VEG, "reliability" is a medical necessity – if our proprietary platform, DogByte, goes down, a pet's life could be at risk. You will be the primary lead for our platform's resilience, transforming our infrastructure into a self-healing system that empowers our medical teams to provide 24/7/365 life-saving care. You will spend your time bridging the gap between high-level architectural strategy and hands-on technical "surgery," ensuring our engineering teams can build at pace while the foundation remains rock-solid.
You will evolve and strengthen an existing system that must meet the demands of VEG’s hospital expansion – ensuring our infrastructure never limits our ability to open new hospitals or provide medical care. You will own the ongoing stability of DogByte, scaling it from its current state into a robust enterprise platform where one hospital's traffic is isolated and does not impact another's experience.
This job has an opportunity to work at our VQ in White Plains or could be open to remote work.
WHAT YOU’LL DO
• Formulate short- and long-term strategies to ensure DogByte withstands year-over-year volume increases, specifically solving for hospital-to-hospital traffic isolation
• Work with engineers to ensure data flows -- from client to API to database -- are configured for high-concurrency and maximum reliability
• Build automated processes to handle high-traffic spikes and automatically remediate common system errors
• Set up monitoring and alerting to identify latency throughout the stack and resolve issues before they impact hospital operations
• Establish and meet SLOs for high availability, ensuring our engineers can build products without worrying if the system can support them
WHAT YOU NEED
• Bachelor’s Degree preferred or equivalent experience
• 5+ years in SRE/DevOps roles, expertly handling high-concurrency environments
• Deep understanding of the AWS ecosystem managed entirely through Infrastructure as Code
• Expertise in traffic management, including load balancing techniques, Nginx configuration, and autoscaling to handle volatile patterns
• Technical leadership in observability, establishing the tracing frameworks and monitoring required to diagnose latency issues and ensure high availability across the entire request lifecycle
• You have direct experience with technologies relevant to our technical stack, which currently includes: AWS ECS, Terraform, Nginx, PostgreSQL (RDS), Python
WHO YOU ARE
• Empathetic, instinctively taking a people-centric approach, whether supporting your colleagues or making an effort to understand different perspectives
• Have a sense of humility; acknowledging mistakes, sharing credit with others, and lifting up your team’s’ accomplishments
• Feel a strong sense of ownership over your work, taking responsibility for outcomes and staying committed to achieving long-term, impactful results
• Curious by nature; you ask insightful questions and continuously seek out opportunities to learn and grow your skills and knowledge
HOW WE INVEST IN YOU
• Competitive Compensation Including ($170,000 - $200,000) + bonus + benefits.
• Comprehensive health and wellness benefits that start on day one, and access to free therapy or counseling
• Paid parental leave, up to 10 weeks at 100% of regular salary, and offering inclusive fertility and family-building care for all types of families
• Unlimited PTO to use for vacation or sick days—however you need it!
• Generous employee referral program, so our awesome people can bring in more awesome people.
• And the little (big) things, like casual office attire, ability to bring your fur baby to work, cool VEG swag, food in the fridge for when you’re hungry and free lunches twice a week!!
• Company laptop and a monthly cell phone reimbursement
DEI
At VEG, diversity is not just a word—it's a strength that fuels innovation and kindness. Our mission is “Helping people and their pets when they need it most.” And we do that better when our VEGgies (employees) feel valued, respected, and empowered to bring their authentic selves to work. That's why we're devoted to creating an environment that reflects the diverse communities we serve—where different perspectives are not only welcomed but celebrated.
We are focused on providing equitable opportunities for growth, promoting inclusive decision-making, and ensuring that everyone's perspective is considered. Saying yes to VEG means helping us build a culture where your unique experiences and background contribute to a shared vision: being the world’s veterinary emergency company.