ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

DevOps Engineer - AIOps

Endava

Rosario, Santa Fe, Argentina Hybrid permanent

Posted: February 26, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

Maintaining the Reliability and Performance of our Engineering Infrastructure and Tools to ensure seamless user experience.

Job Description

Technology is our how. And people are our why. For over two decades, we have been harnessing technology to drive meaningful change.

 

By combining world-class engineering, industry expertise and a people-centric mindset, we consult and partner with leading brands from various industries to create dynamic platforms and intelligent digital experiences that drive innovation and transform businesses.

 

From prototype to real-world impact - be part of a global shift by doing work that matters.

We are seeking a hands-on Site Reliability Engineer (SRE) / AI Platform DevOps Engineer to own infrastructure provisioning, CI/CD automation, telemetry pipelines, and production deployment for AI-powered services, agents, and orchestration systems.

This is an SRE-heavy, infrastructure-first role, focused on ensuring AI systems operating in production are:

• Reliable

• Observable

• Scalable

• Secure

• Cost-efficient

• Safe to deploy and operate

You will play a critical role in building and maintaining the platform foundation that enables AI services to run safely and efficiently at scale.

Key Responsibilities

1. Infrastructure Provisioning & Automation

• Design and manage cloud infrastructure using Infrastructure as Code (Terraform or similar)

• Provision and maintain Kubernetes clusters and supporting services

• Automate environment setup across development, staging, and production

• Manage networking, IAM, secrets, storage, and compute scaling

• Ensure high availability, resilience, and disaster recovery readiness

2. CI/CD & Deployment Engineering

• Build and maintain CI/CD pipelines for:

• AI services

• Agent frameworks

• Orchestrators

• Model artifacts

• Implement automated testing and reliability validation gates

• Enable blue/green and canary deployments

• Build safe rollback mechanisms for services and models

• Integrate reliability and health checks into deployment workflows

3. Model & Agent Deployment Governance

• Package, version, and deploy models into containerized environments

• Manage model artifact storage and promotion across environments

• Monitor model performance and detect degradation

• Support retraining cycle integration and model refresh workflows

• Ensure safe rollout and rollback of model versions

• Implement monitoring for inference latency, throughput, and cost

4. Data Pipelines for Telemetry & Observability

• Design and maintain data pipelines to ingest, clean, and process high-volume telemetry (logs, metrics, traces, events)

• Enable structured telemetry for AI and orchestration workflows

• Ensure reliability for real-time and batch processing

• Optimize pipeline scalability and performance

5. AIOps Platform Integration

• Evaluate, deploy, and integrate AIOps platforms

• Improve anomaly detection, correlation, and alert intelligence

• Reduce alert noise and improve signal quality

• Integrate AIOps outputs into operational workflows and incident management

6. Intelligent Incident Automation

• Automate incident detection and remediation workflows

• Build self-healing scripts and intelligent runbooks

• Reduce MTTD and MTTR through automation

• Integrate AI-driven root cause analysis insights into operational tooling

• Improve prevention of recurring incidents

7. Production Reliability & SRE Excellence

• Define and manage SLIs, SLOs, and error budgets

• Implement monitoring, dashboards, and alerting systems

• Participate in on-call rotation

• Lead incident triage and root cause analysis

• Improve resilience, scaling, and failure handling

• Implement circuit breakers, rate limits, and failover mechanisms

8. Security & Governance

• Implement least-privilege access controls

• Manage secrets and credential rotation

• Enforce environment isolation

• Ensure auditability and compliance for AI systems

Required Experience

• 5+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles

• Strong hands-on experience with cloud platforms (AWS, Azure, or GCP)

• Proven expertise with Kubernetes and containerized workloads

• Experience with Infrastructure as Code (Terraform, CloudFormation, etc.)

• Strong CI/CD implementation experience (GitHub Actions, GitLab CI, Jenkins, etc.)

• Experience building observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Datadog, etc.)

• Experience defining and managing SLIs/SLOs and error budgets

• Hands-on experience with incident response and production support

• Strong scripting skills (Python, Bash, or similar)

AI/ML Platform Experience (Strongly Preferred)

• Experience deploying and managing AI/ML services in production

• Familiarity with model packaging, versioning, and artifact management

• Understanding of model lifecycle management and retraining workflows

• Experience monitoring inference performance, latency, and cost

• Exposure to AIOps tools and intelligent alerting systems

Additional Skills

• Strong understanding of distributed systems reliability patterns

• Knowledge of security best practices in cloud-native environments

• Experience implementing high-availability and disaster recovery strategies

• Excellent problem-solving and root cause analysis skills

• Strong communication skills and ability to collaborate across engineering and AI teams

Discover some of the global benefits that empower our people to become the best version of themselves:

• Finance: Competitive salary package, share plan, company performance bonuses, value-based recognition awards, referral bonus;   
• Career Development: Career coaching, global career opportunities, non-linear career paths, internal development programmes for management and technical leadership;
• Learning Opportunities: Complex projects, rotations, internal tech communities, training, certifications, coaching, online learning platforms subscriptions, pass-it-on sessions, workshops, conferences;
• Work-Life Balance: Hybrid work and flexible working hours, employee assistance programme;
• Health: Global internal wellbeing programme, access to wellbeing apps;
• Community: Global internal tech communities, hobby clubs and interest groups, inclusion and diversity programmes, events and celebrations.

At Endava, we’re committed to creating an open, inclusive, and respectful environment where everyone feels safe, valued, and empowered to be their best. We welcome applications from people of all backgrounds, experiences, and perspectives—because we know that inclusive teams help us deliver smarter, more innovative solutions for our customers. Hiring decisions are based on merit, skills, qualifications, and potential. If you need adjustments or support during the recruitment process, please let us know.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply