ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

MLOps and Platform Engineer (AI Platform Reliability )

Apna

Bengaluru, Karnataka, India permanent

Posted: November 20, 2025

Interested in this position?

Create a free account to apply with AI-powered matching

Job Description

Software Engineer (SDE-2) – DevOps, SRE & MLOps Platform Engineering
Location: Bengaluru
Employment Type: Full-time
Team: Platform Engineering / Reliability

About Blue Machines

Blue Machines powers large-scale, real-time Voice AI platforms and Agentic Workflows for global enterprises across BFSI, Healthcare, HRTech and customer experience domains.
Built and scaled from India, our platform has processed 14.5M+ minutes of production-grade AI agent conversations, operating latency-sensitive, always-on voice systems across geographies.

About the Role

We are hiring a hands-on DevOps / SRE engineer who owns platform reliability, observability and automation and grows into MLOps and AI platform engineering.
This role focuses on designing, operating and evolving the infrastructure behind real-time Voice AI systems. You work directly on production systems at global scale, driving uptime, performance and resilience.

Key Responsibilities

Platform Reliability & SRE

• Own 99.9%+ platform uptime for real-time Voice AI workloads.
• Participate in on-call rotations, incident response and post-incident reviews.
• Lead root cause analysis (RCA) and drive permanent reliability improvements.
• Design and implement self-healing systems using automation, retries, circuit breakers and failover strategies.

Kubernetes & Cloud Infrastructure

• Design, operate and scale Kubernetes clusters in public cloud environments.
• Work with managed Kubernetes platforms such as GKE, and apply cloud-native best practices.
• Implement auto-scaling strategies (HPA, VPA, node pools, GPU workloads).
• Manage infrastructure using Infrastructure as Code (Terraform).
• Optimize infrastructure for performance, reliability and cost efficiency.

Observability & Incident Intelligence

• Build and maintain monitoring, logging and alerting systems using Prometheus, Grafana, Loki and OpenTelemetry.
• Define SLIs, SLOs and error budgets for platform and AI workloads.
• Drive signal-based alerting to reduce noise and improve response quality.
• Implement anomaly detection and predictive alerting for infrastructure and AI pipelines.

CI/CD & Platform Automation

• Design and maintain CI/CD pipelines for services and infrastructure.
• Build internal automation tooling for:
• Progressive and canary deployments
• Auto-scaling and capacity planning
• Faster incident diagnosis and recovery

• Enable self-service DevOps workflows for engineering teams.

MLOps & AI Platform Reliability

• Own reliability and performance of STT, TTS and LLM inference pipelines.
• Design provider routing, failover and SLA enforcement mechanisms.
• Deploy, version and roll back AI models and inference services.
• Monitor inference latency, quality and drift in production systems.
• Operate GPU-backed inference workloads where applicable.

Security, Compliance & Resilience

• Enforce DevSecOps practices across build and deploy pipelines.
• Implement network policies, encryption, secrets management and access controls.
• Drive disaster recovery, backup strategies and resilience testing.
• Contribute to SOC2 / ISO compliance and audits.

Collaboration & Engineering Excellence

• Partner with backend, AI and platform teams on architecture and reliability.
• Influence system design through a reliability-first mindset.
• Mentor junior engineers and raise the overall bar for operational excellence.

Qualifications

Must-Have

• 3–6 years of experience in DevOps, SRE or Platform Engineering roles.
• Strong hands-on experience with Kubernetes and Docker in production environments.
• Familiarity with public cloud platforms and managed Kubernetes services (such as GKE).
• Strong understanding of distributed systems and production debugging.
• Hands-on experience with observability systems.
• Proficiency with Infrastructure as Code (Terraform).
• Strong incident ownership and communication skills.

Good-to-Have

• Experience with MLOps or AI inference platforms.
• Familiarity with LLM pipelines, real-time streaming or telephony systems.
• Experience operating GPU workloads.
• Knowledge of AIOps, anomaly detection or intelligent alerting.
• Cloud cost optimization experience.

Why Blue Machines

• Build global-scale AI infrastructure from India.
• Operate real-time Voice AI systems with 14.5M+ minutes in production.
• Work on low-latency, high-reliability platforms.
• Grow from DevOps/SRE into MLOps and AI platform engineering.
• High ownership, deep technical impact and real production scale

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply