MLOps Engineer
Accellor
Posted: April 6, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
Design, build and maintain infrastructure and pipelines for AI and Machine Learning systems.
Required Skills
Job Description
We are seeking a Senior MLOps Engineer to design, build, and maintain the infrastructure and pipelines that operationalize AI and Machine Learning systems at scale. This role bridges the gap between model development and production deployment—ensuring ML and GenAI workloads are reliable, observable, cost-efficient, and continuously improving across enterprise environments.
Key Responsibilities
• Design and implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, evaluation, and deployment.
• Build and manage CI/CD pipelines for ML models, including automated testing, validation, and rollback mechanisms.
• Architect and maintain model serving infrastructure for real-time and batch inference workloads, including LLM and agentic AI deployments.
• Implement model monitoring, drift detection, and alerting systems to ensure production model health and reliability.
• Manage experiment tracking, model versioning, and artifact registries to enable reproducibility and governance.
• Optimize compute costs and inference latency across GPU/CPU workloads on cloud platforms (AWS, Azure, or GCP).
• Containerize and orchestrate ML workloads using Docker and Kubernetes.
• Automate data pipeline workflows and feature store management for training and inference.
• Collaborate with AI Engineers, Data Scientists, and Platform teams to streamline the path from prototype to production.
• Establish and enforce MLOps best practices, standards, and documentation across the engineering organization.
Requirements:
• Bachelor’s degree in Computer Science, Engineering, or a related field.
• 5+ years of experience in DevOps, Platform Engineering, or MLOps roles with 1–2+ years focused on ML/AI infrastructure.
• Strong programming skills in Python; experience with Bash, Go, or Java is a plus.
• Hands-on experience with ML pipeline orchestration tools such as Kubeflow, MLflow, Airflow, or Vertex AI Pipelines.
• Proficiency with containerization (Docker) and orchestration (Kubernetes, Helm).
• Experience with cloud-native ML services on AWS (SageMaker), Azure (Azure ML), or GCP (Vertex AI).
• Familiarity with model serving frameworks such as TorchServe, Triton Inference Server, vLLM, or TGI.
• Knowledge of Infrastructure as Code (Terraform, Pulumi, or CloudFormation).
• Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or equivalent).
• Strong understanding of software engineering fundamentals, version control (Git), and CI/CD practices.
Nice to Have:
Experience deploying and serving Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems in production.
Familiarity with vector databases (Pinecone, Weaviate, Qdrant, or pgvector).
Exposure to AI observability platforms (LangSmith, Weights & Biases, Arize, or WhyLabs).
Experience with feature stores (Feast, Tecton, or equivalent).
Familiarity with GPU cluster management and distributed training infrastructure.
Experience with enterprise SaaS platforms and multi-tenant ML infrastructure.