ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

Senior Software Engineer, Inference Platform

AION

Bengaluru, Karnataka, India Hybrid permanent

Posted: December 17, 2025

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

AION is building an interoperable AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud.

Job Description

About AION

AION is building an interoperable AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance, AION democratizes access to compute and provides managed services, aiming to be an end-to-end AI lifecycle platform—taking organizations from data to deployed models using its forward-deployed engineering approach.

AI is transforming every business around the world, and the demand for compute is surging like never before. AION thrives to be the gateway for dynamic compute workloads by building integration bridges with diverse data centers around the world and re-inventing the compute stack via its state-of-the-art serverless technology. We stand at the crossroads where enterprises are finding it hard to balance AI adoption with security. At AION, we take enterprise security and compliance very seriously and are re-thinking every piece of infrastructure from hardware and network packets to API interfaces.

Led by high-pedigree founders with previous exits, AION is well-funded by major VCs with strategic global partnerships. Headquartered in the US with global presence, the company is building its initial core team in India.

Who You Are

You're a seasoned engineer who has built and scaled high-performance inference systems for AI/ML workloads. . You understand the complexities of serving models at scale—latency optimization, resource orchestration, autoscaling dynamics, and production reliability. You've designed distributed systems that handle thousands of requests per second while maintaining sub-second response times and cost efficiency.

Experience with Golang is strongly preferred, and exposure to inference engines (vLLM, TGI, TensorRT), containerization, and distributed systems is an added bonus. You take ownership of platform-level decisions, think strategically about performance vs. cost trade-offs, and want your work to power AI inference for thousands of developers globally.

You're product-minded—you understand how your technical decisions impact developers using AION's platform and think about the end-to-end user experience. You're a team player comfortable wearing multiple hats—one day you're optimizing inference latency, the next you're joining customer calls to understand their deployment challenges, and the day after you're helping with UI/UX, customer success, documentation and product ops.


Requirements:
What You'll Do

Inference Platform Architecture & Core Services

• Design and build AION's inference service platform—the backbone for serving AI models at scale across diverse workloads
• Own and architect core platform components: AI Gateway, Resource Orchestrator, Runtime Engines, and Autoscaler
• Design highly modular, scalable, and extensible low-level designs (LLDs) for inference infrastructure components
• Lead high-level design discussions, establish architectural patterns, and drive technical decision-making for the inference stack

Model Deployment & Lifecycle Management

• Understand and optimize the dynamics of model deployment, version upgrades, and rollback strategies
• Build robust deployment pipelines for seamless model updates with zero-downtime deployments
• Design intelligent routing systems for multi-model serving, A/B testing, and canary deployments
• Implement strategies for efficient GPU utilization and model cold-start optimization

Performance & Distributed Systems

• Implement highly performant and optimized software for low-latency, high-throughput inference serving
• Build and debug production-grade code in distributed systems handling real-time AI workloads
• Optimize inference pipelines for latency, throughput, batching efficiency, and resource utilization
• Design fault-tolerant systems with graceful degradation and automatic recovery mechanisms

Observability & Engineering Excellence

• Build high-performance telemetry and observability stack for inference metrics, performance tracking, and debugging
• Implement comprehensive monitoring for model latency, throughput, error rates, GPU utilization, and cost per inference
• Conduct thorough code reviews to maintain code quality, performance standards, and architectural consistency
• Establish engineering best practices for testing, documentation, and production readiness

Technical Skills & Experience

If you are meeting some of these requirements and feel comfortable catching up on others, we definitely recommend you to apply:

• 4+ years of experience building and scaling backend systems, distributed platforms, or inference infrastructure
• Strong understanding of AI/ML inference systems and experience with inference engines (vLLM, TGI, TensorRT-LLM, or similar)
• Deep knowledge of distributed systems design, microservices architecture, and API gateway patterns
• Proficiency in Golang strongly preferred; Python, Rust, C++ for performance-critical components a plus
• Experience with container orchestration (Kubernetes, Docker) and infrastructure-as-code
• Solid understanding of autoscaling strategies, load balancing, and resource scheduling algorithms
• Experience building high-throughput, low-latency systems with sub-100ms response time requirements
• Familiarity with message queues (Kafka, RabbitMQ), databases (PostgreSQL, Redis), and event-driven architectures
• Knowledge of GPU computing, model serving optimizations (batching, quantization, multi-tenancy), and resource allocation
• Experience with observability tools (Prometheus, Grafana, OpenTelemetry) and distributed tracing
• Understanding of API design, rate limiting, authentication/authorization, and security best practices
• Exposure to AI model deployment workflows and model lifecycle management is highly desirable

Bonus / Good to Have

Having expertise in one or more of these specializations is highly desired:

• HPC & Cluster Management: Experience handling large-scale HPC clusters using Kubernetes and Slurm for job scheduling, resource allocation, and workload orchestration
• Data Engineering: Expertise with data pipelines, ETL systems, and large-scale data processing frameworks
• Systems-Level Programming: Experience with low-level systems programming such as storage systems, Kubernetes operators, OS-level software development, or daemon services (llm-d, system agents)
• ML Platform Engineering: Experience productionizing ML pipelines, batch job orchestration, model fine-tuning workflows, and Jupyter notebook orchestration systems

Enterprise Deployment: Experience platformizing and packaging software for on-premises deployments or customer VPC installations with emphasis on security, compliance, and operational simplicity


Benefits:
Preferred Attributes:

• High ownership, self driven and bias for action.
• Strong strategic thinking and ability to connect technical decisions to business impact.
• Excellent communication and mentoring skills.
• Thrives in ambiguity, fast-paced environments, and early-stage startup culture.

Why Join AION?

• Work directly with high-pedigree founders shaping technical and product strategy.
• Build infrastructure powering the future of AI compute globally.
• Significant ownership and impact with equity reflective of your contributions.
• Competitive compensation, flexible work options, and wellness benefits

Apply Now:
If you’re a strong engineer ready to lead architecture and scale next-generation AI infrastructure, we want to hear from you. Please share:

• Your resume highlights relevant projects and leadership experience.
• Links to products, code, or demos you’ve built.
• A brief note on why AION’s mission excites you.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply