ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

SLURM HPC Architect / Administrator

Confidential

Ottawa, Ontario Hybrid permanent

Posted: February 19, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

Design, deploy, and operate high-performance computing (HPC) clusters supporting AI training, large-scale inference, scientific computing, and enterprise workloads.

Job Description

SLURM HPC Architect / Administrator

Location: Remote (Canada, U.S., or Europe Preferred)
Company: Cylix Applied Intelligence
Employment Type: Full-Time or Contract

About the Role

Cylix Applied Intelligence is seeking an experienced SLURM HPC Architect / Administrator to design, deploy, and operate high-performance computing (HPC) clusters supporting AI training, large-scale inference, scientific computing, and enterprise workloads.

This role will focus on building and managing enterprise-grade HPC environments powered by GPU and CPU compute clusters, leveraging SLURM as the core workload orchestration and resource scheduling platform.

You will work closely with AI engineers, infrastructure teams, and enterprise clients to deliver scalable, reliable, and high-performance compute environments across on-premise, hybrid, and cloud platforms.

Key Responsibilities

HPC Cluster Architecture and Design

Design and implement SLURM-based HPC cluster architectures

Architect scalable CPU and GPU compute environments

Define cluster topology including compute, storage, login, and management nodes

Design high-availability SLURM controller configurations

Implement cluster segmentation, partitioning, and resource allocation strategies

SLURM Deployment and Administration

Install, configure, and manage SLURM workload manager environments

Configure SLURM partitions, queues, QoS policies, and scheduling policies

Manage job scheduling optimization and fair-share policies

Implement accounting, usage tracking, and reporting systems

Maintain SLURM cluster health, stability, and performance

GPU Cluster and AI Infrastructure Management

Configure GPU scheduling and allocation policies

Support GPU resource management including:

NVIDIA A100, H100, L40, and similar accelerator platforms

MIG partitioning and GPU isolation

Multi-tenant GPU resource allocation

Optimize cluster performance for AI training and inference workloads

Infrastructure Automation and Operations

Automate cluster deployment and configuration using:

Ansible, Terraform, or similar tools

Shell scripting and Python

Implement monitoring, alerting, and performance tracking systems

Support cluster lifecycle management, upgrades, and expansion

Storage and Filesystem Integration

Integrate HPC clusters with high-performance storage systems including:

NFS

Lustre

BeeGFS

GPFS / Spectrum Scale

Optimize I/O performance and storage architecture

User and Workload Support

Support enterprise and research users with job scheduling and optimization

Troubleshoot job failures and performance issues

Assist engineering teams in optimizing workloads for HPC environments

Required Qualifications

3+ years experience administering HPC clusters

Strong experience with SLURM workload manager

Strong Linux system administration experience (Ubuntu, Rocky Linux, RHEL, or similar)

Experience with HPC cluster architecture and deployment

Experience with shell scripting and automation

Experience with:

Cluster resource management

Multi-node distributed computing environments

SSH, networking, and Linux system internals

Preferred Qualifications

Experience managing GPU-based HPC clusters

Experience supporting AI / ML workloads

Experience with NVIDIA GPU platforms and drivers

Experience with:

CUDA environments

NVIDIA MIG configuration

GPU scheduling optimization

Experience with configuration management tools:

Ansible

Terraform

Puppet or Chef

Experience with monitoring tools such as:

Prometheus

Grafana

Node exporter

SLURM accounting tools

Nice to Have

Experience with large-scale enterprise or cloud HPC environments

Experience deploying HPC environments in cloud platforms such as:

AWS

Azure

Private cloud environments

Experience with containerized HPC workloads:

Docker

Singularity / Apptainer

Experience integrating SLURM with Kubernetes or hybrid orchestration systems

Example Projects You Will Work On

Deployment of enterprise AI GPU clusters

Multi-tenant SLURM cluster architecture design

GPU scheduling optimization for AI workloads

HPC infrastructure for large-scale inference and model training

Hybrid HPC environments spanning data center and cloud

HPC cluster performance optimization and scaling

Technology Environment

You will work with:

SLURM Workload Manager

Linux (Ubuntu, Rocky Linux, RHEL)

NVIDIA GPU platforms (A100, H100, L40)

High-performance storage systems

HPC networking (InfiniBand, high-speed Ethernet)

Automation tools (Ansible, Terraform)

Monitoring tools (Prometheus, Grafana)

Container environments (Docker, Apptainer)

What We Offer

Competitive compensation

Remote-first environment

Opportunity to work with cutting-edge HPC and AI infrastructure

Exposure to enterprise-scale AI and compute environments

Flexible employment structure (Full-Time or Contract)

Opportunity to architect next-generation HPC environments

About Cylix Applied Intelligence

Cylix Applied Intelligence builds enterprise AI infrastructure and high-performance computing environments supporting advanced AI workloads, intelligent automation, and enterprise-scale compute platforms.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply