IT Systems & HPC Infrastructure Specialist 60% - contractor
Confidential
Posted: May 13, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
We are seeking a Part-Time IT Systems & HPC Infrastructure Specialist to manage our technical environment, focusing on the tactical management of local HPC resources.
Required Skills
Job Description
Are you a Linux-savvy IT professional who thrives at the intersection of systems administration and High-Performance Computing (HPC)?
We are looking for a Part-Time IT Systems & HPC Infrastructure Specialist to manage our technical environment. While we have 30 employees who need standard support, our additional challenge lies in maintaining the high-performance compute farms and cloud-bursting capabilities that drive our engineering team.
Currently, these tasks are handled by our senior engineers; we are looking for a dedicated specialist to take over the tactical management of our local and cloud infrastructure, ensuring our compute-heavy workloads run reliably and securely, as well as helping enhance and maintain our standard office IT infrastructure.
Responsibilities
• HPC (High-Performance Computing) - Ensuring engineers have the compute and tools needed for their work.
• Compute Management: Manage and optimize job scheduling tools (SLURM, LSF-like) for heavy compute loads.
• Hardware Acceleration: Ensure optimal GPU access (both local and cloud-based).
• DevOps & Infrastructure - Automation, security, and the "invisible" backbone that keeps systems running reliably.
• Infrastructure Automation: Use Ansible to deploy, configure, and maintain on-premises servers (Proxmox/bare-metal) and cloud environments.
• Containerization & Data Pipelines: Facilitate data pipelines utilizing Kubernetes, Docker, and containers.
• Security & IAM: Manage identity and authentication (LDAP, SSSD, SAML) and maintain firewall rules (Fortinet).
• Data Integrity: Maintain a rigorous backup/snapshot regime and ensure all SaaS data (e.g. GitLab, Coda) is archived to the NAS.
• Business Continuity: Establish disaster recovery procedures and conduct security audits/log monitoring.
• Helpdesk & IT Operations - Supporting the human element and managing the physical office technology.
• User Support: Provide essential support for the ~ 30 users (troubleshooting hardware, app freezes, network latency).
• Endpoint Management: Deploy and maintain Operating Systems (Windows/macOS) and manage ESET/Google Admin consoles.
• Life Cycle Management: Onboarding/Offboarding employees and providing training on security/IT tools.
• Vendor Relations: Act as the technical point of contact for partners and third-party service providers.