MisuJob - AI Job Search Platform MisuJob

High Performance Computing Software Engineer - Supercomputing

Ifm Us

Sunnyvale, CA permanent

Posted: April 3, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

Work as a High Performance Computing Software Engineer to design, develop, and maintain high-performance computing systems, such as AI models, and contribute to the development of foundation models.

Job Description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

IFM is building the foundational compute infrastructure that will power tomorrow’s breakthroughs in AI and computational science. We’re looking for a High Performance Computing Software Engineer to help us design, develop, and operate the software systems that run our large-scale AI workloads.

In this role, you’ll work at the intersection of high-performance computing and machine learning. You’ll be part of a team responsible for crafting the software stack that enables training of cutting-edge ML models—spanning 1000+ GPUs—and ensuring our infrastructure is robust, performant, and developer-friendly.

Job Responsibilities

• Design and implement high-performance, distributed software solutions for large-scale AI/ML training.

• Optimize low-level system components including Linux kernel, GPU/accelerator kernels, and interconnects.

• Develop and tune communication libraries such as NCCL, MPI, UCX, RCCL, and RDMA-based systems.

• Partner with ML researchers and engineers to support frameworks like PyTorch, MegatronLM, and DeepSpeed in large-scale production environments.

• Contribute to our scheduling, orchestration, and job management systems, including Slurm and Kubernetes.

• Debug and resolve complex issues across the stack—from kernel to container to model.

• Work closely with hardware vendors, upstream open-source communities, and internal teams to drive performance and reliability improvements.

Skills & Experience

• Proven experience developing and optimizing software for large-scale ML workloads (1000+ GPUs preferred).

• Deep understanding of Linux kernel internals and accelerator (GPU) kernel development.

• Proficiency with distributed communication libraries (e.g., NCCL, RCCL, MPI, UCX, SHARP, Libfabric).

• Experience with ML frameworks like PyTorch, TensorFlow, JAX, or MegatronLM.

• Strong knowledge of HPC job scheduling and orchestration tools (e.g., Slurm, Kubernetes, Pyxis).

• Excellent debugging and systems performance tuning skills.

• A collaborative mindset with a focus on shared success and technical excellence.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply