ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

AI Performance Engineer

Confidential

Not specified permanent

Posted: January 30, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

AI Performance Engineer is responsible for designing and implementing scalable and efficient AI and HPC solutions for high-performance computing workloads.

Job Description

Cornelis Networks delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware, software and system level technologies to maximize the efficiency of GPU, CPU and accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation, performance and scalability - solving the world’s most demanding computational challenges with our next-generation networking solutions.

  

We are a fast-growing, forward-thinking team of architects, engineers, and business professionals with a proven track record of building successful products and companies. As a global organization, our team spans multiple U.S. states and six countries, and we continue to expand with exceptional talent in onsite, hybrid, and fully remote roles. 

We’re seeking an AI Performance Engineer that will optimize training and multi-node inference across next-gen networking silicon and systems—adapters, switches, and the software stack that ties it all together. You’ll partner with architecture, firmware, software, and lighthouse customers to turn lab results into field-proven wins with an emphasis on distributed serving architectures and P99-aware optimizations.

Key Responsibilities:

Own end-to-end performance for distributed AI workloads (training + multi-node inference) across multi-node clusters and diverse fabrics (Omni-Path, Ethernet, InfiniBand).

Benchmark, characterize, and tune open-source & industry workloads (e.g., Llama, Mixtral, diffusion, BERT/T5, MLPerf) on current and future compute, storage, and network hardware, including vLLM/TensorRT-LLM/Triton serving paths.

Design and optimize distributed serving topologies (sharded/replicated, tensor/pipe parallel, MoE expert placement), continuous/adaptive batching, KV-cache sharding/offload (CPU/NVMe) & prefix caching, and token streaming with tight p99/p999 SLOs.

Optimize inferencing: Validate RDMA/GPUDirect RDMA, congestion control, and collective/point-to-point tradeoffs during inference.

Design experiment plans to isolate scaling bottlenecks (collectives, kernel hot spots, I/O, memory, topology) and deliver clear, actionable deltas with latency-SLO dashboards and queuing analysis.

Build crisp proof points that compare Cornelis Omni-Path to competing interconnects; translate data into narratives for sales/marketing and lighthouse customers, including cost-per-token and tokens/sec-per-watt for serving.

Instrument and visualize performance (Nsight Systems, ROCm/Omnitrace, VTune, perf, eBPF, RCCL/NCCL tracing, app timers) plus serving telemetry (Prometheus/Grafana, OpenTelemetry traces, concurrency/queue depth).

Evangelize best practices through briefs, READMEs, and conference-level presentations on distributed inference patterns and anti-patterns.

 

Minimum Qualifications:

B.S. in CS/EE/CE/Math or related

5–7+ years running AI/ML at cluster scale.

Proven ability to set up, run, and analyze AI benchmarks; deep intuition for message passing, collectives, scaling efficiency, and bottleneck hunting for both training and low-latency serving.

Hands-on with distributed training beyond single-GPU (DP/TP/PP, ZeRO, FSDP, sharded optimizers) and distributed inference architectures (replicated vs sharded, tensor/KV parallel, MoE).

Practical experience across AI stacks & comms: PyTorch, DeepSpeed, Megatron-LM, PyTorch Lightning; RCCL/NCCL, MPI/Horovod; Triton Inference Server, vLLM, TensorRT-LLM, Ray Serve, KServe.

Comfortable with compilers (GCC/LLVM/Intel/OneAPI) and MPI stacks; Python + shell power user.

Familiarity with network architectures (Omni-Path/OPA, InfiniBand, Ethernet/RDMA/ROCE) and Linux systems at the performance-tuning level, including NIC offloads, CQ moderation, pacing, ECN/RED.

Excellent written and verbal communication—turn measurements into
persuasion with SLO-driven narratives for inference.

Preferred Qualifications: 

M.S. in CS/EE/CE/Math or related

Scheduler expertise (SLURM, PBS) and multi-tenant cluster ops.

Hands-on profiling & tracing of GPU/comm paths (Nsight Systems, Nsight Compute, ROCm tools/rocprof/roctracer/omnitrace, VTune, perf, PCP, eBPF).

Experience with NeMo, DeepSpeed, Megatron-LM, FSDP, and collective ops analysis (AllReduce/AllGather/ReduceScatter/Broadcast).

Background in HPC performance engineering or storage (BeeGFS, Lustre, NVMeoF) for data & checkpoint pipelines.

Location: This is a remote position for employees residing within the United States.

We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.

  

At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives. 

In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.

 

Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply