Deep Learning Kernel Software Performance Architect

NVIDIA

2 Locations permanent

Posted: February 6, 2026

Job Description

NVIDIA is seeking Software Performance Architects to optimize GPU kernel performance for state-of-the-art data-center platforms. We build automated, data-driven workflows to detect, explain, and prevent performance regressions across key deep learning workloads, partnering closely with kernel developers, compiler teams, infrastructure, and architecture/performance groups.

What you'll be doing:

• Performance analysis + debugging

• Validate and analyze performance of GPU-accelerated kernels and key deep learning building blocks.

• Debug performance issues end-to-end: reproduce, isolate root causes, propose fixes or mitigation paths, and drive closure with the owning teams.

• Build performance narratives using structured evidence: baselines, controlled comparisons, and regression attribution.

• Automation + regression infrastructure (Python-heavy)

• Develop and maintain Python-based automation for performance testing and analysis—using modern AI-assisted developer tools (e.g., Cursor/Claude Code/Copilot) to accelerate scripting while keeping code maintainable and reviewable.

• Design and operate performance test workflows: coverage definition, test/workload generation, automated large-scale execution (CI/nightly/on-demand), rerun rules, and reproducibility standards.

• Convert raw run outputs into actionable insight: statistics, noise control, post-processing, visualization, and large-scale result mining.

• Cross-team collaboration and operating model

• Work with kernel developers and compiler/rotation teams to ensure performance checks are practical, scalable, and aligned to release needs.

• Partner with SWQA and infrastructure teams for execution at scale and reliable pipelines/dashboards.

• Contribute to clear ownership/triage/routing rules so regressions close quickly and consistently

• Following general software engineering best practices including support for regression testing and CI/CD flows

What we need to see:

• Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field

• Strong programming ability in Python plus C/C++ (performance-oriented code reading/debugging)

• Solid fundamentals in computer architecture and performance reasoning (latency/throughput, memory hierarchy, parallelism).

• Experience with performance analysis workflows: profiling, measurement methodology, reproducibility, and regression triage.

• Comfortable working across teams and driving issues to decision/closure with clear communication

• Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design

• Experience with performance-oriented parallel programming, even if it’s not on GPUs (e.g. with OpenMP or pthreads)

• Solid understanding of computer architecture and some experience with assembly programming

• Identify bottlenecks, optimize resource utilization, and improve throughput

Ways to stand out from the crowd:

• Experience with high-performance kernels or math libraries (e.g., GEMM/attention, CUTLASS-like concepts)

• Experience building CI/nightly regression systems, dashboards, or large-scale performance analytics

• GPU programming/perf experience (CUDA or equivalent parallel programming)

• Strong ML/DL workload understanding (training/inference shapes, precision modes, perf bottlenecks)

• Familiarity with simulators/analytical modeling or performance characterization methodology

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.