Deep Learning Solution Architect
NVIDIA
Posted: April 3, 2026
Interested in this position?
Create a free account to apply with AI-powered matching
Quick Summary
As a Deep Learning Solution Architect, you will design, optimize, and deliver high-performance generative AI solutions for NVIDIA's customers, leveraging our full software and hardware ecosystem. You will work closely with cross-functional teams to develop and implement large-scale AI models, and ensure that your solutions meet our high standards for quality and scalability.
Required Skills
Job Description
NVIDIA are seeking dynamic Solution Architects with specialized expertise in training Large Language Models (LLMs), implementing RAG workflows, and agentic inference. You will leverage the full NVIDIA software & hardware ecosystem to design, optimize, and deliver production-grade generative AI solutions for enterprise customers. With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.
What You Will Be Doing:
• Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
• Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
• Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
• Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
• Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).
What We Need to See:
• Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience.
• 4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.
• Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.
• Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.
• Competency in agentic inference design and using AI agents to solve business challenges.
• Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.
Ways to Stand Out from the Crowd:
• Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).
• Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).
• Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.
• In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management.
#deeplearning