Jobs

Senior Product Manager – Edge AI, Computer Vision & Multimodal Inference

HP Inc

2 Locations permanent

Strategic ThinkingHands-On ExecutionCross-Functional AlignmentVision DevelopmentRoadmap DefinitionIntegrated Marketing SolutionsGo-to-Market StrategyHardware-Software-Service Integration

Senior Product Manager – Edge AI, Computer Vision & Multimodal Inference Description - About Us Innovation is in HP’s DNA. From our origins in a Palo Alto garage in 1939, to our current position as...

April 3, 2026 View Details

Software Engineer, AI Inference Systems - New College Graduate 2026

NVIDIA

US, CA, Santa Clara permanent

Computer SciencePythonGoRustCUDAGPU ProgrammingPerformance EngineeringLLM InferenceML techniquesDSLsContainerization

We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-perf...

April 2, 2026 View Details

Distinguished Engineer - Inference Serving Network and Storage

Graphcore

Austin, Texas, United States (US - Austin) Remote permanent

Artificial IntelligenceData Center HardwareNetwork BuildStorage ArchitectureQoSTemporary Traffic ControlSegmentationService PhilosophyObservabilityAutomationSemiconductorsTransport Tuning

About us Graphcore is a globally recognized leader in Artificial Intelligence computing systems. The company designs advanced semiconductors and data center hardware that provide the specialized proc...

April 2, 2026 View Details

Sr. Manager, Engineering - AI Gateway (LLM Inference)

Databricks

New York (New York City, New York) permanent

Large Scale SystemsMachine LearningGenAI SystemsProduct DevelopmentScalingControl-Plane SoftwareStandardizationSecurityObservabilityCustomer Value

RDQ127R255 At Databricks, we are passionate about enabling data teams to solve the world’s toughest problems — from making the next mode of transportation a reality to accelerating the development of...

April 2, 2026 View Details

Technical Lead - AI Inferences

Wekatest

U.S. Remote Remote permanent

Technical LeadershipTeam ManagementInference OptimizationFramework MasteryEvaluating AI OutputvLLMKV cache reuseSpeculative DecodingContinuous BatchingLMCacheNIXL

WEKA is architecting a new approach to the enterprise data stack built for the age of reasoning. NeuralMesh by WEKAsets the standard for agentic AI data infrastructure with a cloud and AI-native softw...

March 30, 2026 View Details

Senior Director, NVIDIA AI Inference Sales

NVIDIA

US, CA, Santa Clara permanent

Go-To-Market StrategyEcosystem EngagementEnablementRoadmap InfluenceRevenue GrowthEnterprise SalesPartner ManagementTechnical EnablementPipeline ManagementWritten Feedback

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. To...

March 30, 2026 View Details

Software Engineer, Cloud Inference Safeguards

Anthropic

San Francisco, CA | Seattle, WA (San Francisco, CA, Seattle, WA) Hybrid permanent

CloudDockerKubernetesPostgreSQLTelemetryLoggingTeachingRoot Cause AnalysisComp Management

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickl...

March 27, 2026 View Details

Director of Engineering - AI Inferences

Wekatest

U.S. Remote Remote permanent

Technical LeadershipTeam ManagementInference OptimizationFramework MasteryDeep Domain ExpertiseStack KnowledgeBackend EngineeringInfrastructure

WEKA is architecting a new approach to the enterprise data stack built for the age of reasoning. NeuralMesh by WEKAsets the standard for agentic AI data infrastructure with a cloud and AI-native softw...

March 26, 2026 View Details

Senior System Software Engineer – Embedded AI Inference

NVIDIA

Germany, Munich permanent

C++CUDAAI InferencePyTorchMLGPU-Centric Performance EngineeringAutomotive SystemsLinux DevelopmentGPU ProgrammingTensorRT

NVIDIA is synonymous with innovation, boasting trailblazers who are shaping the world with their forward-thinking approaches. This is your chance to be part of a vibrant community that's redefining th...

March 20, 2026 View Details

Senior Applied Scientist, Causal Inference

LinkedIn3

Mountain View, CA, United States permanent

Data ScienceMachine LearningCausal InferenceStatistical AnalysisExperimental DesignLarge-Scale Data ProcessingResearchModel DevelopmentProduct EnhancementCross-Functional Collaboration

LinkedIn is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover excitin...

March 19, 2026 View Details

TL, Research Inference

Openai

San Francisco, California, United States permanent

Distributed SystemsInference PipelinesGPU OptimizationProfilingBenchmarkingLow-Level DebuggingObservabilityCorrectnessGPU-Centric Performance EngineeringMulti-GPU SystemsMemory Behavior

About the Team The Foundations team focuses on how model behavior changes as we scale models, data, and compute. The team studies the interactions between model architecture, optimization, and traini...

March 19, 2026 View Details

Senior Software Engineer, Deep Learning Inference - Automotive Safety

NVIDIA

US, CA, Santa Clara permanent

C++Deep LearningTensorRTSoftware EngineeringAutomotive SafetySystems ProgrammingPerformance OptimizationTestingDocumentationBenchmarking

Are you passionate about driving innovation in deep learning and eager to work on cutting-edge AI technology for safety-critical applications? Join NVIDIA's TensorRT team as a Senior Software Engineer...

March 18, 2026 View Details

(Contract) Senior Data Scientist, Platform Inference - MarTech DS Measurement

Airbnb

United States Remote permanent

Marketing Mix ModelingPythonData pipelinesGenerative AI IntegrationProductionalizationGeo-Based MeasurementBayesian frameworksInsight Communication

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every co...

March 18, 2026 View Details

Engineering Manager, Inference Routing and Performance

Anthropic

San Francisco, CA | New York City, NY (San Francisco, CA) Remote permanent

Systems DesignAlgorithm DevelopmentPerformance OptimizationDistributed SystemsLoad BalancingQuantitative ModelingIncident ManagementTeam LeadershipArchitecture EvaluationSystem-Level Performance

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickl...

March 18, 2026 View Details

Senior Software Engineer, AI Inference Systems

NVIDIA

Canada, Toronto permanent

Computer SciencePythonGPU ProgrammingCUDAPerformance EngineeringLLM InferencevLLMSGLangDistributed Systems

We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-perf...

March 17, 2026 View Details

Manager, Large Language Model Inference

NVIDIA

US, CA, Santa Clara permanent

LeadershipTeam ManagementSoftware EngineeringC++PythonGPU ArchitectureCUDA ProgrammingLLM InferenceTensorRTProduction Software Development

At NVIDIA, we aren't just powering the AI revolution—we're accelerating it. The TensorRT inference platform is the backbone of modern AI, delivering the industry's fastest and most efficient deploymen...

March 17, 2026 View Details

Backend Engineer- Inference Services

Deepgram

Remote, California, United States Remote permanent

BackendSoftwareEngineerInferenceServicesCloudScalableComputeOptimizationDebuggingCustomizationDistributed

Company Overview Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text (STT), text-to-speech (TTS), and building pro...

March 14, 2026 View Details

Inference Technical Lead, On-Device Transformers

Openai

San Francisco, California, United States Remote permanent

Inference SystemsGPU ArchitectureModel DeploymentPerformance OptimizationWorkload MonitoringTeam LeadershipCUDA KernelsCompiler DevelopmentNPU Design

About the Team The Future of Computing Research team is an applied research team in the Consumer Devices group focused on developing new methods and models to support our vision as we advance forward...

March 13, 2026 View Details

Sr. Software Engineer, Inference

Anthropic

London, UK Remote permanent

Distributed SystemsMachine Learning SystemsLoad BalancingTraffic ManagementLLM Inference OptimizationKubernetesCloud InfrastructurePythonRustRequest Routing

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickl...

March 12, 2026 View Details

Staff Software Engineer, Inference

Anthropic

Dublin, IE Remote permanent

Distributed SystemsPerformance OptimizationMachine Learning SystemsLLM Inference OptimizationKubernetesAWSGCPPythonRust

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickl...

March 11, 2026 View Details

Lead AI Inference Engineer QVAC (100% remote Worldwide)

Confidential

Remote job permanent

C++Problem Solving TechniquesJob Schedule OptimizationMemory managementSystem engineeringPredictable ResultsTechnical ownershipInfra trust

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

March 11, 2026 View Details

AI Inference Engineer QVAC (100% remote Worldwide)

Confidential

Remote job permanent

C++Problem Solving TechniquesSystem ArchitecturePerformance OptimizationMemory ManagementProduction DeploymentEnglish CommunicationAI Inference

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

March 11, 2026 View Details

Customer Support Engineer (Inference), India

Togetherai

India (Remote) Remote permanent

Customer SupportTechnical Problem SolvingAI ExpertiseGPU ClustersInference ServicesInfrastructure ServicesKubernetesPythonTypeScriptCross-functional CollaborationFine-tuning Services

About the Role As a Customer Support Engineer at a pioneering AI company, you'll be the first line of defense to support customers as they build out training, fine tuning, and inference solutions wit...

March 10, 2026 View Details

Senior Software Engineer, Inference

Anthropic

London, UK Remote permanent

Distributed SystemsMachine Learning SystemsLoad BalancingTrade RoutingTraffic ManagementLLM Inference OptimizationKubernetesCloud InfrastructurePythonRust

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickl...

March 10, 2026 View Details

Deep Learning Architect, LLM Inference - New College Grad 2026

NVIDIA

US, CA, Santa Clara permanent

Deep LearningInferenceGame PerformancePyTorchLLMPerformance OptimizationMicroarchitectureSoftware DevelopmentCollaborationCommunication

We are now looking for a Deep Learning Architect, LLM Inference! NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team specifically focuses on inference ser...

March 9, 2026 View Details

AI Inference Performance Engineer

NVIDIA

US, CA, Santa Clara permanent

GPU Performance Engineeringdeepset CloudTensorRT-LLMvLLMSGLangSoftware DevelopmentPython ProgrammingC++ ProgrammingPerformance OptimizationDistributed Inference

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directl...

March 9, 2026 View Details

Inference Optimization Architect, Speech AI

NVIDIA

India, Pune permanent

Inference OptimizationModel CompressionBenchmarkingHardware AccelerationInfrastructure DesignCross-Platform OptimizationToolingResource ManagementModel ServingGPU Profiling

Widely considered to be one of the technology world’s most desirable employers, NVIDIA is an industry leader with groundbreaking developments in High-Performance Computing, Artificial Intelligence and...

March 7, 2026 View Details

Engineering Manager, Cloud Inference AWS

Anthropic

San Francisco, CA | New York City, NY (San Francisco, CA, Seattle, WA) Hybrid permanent

AWSCloudInferenceKubernetesDockerCapacity ManagementCapacity PlanningAPI DevelopmentLoad BalancingOperationsCapacity OptimizationLLM Serving

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickl...

March 5, 2026 View Details

AI Inference Performance Engineer - New College Grad 2026

NVIDIA

US, CA, Santa Clara permanent

Performance OptimizationBenchmarkingTensorRT-LLMSGLangvLLMDeep LearningQuantizationSchedulingMemory ManagementGPU Performance EngineeringDistributed Inference

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directl...

March 5, 2026 View Details

Software Engineer, Inference Platform

Fluidstack

San Francsisco, California, US permanent

PythonGoDistributed SystemsKubernetesLLM ServingPerformance TuningThroughput OptimizationIncident ResponseOn-Call RotationCost-Per-Token

About Fluidstack At Fluidstack, we’re building the infrastructure for abundant intelligence. We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs...

March 5, 2026 View Details

Software Engineer, AI Inference / HPC

Topazlabs

Dallas, TX permanent

Performance OptimizationConcurrencyMultithreadingMemory ManagementSpeedBenchmarkingReliabilityAPI ArchitectureImage ProcessingC++OpenCV

54,000 new photos are taken every second, and 600 hours of video are uploaded every minute. At Topaz Labs, we help over 1 million paying customers (including teams at Google, Nvidia, and NASA) maximiz...

March 5, 2026 View Details

Software Engineer (Inference Platform)

Isomorphiclabs

London (London ) Remote permanent

KubernetesMachine Learning ModelsSoftware DevelopmentProduction SupportUser-Centric SolutionsMaintenanceObservabilityDistributed Systems Programming

Isomorphic Labs is applying frontier AI to help unlock deeper scientific insights, faster breakthroughs, and life-changing medicines with an ambition to solve all disease. The future is coming. A fut...

March 4, 2026 View Details

Member of Technical Staff, Inference & RL Systems

Magic.dev

Archived San Francisco, California, USA permanent

Software EngineeringDistributed SystemsInference SystemsGPU OptimizationMemory ManagementLatency OptimizationThroughput OptimizationFault DetectionProduction InfrastructureRL Systems

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code ge...

February 28, 2026 View Details

Solutions Architect, Inference Deployments

NVIDIA

Archived US, CA, Santa Clara permanent

Solutions ArchitectureDistributed SystemsLocal AIKubernetesGPU Performance OptimizationTensorRT-LLMNVIDIA ToolsTriton Inference ServerGPU OrchestrationModel Optimization

We’re forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA’s GPU technology and Kubernetes. As a Solutions Architect focused on inference, you’ll ...

February 27, 2026 View Details

Senior Deep Learning Architect, LLM Inference

NVIDIA

Archived US, CA, Santa Clara permanent

Deep LearningInferenceGPUSoftware DevelopmentPyTorchProfilingCompiler OptimizationsLLMMicroarchitectureAgentic technologiesCommunicationCollaboration

We are now looking for a Senior Deep Learning Architect, LLM Inference! NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team specifically focuses on infere...

February 27, 2026 View Details

Inference Platform Engineer (LLM & Kubernetes)

Nix

Archived Romania (Europe) Remote permanent

PythonKubernetesLLMAPI IntegrationOperationsPerformance OptimizationMonitoringKubernetes DeploymentsHelm ChartsSecurityCompliance

N-iX is a global software development service company that helps businesses across the globe create next-generation software products. Founded in 2002, we unite 2,400+ tech-savvy professionals across ...

February 26, 2026 View Details

Inference Platform Engineer (LLM & Kubernetes)

Nix

Archived Bulgaria (Europe) Remote permanent

PythonKubernetesLLMAPI IntegrationOperationsPerformanceMonitoringTroubleshootingKubernetes DeploymentsSecurity

N-iX is a global software development service company that helps businesses across the globe create next-generation software products. Founded in 2002, we unite 2,400+ tech-savvy professionals across ...

February 26, 2026 View Details

Inference Platform Engineer (LLM & Kubernetes)

Nix

Archived Poland (Europe) Remote permanent

PythonKubernetesLLMAPI IntegrationOperationsPlatform ReliabilityMonitoringTroubleshootingKubernetes DeploymentsSecurity

N-iX is a global software development service company that helps businesses across the globe create next-generation software products. Founded in 2002, we unite 2,400+ tech-savvy professionals across ...

February 26, 2026 View Details

Senior Software Engineer, Quantized Inference

NVIDIA

Archived 2 Locations permanent

PythonC++Software EngineeringPyTorchModel CompressionCode ReviewsML AcceleratorsInference ServingNumerical Debugging

We are now looking for a Senior Software Engineer for Quantized Inference! NVIDIA is seeking software engineers to accelerate the discovery and deployment of efficient inference recipes for LLMs. A re...

February 26, 2026 View Details

Senior Deep Learning Software Engineer, Inference and Model Optimization

NVIDIA

Archived 2 Locations permanent

Deep LearningGenerative AIPyTorchCUDAModel OptimizationLLM InferenceSoftware EngineeringAutomated DeploymentResearchTRT Model Optimizer

NVIDIA is at the forefront of the generative AI revolution! The Algorithmic Model Optimization Team specifically focuses on optimizing generative AI models such as large language models (LLM) and diff...

February 25, 2026 View Details

Senior Machine Learning Engineer, Quantized Inference

NVIDIA

Archived 2 Locations permanent

PythonPyTorchquantizationsparsitymodel compressionexperiment designLLM evaluationLLM frameworkscode reviewsnumerics debugging

We are now looking for a Senior Machine Learning Engineer for Quantized Inference! NVIDIA is seeking machine learning engineers to accelerate the discovery and deployment of efficient inference recipe...

February 25, 2026 View Details

Senior AI Inference Compiler Engineer

NVIDIA

Archived 5 Locations permanent

Computer ScienceCompiler TechnologiesPythonDeep Learning ModelsPerformance AnalysisGPU ArchitectureCUDALLM Inference

NVIDIA's invention of the GPU 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited moder...

February 25, 2026 View Details

Senior Software Engineer, AI Inference Systems

NVIDIA

Archived US, CA, Santa Clara permanent

Computer SciencePythonCUDAGPU ProgrammingPerformance EngineeringLLM RetrievalvLLMSGLangDockerKubernetesML Compilers

We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-perf...

February 24, 2026 View Details

Senior Compiler Engineer, AI Inference Performance

NVIDIA

Archived 6 Locations permanent

Computer ScienceEfficient ExecutionDeep LearningGPU ArchitecturePython ProgrammingPerformance AnalysisAPI DesignParallel ComputingCollaboration

NVIDIA's invention of the GPU 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited moder...

February 24, 2026 View Details

Principal Software Engineer - AI Inference

NVIDIA

Archived 2 Locations permanent

Systems EngineeringLLM BenchmarksvLLMSGLangGPU WorkloadsCUDARustC++PythonDistributed SystemsConcurrencyProfiling

NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream infere...

February 23, 2026 View Details

Senior Compiler Engineer, AI Inference Platforms

NVIDIA

Archived 5 Locations permanent

Computer ScienceDeep LearningRate OptimizationGPU ArchitecturePerformance AnalysisPython ProgrammingCUDADeep Learning Models

NVIDIA's invention of the GPU 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited moder...

February 23, 2026 View Details

Staff Software Engineer - Inference & Performance

Runware

Archived United Kingdom Remote permanent

Software EngineeringSystems DesignPerformance EngineeringDistributed SystemsGPU OptimizationLatency OptimizationThroughput OptimizationReliability EngineeringArchitecture DesignBudgets and Forecasts

We’re looking for a Staff Engineer to take technical ownership of latency, throughput, and reliability across Runware’s AI inference platform. This is a senior technical leadership role for someone w...

January 28, 2026 View Details

Embedded Computer Vision Engineer (Edge Inference)

Rapsodo

Archived Singapore, South West, Singapore permanent

Embedded Software DevelopmentLinux SystemsC++ ProgrammingRust ProgrammingComputer VisionDeep Learning TechniquesEdge Vision SystemsQuantizationModel DeploymentLinux ArchitecturePerformance ProfilingDebugging

. Embedded Computer Vision Engineer (Edge Inference) Overview We are building computer-vision capabilities on Linux-based edge devices. This role owns the embedded software that takes models from “...

December 17, 2025 View Details

Inference Runtime, Engineering Manager

Openai

Archived San Francisco, California, United States permanent

LeadershipTeam LeadershipDistributed SystemsModel ArchitectureCo-developmentProduction EnvironmentOutcome-OrientedPerformance OptimizationGPU UtilizationCode Optimization

About the Team Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our s...

February 19, 2026 View Details

Senior Software Engineer, Inference Platform

AION

Archived Bengaluru, Karnataka, India Hybrid permanent

Variational InferenceCore HR ProcessesAI GatewayAgent OrchestratorCorrelation EnginesDistributed SystemsGolangContainerizationPerformance OptimizationCost EfficiencyScalabilityAutoscaler

About AION AION is building an interoperable AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performa...

December 17, 2025 View Details

Latest Job Openings