Jobs

Software Engineer - Inference and Accelerators, Robot Software

Wayve

London (London, United Kingdom) Remote permanent

Software DevelopmentC++Fault ToleranceDiagnosticsObservabilityDebuggingReliability

At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief...

January 27, 2026 View Details

Site Reliability Engineer (SRE) — AI Training & Inference Infrastructure

Confidential

Not specified permanent

KubernetesGPU SystemsObservabilityIncident ResponseReliability EngineeringSLIs/SLOsMonitoringPerformance OptimizationDeveloper Experience

About STACK STACK builds software that helps teams plan, build, and operate with clarity and speed. We’re investing in an in-house AI team to train and run models that meaningfully improve our produc...

January 30, 2026 View Details

Product Manager - BioNeMo Inference

NVIDIA

US, NY, New York permanent

Product ManagementAI StrategyFull-Stack InferenceDigital BiologyNVIDIA Inference Microservices (NIMs)BlueprintsCustomer CommunicationRoadmap PrioritizationSales & Marketing CollaborationPerformance MonitoringIndustry KnowledgeCommunication Skills

NVIDIA is transforming healthcare with AI to power the next generation of innovation in Biology and Life Sciences. BioNeMo platform is rapidly growing and it is becoming the defacto platform for AI-dr...

January 30, 2026 View Details

Senior Software Engineer I, Inference

Coreweave

Sunnyvale, CA / Bellevue, WA (Bellevue, WA, Sunnyvale, CA) Remote permanent

KubernetesPythonGoKubernetes-native inference platformSLIs/SLOsMetrics-driven improvementsIncident ManagementCI/CDObservabilityInference InternalsLatency Optimization

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. T...

January 28, 2026 View Details

Inference Optimization Engineer

Bentoml

San Mateo, California, United States Remote permanent

CUDAProfilingOptimizationBenchmarkingResource EfficiencyServing FeaturesKnowledge SharingProfiling ToolsTrusted Voice

About BentoML BentoML is a leading inference platform provider that helps AI teams run large language models and other generative AI workloads at scale. With support from investors such as DCM, enter...

July 15, 2025 View Details

Applied Scientist - AI Inference (Agentic AI startup)

Ninjatech.ai

Sydney, NSW, Australia permanent

Applied ScienceAI InferenceDistributed SystemsModel ServingInference OptimizationQuantizationHardware AccelerationAlgorithm OptimizationBenchmarkingLLM Deployment

We invite you to join NinjaTech AI as an Applied Scientist specialized in AI inference and distributed systems to help optimize and scale our AI models for production environments. You will work at t...

September 14, 2025 View Details

Fullstack Engineer - Frontend Focus

Inference

San Francisco, California, United States permanent

5+ years building production React applicationsAuthN/AuthZ design (OIDC, JWT)Experience with Tanstack / Next.jsData-viz libraries (Recharts, Visx, D3)tRPC experienceFamiliarity with GPU or ML tooling dashboardsDev-ops chops: CI/CD, Docker, Terraform

Inference.net is hiring a Senior Full-Stack (Frontend-Focused) Engineer Help us build beautiful, performant web experiences that give users super-powers over our globally distributed LLM inference pl...

July 23, 2025 View Details

Machine Learning Researcher

Inference

San Francisco, California, United States permanent

Machine LearningResearchModel ArchitecturesInference Time ScalingLearning MethodsPost-Training TechniquesDistillation PipelineModel TrainingExperimentationBenchmarks

Help us push the boundaries of what's possible in LLM post-training. If you love training models, exploring new architectures, running experiments, and turning research insights into products that shi...

January 5, 2026 View Details

Senior Software Engineer - Model Performance

Inference

San Francisco, California, United States permanent

ML SystemsInference OptimizationGPU ProgrammingPythonC++LLM Inference FrameworksCUDA KernelsGPU ArchitectureLLM Optimization Techniques

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techniques into production systems, we'...

January 21, 2026 View Details

Applied Machine Learning Engineer

Inference

San Francisco, California, United States permanent

Applied Machine LearningModel TrainingData ProcessingData PipelinesData VisualizationModel EvaluationModel OptimizationResearch ApplicationProduction EngineeringCollaboration

Help us build the systems that train specialized AI models for the fastest-growing companies in the world. If you love taking cutting-edge ML techniques and turning them into products that ship, we'd ...

January 5, 2026 View Details

Filmmaker / Storyteller

Inference

San Francisco, California, United States permanent

Content CreationStorytellingTechnical ProductionNarrative DevelopmentPlatform StrategyCreative ExperimentationBrand StorytellingTeam Building

Filmmaker / Storyteller Inference.net is seeking a Filmmaker / Storyteller to join our team and help define the narrative of building the world's largest distributed GPU cluster. This role combines c...

May 29, 2025 View Details

Staff Data Scientist, Platform (Inference/Payments)

Airbnb

United States Remote permanent

Data ScienceCausal InferenceStatistical AnalysisModel DevelopmentExperiment DesignPredictive ModelingML/AIOptimizationBusiness StrategyCommunication

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every co...

January 29, 2026 View Details

Senior Data Scientist - Inference, Global Markets

Airbnb

China Remote permanent

Causal InferenceExperimentationSQLPythonData AnalysisStatistical ModelingCross-functional CollaborationCommunication SkillsBusiness Problem Solving

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every co...

January 22, 2026 View Details

AI Systems & Inference Frameworks Engineer

Adaption

San Francisco, California, United States Remote permanent

Inference & OptimizationInference SystemsInference FrameworksInference OptimizationHardware Software DesignPerformance OptimizationGPU SystemsLatency OptimizationResource EfficiencyTransformer Inference

About us Most AI is frozen in place - it doesn't adapt to the world. We think that's backwards. Our mandate is to build efficient intelligence that evolves in real-time. Our vision is AI systems that...

January 13, 2026 View Details

Software Engineering – Inference Engineer

Virtue AI

San Francisco, California, United States permanent

Inference APIsLoad BalancingRouting LogicSGLangvLLMGPU BehaviorMemory LimitsDockerPrometheus MetricsStructured LoggingAutoscalingGPU Scheduling

Location: San Francisco, CA (Onsite | Remote) About Virtue AI Virtue AI sets the standard for advanced AI security platforms. Built on decades of foundational and award-winning research in AI securi...

January 13, 2026 View Details

Research Scientist/Engineer - Post-training, Inference, & Safety and Security

Virtue AI

San Francisco, California, United States permanent

Machine LearningProgrammingLLMInferenceLLM Red-teamingLLM GuardrailsModel EvaluationModel OptimizationInference & OptimizationLLM AgentsDockerKubernetes

About Virtue AI Virtue AI sets the standard for advanced AI security platforms. Built on decades of foundational and award-winning research in AI security, its AI-native architecture unifies automate...

September 9, 2025 View Details

Machine Learning Engineer — Inference Optimization

Featherlessai

Remote (world) permanent

Machine Learning OptimizationInference PerformanceGPU/CPU ProfilingQuantizationKV-cache OptimizationSpeculative DecodingModel PruningInference Serving SystemsBenchmarkingReliability

About the Role We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cut...

January 22, 2026 View Details

AI Researcher — Inference Optimization

Featherlessai

Remote (world) permanent

Machine LearningDeep LearningInference OptimizationPythonPyTorchTritonTensorRTvLLMONNX RuntimeHardware-Aware Optimization

Role Overview We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models...

January 23, 2026 View Details

Senior / Principal Inference Engineer - ML Platform

Roblox

San Mateo, CA, United States (San Mateo, CA) Remote permanent

System DesignDistributed SystemsPerformance OptimizationDebuggingML Model InferenceTriton Inference ServerTensorRTKServeCross-functional CollaborationReliability

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers an...

January 27, 2026 View Details

Lead ML Inference Engineer, Advertising

Roku

Austin, Texas Remote permanent

Machine LearningInferenceSystem ArchitecturePerformance OptimizationHigh-Performance ComputingDistributed SystemsMonitoring & ObservabilityML FrameworksHardware AccelerationTeam Leadership

Teamwork makes the stream work. Roku is changing how the world watches TV Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television ...

January 28, 2026 View Details

Member of Technical Staff - Edge Inference Engineer

Liquid Ai

United States Remote permanent

Systems ProgrammingC++Embedded SystemsML FundamentalsHardware ArchitectureQuantizationOptimizationInference KernelsOpen Source ContributionProfiling

About Liquid AI Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, m...

January 25, 2026 View Details

Member of technical staff (Inference)

Hcompany

Paris, France permanent

Strong communication and presentation skillsEager to explore new challenges

About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. ...

November 13, 2025 View Details

Machine Learning Intern - Dynamic KV-Cache Modeling for Efficient LLM Inference

D Matrix

Santa Clara, Ca, Ca, United States Remote internship

PyTorchDeep LearningModel OptimizationMemory ManagementCUDA ProgrammingHardware-accelerated ComputationLLM InferenceKV-Cachetorch.compileAnalytical Problem Solving

At d-Matrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries ...

January 28, 2026 View Details

LLM Inference Engineer

Periodic Labs

Menlo Park, California, USA permanent

OptimizationPerformanceTensorRT-LLMvLLMDistributed InferenceGPU UtilizationLatencyReinforcement Learning

About Periodic Labs We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries. We are well funded and growing rapidly. Team members are owners who ide...

September 24, 2025 View Details

Backend / ML-Ops Engineer — Speech Model Deployment & Inference Optimization

Outcomesai

Bengaluru permanent

TritonTensorRTDockerKubernetesGPUCI/CDAutoscalingObservabilityModel DeploymentGPU Optimization

OutcomesAI is a healthcare technology company building an AI-enabled nursing platform designed to augment clinical teams, automate routine workflows, and safely scale nursing capacity. Our solution co...

November 7, 2025 View Details

Inference Engineer

Cartesia

San Francisco, California, United States permanent

Strong engineering skillsTechnical leadershipExperience implementing state-of-the-art ML modelsPreferable experience working in CUDA, Triton

About Cartesia Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason ...

December 12, 2024 View Details

Software Engineer (Inference Engine)

Furiosa Ai

Seoul, Seoul, South Korea permanent

Computer ScienceC++RustDeep LearningLLMGPUPerformance OptimizationProblem SolvingCommunicationCollaboration

About the job Software Engineer (Inference Engine)는 FuriosaAI NPU에서 구동되는 대규모 언어모델 및 멀티모달 모델을 위한 고성능 추론 엔진을 개발하고 최적화합니다. 최신 추론 최적화 기술을 선도적으로 연구조사 하여 엔진에 적용하며, 컴파일러팀, 하드웨어팀과 긴밀한 협업을 통해 엔진의 성능을 고도화하는 역할...

October 20, 2025 View Details

Applied AI Inference Engineer

Baseten

San Francisco, California, United States permanent

PythonSoftware DevelopmentProduct ManagementTechnical Customer SuccessPre-sales Solution EngineeringDockerProduction Deployment

ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, ...

November 4, 2025 View Details

Sr. Software Engineer, ML Edge Inference

Serverobotics

San Francisco, California , United States permanent

SoftwareEngineeringMachine LearningEdge InferenceCUDAJetsonOptimizationQuantizationPruningBenchmarkingCollaborationQA

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliverie...

October 23, 2025 View Details

Architecture Intern - Inference

Etched

San Jose, CA, United States internship

Computer ScienceC++RustLinux internalsAccelerator ArchitecturesCompilersHigh-speed InterconnectsPyTorchJAXTransformer Model ArchitecturesInference Serving StacksLow-latency Applications

Architecture Intern - Inference Location: San Jose, CA Team: Architecture About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x h...

December 8, 2025 View Details

Head of Inference Kernels

Etched

San Jose, CA, United States permanent

Inference PerformanceInference KernelsModel MappingHardware-Software Co-designTeam LeadershipAlgorithmic InnovationScalable Team ManagementCross-Functional AlignmentState-of-the-Art Model OptimizationProduction Ready Implementations

About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With...

November 3, 2025 View Details

Inference Software Engineer

Etched

San Jose, California, United States permanent

C++RustPerformance OptimizationDistributed SystemsPyTorchTransformer ArchitecturesSIMD OptimizationsDebugging ToolsLinux InternalsHigh-Speed Interconnects

About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With...

June 17, 2025 View Details

Research Scientist, Latent State Inference for World Models

Tri

Los Altos, CA Hybrid permanent

ResearchLatent State InferenceSensor Data ProcessingWorld ModelsPolicy EvaluationPerception SystemsReinforcement LearningMultimodal Data FusionTemporal ReasoningInterpretability

At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative sh...

August 4, 2025 View Details

Member of Technical Staff, Training and Inference

Bosonai

Toronto permanent

CUDATritonPyTorchdistributed optimizationdeep learning architecturesGPU performance optimizationfloating point formatssparsitysystems level optimizationdistributed training

Boson AI is an early-stage startup building large audio models for everyone to enjoy and use. Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimization, NLP, and Statistics scientists...

September 22, 2025 View Details

Member of Technical Staff, Training and Inference

Bosonai

Santa Clara HQ permanent

CUDATritonPyTorchdistributed optimizationdeep learning architectureskernel implementationperformance optimizationdistributed training

Boson AI is an early-stage startup building large audio models for everyone to enjoy and use. Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimization, NLP, and Statistics scientists...

September 17, 2025 View Details

Software Engineer, ML Inference, Simulation Infrastructure

Waymo

Mountain View, CA, USA; San Francisco, CA, USA (Mountain View (US-MTV-EMF680), San Francisco (US-SFO-MKT555)) Remote permanent

C++GolangDistributed SystemsML InferenceSimulation InfrastructureModel DeploymentPerformance OptimizationFault ToleranceLarge-scale SystemsSoftware Engineering

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building ...

January 14, 2026 View Details

Software Engineer, Bulk/Interactive Inference

Waymo

Mountain View, CA, USA (Mountain View (US-MTV-EMF680)) Remote permanent

Software EngineeringC++Distributed SystemsInference PlatformModel HostingData PipelinesScalabilityHigh ThroughputLow LatencyML Operations

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building ...

January 14, 2026 View Details

Staff Product Manager, Managed Inference (SF/Sunnyvale/New York)

Crusoe

San Francisco, California, USA permanent

Technical Product ManagementCloud InfrastructureMachine LearningInference ServicesCloud PlatformsProduct RoadmapTechnical RequirementsCommunication SkillsCloud ComputingAI/ML Cloud Solutions

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed...

December 24, 2025 View Details

Sales Manager -AI Inference & Enterprise Cloud

Paytm

Bangalore, Karnataka permanent

Enterprise SalesCloudAIInferenceGPU InfrastructureLLMsData PlatformsCustomer Relationship ManagementContract NegotiationValue/Benefit Communication

About us: Paytm is India's payment Super App offering consumers and merchants most comprehensive payment services. Pioneer of mobile QR payments revolution in India, today, Paytm is India’s largest pa...

December 19, 2025 View Details

Senior AI Inference Engineer (llama.cpp specialist) - 100% Remote

Tether Operations Limited

Location not specified Remote

C++AI InferenceEdge DevicesOptimizationRuntimeStabilityCollaborationResearchEnglish Communication

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 22, 2026 View Details

Lead AI Inference Engineer

Tether Operations Limited

Location not specified Remote

AI SystemsMachine LearningEdge ComputingC++JavaScriptCollaborationTeam LeadershipProduction SystemsEdge AI

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 22, 2026 View Details

Senior Deep Learning Software Engineer, Inference and Model Optimization

NVIDIA

2 Locations permanent

Deep LearningPyTorchHuggingFaceCUDATRTTRT-LLMTritonAutomated DeploymentModel OptimizationInference EfficiencySoftware ArchitectureSoftware Engineering

NVIDIA is at the forefront of the generative AI revolution! The Algorithmic Model Optimization Team specifically focuses on optimizing generative AI models such as large language models (LLM) and diff...

January 21, 2026 View Details

Software Engineer, ML (Training and Inference)

Isomorphiclabs

London (London ) Remote permanent

Machine LearningPythonJAXPyTorchTensorFlowDistributed ComputingModel OptimizationReproducibilityScalabilityResearch Environment

Isomorphic Labs is applying frontier AI to help unlock deeper scientific insights, faster breakthroughs, and life-changing medicines with an ambition to solve all disease. The future is coming. A fut...

January 21, 2026 View Details

Product Manager MBA Intern, AI Platform Inference - Summer 2026

NVIDIA

US, CA, Santa Clara permanent

Product ManagementAI InferenceGenAIMachine LearningSoftware DevelopmentPerformance OptimizationDeveloper ProductsProduct StrategyGo-To-MarketCommunication

Our work at NVIDIA is dedicated towards a computing model focused on visual and AI computing. For two decades, NVIDIA has pioneered visual computing, the art and science of computer graphics, with our...

January 21, 2026 View Details

Inference Engineering Manager

Perplexity

San Francisco, California, United States permanent

PythonPyTorchRustC++KubernetesAPIsInference InfrastructureModel InferenceReliabilityObservabilityIncident ResponseBatching

ABOUT THE ROLE We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products ...

January 18, 2026 View Details

Lead AI Inference Engineer

Confidential

Remote job permanent

AI InferenceMachine LearningEdge DevicesC++JavaScriptCross-functional CollaborationProduction-ready Systems

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details

Lead AI Inference Engineer

Confidential

Remote job permanent

AIInferenceMachine LearningEdge DevicesLlama.cppGGMLONNXCollaborationResearchProduct Development

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details

AI Software Engineer, LLM Inference Performance Analysis - New College Grad 2026

NVIDIA

4 Locations permanent

Computer ScienceC++PythonDeep LearningLLM InferenceCompiler OptimizationKernel-Level OptimizationPerformance AnalysisCUDA Programming

NVIDIA is at the forefront of the generative AI revolution. We are looking for a Software Engineer, Performance Analysis, and Optimization for LLM Inference, to join our performance engineering team. ...

January 14, 2026 View Details

Lead AI Inference Engineer

Confidential

Remote job permanent

AI InferenceMachine LearningEdge ComputingC++JavaScriptCollaborationProject Management

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details

Lead AI Inference Engineer

Confidential

Remote job permanent

Machine LearningEdge DevicesC++JavaScriptLlama.cppggmlONNXCollaborationResearchProduction

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details

Latest Job Openings