MisuJob - AI Job Search Platform MisuJob

Jobs

Browse 250+ jobs updated daily

Latest Job Openings

London (London, United Kingdom) Remote permanent
Software DevelopmentC++Fault ToleranceDiagnosticsObservabilityDebuggingReliability

At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief...

January 27, 2026 View Details
Not specified permanent
KubernetesGPU SystemsObservabilityIncident ResponseReliability EngineeringSLIs/SLOsMonitoringPerformance OptimizationDeveloper Experience

About STACK STACK builds software that helps teams plan, build, and operate with clarity and speed. We’re investing in an in-house AI team to train and run models that meaningfully improve our produc...

January 30, 2026 View Details
US, NY, New York permanent
Product ManagementAI StrategyFull-Stack InferenceDigital BiologyNVIDIA Inference Microservices (NIMs)BlueprintsCustomer CommunicationRoadmap PrioritizationSales & Marketing CollaborationPerformance MonitoringIndustry KnowledgeCommunication Skills

NVIDIA is transforming healthcare with AI to power the next generation of innovation in Biology and Life Sciences. BioNeMo platform is rapidly growing and it is becoming the defacto platform for AI-dr...

January 30, 2026 View Details
Sunnyvale, CA / Bellevue, WA (Bellevue, WA, Sunnyvale, CA) Remote permanent
KubernetesPythonGoKubernetes-native inference platformSLIs/SLOsMetrics-driven improvementsIncident ManagementCI/CDObservabilityInference InternalsLatency Optimization

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. T...

January 28, 2026 View Details
San Mateo, California, United States Remote permanent
CUDAProfilingOptimizationBenchmarkingResource EfficiencyServing FeaturesKnowledge SharingProfiling ToolsTrusted Voice

About BentoML BentoML is a leading inference platform provider that helps AI teams run large language models and other generative AI workloads at scale. With support from investors such as DCM, enter...

July 15, 2025 View Details
Sydney, NSW, Australia permanent
Applied ScienceAI InferenceDistributed SystemsModel ServingInference OptimizationQuantizationHardware AccelerationAlgorithm OptimizationBenchmarkingLLM Deployment

We invite you to join NinjaTech AI as an Applied Scientist specialized in AI inference and distributed systems to help optimize and scale our AI models for production environments. You will work at t...

September 14, 2025 View Details
San Francisco, California, United States permanent
5+ years building production React applicationsAuthN/AuthZ design (OIDC, JWT)Experience with Tanstack / Next.jsData-viz libraries (Recharts, Visx, D3)tRPC experienceFamiliarity with GPU or ML tooling dashboardsDev-ops chops: CI/CD, Docker, Terraform

Inference.net is hiring a Senior Full-Stack (Frontend-Focused) Engineer Help us build beautiful, performant web experiences that give users super-powers over our globally distributed LLM inference pl...

July 23, 2025 View Details
San Francisco, California, United States permanent
Machine LearningResearchModel ArchitecturesInference Time ScalingLearning MethodsPost-Training TechniquesDistillation PipelineModel TrainingExperimentationBenchmarks

Help us push the boundaries of what's possible in LLM post-training. If you love training models, exploring new architectures, running experiments, and turning research insights into products that shi...

January 5, 2026 View Details
San Francisco, California, United States permanent
ML SystemsInference OptimizationGPU ProgrammingPythonC++LLM Inference FrameworksCUDA KernelsGPU ArchitectureLLM Optimization Techniques

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techniques into production systems, we'...

January 21, 2026 View Details
San Francisco, California, United States permanent
Applied Machine LearningModel TrainingData ProcessingData PipelinesData VisualizationModel EvaluationModel OptimizationResearch ApplicationProduction EngineeringCollaboration

Help us build the systems that train specialized AI models for the fastest-growing companies in the world. If you love taking cutting-edge ML techniques and turning them into products that ship, we'd ...

January 5, 2026 View Details
San Francisco, California, United States permanent
Content CreationStorytellingTechnical ProductionNarrative DevelopmentPlatform StrategyCreative ExperimentationBrand StorytellingTeam Building

Filmmaker / Storyteller Inference.net is seeking a Filmmaker / Storyteller to join our team and help define the narrative of building the world's largest distributed GPU cluster. This role combines c...

May 29, 2025 View Details
United States Remote permanent
Data ScienceCausal InferenceStatistical AnalysisModel DevelopmentExperiment DesignPredictive ModelingML/AIOptimizationBusiness StrategyCommunication

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every co...

January 29, 2026 View Details
China Remote permanent
Causal InferenceExperimentationSQLPythonData AnalysisStatistical ModelingCross-functional CollaborationCommunication SkillsBusiness Problem Solving

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every co...

January 22, 2026 View Details
San Francisco, California, United States Remote permanent
Inference & OptimizationInference SystemsInference FrameworksInference OptimizationHardware Software DesignPerformance OptimizationGPU SystemsLatency OptimizationResource EfficiencyTransformer Inference

About us Most AI is frozen in place - it doesn't adapt to the world. We think that's backwards. Our mandate is to build efficient intelligence that evolves in real-time. Our vision is AI systems that...

January 13, 2026 View Details
San Francisco, California, United States permanent
Inference APIsLoad BalancingRouting LogicSGLangvLLMGPU BehaviorMemory LimitsDockerPrometheus MetricsStructured LoggingAutoscalingGPU Scheduling

Location: San Francisco, CA (Onsite | Remote) About Virtue AI Virtue AI sets the standard for advanced AI security platforms. Built on decades of foundational and award-winning research in AI securi...

January 13, 2026 View Details
San Francisco, California, United States permanent
Machine LearningProgrammingLLMInferenceLLM Red-teamingLLM GuardrailsModel EvaluationModel OptimizationInference & OptimizationLLM AgentsDockerKubernetes

About Virtue AI Virtue AI sets the standard for advanced AI security platforms. Built on decades of foundational and award-winning research in AI security, its AI-native architecture unifies automate...

September 9, 2025 View Details
Remote (world) permanent
Machine Learning OptimizationInference PerformanceGPU/CPU ProfilingQuantizationKV-cache OptimizationSpeculative DecodingModel PruningInference Serving SystemsBenchmarkingReliability

About the Role We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cut...

January 22, 2026 View Details
Remote (world) permanent
Machine LearningDeep LearningInference OptimizationPythonPyTorchTritonTensorRTvLLMONNX RuntimeHardware-Aware Optimization

Role Overview We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models...

January 23, 2026 View Details
San Mateo, CA, United States (San Mateo, CA) Remote permanent
System DesignDistributed SystemsPerformance OptimizationDebuggingML Model InferenceTriton Inference ServerTensorRTKServeCross-functional CollaborationReliability

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers an...

January 27, 2026 View Details
Austin, Texas Remote permanent
Machine LearningInferenceSystem ArchitecturePerformance OptimizationHigh-Performance ComputingDistributed SystemsMonitoring & ObservabilityML FrameworksHardware AccelerationTeam Leadership

Teamwork makes the stream work. Roku is changing how the world watches TV Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television ...

January 28, 2026 View Details
United States Remote permanent
Systems ProgrammingC++Embedded SystemsML FundamentalsHardware ArchitectureQuantizationOptimizationInference KernelsOpen Source ContributionProfiling

About Liquid AI Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, m...

January 25, 2026 View Details
Paris, France permanent
Strong communication and presentation skillsEager to explore new challenges

About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. ...

November 13, 2025 View Details
Santa Clara, Ca, Ca, United States Remote internship
PyTorchDeep LearningModel OptimizationMemory ManagementCUDA ProgrammingHardware-accelerated ComputationLLM InferenceKV-Cachetorch.compileAnalytical Problem Solving

At d-Matrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries ...

January 28, 2026 View Details

LLM Inference Engineer

Periodic Labs

Menlo Park, California, USA permanent
OptimizationPerformanceTensorRT-LLMvLLMDistributed InferenceGPU UtilizationLatencyReinforcement Learning

About Periodic Labs We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries. We are well funded and growing rapidly. Team members are owners who ide...

September 24, 2025 View Details
San Francisco, California, United States permanent
Strong engineering skillsTechnical leadershipExperience implementing state-of-the-art ML modelsPreferable experience working in CUDA, Triton

About Cartesia Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason ...

December 12, 2024 View Details
Seoul, Seoul, South Korea permanent
Computer ScienceC++RustDeep LearningLLMGPUPerformance OptimizationProblem SolvingCommunicationCollaboration

About the job Software Engineer (Inference Engine)는 FuriosaAI NPU에서 구동되는 대규모 언어모델 및 멀티모달 모델을 위한 고성능 추론 엔진을 개발하고 최적화합니다. 최신 추론 최적화 기술을 선도적으로 연구조사 하여 엔진에 적용하며, 컴파일러팀, 하드웨어팀과 긴밀한 협업을 통해 엔진의 성능을 고도화하는 역할...

October 20, 2025 View Details
San Francisco, California, United States permanent
PythonSoftware DevelopmentProduct ManagementTechnical Customer SuccessPre-sales Solution EngineeringDockerProduction Deployment

ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, ...

November 4, 2025 View Details
San Francisco, California , United States permanent
SoftwareEngineeringMachine LearningEdge InferenceCUDAJetsonOptimizationQuantizationPruningBenchmarkingCollaborationQA

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliverie...

October 23, 2025 View Details
San Jose, CA, United States internship
Computer ScienceC++RustLinux internalsAccelerator ArchitecturesCompilersHigh-speed InterconnectsPyTorchJAXTransformer Model ArchitecturesInference Serving StacksLow-latency Applications

Architecture Intern - Inference Location: San Jose, CA Team: Architecture About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x h...

December 8, 2025 View Details
San Jose, CA, United States permanent
Inference PerformanceInference KernelsModel MappingHardware-Software Co-designTeam LeadershipAlgorithmic InnovationScalable Team ManagementCross-Functional AlignmentState-of-the-Art Model OptimizationProduction Ready Implementations

About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With...

November 3, 2025 View Details
San Jose, California, United States permanent
C++RustPerformance OptimizationDistributed SystemsPyTorchTransformer ArchitecturesSIMD OptimizationsDebugging ToolsLinux InternalsHigh-Speed Interconnects

About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With...

June 17, 2025 View Details
Los Altos, CA Hybrid permanent
ResearchLatent State InferenceSensor Data ProcessingWorld ModelsPolicy EvaluationPerception SystemsReinforcement LearningMultimodal Data FusionTemporal ReasoningInterpretability

At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative sh...

August 4, 2025 View Details
Toronto permanent
CUDATritonPyTorchdistributed optimizationdeep learning architecturesGPU performance optimizationfloating point formatssparsitysystems level optimizationdistributed training

Boson AI is an early-stage startup building large audio models for everyone to enjoy and use. Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimization, NLP, and Statistics scientists...

September 22, 2025 View Details
Santa Clara HQ permanent
CUDATritonPyTorchdistributed optimizationdeep learning architectureskernel implementationperformance optimizationdistributed training

Boson AI is an early-stage startup building large audio models for everyone to enjoy and use. Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimization, NLP, and Statistics scientists...

September 17, 2025 View Details
Mountain View, CA, USA; San Francisco, CA, USA (Mountain View (US-MTV-EMF680), San Francisco (US-SFO-MKT555)) Remote permanent
C++GolangDistributed SystemsML InferenceSimulation InfrastructureModel DeploymentPerformance OptimizationFault ToleranceLarge-scale SystemsSoftware Engineering

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building ...

January 14, 2026 View Details
Mountain View, CA, USA (Mountain View (US-MTV-EMF680)) Remote permanent
Software EngineeringC++Distributed SystemsInference PlatformModel HostingData PipelinesScalabilityHigh ThroughputLow LatencyML Operations

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building ...

January 14, 2026 View Details
San Francisco, California, USA permanent
Technical Product ManagementCloud InfrastructureMachine LearningInference ServicesCloud PlatformsProduct RoadmapTechnical RequirementsCommunication SkillsCloud ComputingAI/ML Cloud Solutions

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed...

December 24, 2025 View Details
Bangalore, Karnataka permanent
Enterprise SalesCloudAIInferenceGPU InfrastructureLLMsData PlatformsCustomer Relationship ManagementContract NegotiationValue/Benefit Communication

About us: Paytm is India's payment Super App offering consumers and merchants most comprehensive payment services. Pioneer of mobile QR payments revolution in India, today, Paytm is India’s largest pa...

December 19, 2025 View Details
Location not specified Remote
C++AI InferenceEdge DevicesOptimizationRuntimeStabilityCollaborationResearchEnglish Communication

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 22, 2026 View Details

Lead AI Inference Engineer

Tether Operations Limited

Location not specified Remote
AI SystemsMachine LearningEdge ComputingC++JavaScriptCollaborationTeam LeadershipProduction SystemsEdge AI

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 22, 2026 View Details
2 Locations permanent
Deep LearningPyTorchHuggingFaceCUDATRTTRT-LLMTritonAutomated DeploymentModel OptimizationInference EfficiencySoftware ArchitectureSoftware Engineering

NVIDIA is at the forefront of the generative AI revolution! The Algorithmic Model Optimization Team specifically focuses on optimizing generative AI models such as large language models (LLM) and diff...

January 21, 2026 View Details
London (London ) Remote permanent
Machine LearningPythonJAXPyTorchTensorFlowDistributed ComputingModel OptimizationReproducibilityScalabilityResearch Environment

Isomorphic Labs is applying frontier AI to help unlock deeper scientific insights, faster breakthroughs, and life-changing medicines with an ambition to solve all disease. The future is coming. A fut...

January 21, 2026 View Details
US, CA, Santa Clara permanent
Product ManagementAI InferenceGenAIMachine LearningSoftware DevelopmentPerformance OptimizationDeveloper ProductsProduct StrategyGo-To-MarketCommunication

Our work at NVIDIA is dedicated towards a computing model focused on visual and AI computing. For two decades, NVIDIA has pioneered visual computing, the art and science of computer graphics, with our...

January 21, 2026 View Details
San Francisco, California, United States permanent
PythonPyTorchRustC++KubernetesAPIsInference InfrastructureModel InferenceReliabilityObservabilityIncident ResponseBatching

ABOUT THE ROLE We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products ...

January 18, 2026 View Details
Remote job permanent
AI InferenceMachine LearningEdge DevicesC++JavaScriptCross-functional CollaborationProduction-ready Systems

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details
Remote job permanent
AIInferenceMachine LearningEdge DevicesLlama.cppGGMLONNXCollaborationResearchProduct Development

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details
4 Locations permanent
Computer ScienceC++PythonDeep LearningLLM InferenceCompiler OptimizationKernel-Level OptimizationPerformance AnalysisCUDA Programming

NVIDIA is at the forefront of the generative AI revolution. We are looking for a Software Engineer, Performance Analysis, and Optimization for LLM Inference, to join our performance engineering team. ...

January 14, 2026 View Details
Remote job permanent
AI InferenceMachine LearningEdge ComputingC++JavaScriptCollaborationProject Management

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details
Remote job permanent
Machine LearningEdge DevicesC++JavaScriptLlama.cppggmlONNXCollaborationResearchProduction

Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exc...

January 30, 2026 View Details