MisuJob - AI Job Search Platform MisuJob

Jobs

Browse 250+ jobs updated daily

Latest Job Openings

Remote, California, United States Remote permanent
Systems ArchitectureCross-Cloud ArchitectureCost OptimizationScalabilityLeadershipCompute OrchestrationVendor FlexibilityProduction InferenceResearch Training

Company Overview Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text (STT), text-to-speech (TTS), and building pro...

April 6, 2026 View Details

ML Infra Engineer - Supercomputing

Physicalintelligence

San Francisco, California, United States permanent
SchedulingPlacementCluster optimizationAccelerator PhysicsFault ToleranceEfficiencyObservabilityDeveloper ExperienceTraining Lifecycle ManagementResource Allocation

Physical Intelligence builds general-purpose AI for the physical world. Training our models requires orchestrating thousands of accelerators across a heterogeneous fleet of GPU and TPU clusters — span...

March 7, 2026 View Details

ML Infra Engineer

Physicalintelligence

San Francisco, California, United States permanent
Software EngineeringEngaging trainingJAXDistributed TrainingTPUGPUCloud PlatformsPerformance OptimizationAbstractionsCross-functional Communication

In this role you will help scale and optimize our training systems and core model code. You’ll own critical infrastructure for large-scale training, from managing GPU/TPU compute and job orchestration...

August 24, 2024 View Details
Köln permanent
PythonSoftware EngineeringDesign PatternsAlgorithmsTestingBenchmarkingValidationCompilers

Who we are: Roofline is building a deployment platform to run any model on disruptive hardware at the edge. We are looking for talented and ambitious engineers that are passionate about technology to ...

April 2, 2026 View Details
San Mateo, CA, United States (San Mateo, CA) Remote permanent
Machine LearningAISoftware EngineeringSystems DesignCloud InfrastructureAPI DevelopmentData ManagementCross-functional CollaborationSafety ProtocolsPerformance Optimization

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers an...

April 1, 2026 View Details
Not Specified, Belgium Freelance
Senior Project ManagerData CentreML InfrastructureData Centre InfrastructureLiquid CoolingProject DeliveryML Project LeadLive EnvironmentFibreElectrical UpgradesFibre Management

Lead the deployment of machine learning infrastructure across multiple locations, including liquid cooling, fibre, and electrical upgrades, in a live environment....

April 1, 2026 View Details
Mountain View, USA (Mountain View) permanent
Data EngineeringMachine LearningScalable SystemsBig DataData QualityReal-Time Event CoverageRecommendation SystemsBig Data ProcessingSoftware EngineeringProblem Solving

Please complete the attached Internal Transfer Request Form and submit. Please make sure to apply with your Coupang e-mail address. We exist to wow our customers. We know we’re doing the right thing...

March 31, 2026 View Details
San Francisco, CA, United States permanent
ML InfrastructureDistributed TrainingLLMMultimodalCNN ArchitecturesMulti-provider AI ReliabilityLatency OptimizationCost EfficiencyEvaluation InfrastructureTechnical Direction

About Decagon Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences. Our technology enables industry-defining enterprises like Avis Budge...

March 26, 2026 View Details
Vancouver, British Columbia, Canada (Vancouver) Remote permanent
Machine LearningInfrastructureDockerAWSGCPMonitoringModel DeploymentModel LifecycleGrafana

Later is the world’s most intelligent influencer marketing company, built to give brands the confidence to create unforgettable campaigns. By combining real creator relationships, trusted intelligence...

March 9, 2026 View Details
Freiburg or Berlin permanent
CloudGPUDistributed TrainingCost OptimizationSlurmML Infrastructure

Who we are Foundation models have transformed text and images, but structured data - the largest and most consequential data modality in the world - has remained untouched. Tables power every clinica...

March 22, 2026 View Details
Boston, Massachusetts, United States (Boston, Pittsburgh, Remote U.S. Only) Remote permanent
KubernetesPythonGoDistributed SystemsMachine LearningData ProcessingModel TrainingHigh-Throughput SystemsSoftware EngineeringCloud Platforms

Mission Summary: Our team builds the foundational infrastructure that empowers Machine Learning Engineers to develop the next generation of self-driving technology. We design and operate the high-per...

March 17, 2026 View Details
Pittsburgh, Pennsylvania, United States (Boston, Pittsburgh, Remote U.S. Only) Remote permanent
Software EngineeringKubernetesPythonGoCloudDistributed SystemsMachine LearningHigh-Throughput SystemsEnd-to-End DevelopmentOwnership

Mission Summary: Our team builds the foundational infrastructure that empowers Machine Learning Engineers to develop the next generation of self-driving technology. We design and operate the high-per...

March 17, 2026 View Details
Remote, Ontario Remote permanent
Software EngineeringGoPythonData StructuresAlgorithmsSoftware DesignRelational DatabasesNoSQL Databases

Thumbtack helps millions of people confidently care for their homes. Thumbtack is the one app you need to take care of and improve your home — from personalized guidance to AI tools and a best-in-cla...

March 11, 2026 View Details
Remote, California, United States Remote permanent
Site Reliability EngineeringKubernetesAWSTerraformInfrastructure-as-CodeSlurmGPU WorkloadsProprietary Platforms

Company Overview Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text (STT), text-to-speech (TTS), and building pro...

March 9, 2026 View Details
San Francisco, CA, United States permanent
Staff Software EngineerML InfrastructureDistributed TrainingMLPerf InferenceTraining MatrixIntelligent RoutingQuantizationBatch Process ManagementInfrastructure DomainSpeculative Decoding

About Decagon Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences. Our technology enables industry-defining enterprises like Avis Budge...

February 24, 2026 View Details
Portugal Remote permanent
Research Focused System EngineerSoftware EngineeringData ProcessingMachine LearningGraphsCloud Native ServicesSupervised Machine LearningFault ToleranceScientific PublicationsPatents

Feedzai is the world’s first RiskOps platform for financial risk management, and the market leader in safeguarding global commerce with today’s most advanced cloud-based risk management platform, powe...

February 19, 2026 View Details
Paris, Île-de-France, France Remote permanent
LinuxDistributed SystemsGPU ClustersContainer OrchestrationMonitoringLoggingAlertingTerraformKubernetes

About Pathway Pathway is shaking the foundations of artificial intelligence by introducing the world’s first post-transformer model that adapts and thinks just like humans. Pathway’s breakthrough ar...

December 19, 2025 View Details
Palo Alto, CA Remote permanent
Computer Science FundamentalsC++PythonGoDistributed SystemsML InfrastructureHigh AvailabilityScalable InfrastructureScalable ML PipelinesTeam Collaboration

About AppLovin AppLovin makes technologies that help businesses of every size connect to their ideal customers. The company provides end-to-end software and AI solutions for businesses to reach, mone...

February 18, 2026 View Details
Boston, MA (Boston) Remote permanent
AWSKubernetesAWS infrastructure managementMicroservicesProduction SolutionsSoftware ArchitectureEngineering Best PracticesTechnical GuidanceContainerized Deployments

About SimpliSafe SimpliSafe is a leading innovator in the home security industry, dedicated to making every home a safe home. With a mission to provide accessible and comprehensive security solutions...

February 17, 2026 View Details
Remote, California, United States Remote permanent
Site Reliability EngineeringKubernetesTerraformAWSInfrastructure-as-CodeSlurm

Company Overview Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text (STT), text-to-speech (TTS), and building pro...

February 17, 2026 View Details

Senior ML Infrastructure Engineer

Ellison Institute of Technology

Oxford, England, United Kingdom Hybrid permanent
CloudGPUDockerKubernetesTerraformHigh-performance ComputingStorage SystemsObservabilitySecurity

At the Ellison Institute of Technology (EIT), we’re on a mission to translate scientific discovery into real world impact. We bring together visionary scientists, technologists, policy makers, and ent...

December 12, 2025 View Details
Athens, Attica, Greece Hybrid permanent
Machine LearningData EngineeringBig DataSparkPythonPySparkSQLLinuxDockerJenkinsAirflow

Optasia is a fully enabled B2B2X financial technology platform covering scoring, financial decisioning, disbursement and collection. We are committed to enabling financial inclusion for all. We are ch...

September 1, 2025 View Details
Athens, Attica, Greece Hybrid permanent
Machine LearningSparkPySparkScalaJenkinsAirflowMicroservicesData AnalysisFeature Engineering

Optasia is a fully enabled B2B2X financial technology platform covering scoring, financial decisioning, disbursement and collection. We are committed to enabling financial inclusion for all. We are ch...

September 1, 2025 View Details
Mountain View, California, United States permanent
Software EngineeringSystems FundamentalsDistributed SystemsData PipelinesPlatform TrainingGPU OptimizationInference OptimizationTooling SuitePerformance AnalysisOwnership Mindset

Join Us in Building the Future of Home Robotics At Sunday, we're developing personal robots to reclaim the hours lost to repetitive tasks. We're focused on an ambitious goal to make generalized robot...

February 11, 2026 View Details
San Francisco, California, United States permanent
Backend EngineeringDistributed SystemsCloud InfrastructureGPU WorkloadsContainerizationKubernetesObservabilityMonitoringML PlatformsPython

Rockstar is recruiting for a fast-growing startup that is building the AI backbone for the next generation of intelligent products. They help fast-growing AI startups design, fine-tune, evaluate, depl...

December 22, 2025 View Details
Munich, Germany (Munich (DE-MUC-ARP)) permanent
PythonC++GoMachine LearningKubernetesCloud servicesRoboticsState-of-the-art models

Intrinsic is Alphabet’s bet aiming to reimagine the potential of industrial robotics. Our team believes that advances in AI, perception and simulation will redefine what’s possible for industrial robo...

February 6, 2026 View Details
Remote - United States Remote permanent
IT BackgroundInfra WorksGPU ArchitectureServing SystemsData PipelinesService DesignPerformance TuningOperational ExcellenceEnterprise-Grade ReliabilityCross-Team Collaboration

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote...

January 29, 2026 View Details
Palo Alto, CA permanent
Network DevelopmentML InfrastructureHigh-Speed InterconnectsDesignValidationProductizationVendor Due DiligenceOnboardingBringing UpCharacterizationRigorous TestingLPO

About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineeri...

February 5, 2026 View Details
Mountain View, USA (Mountain View) permanent
Big Data EngineeringScalable SystemsData PipelinesMachine LearningReal-Time ServingData ProcessingData QualityBig Data StorageData Driven SolutionsProblem Solving

We exist to wow our customers. We know we're doing the right thing when we hear our customers say, "How did we ever live without Coupang?" Born out of an obsession to make shopping, eating, and living...

February 2, 2026 View Details
New York, NY Remote permanent
AWSPythonKafkaFlinkAWS GluedbtAthenaDynamoDBBigQueryMachine LearningData EngineeringML Infrastructure

The mission of The New York Times is to seek the truth and help people understand the world. That means independent journalism is at the heart of all we do as a company. It’s why we have a world-renow...

January 16, 2026 View Details
San Francisco, CA Remote permanent
Software EngineeringMachine LearningInfrastructureAWSDistributed SystemsScalable SystemsPythonDockerKubernetesGit

About Us Twitch is the world’s biggest live streaming service, with global communities built around gaming, entertainment, music, sports, cooking, and more. It is where thousands of communities come ...

January 22, 2026 View Details
United States (HQ) permanent
Full-Stack DevelopmentReactFastAPIREST APIsgRPCRAGLangChainMilvusPGVectorOllamaUI/UX DesignWebGL/Three.js

At Zone 5 Technologies, we're redefining what's possible in unmanned aircraft systems. Our team of engineers and innovators is developing cutting-edge autonomous solutions that push the boundaries of ...

January 29, 2026 View Details
Santa Clara HQ permanent
LinuxKubernetesCephPythonBashInfrastructure-as-CodeGitOpsRDMAInfiniBandGPU

About The Role We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB...

January 26, 2026 View Details
San Francisco, California, United States Remote permanent
CloudKubernetesDockerPythonRaySparkS3GCSCI/CDDevOps

Who Are We Industrial labor is incredibly dangerous work - almost 3 million people in the US per year are injured in the workplace for entirely preventable and at times, fatal or debilitating causes....

August 18, 2025 View Details
Toronto permanent
Network EngineeringInfiniBandEthernetHigh-speed NetworkingRDMAGPU CommunicationNetwork SecurityFirewallsACLsVLANsHPC TopologiesInfiniBand Fabrics

About The Role We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the...

January 26, 2026 View Details

ML Infra Engineer - Platform

Physicalintelligence

San Francisco, California, United States permanent
Cloud PlatformsDistributed SystemsKubernetesCloud InfrastructureObservabilityCost ManagementDeveloper ExperienceCloud FoundationsDeveloper ToolsCollaboration

Who We Are Physical Intelligence is bringing general-purpose AI into the physical world. We are a team of engineers, scientists, roboticists, and company builders developing foundation models and lea...

January 28, 2026 View Details
San Francisco, California, United States permanent
Software EngineeringLarge-scale TrainingJAXTPUGPUDistributed TrainingPerformance OptimizationCloud PlatformsDebuggingCross-functional Communication

In this role you will help scale and optimize our training systems and core model code. You’ll own critical infrastructure for large-scale training, from managing GPU/TPU compute and job orchestration...

January 23, 2026 View Details

ML Infra Engineer (Data Systems)

Physicalintelligence

San Francisco, California, United States permanent
Software EngineeringDistributed SystemsData PipelinesPerformance OptimizationObject StorageBatch ProcessingStreaming ProcessingMetadata ManagementData MovementObservabilityCross-Functional Collaboration

As an ML Infra Engineer (Data Systems), you’ll build and operate the data infrastructure that powers large-scale robot learning. Your systems will sit directly between raw data sources and training/ev...

January 23, 2026 View Details
Bay Area, California, United States Remote permanent
Software EngineeringInfrastructureDistributed SystemsStream ProcessingScalable ArchitectureLow-Latency SystemsFault ToleranceSystems DesignPerformance TuningReliability

About LMArena LMArena is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier...

December 18, 2025 View Details
San Mateo, CA, United States (San Mateo, CA) Remote permanent
Software EngineeringAI SystemsLarge Multimodal ModelsData PipelinesData QualitySynthetic Data GenerationCross-functional CollaborationSafety AI SystemsPerformance OptimizationML Infrastructure

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers an...

January 27, 2026 View Details
United States Remote permanent
Production ML InfrastructureModel ServingOrchestrationCost OptimizationObservabilityData QualityML OperationsCross-functional CollaborationInnovation & Operational ExcellenceMentorship

About Playlab Playlab is a tech non-profit dedicated to helping educators and students become critical consumers and creators of AI. We believe that an open-source, community-driven approach is key ...

November 24, 2025 View Details
San Mateo, CA permanent
ML InfrastructureDistributed TrainingInferencePytorchPytorch LightningPytorch GeometricRayGPU Performance EngineeringMLOpsCross-Cluster Deployment

About the Team We’re a tight-knit team of proven drug hunters, deep learning researchers, and software engineers united by a common mission — drive AI innovation in biochemistry, discovering and deve...

November 24, 2025 View Details
San Francisco, California, United States Remote permanent
CloudDockerKubernetesTerraformPostgreSQLDistributed Systems

Company Overview Echo Neurotechnologies is an exciting new startup in the Brain-Computer Interface (BCI) space, driving innovation through advanced hardware engineering and AI solutions. Our mission ...

January 29, 2026 View Details
San Mateo, California, United States permanent
GPU PerformanceServing StackParallelismQuantization/PEFTSystemsObservabilityAutoscalingA/B TestingCUDATensorRT

Introducing Moonlake, AI for creating real-time interactive content Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions. Scope of W...

December 12, 2025 View Details
San Mateo, CA internship
Distributed ComputingData SystemsAPI DevelopmentSystem OptimizationSystem PerformanceTesting & DebuggingProgramming Languages (Python, Java, Go)Distributed SystemsCommunication Skills

About the Job We are looking for a few interns to join us either part-time through the year or Full-time for the summer. The ideal candidate should have an interest and some experience in Distributed ...

November 27, 2024 View Details
Remote, California, United States Remote permanent
KubernetesAWSTerraformSlurmAI/ML InfrastructureJob SchedulingInfrastructure-as-CodeScalabilitySelf-service EnvironmentGPU Orchestration

Company Overview Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text (STT), text-to-speech (TTS), and building pro...

December 23, 2025 View Details
San Francisco, CA, United States permanent
PythonPostgresFastAPISQLAlchemyPydanticMachine LearningKubernetesProduction SystemsCode QualityMachine Learning Models

Who We Are At TwelveLabs, we are pioneering the development of frontier multimodal foundation models that can see, hear and understand the world as humans do. Our models have redefined the standards ...

August 26, 2025 View Details
los angeles, california , United States Remote permanent
DevOpsMachine LearningML InfrastructureCloud PlatformsKubernetesDockerTerraformCI/CDAutomationSecurity

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliverie...

December 19, 2025 View Details
los angeles, california , United States Remote permanent
Machine LearningData EngineeringData Processing PipelinesData CurationData AnnotationSearch CapabilitiesNatural Language QueryingOrchestration and SchedulingData SchemasAnnotation Platforms

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliverie...

October 29, 2025 View Details
los angeles, california , United States Remote permanent
Full Stack DevelopmentNoSQLSQLCloud PlatformsReactCI/CDPythonUI/UXData Engineering

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliverie...

January 13, 2026 View Details