ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

Software Engineer, ML & Data Infra

Xai

Palo Alto, CA Remote permanent

Posted: March 6, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Quick Summary

Software Engineer, ML & Data Infra, USA (remote)

Job Description

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

The ML and Data Infrastructure team is responsible for building the foundational infrastructure that powers frontier AI models and truth-seeking agents—from petabyte-scale data acquisition and multimodal crawling, to web-scale search/retrieval systems, reliable high-throughput inference serving, low-level GPU/kernel optimizations, compiler/runtime innovations, and high-speed interconnect fabrics for massive clusters. In this role, you will collaborate across pre-training, multimodal, reasoning, and product teams in a fast-paced, meritocratic environment where you will tackle ambiguous, high-stakes problems with first-principles thinking and rigorous execution.

Responsibilities

• Design, build, and operate petabyte-to-exabyte scale distributed systems for data acquisition, web crawling, preprocessing, filtering/classification, and multimodal pipelines (CPU/GPU workloads).

• Architect high-performance search/retrieval engines (vector/hybrid/semantic) at trillion-document scale, integrating with LLMs/agents for truth-seeking, low-hallucination reasoning, and real-time knowledge access.

• Develop reliable inference serving infrastructure: load balancing, autoscaling, KV cache, batching, fault-tolerance, monitoring (Prometheus/Grafana), CI/CD (Buildkite/ArgoCD), and benchmarking for 100% uptime and optimal tail latency.

• Optimize low-level performance: CUDA kernels (GeMM, attention), Triton/CUTLASS extensions, quantization/distillation/speculative decoding, GPU memory hierarchy, and model-hardware co-design for next-gen architectures.

• Innovate on compilers/runtimes (JAX/XLA/MLIR, custom features for Hopper/Blackwell), distributed profiling/debugging tools, and interconnect fabrics (copper/optical, 1.6T+, SerDes/photonics, topology simulation, vendor roadmaps).

• Manage complex workloads across clouds/clusters: orchestration (Kubernetes), data bookkeeping/verifiability, high-speed interconnect validation, failure analysis, and telemetry/automation for production reliability.

Required Qualifications

• Strong systems engineering skills with proven impact on large-scale distributed infrastructure (data processing, search, inference, or cluster networking).

• Proficiency in Python and at least one compiled language (Rust, C++, Go, Java); experience building bespoke libraries, optimizing performance, and debugging complex systems.

• Hands-on experience with at least one key area: petabyte-scale data pipelines/crawling (Spark/Ray/Kubernetes), web-scale search/retrieval (vector DBs, ranking, RAG), inference optimization (SGLang, kernels, batching), compiler features (JAX/XLA), or high-speed interconnects (optical/copper, SerDes, signal integrity).job

• Deep understanding of distributed systems challenges: high-throughput ops/sec, latency/throughput tradeoffs, fault-tolerance, monitoring, and scaling to production billions-of-users or 100k+ GPUs.

• Passion for AI infrastructure: keeping up with SOTA techniques, first-principles problem-solving, meticulous organization/bookkeeping, and delivering rigorous, high-quality results.

Preferred Qualifications

• Experience with multimodal data (images/video/audio), epistemics/truth-seeking in retrieval, or agentic systems (long-horizon reasoning, feedback loops).

• Low-level optimizations: CUDA kernel development (Tensor cores, attention), GPU profiling (Nsight), low-precision numerics, or interconnect pathfinding (LPO/LRO/CPO, photonics).

• Production expertise in inference reliability (0% error target), CI/CD for ML, or cluster networking (topology, vendor collaboration, failure root-cause).

• Track record owning end-to-end projects in hyperscale environments, with strong debugging, vendor management, or open-source contributions (e.g., SGLang).

Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply