ARCHIVED
This job listing has been archived and is no longer accepting applications.
MisuJob - AI Job Search Platform MisuJob

Senior Software Engineer - Model Performance

Inference

San Francisco, California, United States permanent

Posted: January 21, 2026

Interested in this position?

Create a free account to apply with AI-powered matching

Job Description

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techniques into production systems, we'd love to meet you.

About Inference.net

Inference.net trains and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models we train match GPT-5 accuracy but are smaller, faster, and up to 90% cheaper. Our platform handles everything end-to-end: distillation, training, evaluation, and planet-scale hosting.

We are a well-funded ten-person team of engineers who work in-person in downtown San Francisco on difficult, high-impact engineering problems. Everyone on the team has been writing code for over 10 years, and has founded and run their own software companies. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do. Most of us are in the office 4 days a week in SF; hybrid works for Bay Area candidates.

About the Role

You will be responsible for making our inference stack as fast and efficient as possible. Your work spans from implementing known optimization techniques to experimenting with novel approaches, always with the goal of serving models faster and cheaper at scale.

Your north star is inference performance: latency, throughput, cost efficiency, and how quickly we can bring new model architectures into production. You'll work across the full inference stack—from CUDA kernels to serving frameworks—to find and eliminate bottlenecks. This role reports directly to the founding team. You'll have the autonomy, a large compute budget, and technical support to push the limits of what's possible in model serving.

Key Responsibilities

• Implement and productionize optimization techniques including quantization, speculative decoding, KV cache optimization, continuous batching, and LoRA serving

• Deep dive into inference frameworks (vLLM, SGLang, TensorRT-LLM) and underlying libraries to debug and improve performance

• Profile and optimize CUDA kernels and GPU utilization across our serving infrastructure

• Add support for new model architectures, ensuring they meet our performance standards before going to production

• Experiment with novel inference techniques and bring successful approaches into production

• Build tooling and benchmarks to measure and track inference performance across our fleet

• Collaborate with applied ML engineers to ensure trained models can be served efficiently

Requirements

• 2+ years of experience in ML systems, inference optimization, or GPU programming

• Strong proficiency in Python and familiarity with C++

• Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)

• Deep understanding of GPU architecture and experience profiling GPU workloads

• Familiarity with LLM optimization techniques (quantization, speculative decoding, continuous batching, KV cache management)

• Experience with PyTorch and understanding of how models execute on hardware

• Track record of measurably improving system performance

Nice-to-Have

• Experience with CUDA programming

• Familiarity with serving non-LLM models (TTS, vision, embeddings)

• Experience with distributed inference and multi-GPU serving

• Contributions to open-source inference frameworks

• Experience with Docker and Kubernetes

You don't need to tick every box. Curiosity and the ability to learn quickly matter more.

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $220,000 - $320,000, plus equity and benefits, depending on experience.

Equal Opportunity

Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

If you're excited about making AI inference faster for everyone, we'd love to hear from you. Please send your resume and GitHub to [email protected] and/or apply here on Ashby.

Why Apply Through MisuJob?

AI-Powered Job Matching: MisuJob uses advanced artificial intelligence to analyze your skills, experience, and career goals. Our matching algorithm compares your profile against thousands of job requirements to find positions where you have the highest chance of success. This saves you hours of manual job searching and ensures you only see relevant opportunities.

One-Click Applications: Once you create your profile, applying to jobs is effortless. Your resume and cover letter are automatically tailored to highlight the most relevant experience for each position. You can apply to multiple jobs in minutes, not hours.

Career Intelligence: Beyond job matching, MisuJob provides valuable career insights. See how your skills compare to market demands, identify skill gaps to address, and understand salary benchmarks for your experience level. Make data-driven decisions about your career path.

Frequently Asked Questions

How do I apply for this position?

Click the "Register to Apply" button above to create a free MisuJob account. Once registered, you can apply with one click and track your application status in your dashboard.

Is MisuJob free for job seekers?

Yes, MisuJob is completely free for job seekers. Create your profile, get matched with jobs, and apply without any cost. We help you find your dream job without any hidden fees.

How does AI matching work?

Our AI analyzes your resume, skills, and experience to understand your professional profile. It then compares this against job requirements using natural language processing to calculate a match percentage. Higher matches mean better fit for the role.

Can I apply to jobs in other countries?

Absolutely. MisuJob features jobs from companies worldwide, including remote positions. Filter by location or look for remote opportunities to find jobs that match your preferences.

Ready to Apply?

Join thousands of job seekers using MisuJob's AI to find and apply to their dream jobs automatically.

Register to Apply