Software Engineer, ML Inference
GW478
Posted: 08/05/2026
- $250,000-$320,000
- San Francisco, CA
- Permanent
About the job
Software Engineer, ML Inference
San Francisco (On-Site)
$250,000–$320,000 base + equity
Why this role
Early-stage infrastructure company building a next-generation AI cloud — rethinking how frontier models run across heterogeneous compute environments.
This team is focused on the hardest part of the stack: making large-scale model inference fast, reliable, and production-ready.
You’ll own the systems that actually execute models in production — working across runtime, serving infrastructure, memory management, and hardware optimisation.
What you’ll do
- Build and scale end-to-end inference systems from request → runtime → response
- Optimise latency, throughput, concurrency, and reliability under real production workloads
- Design batching, scheduling, and queuing systems for high-performance serving
- Improve KV cache management and memory efficiency at scale
- Debug performance bottlenecks across model, runtime, and hardware layers
- Work closely with systems, infrastructure, and ML teams to push inference performance forward
What makes this interesting
- Deep work on LLM inference internals including prefill, decode, and attention optimisation
- Solving real-world trade-offs between tail latency and throughput
- Optimising workloads across GPUs and next-generation accelerators
- Hands-on work with vLLM, TensorRT-LLM, and custom inference runtimes
- Opportunity to shape core infrastructure at an early-stage company
What they’re looking for
- Experience building ML inference or model serving systems
- Strong systems engineering or backend infrastructure fundamentals
- Experience working on performance, scaling, memory, or distributed systems challenges
- Strong Python and/or C++ skills
- Familiarity with modern inference frameworks and runtimes is a plus
APPLY NOW!
Anna Heneghan
Senior ML Research & Engineering Recruiter