Inference Engineer
- $200,000-$300,000
- San Francisco, CA
- Permanent
About the job
📍 Onsite – San Francisco
💰 $200k–300k base + meaningful equity
Acceler8 Talent is partnering with an AI infrastructure startup building the platform next-generation AI systems will run on.
Fresh out of stealth, the company has already reached eight-figure revenue, raised an $80M Series A, and is scaling a world-class engineering team across inference, distributed systems, compiler infrastructure, and high-performance AI compute.
Their platform automatically maps complex AI workloads across CPUs, GPUs, and emerging accelerators to maximize inference performance and hardware efficiency at scale.
As a Software Engineer focused on inference systems, you’ll own the runtime layer that executes modern models end-to-end under real production constraints.
Responsibilities ⚙️
• Design and optimize production inference pipelines
• Improve batching, scheduling, concurrency, and runtime behavior
• Optimize KV cache systems and memory efficiency
• Debug latency and throughput bottlenecks across model and systems layers
• Partner closely with compiler, kernel, and distributed systems engineers
• Contribute to large-scale distributed inference infrastructure
Requirements
• Hands-on experience building and scaling production ML inference systems
• Experience owning inference or model serving infrastructure end-to-end
• Strong understanding of distributed systems and runtime behavior under load
• Experience optimizing latency, throughput, batching, and memory efficiency
• Strong Python and/or C++ skills
• Comfortable operating in highly technical, high-ownership environments
Bonus Points
• Experience with TensorRT-LLM, vLLM, or custom inference runtimes
• CUDA, kernel optimization, or compiler-adjacent systems experience
• Experience optimizing GPU utilization at scale
• Background in AI infrastructure or high-performance compute systems.
If you're interested in building inference infrastructure for next-generation AI systems at massive scale, please apply here or reach out directly to hear more!