Inference Engineer

GW475
  • $200,000-$300,000
  • San Francisco, CA
  • Permanent

About the job


📍 Onsite – San Francisco

💰 $200k–300k base + meaningful equity



Acceler8 Talent is partnering with an AI infrastructure startup building the platform next-generation AI systems will run on.


Fresh out of stealth, the company has already reached eight-figure revenue, raised an $80M Series A, and is scaling a world-class engineering team across inference, distributed systems, compiler infrastructure, and high-performance AI compute.


Their platform automatically maps complex AI workloads across CPUs, GPUs, and emerging accelerators to maximize inference performance and hardware efficiency at scale.

As a Software Engineer focused on inference systems, you’ll own the runtime layer that executes modern models end-to-end under real production constraints.



Responsibilities ⚙️

• Design and optimize production inference pipelines

• Improve batching, scheduling, concurrency, and runtime behavior

• Optimize KV cache systems and memory efficiency

• Debug latency and throughput bottlenecks across model and systems layers

• Partner closely with compiler, kernel, and distributed systems engineers

• Contribute to large-scale distributed inference infrastructure


Requirements

• Hands-on experience building and scaling production ML inference systems

• Experience owning inference or model serving infrastructure end-to-end

• Strong understanding of distributed systems and runtime behavior under load

• Experience optimizing latency, throughput, batching, and memory efficiency

• Strong Python and/or C++ skills

• Comfortable operating in highly technical, high-ownership environments


Bonus Points

• Experience with TensorRT-LLM, vLLM, or custom inference runtimes

• CUDA, kernel optimization, or compiler-adjacent systems experience

• Experience optimizing GPU utilization at scale

• Background in AI infrastructure or high-performance compute systems.



If you're interested in building inference infrastructure for next-generation AI systems at massive scale, please apply here or reach out directly to hear more!


Anna Button Researcher

Apply for this role