Machine Learning Engineer

GW524
  • $150,000-$200,000
  • San Francisco, CA
  • Permanent

About the job


Machine Learning Engineer


We’re partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads.


Their platform is changing how inference runs in production, intelligently managing workloads across different hardware to improve performance, efficiency, and cost.

They’re looking for an ML Engineer who cares about how models perform in the real world — making them faster, more reliable, and easier to scale under production load.


What They’re Looking For

  • Strong machine learning engineering experience
  • Experience deploying, serving, or optimizing ML models in production
  • Good understanding of transformer models, attention, and LLM inference
  • Experience with inference frameworks such as vLLM, TensorRT-LLM, Triton, or similar tools
  • Strong Python skills, with C++ experience a plus
  • Understanding of latency, throughput, batching, concurrency, and memory usage
  • A practical, product-minded approach to building scalable ML systems


Nice to Have

  • Experience with large-scale LLM serving or distributed inference
  • Familiarity with GPU systems, CUDA, kernels, or compiler tooling
  • Experience optimizing KV cache, prefill/decode performance, or memory placement
  • Background working with high-performance infrastructure teams


Why Join?

  • Work on some of the most important problems in AI infrastructure
  • Help build systems for the next generation of AI workloads
  • Join a small, highly technical team with strong ownership
  • Have a major impact on product, architecture, and performance
  • Shape foundational infrastructure used to run AI at scale


Anna Heneghan Senior ML Research & Engineering Recruiter

Apply for this role