Machine Learning Engineer
GW524
Posted: 17/06/2026
- $150,000-$200,000
- San Francisco, CA
- Permanent
About the job
Machine Learning Engineer
We’re partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads.
Their platform is changing how inference runs in production, intelligently managing workloads across different hardware to improve performance, efficiency, and cost.
They’re looking for an ML Engineer who cares about how models perform in the real world — making them faster, more reliable, and easier to scale under production load.
What They’re Looking For
- Strong machine learning engineering experience
- Experience deploying, serving, or optimizing ML models in production
- Good understanding of transformer models, attention, and LLM inference
- Experience with inference frameworks such as vLLM, TensorRT-LLM, Triton, or similar tools
- Strong Python skills, with C++ experience a plus
- Understanding of latency, throughput, batching, concurrency, and memory usage
- A practical, product-minded approach to building scalable ML systems
Nice to Have
- Experience with large-scale LLM serving or distributed inference
- Familiarity with GPU systems, CUDA, kernels, or compiler tooling
- Experience optimizing KV cache, prefill/decode performance, or memory placement
- Background working with high-performance infrastructure teams
Why Join?
- Work on some of the most important problems in AI infrastructure
- Help build systems for the next generation of AI workloads
- Join a small, highly technical team with strong ownership
- Have a major impact on product, architecture, and performance
- Shape foundational infrastructure used to run AI at scale
Anna Heneghan
Senior ML Research & Engineering Recruiter