Inference Engineer

GW497 Posted: 27/05/2026

$300,000-$325,000
San Francisco, CA
Permanent

About the job

Inference Engineer

We’re partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads.

Their platform is rethinking how inference runs at scale - intelligently orchestrating workloads across heterogeneous hardware to unlock major gains in performance, efficiency, and cost. The team is solving some of the hardest problems in modern AI infrastructure: inference scheduling, KV cache management, runtime optimization, memory efficiency, and low-latency serving across distributed systems.

They’re looking for engineers who care deeply about how models execute in production — not just training models, but making them fast, scalable, and reliable under real-world load.

What You’ll Work On

Designing and optimizing large-scale inference pipelines
Improving latency, throughput, and concurrency under production workloads
Building inference runtimes and serving infrastructure
Optimizing batching, scheduling, and request orchestration
Managing KV cache allocation, reuse, placement, and eviction strategies
Improving prefill/decode performance and memory efficiency
Profiling bottlenecks across model, runtime, and distributed system layers
Collaborating closely with compiler, kernel, and systems engineers

What They’re Looking For

Strong systems engineering fundamentals
Experience building or scaling ML inference / model serving systems
Deep understanding of performance optimization and memory behavior
Experience with runtimes such as vLLM, TensorRT-LLM, or custom serving infrastructure
Strong understanding of transformer architectures and attention mechanisms
Familiarity with batching, scheduling, concurrency, and cache management
Strong Python and/or C++ engineering skills

Why Join

Work on cutting-edge inference infrastructure and AI systems problems
Build systems designed for next-generation AI scale
Small, highly technical engineering team
Significant ownership and technical impact
Opportunity to shape foundational infrastructure for future AI workloads

Anna Heneghan Senior ML Research & Engineering Recruiter

Apply for this role

First Name

Last Name

Telephone Number

Email Address

Resume, LinkedIn or Dropbox URL

Resume Upload

Choose File

LinkedIn / Dropbox URL

Message

By submitting this form you agree to our Terms & Conditions, Privacy Policy & Cookie Policy

Not yet registered? Create an account today

Already have an account? Sign in now

Still looking? What about...

Featured Jobs

View all jobs

Posted: 27/05/2026

Senior Software Engineer

GW502

$150,000-$210,000
San Francisco, CA
Permanent

About the jobSenior Software Engineer - Voice AI A fast-growing AI startup building voice and commu...

View Job

Posted: 27/05/2026

Senior Machine Learning Engineer

GW501

$180,000-$270,000
San Francisco, CA
Permanent

About the jobSenior Machine Learning Engineer – Generative AI & Avatar Animation | Los Angeles...

View Job

Posted: 27/05/2026

Senior Software Engineer

GW500

$180,000-$250,000
San Francisco, CA
Permanent

About the job🚀 Senior Software Engineer📍 San Francisco | Fully onsite | Early-stage AI startupA YC-bac...

View Job

Posted: 27/05/2026

Compiler Engineer

GW499

$275,000-$300,000
San Francisco, CA
Permanent

About the jobCompiler EngineerWe’re partnered with an AI infrastructure company building next-gene...

View Job

Posted: 27/05/2026

Engineering Manager

GW498

$200,000-$250,000
San Francisco, CA
Permanent

About the jobEngineering Manager - San Francisco, CA A fast-growing AI startup building voice and c...

View Job

Posted: 27/05/2026

Inference Engineer

GW497

$300,000-$325,000
San Francisco, CA
Permanent

About the jobInference EngineerWe’re partnered with an AI infrastructure company building next-gen...

View Job

Posted: 27/05/2026

Software Engineer- Agentic AI

GW496

$200,000-$265,000
San Francisco, CA
Permanent

About the jobSoftware Engineer - Agentic AIWe’re partnered with an AI company building advanced au...

View Job

Posted: 27/05/2026

Lead AI Engineer

GW495

$180,000-$240,000
United States
Permanent

About the jobLead AI Engineer (Palantir Foundry) - US Eastern (Remote)A global organization is building ...

View Job

Posted: 27/05/2026

Engineering Manager

GW494

$200,000-$250,000
San Francisco, CA
Permanent

About the jobEngineering ManagerWe’re partnered with a fast-growing AI infrastructure company buil...

View Job

Posted: 27/05/2026

Senior AI Engineer

GW493

$140,000-$180,000
United States
Permanent

About the jobSenior AI Engineer – Palantir Foundry - Remote East Coast, USA global, large-scale or...

View Job

Quick Resume Dropoff

Inference Engineer

About the job

Apply for this role

Still looking? What about...

Featured Jobs

Senior Software Engineer

Senior Machine Learning Engineer

Senior Software Engineer

Compiler Engineer

Engineering Manager

Inference Engineer

Software Engineer- Agentic AI

Lead AI Engineer

Engineering Manager

Senior AI Engineer

Contact Us

Find us on social

Useful Links

Legal