Back to results

Inference Engineer

GW487 Posted: 15/05/2026

$200,000-$350,000
Santa Clara, CA
Permanent

About the job

Senior / Principal Machine Learning Engineer – Inference Serving Frameworks

Full-time | On-site | Bay Area

About the Company

We are a VC-backed, stealth-mode startup building rack-level AI inference systems. Our differentiated system-on-chip architecture enables system-level innovations designed to maximize efficiency for data center-scale inference serving.

The team is building hardware and extending open-source software to serve leading-edge models with extreme efficiency. We are looking for highly skilled engineers who can help architect and optimize large-scale inference systems across software, hardware, networking, and scheduling.

Leveling is determined by scope, ownership, and leadership, not only years of experience.

About the Role

As a Senior or Principal Machine Learning Engineer focused on inference serving frameworks, you will lead or serve as a core member of a team building state-of-the-art inference serving and cluster scheduling capabilities.

You will work alongside hardware and software experts to architect high-performance inference stacks and design resource scheduling strategies that push the frontier of efficiency for large-scale open-source models on custom AI infrastructure.

Key Responsibilities

Design, develop, and tune multi-node inference techniques to optimize throughput and latency.
Apply strategies such as tensor parallelism, pipeline parallelism, expert parallelism, continuous batching, and KV cache management.
Optimize at the intersection of compute, networking, and storage for large-scale model serving.
Drive performance improvements in inference frameworks such as vLLM, SGLang, PyTorch, or similar systems.
Develop advanced cluster scheduling algorithms to improve throughput, latency, and resource utilization.
Engage with the open-source community to upstream optimizations, influence roadmaps, and support long-term maintainability.
Apply best practices in benchmarking, testing, profiling, and debugging to maintain a robust production-grade stack.

Experience and Qualifications

Strong proficiency in Python, C++, and PyTorch.
Demonstrated history of shipping high-quality software in a startup or fast-paced technical environment.
Experience as a developer of one or more LLM inference serving frameworks, such as vLLM, SGLang, or comparable systems.
Deep understanding of LLM inference internals, including KV cache management, batching, attention mechanisms, and serving-time performance tradeoffs.
Experience running and optimizing large-scale workloads on heterogeneous clusters.
Familiarity with networking, storage management, distributed scheduling, or related systems.
Proficiency in performance analysis and systems-level debugging.
GPU kernel development experience using CUDA, Triton, ROCm, or similar technologies is a plus.
Master’s or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field, or equivalent practical experience.

Bonus Experience

Experience contributing to or maintaining open-source inference-serving frameworks.
Familiarity with advanced scheduling or memory systems for LLM serving.
Experience optimizing inference workloads on custom or heterogeneous AI hardware.
Understanding of cluster-scale bottlenecks across compute, memory, networking, and storage.
Prior experience in stealth, early-stage, or fast-moving infrastructure startups.

What We Offer

Opportunity to work on next-generation AI inference infrastructure.
Direct collaboration with hardware and software experts.
High ownership over core serving and scheduling systems.
Fast-paced startup environment with significant technical scope and impact.

Kelly Dougherty Researcher

Apply for this role

First Name

Last Name

Telephone Number

Email Address

Resume, LinkedIn or Dropbox URL

Resume Upload

Choose File

LinkedIn / Dropbox URL

Message

By submitting this form you agree to our Terms & Conditions, Privacy Policy & Cookie Policy

Not yet registered? Create an account today

Already have an account? Sign in now

Still looking? What about...

Featured Jobs

View all jobs

Posted: 15/05/2026

Research Engineer - Interpretability Systems

GW488

$250,000-$350,000
San Francisco, CA
Permanent

About the job🚨 Research Engineer – Interpretability Systems📍 San Francisco, CA | Onsite🧠 E...

View Job

Posted: 15/05/2026

Inference Engineer

GW487

$200,000-$350,000
Santa Clara, CA
Permanent

About the jobSenior / Principal Machine Learning Engineer – Inference Serving FrameworksFull-time ...

View Job

Posted: 15/05/2026

Research Engineer – Experimental ML Systems

GW486

$250,000-$350,000
San Francisco, CA
Permanent

About the job🚨 Research Engineer – Experimental ML Systems📍 San Francisco, CA | Onsite🧠 Ea...

View Job

Posted: 15/05/2026

Performance Modeling Engineer

GW485

$200,000-$350,000
Santa Clara, CA
Permanent

About the jobSr/Principal Software Engineer – Simulator DeveloperLocation: Santa Clara, CA | Onsit...

View Job

Posted: 15/05/2026

Senior Engineer (Electron)

GW484

$220,000-$240,000
San Francisco, CA
Permanent

About the jobSoftware Engineer (Electron) We are seeking a Software Engineer to architect and engin...

View Job

Posted: 15/05/2026

Forward Deployed Engineer

GW483

$150,000-$200,000
United States
Permanent

About the jobHiring for a Remote Senior Forward Deployed position, across the US and Canada, f...

View Job

Posted: 08/05/2026

Senior AI Engineer

GW482

$150,000-$180,000
United States
Permanent

About the job📍 Remote across the U.S. (Eastern/Central time zones) 💰 $150-180k + strong benefitsT...

View Job

Posted: 08/05/2026

Senior Backend Engineer

GW481

$180,000-$250,000
Boston, MA
Permanent

About the jobSenior Software Engineer (Agentic AI) - Boston, MA A fast-growing deep tech startup is...

View Job

Posted: 08/05/2026

Lead AI Engineer

GW480

$160,000-$245,000
United States
Permanent

About the jobWe are looking for a Lead AI Engineer with 7+ years’ experience buildi...

View Job

Posted: 08/05/2026

Lead AI Engineer

GW479

$180,000-$240,000
United States
Permanent

About the jobLead AI Engineer – Enterprise AI Transformation - US Eastern (Remote) A global o...

View Job

Quick Resume Dropoff

Inference Engineer

About the job

Apply for this role

Still looking? What about...

Featured Jobs

Research Engineer - Interpretability Systems

Inference Engineer

Research Engineer – Experimental ML Systems

Performance Modeling Engineer

Senior Engineer (Electron)

Forward Deployed Engineer

Senior AI Engineer

Senior Backend Engineer

Lead AI Engineer

Lead AI Engineer

Contact Us

Find us on social

Useful Links

Legal