Software Engineer
- $200,000-$300,000
- San Francisco, CA
- Permanent
About the job
AI Performance Engineer (Kernel Systems)
San Francisco, CA (Onsite)
$250,000-$300,000 Base + Equity
I'm working with a rapidly growing AI infrastructure company building the orchestration layer for the future of AI compute.
As AI workloads become increasingly complex and hardware becomes increasingly diverse, extracting performance from modern accelerators is becoming one of the most important challenges in the industry. This team is building the systems that allow AI workloads to run efficiently across GPUs and emerging accelerator architectures at production scale.
They've recently emerged from stealth with $80m Series A, eight-figure revenue, Fortune 500 deployments, and a growing roster of AI-native customers.
This is not a traditional GPU optimization role.
You'll work at the intersection of kernel engineering, AI infrastructure, runtime systems, and hardware performance, helping define how next-generation AI workloads execute across heterogeneous compute environments.
You'll work on problems such as:
• Kernel optimization for large-scale AI inference workloads
• Memory movement, cache utilization, and execution efficiency
• GPU and accelerator performance tuning
• Kernel orchestration and execution planning
• Throughput, latency, and hardware utilization optimization
• Performance profiling and bottleneck analysis
• Supporting execution across both established and emerging hardware architectures.
We're looking for engineers who have:
• Experience building or optimizing performance-critical systems close to hardware
• Strong understanding of GPU architecture and execution behavior
• Experience reasoning about memory hierarchies, latency, throughput, and hardware efficiency
• Strong software engineering fundamentals
• Experience working on systems where performance and correctness are equally important.
Nice to have:
• Experience with ROCm, Metal, or alternative accelerator backends
• Experience optimizing AI inference or training workloads
• Familiarity with occupancy tuning, latency hiding, and instruction-level parallelism
• Experience with distributed or multi-GPU execution
• Experience working alongside compiler, runtime, or systems teams
If you're excited about kernel engineering, hardware performance, and building the execution layer that powers the next generation of AI infrastructure, I'd love to chat.