Senior Kernel Engineer

GW400
  • $225,000-$300,000
  • Santa Clara, CA
  • Permanent

About the job


Acceler8 Talent is seeking a Senior Kernel Engineer to join an early stage startup backed by a Tier-1 VC that is rethinking AI infrastructure from first principles.


Founded by industry veterans with a track record of shipping some of the industry's most successful products, this company is innovating at the chip and system level to deliver an order of magnitude better performance-per-watt for inference, which would mean a huge economic shift for anyone running large scale models.


As a Senior Kernel Engineer, you will build and optimize high-performance GPU kernels for next-generation AI systems.


Responsibilities:

● Design, implement, and optimize CUDA kernels for performance and scalability

● Build and tune GPU-to-GPU communication paths (e.g., NIXL, NCCL-style collectives, P2P)

● Profile, debug, and optimize memory, latency, and throughput bottlenecks

● Collaborate with compiler, systems, and hardware teams


Experience:

● 3+ years of kernel development and performance optimization experience

● Deep understanding of GPU architecture, memory hierarchies, and execution models

● Experience with multi-GPU communication and synchronization

● Triton experience is a plus

● Familiarity with AMD GPUs & ROCm is a strong plus


If you're looking for massive ownership, huge impact, and the opportunity to build from the ground up, please apply here or reach out to me at ltomaszko@acceler8talent.com to hear more.


Luke Tomaszko Senior Semiconductor & Chip Design Recruiter

Apply for this role