Senior Kernel Engineer
- $225,000-$300,000
- Santa Clara, CA
- Permanent
About the job
Acceler8 Talent is seeking a Senior Kernel Engineer to join an early stage startup backed by a Tier-1 VC that is rethinking AI infrastructure from first principles.
Founded by industry veterans with a track record of shipping some of the industry's most successful products, this company is innovating at the chip and system level to deliver an order of magnitude better performance-per-watt for inference, which would mean a huge economic shift for anyone running large scale models.
As a Senior Kernel Engineer, you will build and optimize high-performance GPU kernels for next-generation AI systems.
Responsibilities:
● Design, implement, and optimize CUDA kernels for performance and scalability
● Build and tune GPU-to-GPU communication paths (e.g., NIXL, NCCL-style collectives, P2P)
● Profile, debug, and optimize memory, latency, and throughput bottlenecks
● Collaborate with compiler, systems, and hardware teams
Experience:
● 3+ years of kernel development and performance optimization experience
● Deep understanding of GPU architecture, memory hierarchies, and execution models
● Experience with multi-GPU communication and synchronization
● Triton experience is a plus
● Familiarity with AMD GPUs & ROCm is a strong plus
If you're looking for massive ownership, huge impact, and the opportunity to build from the ground up, please apply here or reach out to me at ltomaszko@acceler8talent.com to hear more.