Senior CUDA Kernel Engineer
- $225,000-$275,000
- Santa Clara, CA
- Permanent
About the job
Acceler8 Talent is seeking a Senior CUDA Kernel Engineer to join an early stage startup backed by a Tier-1 VC that is rethinking AI infrastructure from first principles.
Founded by highly respected industry veterans, they are innovating at the chip and system level to deliver an order of magnitude better performance-per-watt for inference, which would mean a huge economic shift for anyone running large scale models, unlocking larger context windows and longer generations, making new workloads economically viable.
As a Senior CUDA Kernel Engineer, you will build and optimize high-performance GPU kernels for next-generation AI systems.
Responsibilities:
● Design, implement, and optimize CUDA kernels for performance and scalability
● Build and tune GPU-to-GPU communication paths (e.g., NIXL, NCCL-style collectives, P2P)
● Profile, debug, and optimize memory, latency, and throughput bottlenecks
● Collaborate with compiler, systems, and hardware teams
Experience:
● 3+ years of CUDA development and performance optimization experience
● Deep understanding of GPU architecture, memory hierarchies, and execution models
● Experience with multi-GPU communication and synchronization
● Triton experience is a plus
● Familiarity with AMD GPUs & ROCm is a strong plus
If you're looking for massive ownership, huge impact, and the opportunity to build from the ground up, please apply here or reach out to me at ltomaszko@acceler8talent.com to hear more.