Distributed Systems Engineer

GW196
  • $225,000-$550,000
  • San Francisco, CA
  • Permanent

About the job


Distributed Systems Engineer - San Francisco, CA 


A company building frontier-scale AI models that automate software engineering and AI research, combining ultra-long context, domain-specific RL, and massive compute infrastructure are looking for a Distributed Systems Engineer to join their team.


What Will I Be Doing: 


  • Design and build distributed data and coordination systems that enable ultra-long-context model training and inference
  • Develop high-performance storage and caching systems to support large-scale GPU workloads
  • Work deep in the internals of modern deep learning frameworks in highly distributed environments
  • Build automation for fault detection, recovery and high availability across GPU clusters
  • Troubleshoot complex, cross-stack issues spanning GPUs, networking, storage, operating systems and cloud infrastructure


What We’re Looking For:


  • Deep expertise in distributed systems design and public cloud platforms
  • Proven experience designing and operating highly available, high-throughput data systems
  • Strong knowledge of distributed databases, batch or stream processing systems, and/or distributed file systems
  • Exceptional problem-solving ability across the full systems stack
  • A hands-on mindset with the curiosity and grit to learn fast in a frontier technical environment


What’s In It for Me:


  • Salary of $225K–$550K dependent on experience + significant equity
  • Great benefits inc. 401(k) with 6% company match, comprehensive health, unlimited PTO
  • Visa sponsorship and SF relocation stipend available
  • Well-funded ($465M+) with backing from top investors


Apply now for immediate consideration!


Kirstie Moffat ML Research & Engineering Recruiter

Apply for this role