ML Performance Wizard

  • Negotiable
  • Santa Clara, California
  • Permanent

ML Performance Engineer

Join Our Team as an ML Performance Engineer

Are you ready to pioneer the future of AI, making it private, convenient, and profitable for all? At our company, we're on a mission to empower developers and enterprises worldwide by migrating inference to user devices and supercharging existing on-device inference. We believe this advancement will propel us into a future where AI is seamlessly integrated into daily life.

About Us

At our company, we're leading the charge in AI innovation. Our team is currently developing cutting-edge technologies, including a cross-platform Inference Engine leveraging Metal and CUDA, Swift packages for end-to-end inference pipelines, a Python toolkit for model compression and inference efficiency, and fostering a vibrant developer community.

Why Join Us?

As an ML Performance Engineer at our company, you'll have the opportunity to:

  • Collaborate closely with industry leaders, partners, and collaborators.
  • Contribute to open-source projects and publish technical blogs.
  • Have autonomy aligned with business objectives.
  • Advocate for increased R&D budgets based on data-driven insights.

What We Offer

  • Competitive equity compensation based on market data.
  • Flexible work locations in Los Angeles, San Francisco, or New York City.
  • Platinum-tier health insurance and 401(k) with a 100% match.
  • Travel opportunities for team meetings and conferences.

Key Responsibilities

As an ML Performance Engineer you will:

  • Profile and enhance the performance of ML workloads across various platforms, such as Nvidia, Apple, and Qualcomm.
  • Develop highly optimized GPU kernels for our Inference Engine.
  • Translate complex technical outcomes into accessible technical blogs for our audience.
  • Mentor junior team members and interns.


To excel in this role, you'll need:

  • Proficiency in debugging, profiling, and optimizing GPU kernels.
  • Expertise in parallel programming.
  • Familiarity with Metal and/or CUDA/Triton.
  • Advanced understanding of modern Deep Learning workload characteristics.
  • Basic knowledge of fundamental Machine Learning concepts.
  • Experience with Android/Windows platforms is advantageous.

Tyler Long Researcher

Apply for this role