ML Infrastructure Engineer

GW271
  • $200,000-$275,000
  • San Francisco, CA
  • Permanent

About the job


Senior ML Infrastructure / Backend Engineer


Series C Startup | AI-Powered 3D & Avatar Platform | Hybrid (LA or SF)


We’re hiring a Senior ML Infrastructure / Backend Engineer to join a well-funded AI company building the visual and interaction layer for the next generation of AI-powered digital identities.

This team is developing production systems that bring AI characters out of chat boxes and into real-time, interactive 3D experiences.


You’ll own backend and infrastructure systems that serve ML-powered functionality at scale — supporting high-concurrency user traffic, low-latency inference, and rapid iteration as the platform grows.


You’ll work closely with ML researchers, platform engineers, and product teams to take models from experimentation to reliable, scalable production services.


What You’ll Do:


  • Own backend services and APIs that expose ML-powered features to real users
  • Design and operate orchestration layers for ML workloads (routing, batching, retries, concurrency)
  • Deploy and scale ML-backed services in cloud environments
  • Take systems from architecture → implementation → production ownership
  • Scale infrastructure to support thousands to hundreds of thousands of daily requests
  • Implement observability, monitoring, and alerting to ensure system reliability
  • Partner closely with ML teams to productionize generative and ML models
  • Improve end-to-end efficiency across inference, post-processing, and data pipelines


What We’re Looking For:


  • Strong, production-level experience building and owning backend or distributed systems
  • Hands-on experience designing and operating APIs (Python preferred — FastAPI, Flask, or gRPC)
  • Experience deploying and running ML-backed systems in production environments
  • Proven ability to scale systems under real user traffic with attention to latency and reliability
  • Experience with cloud platforms (AWS, GCP, or similar) and containerized deployments
  • Strong debugging, performance tuning, and operational ownership skills


Nice to have:


  • Experience with ML inference optimization (quantization, mixed precision, ONNX, TensorRT)
  • Familiarity with scalable inference frameworks (Ray Serve, Triton, TorchServe, SageMaker)
  • Exposure to generative models (diffusion or transformer-based systems)
  • Experience running GPU-backed or high-performance workloads in production


This Role Is:


  • Hybrid (Los Angeles or San Francisco office)
  • Salary range: upto $275k base, depending on experience


If you’re excited about building the infrastructure that powers real-time, embodied AI — and want ownership over systems that actually ship — we’d love to talk.


Anna Lynch Principal Account Manager

Apply for this role