ML Infrastructure Engineer
- $200,000-$275,000
- San Francisco, CA
- Permanent
About the job
Senior ML Infrastructure / Backend Engineer
Series C Startup | AI-Powered 3D & Avatar Platform | Hybrid (LA or SF)
We’re hiring a Senior ML Infrastructure / Backend Engineer to join a well-funded AI company building the visual and interaction layer for the next generation of AI-powered digital identities.
This team is developing production systems that bring AI characters out of chat boxes and into real-time, interactive 3D experiences.
You’ll own backend and infrastructure systems that serve ML-powered functionality at scale — supporting high-concurrency user traffic, low-latency inference, and rapid iteration as the platform grows.
You’ll work closely with ML researchers, platform engineers, and product teams to take models from experimentation to reliable, scalable production services.
What You’ll Do:
- Own backend services and APIs that expose ML-powered features to real users
- Design and operate orchestration layers for ML workloads (routing, batching, retries, concurrency)
- Deploy and scale ML-backed services in cloud environments
- Take systems from architecture → implementation → production ownership
- Scale infrastructure to support thousands to hundreds of thousands of daily requests
- Implement observability, monitoring, and alerting to ensure system reliability
- Partner closely with ML teams to productionize generative and ML models
- Improve end-to-end efficiency across inference, post-processing, and data pipelines
What We’re Looking For:
- Strong, production-level experience building and owning backend or distributed systems
- Hands-on experience designing and operating APIs (Python preferred — FastAPI, Flask, or gRPC)
- Experience deploying and running ML-backed systems in production environments
- Proven ability to scale systems under real user traffic with attention to latency and reliability
- Experience with cloud platforms (AWS, GCP, or similar) and containerized deployments
- Strong debugging, performance tuning, and operational ownership skills
Nice to have:
- Experience with ML inference optimization (quantization, mixed precision, ONNX, TensorRT)
- Familiarity with scalable inference frameworks (Ray Serve, Triton, TorchServe, SageMaker)
- Exposure to generative models (diffusion or transformer-based systems)
- Experience running GPU-backed or high-performance workloads in production
This Role Is:
- Hybrid (Los Angeles or San Francisco office)
- Salary range: upto $275k base, depending on experience
If you’re excited about building the infrastructure that powers real-time, embodied AI — and want ownership over systems that actually ship — we’d love to talk.