Research Engineer - Interpretability Systems

GW488
  • $250,000-$350,000
  • San Francisco, CA
  • Permanent

About the job


🚨 Research Engineer – Interpretability Systems


📍 San Francisco, CA | Onsite


🧠 Early-stage AI research lab | Revenue-generating


An AI research lab working at the frontier of interpretability, alignment, and reinforcement learning is hiring Research Engineers focused on understanding what’s happening inside large language models


This role is for engineers who want to build the experimental systems that make interpretability research possible - not production ML, MLOps, or large-scale training infra


You’ll work on:

🔍 Activation tracing & mechanistic analysis

🧪 Custom RL-style environments for alignment research

🧠 Probing internal representations

🎯 Detecting latent concepts like deception, goals, uncertainty, or hidden objectives

🛠️ Activation-level steering beyond prompting and fine-tuning

📊 New benchmarks for model consistency and robustness


The work is fast, experimental, and greenfield: build custom tooling, test research ideas, get results, move on.


Ideal background:


✅ Strong software engineering fundamentals

✅ Experience with experimental ML / research systems

✅ Comfort working close to model internals

✅ Interest in interpretability, alignment, RL, or mechanistic understanding

✅ PhD helpful, not required


This is not a role for scaling pipelines or maintaining production systems


It’s for people who enjoy ambiguous problems, fast research cycles, and building new tools from first principles


Interested? Apply & Drop me a message!


Zee Uddin Researcher

Apply for this role