1. Home
  2. Jobs
  3. Model Interpretability

Model Interpretability Jobs

Explore Model Interpretability roles on Inference Jobs and apply today.

2w agoAN
2w agoAN

Research Engineer, Interpretability

Anthropic

San Francisco, California, United States (Hybrid)$315K – $560K Yearly
2w agoAN

Senior Research Scientist, Reward Models

Anthropic

San Francisco, California, United States (Hybrid)$350K – $500K Yearly
4w agoBA
3mo agoRA

Member of Technical Staff - Safety Lead

Reflection AI

San Francisco, California, United States (On-site)
2w agoAN

Research Scientist, Interpretability

Anthropic

San Francisco, California, United States (Hybrid)$350K – $850K Yearly
3mo agoOP

Researcher, Interpretability

OpenAI

San Francisco, California, United States (On-site)$310K – $460K Yearly
4w agoAN
3mo agoAN

Research Scientist, Societal Impacts

Anthropic

San Francisco, California, United States (Hybrid)$350K – $850K Yearly
2w agoAN

Machine Learning Engineer, Safeguards

Anthropic

San Francisco, California, United States (Hybrid)$350K – $500K Yearly
5d agoAN

Anthropic Fellows Program — AI Safety

Anthropic

United States + 2 more (Remote)$3.9K – $3.9K Weekly
2mo agoAN
3mo agoCA

Inference Engineer

Cartesia

San Francisco, California, United States (On-site)$180K – $250K Yearly
2w agoAN
4w agoSC

Research Scientist, AI Controls and Monitoring

Scale

San Francisco, California, United States (On-site)$197.4K – $246.8K Yearly
3mo agoOP

Researcher, Training

OpenAI

San Francisco, California, United States (Hybrid)$360K – $440K Yearly
2mo agoAN

Model Quality Software Engineer, Claude Code

Anthropic

San Francisco, California, United States (Hybrid)$320K – $485K Yearly