Cluster Reliability Jobs
Browse 219 Cluster Reliability jobs on Inference Jobs.
101-120 of 219 jobs
4wCR
Site Reliability Engineering Intern, Summer 2026
Crusoe
San Francisco, California, United States (On-site)
2wHA
Senior Software Engineer, Site Reliability Engineer (SRE)
Harvey
San Francisco, California, United States (On-site)$200k – $260k Yearly
3dNV
Systems Quality and Reliability Lead - LPU
NVIDIA
Santa Clara, California, United States (On-site)$168k – $310.5k Yearly
1wAN
Product Manager, Compute Platform
Anthropic
San Francisco, California, United States (Hybrid)$305k – $385k Yearly
1wCO
Principal Engineer - Observability
CoreWeave
New York, New York, United States (Hybrid)$206k – $303k Yearly
3dNV
Senior Data Scientist – EDA Datacenter Observability and Reliability
NVIDIA
Santa Clara, California, United States (Hybrid)$184k – $356.5k Yearly
6dCO
Infrastructure Operations Program Manager
CoreWeave
London, England, United Kingdom (Hybrid)£60k – £80k Yearly
3wXA
Network Development Engineer, ML Infrastructure (High-Speed Interconnects)
xAI
Palo Alto, California, United States (On-site)$180k – $440k Yearly
2wOP
Engineering Manager, Cloud Infrastructure Automation
OpenAI
San Francisco, California, United States (On-site)$405k – $490k Yearly
6dAN
[Pipeline] Staff+ Software Engineer, Systems
Anthropic
San Francisco, California, United States (Hybrid)$405k – $485k Yearly
1wFI
Reliability Test Engineer, Hardware (All Levels)
Figure
San Jose, California, United States (On-site)$120k – $250k Yearly
2wOP
Software Engineer, GPU Infrastructure - HPC
OpenAI
San Francisco, California, United States (On-site)$255k – $490k Yearly
1wHA
Technical Program Manager, Quality and Reliability
Harvey
San Francisco, California, United States (On-site)$200k – $275k Yearly
3wNE
2wCR
Staff+ Software Engineer - Cloud Availability Platform Engineering (CAPE)
Crusoe
San Francisco, California, United States (On-site)$209k – $253k Yearly
6dNV
Senior Resiliency and Safety Architect, GPU Workloads and Failure Analysis
NVIDIA
Santa Clara, California, United States (On-site)$184k – $356.5k Yearly
2wPW
Member of Technical Staff, Infrastructure & Scaling
Parallel Web Systems
San Francisco, California, United States (On-site)