Cluster Reliability Jobs
Browse 225 Cluster Reliability jobs on Inference Jobs.
41-60 of 225 jobs
1wFI
Reliability Engineer (All Levels)
Figure
San Jose, California, United States (On-site)$120k – $250k Yearly
4wCR
Site Reliability Engineering Intern, Summer 2026
Crusoe
San Francisco, California, United States (On-site)
3wCR
Senior Site Reliability Engineer, Managed AI
Crusoe
San Francisco, California, United States (On-site)$172k – $209k Yearly
1wTM
Research Engineer, Infrastructure, RL Systems
Thinking Machines Lab
San Francisco, California, United States (On-site)$350k – $475k Yearly
2wNV
Senior Silicon Reliability Engineer
NVIDIA
Santa Clara, California, United States (On-site)$168k – $264.5k Yearly
3dNV
Senior Data Scientist – EDA Datacenter Observability and Reliability
NVIDIA
Santa Clara, California, United States (Hybrid)$184k – $356.5k Yearly
4wFI
Staff Site Reliability Engineer
Figure
San Jose, California, United States (On-site)$175k – $250k Yearly
2wNV
Director, Global Network Reliability Engineering
NVIDIA
Santa Clara, California, United States (On-site)$268k – $408.3k Yearly
1wXA
Site Reliability Engineer - xAI Technical Operations
xAI
Palo Alto, California, United States (On-site)$180k – $400k Yearly
2wOP
Software Engineer, Infrastructure Reliability
OpenAI
San Francisco, California, United States (On-site)$255k – $385k Yearly
1wNE
Senior Site Reliability Engineer — Token Factory (Inference Platform)
Nebius
Netherlands + 4 more (Remote)
2wOP
Reliability/DFX Engineer
OpenAI
San Francisco, California, United States (On-site)$285k – $460k Yearly
3dNV
Senior Site Reliability Engineer - HPC
NVIDIA
Santa Clara, California, United States (On-site)$152k – $287.5k Yearly
3dNV
Senior Software Engineer - Deep Learning Compiler Verification and Infrastructure
NVIDIA
Santa Clara, California, United States (On-site)$140k – $224.3k Yearly
5dNV
Senior Reliability Engineer - LPU Packaging
NVIDIA
Santa Clara, California, United States (On-site)$168k – $310.5k Yearly