Cluster Reliability Jobs
Browse 220 Cluster Reliability jobs on Inference Jobs.
201-220 of 220 jobs
1wCE
Senior Technical Program Manager – AI Infrastructure, Site Operations
Cerebras
Sunnyvale, California, United States (On-site)
1wMA
4wNV
Senior Storage Production Engineer - DGX Cloud
NVIDIA
Santa Clara, California, United States (On-site)$176k – $333.5k Yearly
1wCR
Engineering Manager (Managed Services, Production Engineering)
Crusoe
San Francisco, California, United States (On-site)$209k – $253k Yearly
1wCO
Director, Hardware Quality & NPI Operations
CoreWeave
Livingston, New Jersey, United States (Hybrid)$180k – $264k Yearly
4wAN
Technical Program Manager, Safeguards – Infrastructure & Evals
Anthropic
San Francisco, California, United States (Hybrid)$290k – $365k Yearly
21hNE
Senior Software Engineer in Hardware Infrastructure Observability
Nebius
Amsterdam, North Holland, Netherlands (On-site)
4wNE
Technical Project Manager / IT Infrastructure Engineer
Nebius
Île de Ré, Charente-Maritime, France (On-site)
6dNV
Global Connectivity Distinguished Engineer
NVIDIA
Santa Clara, California, United States (On-site)$320k – $488.8k Yearly
19hAN
Senior Engineer, Datacenter Server Lifecycle
Anthropic
London, England, United Kingdom (Hybrid)£255k – £325k Yearly
6dNV
Distinguished Resiliency and Safety Architect, GPU Diagnostics
NVIDIA
Santa Clara, California, United States (On-site)$320k – $488.8k Yearly
5dCR
Senior Manager, Data Center Operations
Crusoe
Houston, Texas, United States (On-site)$160k – $195k Yearly
2wCR
Sr/Staff Software Engineer, Observability (Network Engineering)
Crusoe
San Francisco, California, United States (On-site)$172k – $253k Yearly