1. Home
  2. Jobs
  3. Cluster Reliability

Cluster Reliability Jobs

Browse 219 Cluster Reliability jobs on Inference Jobs.

101-120 of 219 jobs

4wCR

Site Reliability Engineering Intern, Summer 2026

Crusoe

San Francisco, California, United States (On-site)
4wCO

Production Engineer

CoreWeave

Livingston, New Jersey, United States (Hybrid)$139k – $204k Yearly
2wHA

Senior Software Engineer, Site Reliability Engineer (SRE)

Harvey

San Francisco, California, United States (On-site)$200k – $260k Yearly
3dNV

Systems Quality and Reliability Lead - LPU

NVIDIA

Santa Clara, California, United States (On-site)$168k – $310.5k Yearly
1wAN

Product Manager, Compute Platform

Anthropic

San Francisco, California, United States (Hybrid)$305k – $385k Yearly
3wCR

Incident Manager

Crusoe

San Francisco, California, United States (On-site)$136.1k – $165k Yearly
1wCO

Principal Engineer - Observability

CoreWeave

New York, New York, United States (Hybrid)$206k – $303k Yearly
3dNV

Senior Data Scientist – EDA Datacenter Observability and Reliability

NVIDIA

Santa Clara, California, United States (Hybrid)$184k – $356.5k Yearly
6dCO

Infrastructure Operations Program Manager

CoreWeave

London, England, United Kingdom (Hybrid)£60k – £80k Yearly
2wOP

Engineering Manager, Cloud Infrastructure Automation

OpenAI

San Francisco, California, United States (On-site)$405k – $490k Yearly
6dAN

[Pipeline] Staff+ Software Engineer, Systems

Anthropic

San Francisco, California, United States (Hybrid)$405k – $485k Yearly
1wFI

Reliability Test Engineer, Hardware (All Levels)

Figure

San Jose, California, United States (On-site)$120k – $250k Yearly
2wOP

Software Engineer, GPU Infrastructure - HPC

OpenAI

San Francisco, California, United States (On-site)$255k – $490k Yearly
1wHA

Technical Program Manager, Quality and Reliability

Harvey

San Francisco, California, United States (On-site)$200k – $275k Yearly
2wCR

Staff+ Software Engineer - Cloud Availability Platform Engineering (CAPE)

Crusoe

San Francisco, California, United States (On-site)$209k – $253k Yearly
6dNV

Senior Resiliency and Safety Architect, GPU Workloads and Failure Analysis

NVIDIA

Santa Clara, California, United States (On-site)$184k – $356.5k Yearly
2wPW

Member of Technical Staff, Infrastructure & Scaling

Parallel Web Systems

San Francisco, California, United States (On-site)