1. Home
  2. Jobs
  3. Fault Management

Fault Management Jobs

Browse 21 Fault Management jobs on Inference Jobs.

21 jobs

3wFI

Staff Diagnostics Engineer

Figure

San Jose, California, United States (On-site)$150k – $250k Yearly
1wOP

Training: ML Framework Engineer

OpenAI

San Francisco, California, United States (Hybrid)$245k – $385k Yearly
2wNV
5dNV

Principal Hardware Functional Safety Expert

NVIDIA

Santa Clara, California, United States (Hybrid)$272k – $431.3k Yearly
5dNV

Senior DFT ATPG Engineer

NVIDIA

Yokneam Ilit, Northern District, Israel (On-site)
2wNE

Incident Manager

Nebius

Amsterdam, North Holland, Netherlands (On-site)
5dCE

Engineering Manager, Kernel Reliability

Cerebras

Sunnyvale, California, United States (On-site)
2wCR

Incident Manager

Crusoe

San Francisco, California, United States (On-site)$136.1k – $165k Yearly
2wCR
3wAN

Technical Program Manager, Reliability Engineering

Anthropic

San Francisco, California, United States (Hybrid)$290k – $365k Yearly
2wNV

Senior Software Engineer, AI Resiliency

NVIDIA

Redmond, Washington, United States (On-site)$184k – $287.5k Yearly
2wOP

Data Center Incident Program Manager

OpenAI

United States or Remote (United States)$125.6k – $228k Yearly
3wNV

DFT ATPG Engineer

NVIDIA

Yokne'am, Northern District, Israel (On-site)
1wOP

Reliability/DFX Engineer

OpenAI

San Francisco, California, United States (On-site)$285k – $460k Yearly
3dCO

Infrastructure Operations Program Manager

CoreWeave

London, England, United Kingdom (Hybrid)£60k – £80k Yearly
4wCO

Manager, Bare Metal Support Engineering

CoreWeave

Singapore, Singapore (Hybrid)S$170k – S$240k Yearly
3wAN

Technical Program Manager, Safeguards – Infrastructure & Evals

Anthropic

San Francisco, California, United States (Hybrid)$290k – $365k Yearly
4hNV

Systems Quality and Reliability Lead - LPU

NVIDIA

Santa Clara, California, United States (On-site)$168k – $310.5k Yearly