1. Home
  2. Jobs
  3. Fault Tolerance

Fault Tolerance Jobs

Browse 24 Fault Tolerance jobs on Inference Jobs.

24 jobs

2wNV
2wOP

Software Engineer, Platform Systems

OpenAI

San Francisco, California, United States (On-site)$310k – $460k Yearly
2wNV

Senior Software Engineer, AI Resiliency

NVIDIA

Redmond, Washington, United States (On-site)$184k – $287.5k Yearly
6dCO

Engineer II, Kubernetes Core Interfaces

CoreWeave

Livingston, New Jersey, United States (Hybrid)$109k – $160k Yearly
2wNV

Senior Quantum Error Correction Research Scientist, Applied Research

NVIDIA

Redmond, Washington, United States (Hybrid)$192k – $304.8k Yearly
2wOP

Training: ML Framework Engineer

OpenAI

San Francisco, California, United States (Hybrid)$245k – $385k Yearly
2wTA

Senior Software Engineer, Observability

Together AI

San Francisco, California, United States (Hybrid)$160k – $260k Yearly
2wOP

Reliability/DFX Engineer

OpenAI

San Francisco, California, United States (On-site)$285k – $460k Yearly
6dNE

Senior Backend Engineer

Nebius

Praha, Prague, Czech Republic (Hybrid)
4dOP

Software Engineer, ChatGPT Infrastructure

OpenAI

San Francisco, California, United States (On-site)$255k – $405k Yearly
6dNV

Senior DFT ATPG Engineer

NVIDIA

Yokneam Ilit, Northern District, Israel (On-site)
2wMA

Distributed Systems Engineer

Magic

San Francisco, California, United States (On-site)$225k – $550k Yearly
5dNV

Principal Datacenter Resiliency Architect, RAS Features and Modeling

NVIDIA

Santa Clara, California, United States (On-site)$272k – $431.3k Yearly
1dNV

Systems Quality and Reliability Lead - LPU

NVIDIA

Santa Clara, California, United States (On-site)$168k – $310.5k Yearly
6dCE

Engineering Manager, Kernel Reliability

Cerebras

Sunnyvale, California, United States (On-site)
1wXA

Site Reliability Engineer - xAI Technical Operations

xAI

Palo Alto, California, United States (On-site)$180k – $400k Yearly