Fault Tolerance Jobs
Browse 24 Fault Tolerance jobs on Inference Jobs.
24 jobs
2wOP
Software Engineer, Platform Systems
OpenAI
San Francisco, California, United States (On-site)$310k – $460k Yearly
2wNV
Senior Software Engineer, AI Resiliency
NVIDIA
Redmond, Washington, United States (On-site)$184k – $287.5k Yearly
6dCO
Engineer II, Kubernetes Core Interfaces
CoreWeave
Livingston, New Jersey, United States (Hybrid)$109k – $160k Yearly
2wNV
Senior Quantum Error Correction Research Scientist, Applied Research
NVIDIA
Redmond, Washington, United States (Hybrid)$192k – $304.8k Yearly
2wOP
Training: ML Framework Engineer
OpenAI
San Francisco, California, United States (Hybrid)$245k – $385k Yearly
2wTA
Senior Software Engineer, Observability
Together AI
San Francisco, California, United States (Hybrid)$160k – $260k Yearly
2wOP
Reliability/DFX Engineer
OpenAI
San Francisco, California, United States (On-site)$285k – $460k Yearly
4dOP
Software Engineer, ChatGPT Infrastructure
OpenAI
San Francisco, California, United States (On-site)$255k – $405k Yearly
2wMA
Distributed Systems Engineer
Magic
San Francisco, California, United States (On-site)$225k – $550k Yearly
5dNV
Principal Datacenter Resiliency Architect, RAS Features and Modeling
NVIDIA
Santa Clara, California, United States (On-site)$272k – $431.3k Yearly
1dNV
Systems Quality and Reliability Lead - LPU
NVIDIA
Santa Clara, California, United States (On-site)$168k – $310.5k Yearly
1wXA
Site Reliability Engineer - xAI Technical Operations
xAI
Palo Alto, California, United States (On-site)$180k – $400k Yearly