About

NVIDIA, founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, is the world leader in accelerated computing. The company pioneered the GPU in 1999 - a specialized parallel processor that handles complex mathematical calculations concurrently, enabling the gaming, high-performance computing, and AI workloads that define modern computing infrastructure. What began as a focused effort to bring interactive 3D graphics to gaming and multimedia markets has evolved into a platform underpinning production inference systems, autonomous vehicle perception pipelines, robotics control loops, and scientific computing clusters where throughput and latency constraints are paramount.

The company's core technical domains span GPU architecture, parallel computing primitives, and accelerated computing frameworks across gaming technology, high-performance computing, artificial intelligence, autonomous vehicles, robotics, healthcare technology, and scientific computing. NVIDIA's hardware and software stack addresses the fundamental bottleneck in data-intensive applications: transforming massive datasets into actionable insights and real-time outputs where traditional CPU-bound architectures fail to meet throughput or latency requirements. This positions the company at the architectural level of systems where inference workloads - whether serving LLMs at scale, running real-time computer vision for autonomous navigation, or processing scientific simulations - require specialized compute with predictable performance characteristics.

NVIDIA operates globally across industry verticals where accelerated computing creates measurable performance advantages: PC gaming, AI model training and inference, autonomous vehicle development, robotics deployment, healthcare imaging and analysis, and scientific research computing. The company's approach centers on solving computational problems where parallelism, memory bandwidth, and specialized instruction sets provide orders-of-magnitude improvements over general-purpose processors - the precise trade-offs that matter in production inference environments where cost per token, p99 latency, and GPU utilization directly impact system economics and user experience.

Open roles at NVIDIA

Explore 1,238 open positions at NVIDIA and find your next opportunity.

NV

Supercomputing and Higher Education Account Manager

NVIDIA

France or Remote (France)

2mo ago
NV

Senior Software Engineer - Storage

NVIDIA

Santa Clara, California, United States (On-site)

$152K – $287.5K Yearly2mo ago
NV

Manager, SOC Modelling

NVIDIA

Santa Clara, California, United States (On-site)

$224K – $431.3K Yearly2mo ago
NV

Senior Manager, Program and Operations - ADAS

NVIDIA

Tokyo Prefecture, Japan (On-site)

2mo ago
NV

Senior GPU Memory Subsystem Architect

NVIDIA

Bengaluru, Karnataka, India (Hybrid)

2mo ago
NV

Senior Technical Program Manager - VLSI

NVIDIA

Santa Clara, California, United States (On-site)

$168K – $322K Yearly2mo ago
NV

Principal Power and Performance Architect

NVIDIA

Santa Clara, California, United States (Hybrid)

$232K – $368K Yearly2mo ago
NV

Manager, Operations ADAS

NVIDIA

Seoul, Seoul, South Korea (On-site)

2mo ago
NV

Senior Software R&D Engineer, VLSI Physical Design

NVIDIA

Santa Clara, California, United States (Hybrid)

$168K – $264.5K Yearly2mo ago
NV

Senior System Co-Design Engineer - Speed and Reliability

NVIDIA

Santa Clara, California, United States (Hybrid)

$136K – $264.5K Yearly2mo ago
NV

Senior QA Automation Engineer — Network AI Platform

NVIDIA

Santa Clara, California, United States (On-site)

$168K – $322K Yearly2mo ago
NV

PhD Research Intern, AI for Climate and Weather Simulation 2026

NVIDIA

United Kingdom or Remote (United Kingdom)

2mo ago
NV

Manager, Internal Audit

NVIDIA

Hsinchu, Taiwan (Hybrid)

2mo ago
NV

Senior Firmware Design Engineer, Optics

NVIDIA

Yokne'am, Northern District, Israel (Hybrid)

2mo ago
NV

Senior SAP Reverse Logistics Solution Architect

NVIDIA

Santa Clara, California, United States (On-site)

$168K – $270.3K Yearly2mo ago
NV

Hardware Systems Application Engineer - CSP

NVIDIA

Santa Clara, California, United States (On-site)

$136K – $264.5K Yearly2mo ago
NV

CAD Engineer

NVIDIA

Santa Clara, California, United States (Hybrid)

$116K – $218.5K Yearly2mo ago
NV

Senior Hardware Time Synchronization Architect

NVIDIA

Yokne'am, Northern District, Israel (On-site)

2mo ago
NV

Senior Software Architect, DriveOS

NVIDIA

Santa Clara, California, United States (On-site)

$224K – $431.3K Yearly2mo ago
NV

Senior AI Developer Technology Engineer, Financial Sector

NVIDIA

Santa Clara, California, United States (Hybrid)

$152K – $241.5K Yearly2mo ago

Similar companies

AN

Anthropic

Anthropic is an AI safety and research company founded in 2021 by seven former OpenAI employees, now operating as a Public Benefit Corporation with approximately 3,000 employees. The company develops the Claude family of large language models and associated AI assistant implementations, with a technical mandate centered on reliability, interpretability, and steerability. Under CEO Dario Amodei, Anthropic has reached a reported valuation of $183 billion while maintaining an explicit focus on AI systems aligned with human values and long-term societal benefit. The core technical work spans AI safety research, interpretable AI systems, and steerable large language models. Claude, Anthropic's primary product line, is positioned as engineered for safety, accuracy, and security in production deployments. The company's research agenda prioritizes understanding failure modes and developing evaluation frameworks that account for reliability constraints in real-world inference scenarios, rather than pursuing capability benchmarks in isolation. Anthropic's operational model combines frontier research with practical deployment considerations - balancing the latency-throughput-cost trade-offs inherent in large-scale language model serving while maintaining interpretability as a first-class constraint. The company approaches AI assistant development through the lens of alignment research, treating production systems as both products and testbeds for safety techniques. This dual mandate shapes technical priorities: understanding model behavior under distribution shift, quantifying uncertainty in high-stakes applications, and building systems where performance degradation is predictable and bounded.

698 jobs
CO

CoreWeave

CoreWeave operates specialized cloud infrastructure purpose-built for AI workloads, with data centers across the US and Europe delivering GPU compute for large language model training and inference at scale. Founded in 2017 as Atlantic Crypto, a cryptocurrency mining operation, the company executed a complete strategic pivot to AI infrastructure - rebuilding from first principles rather than retrofitting existing cloud architectures. The platform runs on Kubernetes-based orchestration designed specifically for AI workloads, coupled with custom storage solutions engineered to handle the I/O patterns and throughput requirements of model training and deployment pipelines. The technical stack centers on NVIDIA GPUs with orchestration built in Go, Python, and C++ on Linux, instrumented with Prometheus, Grafana, and OpenTelemetry for observability across distributed systems. Rather than adapting general-purpose cloud tooling, CoreWeave's infrastructure treats GPU compute density, inter-node bandwidth, and storage parallelism as primary design constraints. This systems-level focus reflects a team drawn from infrastructure engineering and quantitative trading backgrounds - disciplines where latency budgets and resource utilization directly determine feasibility. CoreWeave serves AI labs, enterprises, and startups requiring production-scale inference and training capacity. The company's recognition on the TIME100 most influential companies list signals market adoption of specialized AI infrastructure as distinct from traditional cloud providers. For engineers, the environment offers direct exposure to the operational realities of running GPU clusters at scale: thermal management, network topology for distributed training, failure modes in multi-tenant GPU environments, and the cost-performance trade-offs inherent in serving latency-sensitive inference workloads alongside batch training jobs.

437 jobs
CE

Cerebras

Cerebras Systems designs and manufactures wafer-scale AI chips that consolidate the compute capacity of dozens of GPUs into a single device. Founded in 2015, the company's core architecture is 56 times larger than standard GPUs, addressing the operational complexity of distributed training and inference by offering programmability equivalent to a single-device system while delivering multi-GPU performance. This approach collapses the network bottlenecks and synchronization overhead inherent in GPU clusters, enabling users to run large-scale ML workloads without orchestrating hundreds of accelerators. The company's technical stack spans the full systems hierarchy: custom silicon (wafer-scale chip architecture), compiler infrastructure (MLIR, LLVM IR, and their proprietary CSL language), runtime orchestration (Kubernetes), and deployment tooling. Engineering work touches computer architecture, deep learning kernels, systems software for hardware programmability, and inference serving at scale. Recent partnerships include work with OpenAI on inference deployment, alongside engagements with national laboratories, global enterprises, and healthcare systems requiring high-throughput ML serving. Cerebras positions its hardware for both training and inference workloads, with claimed industry-leading speeds stemming from on-chip interconnect bandwidth and elimination of multi-chip communication latency. The architecture trades traditional data center modularity for integrated performance - relevant for workloads bottlenecked by cross-device synchronization or where cost-per-inference and tail latency matter more than incremental horizontal scaling. Development infrastructure includes C++, Python, Go, and Zig across the stack, with CI/CD through GitHub Actions and Jenkins.

136 jobs
D-

d-Matrix

d-Matrix builds purpose-built silicon for generative AI inference using digital in-memory compute architecture. Founded in 2019, the company approaches inference workloads from first principles rather than adapting GPU architectures, targeting the core bottleneck of data movement between memory and processors. Their Corsair platform addresses latency, throughput, and energy constraints specific to running LLMs and generative models at production scale. The technical stack spans silicon design (SystemVerilog, UVM), systems engineering (PCIe, RISC-V, FPGA), and software infrastructure (MLIR, PyTorch, TensorFlow, ONNX Runtime, TensorRT). With over 200 engineers, the company operates at the intersection of hardware architecture, compiler development, and inference runtime optimization. The focus is making generative AI commercially viable beyond hyperscale deployments - reducing both operational cost and energy consumption per token through architectural changes rather than incremental improvements. d-Matrix's approach centers on co-designing compute, memory hierarchy, and software to eliminate traditional bottlenecks in inference workloads. The team works on problems ranging from physical silicon verification through compiler transformations to inference serving infrastructure. Their claims around ultra-low latency and high throughput depend on in-memory compute reducing off-chip memory access patterns that dominate inference cost profiles in conventional architectures.

54 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

25 jobs