About

NVIDIA, founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, is the world leader in accelerated computing. The company pioneered the GPU in 1999 - a specialized parallel processor that handles complex mathematical calculations concurrently, enabling the gaming, high-performance computing, and AI workloads that define modern computing infrastructure. What began as a focused effort to bring interactive 3D graphics to gaming and multimedia markets has evolved into a platform underpinning production inference systems, autonomous vehicle perception pipelines, robotics control loops, and scientific computing clusters where throughput and latency constraints are paramount.

The company's core technical domains span GPU architecture, parallel computing primitives, and accelerated computing frameworks across gaming technology, high-performance computing, artificial intelligence, autonomous vehicles, robotics, healthcare technology, and scientific computing. NVIDIA's hardware and software stack addresses the fundamental bottleneck in data-intensive applications: transforming massive datasets into actionable insights and real-time outputs where traditional CPU-bound architectures fail to meet throughput or latency requirements. This positions the company at the architectural level of systems where inference workloads - whether serving LLMs at scale, running real-time computer vision for autonomous navigation, or processing scientific simulations - require specialized compute with predictable performance characteristics.

NVIDIA operates globally across industry verticals where accelerated computing creates measurable performance advantages: PC gaming, AI model training and inference, autonomous vehicle development, robotics deployment, healthcare imaging and analysis, and scientific research computing. The company's approach centers on solving computational problems where parallelism, memory bandwidth, and specialized instruction sets provide orders-of-magnitude improvements over general-purpose processors - the precise trade-offs that matter in production inference environments where cost per token, p99 latency, and GPU utilization directly impact system economics and user experience.

Open roles at NVIDIA

Explore 1,173 open positions at NVIDIA and find your next opportunity.

NV

Senior Performance Architect - Heterogeneous Workload Optimization

NVIDIA

Santa Clara, California, United States (Hybrid)

$184K – $356.5K Yearly2mo ago
NV

Mechanical and Thermal Program Manager

NVIDIA

Santa Clara, California, United States (On-site)

$168K – $258.8K Yearly2mo ago
NV

Senior Software Engineer – TensorRT Edge-LLM

NVIDIA

Santa Clara, California, United States (Hybrid)

$152K – $287.5K Yearly2mo ago
NV

Senior Software Engineer, Metropolis Vision AI

NVIDIA

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-site)

2mo ago
NV

Senior Developer Relations Manager - COSMOS and Foundation Models

NVIDIA

Santa Clara, California, United States (On-site)

$184K – $356.5K Yearly2mo ago
NV

Senior Signal Integrity Solutions Architect

NVIDIA

Austin, Texas, United States (On-site)

$184K – $287.5K Yearly2mo ago
NV

Senior Software Engineer - GPU and SOC

NVIDIA

Santa Clara, California, United States (On-site)

$152K – $287.5K Yearly2mo ago
NV

Senior Software Engineer - Robotics

NVIDIA

Santa Clara, California, United States (On-site)

$184K – $356.5K Yearly2mo ago
NV

Senior ASIC Design and STA Engineer

NVIDIA

Bengaluru, Karnataka, India (Hybrid)

2mo ago
NV

Senior Technical Program Manager - Automotive Vehicles

NVIDIA

Guangzhou Shi, Guangdong, China (On-site)

2mo ago
NV

ASIC Clocks Verification Engineer - New College Grad 2026

NVIDIA

Santa Clara, California, United States (On-site)

$116K – $218.5K Yearly2mo ago
NV

Senior Solutions Architect - Data Center Infrastructure

NVIDIA

Santa Clara, California, United States (On-site)

$184K – $356.5K Yearly2mo ago
NV

Senior Research Scientist for Generative AI

NVIDIA

Santa Clara, California, United States (On-site)

$192K – $356.5K Yearly2mo ago
NV

Senior QA Software Engineer - Networking

NVIDIA

Santa Clara, California, United States (On-site)

$140K – $270.3K Yearly2mo ago
NV

Senior Datacenter Power Systems Modeling Engineer

NVIDIA

Santa Clara, California, United States (On-site)

$168K – $322K Yearly2mo ago
NV

Senior System Software Engineer - SoC Power

NVIDIA

Santa Clara, California, United States (On-site)

$152K – $287.5K Yearly2mo ago
NV

Senior Account Manager – RTX Raytheon

NVIDIA

United States or Remote (United States)

$224K – $356.5K Yearly2mo ago
NV

Compiler Verification Engineer, Compute Performance – GPU

NVIDIA

Austin, Texas, United States (On-site)

$140K – $224.3K Yearly2mo ago
NV

Software Engineer, Metropolis Vision AI

NVIDIA

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-site)

2mo ago
NV

Senior Custom SOC IP Verification Engineer

NVIDIA

Shanghai, Shanghai, China (On-site)

2mo ago

Similar companies

AN

Anthropic

Anthropic is an AI safety and research company founded in 2021 by seven former OpenAI employees, now operating as a Public Benefit Corporation with approximately 3,000 employees. The company develops the Claude family of large language models and associated AI assistant implementations, with a technical mandate centered on reliability, interpretability, and steerability. Under CEO Dario Amodei, Anthropic has reached a reported valuation of $183 billion while maintaining an explicit focus on AI systems aligned with human values and long-term societal benefit. The core technical work spans AI safety research, interpretable AI systems, and steerable large language models. Claude, Anthropic's primary product line, is positioned as engineered for safety, accuracy, and security in production deployments. The company's research agenda prioritizes understanding failure modes and developing evaluation frameworks that account for reliability constraints in real-world inference scenarios, rather than pursuing capability benchmarks in isolation. Anthropic's operational model combines frontier research with practical deployment considerations - balancing the latency-throughput-cost trade-offs inherent in large-scale language model serving while maintaining interpretability as a first-class constraint. The company approaches AI assistant development through the lens of alignment research, treating production systems as both products and testbeds for safety techniques. This dual mandate shapes technical priorities: understanding model behavior under distribution shift, quantifying uncertainty in high-stakes applications, and building systems where performance degradation is predictable and bounded.

683 jobs
CO

CoreWeave

CoreWeave operates specialized cloud infrastructure purpose-built for AI workloads, with data centers across the US and Europe delivering GPU compute for large language model training and inference at scale. Founded in 2017 as Atlantic Crypto, a cryptocurrency mining operation, the company executed a complete strategic pivot to AI infrastructure - rebuilding from first principles rather than retrofitting existing cloud architectures. The platform runs on Kubernetes-based orchestration designed specifically for AI workloads, coupled with custom storage solutions engineered to handle the I/O patterns and throughput requirements of model training and deployment pipelines. The technical stack centers on NVIDIA GPUs with orchestration built in Go, Python, and C++ on Linux, instrumented with Prometheus, Grafana, and OpenTelemetry for observability across distributed systems. Rather than adapting general-purpose cloud tooling, CoreWeave's infrastructure treats GPU compute density, inter-node bandwidth, and storage parallelism as primary design constraints. This systems-level focus reflects a team drawn from infrastructure engineering and quantitative trading backgrounds - disciplines where latency budgets and resource utilization directly determine feasibility. CoreWeave serves AI labs, enterprises, and startups requiring production-scale inference and training capacity. The company's recognition on the TIME100 most influential companies list signals market adoption of specialized AI infrastructure as distinct from traditional cloud providers. For engineers, the environment offers direct exposure to the operational realities of running GPU clusters at scale: thermal management, network topology for distributed training, failure modes in multi-tenant GPU environments, and the cost-performance trade-offs inherent in serving latency-sensitive inference workloads alongside batch training jobs.

436 jobs
CE

Cerebras

Cerebras Systems designs and manufactures wafer-scale AI chips that consolidate the compute capacity of dozens of GPUs into a single device. Founded in 2015, the company's core architecture is 56 times larger than standard GPUs, addressing the operational complexity of distributed training and inference by offering programmability equivalent to a single-device system while delivering multi-GPU performance. This approach collapses the network bottlenecks and synchronization overhead inherent in GPU clusters, enabling users to run large-scale ML workloads without orchestrating hundreds of accelerators. The company's technical stack spans the full systems hierarchy: custom silicon (wafer-scale chip architecture), compiler infrastructure (MLIR, LLVM IR, and their proprietary CSL language), runtime orchestration (Kubernetes), and deployment tooling. Engineering work touches computer architecture, deep learning kernels, systems software for hardware programmability, and inference serving at scale. Recent partnerships include work with OpenAI on inference deployment, alongside engagements with national laboratories, global enterprises, and healthcare systems requiring high-throughput ML serving. Cerebras positions its hardware for both training and inference workloads, with claimed industry-leading speeds stemming from on-chip interconnect bandwidth and elimination of multi-chip communication latency. The architecture trades traditional data center modularity for integrated performance - relevant for workloads bottlenecked by cross-device synchronization or where cost-per-inference and tail latency matter more than incremental horizontal scaling. Development infrastructure includes C++, Python, Go, and Zig across the stack, with CI/CD through GitHub Actions and Jenkins.

135 jobs
D-

d-Matrix

d-Matrix builds purpose-built silicon for generative AI inference using digital in-memory compute architecture. Founded in 2019, the company approaches inference workloads from first principles rather than adapting GPU architectures, targeting the core bottleneck of data movement between memory and processors. Their Corsair platform addresses latency, throughput, and energy constraints specific to running LLMs and generative models at production scale. The technical stack spans silicon design (SystemVerilog, UVM), systems engineering (PCIe, RISC-V, FPGA), and software infrastructure (MLIR, PyTorch, TensorFlow, ONNX Runtime, TensorRT). With over 200 engineers, the company operates at the intersection of hardware architecture, compiler development, and inference runtime optimization. The focus is making generative AI commercially viable beyond hyperscale deployments - reducing both operational cost and energy consumption per token through architectural changes rather than incremental improvements. d-Matrix's approach centers on co-designing compute, memory hierarchy, and software to eliminate traditional bottlenecks in inference workloads. The team works on problems ranging from physical silicon verification through compiler transformations to inference serving infrastructure. Their claims around ultra-low latency and high throughput depend on in-memory compute reducing off-chip memory access patterns that dominate inference cost profiles in conventional architectures.

54 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

25 jobs