We don't just build the hardware and software that powers the AI revolution — we are building the AI that designs the next generation of both. Our team sits at the intersection of inference software and GPU architecture, creating autonomous LLM-driven systems that reason about hardware, write high-performance CUDA, and automate the complex loops of architectural simulation, analysis, and optimization.

We are looking for a senior LLM Agents Architect to work hands-on with hardware architects, verification engineers, GPU performance experts, and software developers to build end-to-end agent flows that drive significant improvements in kernel optimization, architectural exploration, and developer efficiency.

What you'll be doing:

Design and build agentic AI systems that generate, analyze, and optimize GPU compute kernels — targeting speed-of-light performance on NVIDIA hardware.
Collaborate with GPU architects and performance engineers to encode domain expertise — memory hierarchy trade-offs, occupancy tuning, instruction-level reasoning — into agent workflows that rival hand-tuned optimization.
Build automated performance forensics agents capable of ingesting large-scale simulation traces and Nsight profiler data to identify bottlenecks and propose architectural or software mitigations.
Partner with HW architects to develop agentic flows for GPU architectural studies — enabling rapid what-if analysis across micro-architecture configurations such as cache sizing, memory controller design, and compute unit scaling.
Explore agentic approaches to HW/SW co-design challenges, including replacing or augmenting graph-compiler functionality (e.g., TorchInductor) with LLM-driven optimization and code-generation pipelines.
Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

What we need to see:

8+ years in applied ML/AI or large-scale systems, with 2+ years crafting agentic or LLM-powered applications in production environments.
B.Sc in Computer Science / Electrical Engineering.
Solid grounding in computer architecture: memory hierarchies, parallelism models, pipelining, and cache behavior. Specific familiarity with NVIDIA GPU architecture — streaming multiprocessors, warp scheduling, shared/global memory model, and occupancy reasoning — is essential.
Hands-on CUDA programming experience: writing, profiling, and optimizing GPU kernels — not just calling into CUDA-accelerated libraries. Comfortable with tools such as Nsight Compute, Nsight Systems, or equivalent profiling workflows.
Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production — not just experience with off-the-shelf frameworks.
Strong software engineering skills in Python and one systems language (C++ preferred).
Proficient in tool use, RAG pipelines, and model adaptation techniques for building agentic systems.
Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.
Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.
Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.

Ways to stand out from the crowd:

Familiarity with the PyTorch compilation and lowering stack (torch.compile, TorchDynamo, TorchInductor, Triton, down to PTX), and with GPU graph compilers, kernel fusion strategies, or auto-tuning frameworks.
Background in performance engineering for HPC or GPU-accelerated workloads, including experience with performance modeling or hardware simulators.
Familiarity with distributed processing, multi-GPU workloads, and networking (e.g., NVLink, InfiniBand).
Familiarity with frontier agentic coding tools (e.g., Claude Code, Codex, Cursor) — understanding their underlying architecture: tool orchestration, context management, and autonomous task execution patterns.
Hands-on experience building a domain-specific coding agent — whether on top of frontier agentic harnesses (e.g., Claude Code, Codex SDK) or lower-level agent frameworks (e.g., LangChain/LangGraph deep agents, CrewAI). Comfortable with the design choices that make a coding agent useful in practice: task scoping, tool and context curation, evaluation, and failure recovery

Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

Senior LLM Agents Architect

Categories

Skills