D-

d-Matrix

About

d-Matrix builds purpose-built silicon for generative AI inference using digital in-memory compute architecture. Founded in 2019, the company approaches inference workloads from first principles rather than adapting GPU architectures, targeting the core bottleneck of data movement between memory and processors. Their Corsair platform addresses latency, throughput, and energy constraints specific to running LLMs and generative models at production scale.

The technical stack spans silicon design (SystemVerilog, UVM), systems engineering (PCIe, RISC-V, FPGA), and software infrastructure (MLIR, PyTorch, TensorFlow, ONNX Runtime, TensorRT). With over 200 engineers, the company operates at the intersection of hardware architecture, compiler development, and inference runtime optimization. The focus is making generative AI commercially viable beyond hyperscale deployments - reducing both operational cost and energy consumption per token through architectural changes rather than incremental improvements.

d-Matrix's approach centers on co-designing compute, memory hierarchy, and software to eliminate traditional bottlenecks in inference workloads. The team works on problems ranging from physical silicon verification through compiler transformations to inference serving infrastructure. Their claims around ultra-low latency and high throughput depend on in-memory compute reducing off-chip memory access patterns that dominate inference cost profiles in conventional architectures.

Open roles at d-Matrix

Explore 44 open positions at d-Matrix and find your next opportunity.

D-

Senior Runtime Software Engineer

d-Matrix

Sydney, New South Wales, Australia (Hybrid)

3mo ago
D-

ML Compiler Architect, Senior Principal

d-Matrix

Toronto, Ontario, Canada (Hybrid)

3mo ago
D-

Senior Staff DFT MBIST Engineer

d-Matrix

Bengaluru, Karnataka, India (Hybrid)

3mo ago
D-

Technical Program Manager - Software, Principal

d-Matrix

Santa Clara, California, United States (Hybrid)

$196K – $300K Yearly3mo ago

Similar companies

NV

NVIDIA

NVIDIA, founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, is the world leader in accelerated computing. The company pioneered the GPU in 1999 - a specialized parallel processor that handles complex mathematical calculations concurrently, enabling the gaming, high-performance computing, and AI workloads that define modern computing infrastructure. What began as a focused effort to bring interactive 3D graphics to gaming and multimedia markets has evolved into a platform underpinning production inference systems, autonomous vehicle perception pipelines, robotics control loops, and scientific computing clusters where throughput and latency constraints are paramount. The company's core technical domains span GPU architecture, parallel computing primitives, and accelerated computing frameworks across gaming technology, high-performance computing, artificial intelligence, autonomous vehicles, robotics, healthcare technology, and scientific computing. NVIDIA's hardware and software stack addresses the fundamental bottleneck in data-intensive applications: transforming massive datasets into actionable insights and real-time outputs where traditional CPU-bound architectures fail to meet throughput or latency requirements. This positions the company at the architectural level of systems where inference workloads - whether serving LLMs at scale, running real-time computer vision for autonomous navigation, or processing scientific simulations - require specialized compute with predictable performance characteristics. NVIDIA operates globally across industry verticals where accelerated computing creates measurable performance advantages: PC gaming, AI model training and inference, autonomous vehicle development, robotics deployment, healthcare imaging and analysis, and scientific research computing. The company's approach centers on solving computational problems where parallelism, memory bandwidth, and specialized instruction sets provide orders-of-magnitude improvements over general-purpose processors - the precise trade-offs that matter in production inference environments where cost per token, p99 latency, and GPU utilization directly impact system economics and user experience.

1882 jobs
CE

Cerebras

Cerebras Systems designs and manufactures wafer-scale AI chips that consolidate the compute capacity of dozens of GPUs into a single device. Founded in 2015, the company's core architecture is 56 times larger than standard GPUs, addressing the operational complexity of distributed training and inference by offering programmability equivalent to a single-device system while delivering multi-GPU performance. This approach collapses the network bottlenecks and synchronization overhead inherent in GPU clusters, enabling users to run large-scale ML workloads without orchestrating hundreds of accelerators. The company's technical stack spans the full systems hierarchy: custom silicon (wafer-scale chip architecture), compiler infrastructure (MLIR, LLVM IR, and their proprietary CSL language), runtime orchestration (Kubernetes), and deployment tooling. Engineering work touches computer architecture, deep learning kernels, systems software for hardware programmability, and inference serving at scale. Recent partnerships include work with OpenAI on inference deployment, alongside engagements with national laboratories, global enterprises, and healthcare systems requiring high-throughput ML serving. Cerebras positions its hardware for both training and inference workloads, with claimed industry-leading speeds stemming from on-chip interconnect bandwidth and elimination of multi-chip communication latency. The architecture trades traditional data center modularity for integrated performance - relevant for workloads bottlenecked by cross-device synchronization or where cost-per-inference and tail latency matter more than incremental horizontal scaling. Development infrastructure includes C++, Python, Go, and Zig across the stack, with CI/CD through GitHub Actions and Jenkins.

135 jobs
SE

Sesame

Sesame builds voice interfaces through tight integration of hardware, software, and machine learning, pursuing research in speech generation, personality modeling, and multimodal ML. The company operates large GPU clusters to support ambitious research programs aimed at making computers lifelike through natural voice interaction, with development cycles measured in days rather than quarters. Backed by a16z, Sequoia, Spark, and Matrix, the technical effort spans PyTorch-based model development alongside Android and iOS deployment, with infrastructure supporting rapid iteration from whiteboard concepts to production systems. The engineering organization comprises an interdisciplinary team of long-tenured experts across machine learning, hardware, software, and entertainment backgrounds, operating from offices in San Francisco, Bellevue, and New York. Core technical domains include speech generation systems, personality modeling for voice companions, and multimodal ML architectures that coordinate audio and other sensory inputs. The product strategy emphasizes deliberate design choices to create voice interfaces that are nuanced and intimate rather than intrusive, with hardware engineering efforts targeting lightweight eyewear form factors for all-day wear. Infrastructure and operational requirements center on GPU cluster management to support training and inference for speech models, alongside mobile platform engineering for real-time voice processing. The technical challenge involves crossing the uncanny valley in voice interaction - achieving latency, naturalness, and contextual appropriateness simultaneously across diverse usage scenarios. Team composition reflects this: specialists in human-computer interaction work alongside ML researchers and hardware engineers to optimize the full stack from acoustic modeling through industrial design.

26 jobs
XT

Xaira Therapeutics

Xaira Therapeutics is an integrated biotechnology company founded in 2023 that combines AI model development, large-scale biological data generation, and therapeutic product development under one organization. Built on protein design research from the University of Washington's Institute for Protein Design and Dr. David Baker's work, the company raised over $1 billion before emerging from stealth in 2024. Co-founded and incubated by ARCH Venture Partners and Foresite Labs, Xaira operates across three locations: South San Francisco, Seattle, and London. The technical infrastructure spans protein design, predictive patient stratification, and drug discovery systems. The stack includes Python, C++, PyTorch, and Jax for modeling, with distributed training approaches using DDP and FSDP. Experimental infrastructure includes Cytiva AKTA and Agilent HPLC systems for protein characterization and purification workflows. The company's approach integrates computational predictions with wet-lab data generation at scale, attempting to shift drug discovery from empirical methods toward engineered precision. Technical domains span AI model development for biological systems, protein engineering, therapeutic design, and patient selection algorithms. The organization is led by David Baker and Marc Tessier-Lavigne, combining academic protein design expertise with pharmaceutical development experience. The integration of model training, experimental validation loops, and therapeutic development pipeline represents a vertically integrated structure where data generation, model iteration, and product development occur within the same organization.

24 jobs
ZU

Zuma

Zuma builds agentic AI systems for multifamily property management, operating at scale across thousands of apartment communities serving millions of residents. The system handles lead engagement, tour scheduling, and rent collections - repetitive operations work that creates bottlenecks for onsite teams - while maintaining human oversight for relationship-critical interactions. The architecture is designed for human-AI collaboration rather than full automation: AI agents process high-volume, structured tasks while property managers handle hospitality and community engagement where judgment and relational context matter. The technical approach emphasizes rapid iteration driven by field feedback from property managers. Engineers and designers work directly with operations teams to identify latency and reliability requirements in production environments - tour scheduling conflicts, communication failure modes during collections, lead response time sensitivity. This operational integration surfaces real constraints: property management workflows involve variable tenant needs, time-sensitive coordination, and edge cases where escalation to human judgment is the correct trade-off. The system is designed to amplify existing teams by removing operational overhead rather than replacing domain expertise. Venture-backed by Andreessen Horowitz and Y Combinator, headquartered in Santa Monica. The company ships product rapidly, prioritizing deployment feedback over extended development cycles. Technical domains span agentic AI implementation, human-AI collaboration interfaces, and operations integration - work that requires understanding both inference system design and the operational complexity of residential property management at scale.

9 jobs
CL

Clarifai

Clarifai operates a full-stack AI platform spanning data preparation, model training, deployment, and monitoring across computer vision, NLP, and audio domains. The platform serves over 400,000 users across 170+ countries, delivering billions of predictions with access to more than 1 million models. Founded in 2013 by Matthew Zeiler after winning top five placements at ImageNet 2013, the company has raised $100 million in funding from Menlo Ventures, Union Square Ventures, NVIDIA, Google Ventures, and Qualcomm. Customers include Amazon, Siemens, NVIDIA, Canva, Vimeo, and OpenTable. The inference architecture supports orchestrated compute across AWS, GCP, and Azure, with edge deployment through Local Runners for on-premises and edge scenarios. The platform integrates PyTorch, TensorFlow, JAX, Nvidia Triton, and ONNX, with reported performance of 544 tokens per second on GPT-OSS-120B. Technical focus areas include image classification, video analysis, multimodal processing, and MLOps workflows. The stack runs on Python and Golang, with Kubeflow for pipeline orchestration. The company positions itself as enterprise- and developer-focused, addressing the full AI lifecycle from unstructured data ingestion through production monitoring. Forrester recognized Clarifai as a leader in its Computer Vision report. The platform's scope spans model training, inference orchestration, and operational deployment across cloud and edge environments, serving use cases in e-commerce, manufacturing, semiconductors, creative software, media, and hospitality verticals.

2 jobs