d-Matrix builds purpose-built silicon for generative AI inference using digital in-memory compute architecture. Founded in 2019, the company approaches inference workloads from first principles rather than adapting GPU architectures, targeting the core bottleneck of data movement between memory and processors. Their Corsair platform addresses latency, throughput, and energy constraints specific to running LLMs and generative models at production scale.
The technical stack spans silicon design (SystemVerilog, UVM), systems engineering (PCIe, RISC-V, FPGA), and software infrastructure (MLIR, PyTorch, TensorFlow, ONNX Runtime, TensorRT). With over 200 engineers, the company operates at the intersection of hardware architecture, compiler development, and inference runtime optimization. The focus is making generative AI commercially viable beyond hyperscale deployments - reducing both operational cost and energy consumption per token through architectural changes rather than incremental improvements.
d-Matrix's approach centers on co-designing compute, memory hierarchy, and software to eliminate traditional bottlenecks in inference workloads. The team works on problems ranging from physical silicon verification through compiler transformations to inference serving infrastructure. Their claims around ultra-low latency and high throughput depend on in-memory compute reducing off-chip memory access patterns that dominate inference cost profiles in conventional architectures.