Etched, founded in 2022, designs transformer-specific ASICs with a hard architectural bet: transformers are the dominant and durable abstraction for AI workloads, so the right move is to burn that assumption into silicon rather than preserve generality. Their first chip, Sohu, is a single-model ASIC built exclusively for transformer inference. The throughput numbers are significant - Etched claims over 500,000 tokens per second on Llama 70B and an order-of-magnitude improvement in both throughput and latency relative to NVIDIA's B200. The trade-off is explicit: Sohu cannot run non-transformer workloads, and the entire value proposition collapses if the architectural assumption does.
The performance claims, if they hold under production conditions, have direct implications for workloads where GPUs currently hit hard limits. Etched points to two in particular: real-time video generation models, where per-frame latency budgets are tight and sustained throughput requirements are high, and deep chain-of-thought reasoning agents, where long output sequences and large batch depths stress both memory bandwidth and end-to-end latency. Whether the claimed gains survive real deployment - across varied sequence lengths, batch sizes, quantization schemes, and serving topologies - is the evaluation question that matters most for operators considering adoption.
On the infrastructure side, Etched is partnering with Rambus on memory and interface technologies, which speaks to where the bandwidth and signaling bottlenecks sit in a transformer-optimized design. The company has raised $120 million and carries a stated valuation of $5 billion as of available reporting. Founders Gavin Uberti, Chris Zhu, and Robert Wachen lead the company out of the US.