turbopuffer

turbopuffer is a serverless vector and full-text search database built on object storage, separating compute and storage to address the latency-cost-throughput trade-off in production retrieval systems. The architecture uses tiered storage - NVMe/SSD caching layered over object storage - to optimize for variable query patterns and burst load while avoiding the fixed cost overhead of traditional in-memory vector databases.

The system handles 3.5T+ documents, 10M+ writes/s, and 25k+ queries/s, with support for hybrid search (vector + full-text) and metadata filtering. Serverless scaling means you pay for what you use; the separation of compute and storage eliminates the need to over-provision either dimension. This matters for workloads with bursty traffic or datasets that grow unpredictably - common in AI retrieval pipelines feeding assistants and agents.

The design makes explicit trade-offs around tail latency and operational complexity. Tiered storage introduces variable access costs and potential cache-miss penalties, requiring careful tuning for your query profile. Full-text search integration alongside vectors reduces the need for multiple systems, but hybrid scoring and ranking add computational overhead that affects per-query latency. Metadata filtering allows selective search without scanning the full corpus, critical for reducing throughput costs in gated retrieval scenarios.

About

Markets

Open roles at turbopuffer

database engineer

product engineer

Similar companies