Perplexity operates an AI-powered answer engine processing over 150 million questions weekly across web, mobile, and enterprise platforms. Founded in 2022, the system combines real-time web search with multiple LLMs to deliver source-attributed answers. The architecture serves both consumer and enterprise workloads, with enterprise deployments requiring security guarantees for knowledge worker use cases including legal research partnerships with organizations like Latham & Watkins.
The technical stack runs on AWS infrastructure with Terraform for provisioning, Python and Go for backend services, and PyTorch with DeepSpeed and FSDP for model training and inference. Data pipelines use dbt, SQL, Snowflake, and Databricks. Frontend implementations use React and TypeScript, with Docker containerization and Open Policy Agent for access control. This architecture must handle tail latency and throughput requirements for real-time search retrieval paired with LLM inference at consumer scale, while maintaining source credibility verification in the critical path.
The engineering focus centers on information retrieval accuracy, model response quality, and citation reliability rather than advertising optimization. Production systems must balance inference cost against answer quality across multiple models, manage retrieval latency for real-time web indexing, and maintain reliability for both free-tier consumer traffic and enterprise SLA requirements. Pro tier monetization suggests capacity-based or model selection tiering rather than pure ad-based revenue.