Gladia operates speech-to-text APIs across two distinct workloads: real-time streaming at sub-300ms latency and asynchronous batch transcription, both supporting over 100 languages. The real-time path handles streaming audio with integrated speaker diarization, word-level timestamps, and sentiment analysis in the inference loop. The async path processes batch jobs with code-switching detection - single utterances spanning multiple languages - and comparable feature coverage. Over 150,000 users and 700 enterprise deployments (including VEED.IO, Circleback, Attention) generate production traffic against these endpoints.
The core technical challenge is maintaining sub-300ms end-to-end latency on the streaming path while running diarization and alignment models alongside the primary ASR stack. Meeting this threshold at scale - across 100+ language models with varying acoustic characteristics - requires careful management of model load times, batching strategies, and inference queue depth. The async API trades latency tolerance for throughput optimization on longer-form audio, though specific cost-per-hour or throughput metrics are not disclosed. Code-switching introduces additional complexity: language detection, model routing, and boundary stitching must occur without degrading transcription accuracy or introducing alignment artifacts at switch points.
Founded in 2022, the company raised $16 million Series A from Sequoia Capital, XAnge, and New Wave. Founders Jean-Louis Quéguiner and Jonathan Soto positioned the service as audio infrastructure for voice-first platforms rather than a narrow transcription tool. The engineering focus centers on reliability and operational predictability across multilingual inference workloads - handling acoustic variability, speaker overlap, background noise, and model version rollouts without service degradation. Production deployment at this user scale surfaces edge cases in language detection, diarization boundary errors, and latency tail behavior that define the system's actual robustness beyond benchmarked WER numbers.