TO

About

Toma operates a voice AI platform for automotive dealerships, processing over 1,000,000 calls since launching in 2024. The system handles inbound phone operations - service scheduling, call routing, and follow-up automation - with safeguards designed to manage transfer latency and revenue leakage. Core technical challenge: maintaining conversational quality and intent detection accuracy across high-variance dealership scenarios (service appointments, parts inquiries, sales handoffs) while minimizing false transfers and dropped context. The platform implements transfer triggers, clawback mechanisms for mistimed handoffs, and follow-up alerts when human staff doesn't complete actions, addressing the operational complexity of human-AI transition points in production telephony.

Infrastructure runs on AWS with a TypeScript/Next.js frontend, PostgreSQL via Prisma for state management, and tRPC for type-safe API boundaries. The voice AI layer must handle real-time constraints - low-latency speech recognition and synthesis, sub-second intent classification - while managing concurrent call volume and dealership-specific context (inventory, scheduling systems, staff availability). Trade-offs center on model selection for conversational understanding versus inference cost at scale, and the reliability surface area of integrating with legacy dealership management systems. Founded by engineers from Scale AI, Uber, Lyft, and Amazon; backed by Andreessen Horowitz and Y Combinator with $17 million Series A funding.

Deployment spans dealerships across the United States, including Pohanka Automotive Group, SCHOMP, Hudson Automotive Group, and Bergey's. Primary bottlenecks likely involve tuning voice models for domain-specific terminology (vehicle makes, service codes, dealership jargon), managing tail latency in transfer decisions where milliseconds impact customer experience, and evaluating conversational success beyond simple call completion - did the AI correctly capture appointment details, route urgency appropriately, preserve customer satisfaction? The system's value proposition hinges on converting missed calls and staff bottlenecks into captured revenue, which requires high precision on intent classification and low false-negative rates on transfer triggers to avoid revenue loss from mishandled interactions.

Open roles at Toma

Explore 1 open positions at Toma and find your next opportunity.

TO

Ex-Founder (Non-Technical)

Toma

San Francisco, California, United States (On-site)

$100K – $200K Yearly3mo ago

Similar companies

EL

EliseAI

EliseAI builds a unified conversational AI platform for property management and healthcare operations, automating workflows that span leasing tours, maintenance requests, patient scheduling, and intake forms. Founded in 2017, the company serves over 600 property owners and healthcare operators managing 5 million+ units, having raised $360 million in funding. The engineering organization ships 175+ new features per year, reflecting a rapid iteration cycle informed by frontline user feedback. The platform consolidates functionality that would otherwise require multiple point solutions, addressing operational bottlenecks in high-volume, repetitive administrative tasks. In property management, this includes conversational AI for leasing tour coordination and maintenance request handling. In healthcare, the system automates patient scheduling and intake form collection. The technical approach centers on a single platform architecture rather than a collection of disconnected tools, with production deployment at scale across both industry verticals. The company's engineering culture emphasizes shipping velocity and product development driven by operational constraints observed in production environments. The 175+ annual feature releases suggest continuous deployment practices and tight feedback loops between product iteration and user-facing workflows. Development priorities appear structured around reducing latency in administrative operations and improving throughput for organizations managing thousands of concurrent interactions across property portfolios or patient populations.

113 jobs
CA

Cartesia

Cartesia builds real-time multimodal AI models for voice applications, with production systems spanning text-to-speech and speech-to-text. The company emerged from Stanford's AI Lab, where the founding team - led by CEO Karan Goel - pioneered work on State Space Models (SSMs) before transitioning to commercial infrastructure. Their technical approach combines model innovation with systems engineering, focusing on the latency, throughput, and operational constraints that define production voice AI. The core product line includes Sonic, a text-to-speech model designed for emotive, human-like output, and Ink, a recently launched speech-to-text system purpose-built for real-time voice applications. Both systems address the fundamental trade-offs in voice AI: achieving low-latency inference while maintaining quality at scale. The company's technical domains span foundation model development, real-time multimodal intelligence, and developer tooling - infrastructure that runs where users are rather than requiring server-side processing. Cartesia's engineering stack runs on Python, Go, and TypeScript, supporting developers building voice interfaces that demand sub-second response times and reliable performance under production load. The team's research background in SSMs informs their approach to model efficiency and scalability, though the company now focuses on shipping production systems rather than pure research. Their stated mission centers on ubiquitous, interactive intelligence - systems that handle the operational complexity of real-time voice while remaining accessible to developers building conversational interfaces.

30 jobs
MA

Mirelo AI

Mirelo AI builds foundation models for generating synchronized audio for video content, targeting the latency and quality bottleneck in audio-for-video workflows. Founded in 2023 in Berlin, the company raised $41 million in seed funding co-led by Index Ventures and Andreessen Horowitz. Their models generate synchronized sound effects in seconds rather than the hours typically required for manual sound design, addressing production throughput constraints across gaming, film, social media, and broader visual content verticals. The technical stack centers on PyTorch with transformer architectures, optimized for H100 and H200 GPUs using Nsight profiling and SLURM for cluster orchestration. The team sources from Google Brain, Amazon, Meta FAIR, Disney, ETH Zürich, and Max Planck Institutes, combining AI research depth with domain expertise from musicians and product specialists. Co-founder and CEO CJ Simon-Gabriel previously worked at AWS Labs, where the founding team originated. The core technical challenge is tight audio-visual synchronization at generation time - a constraint that spans model architecture design, latency optimization, and evaluation methodology. Production systems must handle variable-length video inputs while maintaining temporal coherence across generated audio, requiring careful trade-offs between generation speed, output quality, and computational cost. The company positions its models as infrastructure for visual content pipelines, treating audio generation as a systems problem rather than a standalone creative tool.

8 jobs
RE

Reka

Reka builds unified multimodal foundation models that process text, images, video, and audio. The company's core technical focus is modeling the physical world through systems that handle perception, reasoning, and action across modalities. The team includes researchers and engineers from Google DeepMind and Facebook AI Research working on inference-critical domains including GPU performance engineering, computer vision, audio processing, and natural language understanding. The technical stack centers on Python, PyTorch, and JAX for model development, with CUDA and C++ for performance-critical components. Infrastructure runs on Kubernetes and Slurm for orchestration and job scheduling. Engineering roles emphasize end-to-end ownership - individuals work across the stack from model architecture through deployment, addressing bottlenecks in latency, throughput, and operational complexity at production scale. Reka operates remote-first, aggregating global talent into a distributed systems organization. The work targets enterprise and organizational deployments where multimodal capabilities must meet reliability and cost constraints. Team structure reflects early-stage dynamics: engineers wear multiple hats, and technical decisions directly shape product capabilities and production characteristics.

3 jobs