SE

About

Sesame builds voice interfaces through tight integration of hardware, software, and machine learning, pursuing research in speech generation, personality modeling, and multimodal ML. The company operates large GPU clusters to support ambitious research programs aimed at making computers lifelike through natural voice interaction, with development cycles measured in days rather than quarters. Backed by a16z, Sequoia, Spark, and Matrix, the technical effort spans PyTorch-based model development alongside Android and iOS deployment, with infrastructure supporting rapid iteration from whiteboard concepts to production systems.

The engineering organization comprises an interdisciplinary team of long-tenured experts across machine learning, hardware, software, and entertainment backgrounds, operating from offices in San Francisco, Bellevue, and New York. Core technical domains include speech generation systems, personality modeling for voice companions, and multimodal ML architectures that coordinate audio and other sensory inputs. The product strategy emphasizes deliberate design choices to create voice interfaces that are nuanced and intimate rather than intrusive, with hardware engineering efforts targeting lightweight eyewear form factors for all-day wear.

Infrastructure and operational requirements center on GPU cluster management to support training and inference for speech models, alongside mobile platform engineering for real-time voice processing. The technical challenge involves crossing the uncanny valley in voice interaction - achieving latency, naturalness, and contextual appropriateness simultaneously across diverse usage scenarios. Team composition reflects this: specialists in human-computer interaction work alongside ML researchers and hardware engineers to optimize the full stack from acoustic modeling through industrial design.

Open roles at Sesame

Explore 23 open positions at Sesame and find your next opportunity.

SE

Test Engineer, Manufacturing

Sesame

San Francisco, California, United States (On-site)

$175K – $280K Yearly3mo ago
SE

Embedded OS Architect

Sesame

San Francisco, California, United States (On-site)

$190K – $320K Yearly3mo ago
SE

ML Model Serving Engineer

Sesame

San Francisco, California, United States (On-site)

$175K – $280K Yearly3mo ago

Similar companies

OP

OpenAI

OpenAI develops and deploys generative transformer models at scale, operating production systems that serve millions through ChatGPT, GPT model APIs, and the OpenAI API. The technical challenge spans the full stack: research engineering for novel model architectures, safety engineering for alignment and robustness, and production infrastructure for API deployment at scale. Teams work across research, product engineering, and operations, with work organized around both advancing model capabilities and maintaining reliability for deployed systems serving substantial user traffic. The core technical domains include model development for the GPT series, API infrastructure to support downstream applications, and safety research focused on making AGI beneficial. Engineering work involves trade-offs between model capability, inference cost, latency characteristics, and safety constraints. Research teams collaborate with product and engineering functions to move from experimental systems to production deployment, requiring expertise in distributed systems, model optimization, and operational complexity at scale. The company operates from San Francisco with international presence, positioning work as a global effort toward artificial general intelligence. Cross-functional teams include researchers, engineers, and operations staff working on problems ranging from foundational research to production reliability. The technical culture emphasizes rigorous safety practices alongside advancement of capabilities, with autonomy and ownership distributed across teams working on distinct components of the research-to-deployment pipeline.

741 jobs
AN

Anthropic

Anthropic is an AI safety and research company founded in 2021 by seven former OpenAI employees, now operating as a Public Benefit Corporation with approximately 3,000 employees. The company develops the Claude family of large language models and associated AI assistant implementations, with a technical mandate centered on reliability, interpretability, and steerability. Under CEO Dario Amodei, Anthropic has reached a reported valuation of $183 billion while maintaining an explicit focus on AI systems aligned with human values and long-term societal benefit. The core technical work spans AI safety research, interpretable AI systems, and steerable large language models. Claude, Anthropic's primary product line, is positioned as engineered for safety, accuracy, and security in production deployments. The company's research agenda prioritizes understanding failure modes and developing evaluation frameworks that account for reliability constraints in real-world inference scenarios, rather than pursuing capability benchmarks in isolation. Anthropic's operational model combines frontier research with practical deployment considerations - balancing the latency-throughput-cost trade-offs inherent in large-scale language model serving while maintaining interpretability as a first-class constraint. The company approaches AI assistant development through the lens of alignment research, treating production systems as both products and testbeds for safety techniques. This dual mandate shapes technical priorities: understanding model behavior under distribution shift, quantifying uncertainty in high-stakes applications, and building systems where performance degradation is predictable and bounded.

683 jobs
PE

Perplexity

Perplexity operates an AI-powered answer engine processing over 150 million questions weekly across web, mobile, and enterprise platforms. Founded in 2022, the system combines real-time web search with multiple LLMs to deliver source-attributed answers. The architecture serves both consumer and enterprise workloads, with enterprise deployments requiring security guarantees for knowledge worker use cases including legal research partnerships with organizations like Latham & Watkins. The technical stack runs on AWS infrastructure with Terraform for provisioning, Python and Go for backend services, and PyTorch with DeepSpeed and FSDP for model training and inference. Data pipelines use dbt, SQL, Snowflake, and Databricks. Frontend implementations use React and TypeScript, with Docker containerization and Open Policy Agent for access control. This architecture must handle tail latency and throughput requirements for real-time search retrieval paired with LLM inference at consumer scale, while maintaining source credibility verification in the critical path. The engineering focus centers on information retrieval accuracy, model response quality, and citation reliability rather than advertising optimization. Production systems must balance inference cost against answer quality across multiple models, manage retrieval latency for real-time web indexing, and maintain reliability for both free-tier consumer traffic and enterprise SLA requirements. Pro tier monetization suggests capacity-based or model selection tiering rather than pure ad-based revenue.

76 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

26 jobs
PI

Pinecone

Pinecone operates a fully managed vector database service designed for production AI applications requiring storage and retrieval of high-dimensional embeddings. The system handles vector search at scale across recommendation systems, semantic search, and related ML-backed services. Founded by Edo Liberty, formerly a research director at AWS with prior experience building custom vector search systems at large scale, the company is credited with establishing the vector database category as a distinct infrastructure layer. The technical stack centers on systems languages - Rust, Go, C++, and Python - with RocksDB as the storage engine and Kubernetes orchestration across AWS, GCP, and Azure. This architecture targets the operational complexity of managing embedding indices, query latency, and throughput at production scale, abstracting infrastructure decisions from engineering teams deploying AI features. The platform serves thousands of companies, positioning itself on ease of deployment and reduced time-to-production for vector-backed applications. The founding principle emphasizes accessibility for engineering teams of varying sizes, evolving the managed service model to minimize operational overhead in running vector workloads. Core focus areas include retrieval performance, reliability under production load, and cost-efficiency trade-offs inherent to high-dimensional search systems.

9 jobs