CU

Similar companies

OP

OpenAI

OpenAI develops and deploys generative transformer models at scale, operating production systems that serve millions through ChatGPT, GPT model APIs, and the OpenAI API. The technical challenge spans the full stack: research engineering for novel model architectures, safety engineering for alignment and robustness, and production infrastructure for API deployment at scale. Teams work across research, product engineering, and operations, with work organized around both advancing model capabilities and maintaining reliability for deployed systems serving substantial user traffic. The core technical domains include model development for the GPT series, API infrastructure to support downstream applications, and safety research focused on making AGI beneficial. Engineering work involves trade-offs between model capability, inference cost, latency characteristics, and safety constraints. Research teams collaborate with product and engineering functions to move from experimental systems to production deployment, requiring expertise in distributed systems, model optimization, and operational complexity at scale. The company operates from San Francisco with international presence, positioning work as a global effort toward artificial general intelligence. Cross-functional teams include researchers, engineers, and operations staff working on problems ranging from foundational research to production reliability. The technical culture emphasizes rigorous safety practices alongside advancement of capabilities, with autonomy and ownership distributed across teams working on distinct components of the research-to-deployment pipeline.

741 jobs
MA

Mistral AI

Mistral AI is a French AI company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix - researchers with prior affiliations at Google DeepMind and Meta and academic roots at École Polytechnique. The company develops and releases open-weight, state-of-the-art generative AI models positioned as alternatives to proprietary solutions, with a focus on democratizing access to frontier AI technology. Their core approach centers on open, transparent model development that enables developers, enterprises, and institutions to build applications while maintaining control over their data and deployments. The company's primary product line consists of open-weight generative AI models released publicly, which Mistral claims rival proprietary solutions in capability. Their technical domains span generative AI model training, with particular emphasis on open-weight architectures, AI transparency, and bias mitigation. The founding mission explicitly opposes what the company characterizes as emerging opacity and centralization in AI systems, positioning their open-weight approach as a structural alternative to closed, proprietary models. Mistral AI's operational model emphasizes community-backed development and targets a broad user base spanning individual developers, enterprise deployments, and institutional applications across global markets. The company's cultural positioning centers on maintaining user control over inference infrastructure and data pipelines, combating censorship in model outputs, and providing an alternative to concentrated control of frontier AI capabilities. While specific scale metrics around model performance, deployment volumes, or operational characteristics are not publicly detailed, the company claims to have achieved state-of-the-art results in their released model family.

212 jobs
LA

LangChain

LangChain operates an engineering platform and open source frameworks for building, testing, and deploying AI agents. The core offering comprises LangChain and LangGraph - open source frameworks providing pre-built architectures and access to 1,000+ integrations - alongside LangSmith, a commercial platform for observability, evaluation, and deployment of LLM systems. The frameworks see over 90 million combined downloads per month and are used by millions of developers worldwide, with named deployments at Replit, Clay, Cloudflare, Harvey, Rippling, Vanta, Workday, LinkedIn, and Coinbase. The technical stack addresses the production bottlenecks of agent engineering: reliability through comprehensive observability, evaluation tooling to surface failure modes before deployment, and deployment infrastructure to move from prototype to production. LangSmith's platform provides the operational layer for teams moving LLM systems into production environments, while the open source frameworks prioritize development velocity through pre-built components and extensive integration coverage. The architecture allows granular control over agent behavior while reducing the complexity of integrating external services and managing LLM system reliability at scale. LangChain serves both major enterprises and startups building AI agents, with technical domains spanning agent engineering, LLM systems, observability, evaluation, and developer tooling. The company is led by CEO Harrison Chase and maintains a US headquarters, with a worldwide developer base. The dual model of open source frameworks and commercial platform reflects a focus on production-readiness and operational support for teams deploying agents at scale.

114 jobs
CO

Cohere

Cohere builds enterprise-focused foundational models designed for production deployment with emphasis on security, privacy, and operational trust. Founded in 2019 in Toronto, the company has raised nearly $1 billion and scaled to hundreds of employees worldwide. The technical focus spans semantic search, content generation, and customer experience applications - domains where model reliability and data governance are non-negotiable constraints for enterprise adoption. The company's architecture decisions reflect production realities over research novelty. Models are architected for deployment into regulated environments where data residency, access controls, and audit trails matter as much as accuracy metrics. This positioning addresses the gap between frontier model capabilities and enterprise operational requirements: latency SLAs, cost predictability, and compliance frameworks that prevent many organizations from operationalizing public AI APIs. Cohere Labs has published over 100 papers and built a research community of 4,500+ researchers, signaling ongoing investment in foundational work rather than pure application-layer focus. The team composition skews heavily toward researchers and engineers from academic backgrounds, which maps to the technical challenge space - building models that balance performance, safety constraints, and deployment flexibility across varied enterprise infrastructure.

106 jobs
BA

Baseten

Baseten builds AI infrastructure for production deployment and scaling of models, with work spanning kernel-level optimization for inference performance through developer tooling. The platform ships daily, measuring success by real-world impact of AI products running on it rather than vanity metrics. Engineers embed directly with customers to surface operational bottlenecks, then optimize obsessively - work ranges from TensorRT-LLM and CUDA kernel tuning to building developer tools that reduce deployment friction. The stack centers on inference at scale: TensorRT-LLM and PyTorch for model execution, NVIDIA Triton Inference Server for serving, Kubernetes (EKS) with Karpenter for autoscaling, and Knative for event-driven workloads on AWS EC2. Infrastructure decisions prioritize shipping velocity over process - small teams with real ownership iterate rapidly on production reliability, latency (including tail behavior), and cost efficiency. Docker containerization and PostgreSQL round out core operational dependencies. The team is internationally distributed, composed of engineers and designers who take craft seriously without performative posturing. Customer-embedded engineering informs both platform architecture and developer experience tradeoffs, creating tight feedback loops between deployment reality and infrastructure evolution. From founding, the approach has centered on hands-on problem solving and rapid iteration rather than abstraction layers that delay production learning.

69 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

26 jobs