1. Home
  2. AI Companies
  3. Together AI
TA

About

Together AI operates a purpose-built GPU cloud platform for training, fine-tuning, and deploying generative AI models. The infrastructure is designed without vendor lock-in, serving developers and organizations that need to run open-source models at scale. The engineering work centers on distributed systems, model optimization, and AI infrastructure - areas where trade-offs between throughput, latency, and operational complexity define production viability.

The company maintains active contributions to open-source projects including FlashAttention, Mamba, and RedPajama. Engineers and researchers work in close proximity, with new hires taking ownership of substantial technical challenges from the start. The tech stack spans PyTorch, CUDA, TensorRT, TensorRT-LLM, vLLM, SGLang, and TGI, reflecting the requirement to support multiple inference backends and optimization paths. Work involves designing distributed inference engines and developing model architectures where performance characteristics - memory bandwidth utilization, kernel fusion opportunities, multi-GPU coordination overhead - directly impact what models can run economically in production.

Technical problems include optimizing inference for various model architectures across heterogeneous GPU clusters, managing the reliability and cost trade-offs in serving large language models, and building tooling that makes open-source AI accessible without sacrificing control over deployment parameters. The platform must handle the operational complexity of supporting diverse workloads: training runs with different parallelization strategies, fine-tuning jobs with varying dataset sizes, and inference deployments where tail latency matters.

Open roles at Together AI

Explore 48 open positions at Together AI and find your next opportunity.

TA

Senior Data Engineer

Together AI

San Francisco, California, United States (On-site)

$160K – $240K Yearly3w ago
TA

Senior Backend Engineer, Inference Platform

Together AI

San Francisco, California, United States (On-site)

$160K – $250K Yearly3w ago
TA

Senior Backend Engineer - Together Cloud

Together AI

Amsterdam, North Holland, Netherlands (Hybrid)

4w ago
TA

Program Manager, Data Center Delivery

Together AI

San Francisco, California, United States (Hybrid)

$170K – $210K Yearly1mo ago
TA

Backend Software Engineer — Data Platform & AI Data Products

Together AI

San Francisco, California, United States (On-site)

$160K – $240K Yearly1mo ago
TA

Staff Engineer, Product UI Platform

Together AI

San Francisco, California, United States (On-site)

$200K – $275K Yearly1mo ago
TA

Staff Data Warehouse Engineer

Together AI

San Francisco, California, United States (On-site)

$240K – $275K Yearly1mo ago
TA

Customer Support Engineer (Inference), India

Together AI

India (Remote)

1mo ago
TA

Sales and Marketing Operations Manager

Together AI

San Francisco, California, United States (Hybrid)

$150K – $170K Yearly1mo ago
TA

Backend Engineer

Together AI

Amsterdam, North Holland, Netherlands (Hybrid)

1mo ago
TA

Engineering Manager, Model Serving

Together AI

San Francisco, California, United States (On-site)

$250K – $300K Yearly1mo ago
TA

Global Hardware Sourcing & Supply Manager

Together AI

San Francisco, California, United States (Hybrid)

$150K – $170K Yearly1mo ago
TA

Senior Program Manager, Infrastructure Strategy and Business Operations

Together AI

San Francisco, California, United States (Hybrid)

$180K – $220K Yearly1mo ago
TA

Engineering Manager / Tech Lead

Together AI

Amsterdam, North Holland, Netherlands (On-site)

1mo ago
TA

Lead Product Designer

Together AI

San Francisco, California, United States (On-site)

$200K – $240K Yearly1mo ago
TA

Staff Engineer, API Core Platform

Together AI

San Francisco, California, United States (On-site)

$240K – $275K Yearly2mo ago
TA

Machine Learning, Platform Engineer

Together AI

San Francisco, California, United States (On-site)

$160K – $250K Yearly2mo ago
TA

Backend Engineer - Commerce

Together AI

San Francisco, California, United States (On-site)

$160K – $250K Yearly2mo ago
TA

Research Engineer, Frontier Speculative Decoding

Together AI

San Francisco, California, United States (On-site)

$190K – $270K Yearly2mo ago
TA

Research Engineer, Core ML

Together AI

San Francisco, California, United States (On-site)

$200K – $280K Yearly2mo ago

Similar companies

OP

OpenAI

OpenAI develops and deploys generative transformer models at scale, operating production systems that serve millions through ChatGPT, GPT model APIs, and the OpenAI API. The technical challenge spans the full stack: research engineering for novel model architectures, safety engineering for alignment and robustness, and production infrastructure for API deployment at scale. Teams work across research, product engineering, and operations, with work organized around both advancing model capabilities and maintaining reliability for deployed systems serving substantial user traffic. The core technical domains include model development for the GPT series, API infrastructure to support downstream applications, and safety research focused on making AGI beneficial. Engineering work involves trade-offs between model capability, inference cost, latency characteristics, and safety constraints. Research teams collaborate with product and engineering functions to move from experimental systems to production deployment, requiring expertise in distributed systems, model optimization, and operational complexity at scale. The company operates from San Francisco with international presence, positioning work as a global effort toward artificial general intelligence. Cross-functional teams include researchers, engineers, and operations staff working on problems ranging from foundational research to production reliability. The technical culture emphasizes rigorous safety practices alongside advancement of capabilities, with autonomy and ownership distributed across teams working on distinct components of the research-to-deployment pipeline.

741 jobs
CO

CoreWeave

CoreWeave operates specialized cloud infrastructure purpose-built for AI workloads, with data centers across the US and Europe delivering GPU compute for large language model training and inference at scale. Founded in 2017 as Atlantic Crypto, a cryptocurrency mining operation, the company executed a complete strategic pivot to AI infrastructure - rebuilding from first principles rather than retrofitting existing cloud architectures. The platform runs on Kubernetes-based orchestration designed specifically for AI workloads, coupled with custom storage solutions engineered to handle the I/O patterns and throughput requirements of model training and deployment pipelines. The technical stack centers on NVIDIA GPUs with orchestration built in Go, Python, and C++ on Linux, instrumented with Prometheus, Grafana, and OpenTelemetry for observability across distributed systems. Rather than adapting general-purpose cloud tooling, CoreWeave's infrastructure treats GPU compute density, inter-node bandwidth, and storage parallelism as primary design constraints. This systems-level focus reflects a team drawn from infrastructure engineering and quantitative trading backgrounds - disciplines where latency budgets and resource utilization directly determine feasibility. CoreWeave serves AI labs, enterprises, and startups requiring production-scale inference and training capacity. The company's recognition on the TIME100 most influential companies list signals market adoption of specialized AI infrastructure as distinct from traditional cloud providers. For engineers, the environment offers direct exposure to the operational realities of running GPU clusters at scale: thermal management, network topology for distributed training, failure modes in multi-tenant GPU environments, and the cost-performance trade-offs inherent in serving latency-sensitive inference workloads alongside batch training jobs.

436 jobs
MA

Mistral AI

Mistral AI is a French AI company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix - researchers with prior affiliations at Google DeepMind and Meta and academic roots at École Polytechnique. The company develops and releases open-weight, state-of-the-art generative AI models positioned as alternatives to proprietary solutions, with a focus on democratizing access to frontier AI technology. Their core approach centers on open, transparent model development that enables developers, enterprises, and institutions to build applications while maintaining control over their data and deployments. The company's primary product line consists of open-weight generative AI models released publicly, which Mistral claims rival proprietary solutions in capability. Their technical domains span generative AI model training, with particular emphasis on open-weight architectures, AI transparency, and bias mitigation. The founding mission explicitly opposes what the company characterizes as emerging opacity and centralization in AI systems, positioning their open-weight approach as a structural alternative to closed, proprietary models. Mistral AI's operational model emphasizes community-backed development and targets a broad user base spanning individual developers, enterprise deployments, and institutional applications across global markets. The company's cultural positioning centers on maintaining user control over inference infrastructure and data pipelines, combating censorship in model outputs, and providing an alternative to concentrated control of frontier AI capabilities. While specific scale metrics around model performance, deployment volumes, or operational characteristics are not publicly detailed, the company claims to have achieved state-of-the-art results in their released model family.

212 jobs
LA

LangChain

LangChain operates an engineering platform and open source frameworks for building, testing, and deploying AI agents. The core offering comprises LangChain and LangGraph - open source frameworks providing pre-built architectures and access to 1,000+ integrations - alongside LangSmith, a commercial platform for observability, evaluation, and deployment of LLM systems. The frameworks see over 90 million combined downloads per month and are used by millions of developers worldwide, with named deployments at Replit, Clay, Cloudflare, Harvey, Rippling, Vanta, Workday, LinkedIn, and Coinbase. The technical stack addresses the production bottlenecks of agent engineering: reliability through comprehensive observability, evaluation tooling to surface failure modes before deployment, and deployment infrastructure to move from prototype to production. LangSmith's platform provides the operational layer for teams moving LLM systems into production environments, while the open source frameworks prioritize development velocity through pre-built components and extensive integration coverage. The architecture allows granular control over agent behavior while reducing the complexity of integrating external services and managing LLM system reliability at scale. LangChain serves both major enterprises and startups building AI agents, with technical domains spanning agent engineering, LLM systems, observability, evaluation, and developer tooling. The company is led by CEO Harrison Chase and maintains a US headquarters, with a worldwide developer base. The dual model of open source frameworks and commercial platform reflects a focus on production-readiness and operational support for teams deploying agents at scale.

114 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

26 jobs
CL

Clarifai

Clarifai operates a full-stack AI platform spanning data preparation, model training, deployment, and monitoring across computer vision, NLP, and audio domains. The platform serves over 400,000 users across 170+ countries, delivering billions of predictions with access to more than 1 million models. Founded in 2013 by Matthew Zeiler after winning top five placements at ImageNet 2013, the company has raised $100 million in funding from Menlo Ventures, Union Square Ventures, NVIDIA, Google Ventures, and Qualcomm. Customers include Amazon, Siemens, NVIDIA, Canva, Vimeo, and OpenTable. The inference architecture supports orchestrated compute across AWS, GCP, and Azure, with edge deployment through Local Runners for on-premises and edge scenarios. The platform integrates PyTorch, TensorFlow, JAX, Nvidia Triton, and ONNX, with reported performance of 544 tokens per second on GPT-OSS-120B. Technical focus areas include image classification, video analysis, multimodal processing, and MLOps workflows. The stack runs on Python and Golang, with Kubeflow for pipeline orchestration. The company positions itself as enterprise- and developer-focused, addressing the full AI lifecycle from unstructured data ingestion through production monitoring. Forrester recognized Clarifai as a leader in its Computer Vision report. The platform's scope spans model training, inference orchestration, and operational deployment across cloud and edge environments, serving use cases in e-commerce, manufacturing, semiconductors, creative software, media, and hospitality verticals.

2 jobs