1. Home
  2. AI Companies
  3. Together AI
Together AI logoTA

About

Together AI operates a purpose-built GPU cloud platform for training, fine-tuning, and deploying generative AI models. The infrastructure is designed without vendor lock-in, serving developers and organizations that need to run open-source models at scale. The engineering work centers on distributed systems, model optimization, and AI infrastructure - areas where trade-offs between throughput, latency, and operational complexity define production viability.

The company maintains active contributions to open-source projects including FlashAttention, Mamba, and RedPajama. Engineers and researchers work in close proximity, with new hires taking ownership of substantial technical challenges from the start. The tech stack spans PyTorch, CUDA, TensorRT, TensorRT-LLM, vLLM, SGLang, and TGI, reflecting the requirement to support multiple inference backends and optimization paths. Work involves designing distributed inference engines and developing model architectures where performance characteristics - memory bandwidth utilization, kernel fusion opportunities, multi-GPU coordination overhead - directly impact what models can run economically in production.

Technical problems include optimizing inference for various model architectures across heterogeneous GPU clusters, managing the reliability and cost trade-offs in serving large language models, and building tooling that makes open-source AI accessible without sacrificing control over deployment parameters. The platform must handle the operational complexity of supporting diverse workloads: training runs with different parallelization strategies, fine-tuning jobs with varying dataset sizes, and inference deployments where tail latency matters.

Open roles at Together AI

Explore 51 open positions at Together AI and find your next opportunity.

Together AI logoTA

Sr. Technical Program Manager (TPM)

Together AI

San Francisco, California, United States (Hybrid)

$225K – $265K Yearly2w ago
Together AI logoTA

Customer Support Engineer (GPU Cluster)

Together AI

San Francisco, California, United States (On-site)

$160K – $230K Yearly2w ago
Together AI logoTA

Director, Data Center Operations

Together AI

San Francisco, California, United States (On-site)

$250K – $300K Yearly2w ago
Together AI logoTA

Engineering Manager / Tech Lead

Together AI

Amsterdam, North Holland, Netherlands (On-site)

2w ago
Together AI logoTA

AI Infrastructure Engineer

Together AI

San Francisco, California, United States (On-site)

$190K – $270K Yearly2w ago
Together AI logoTA

Infrastructure Accounting Manager

Together AI

San Francisco, California, United States (On-site)

$180K – $220K Yearly2w ago
Together AI logoTA

Sr. Recruiter, Physical Infrastructure

Together AI

San Francisco, California, United States (Hybrid)

$165K – $210K Yearly3w ago
Together AI logoTA

Forward Deployed Engineer (Inference & Post-Training)

Together AI

San Francisco, California, United States (On-site)

$270K – $300K Yearly3w ago
Together AI logoTA

Staff Engineer, Customer Insights

Together AI

San Francisco, California, United States (On-site)

$200K – $270K Yearly4w ago
Together AI logoTA

Forward Deployed Engineer (GPU Clusters)

Together AI

San Francisco, California, United States (On-site)

$270K – $300K Yearly4w ago
Together AI logoTA

Technical Account Manager (TAM), AI Factory

Together AI

San Francisco, California, United States (Hybrid)

$260K – $290K Yearly4w ago

Similar companies

OpenAI logoOP

OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.

623 jobs
Baseten logoBA

Baseten

Baseten is an AI infrastructure platform providing the tooling, expertise, and hardware needed to deploy and scale AI models in production.

58 jobs
d-Matrix logoD-

d-Matrix

d-Matrix builds purpose-built AI inference computing platforms to make generative AI commercially viable, efficient, and sustainable through digital in-memory compute technology.

43 jobs
Modal logoMO

Modal

Modal is a serverless compute platform for AI and data teams that enables running compute-intensive workloads like ML inference, fine-tuning, and batch jobs with instant GPU access and usage-based pricing.

28 jobs
Runpod logoRU

Runpod

RunPod provides cloud infrastructure for AI developers, offering GPU computing services for training, deploying, and scaling AI models.

18 jobs
Inferact logoIN

Inferact

Inferact commercializes vLLM, an open-source LLM inference engine built by its founders, to reduce inference latency, cost, and serving complexity at scale.