Together AI

Together AI operates a purpose-built GPU cloud platform for training, fine-tuning, and deploying generative AI models. The infrastructure is designed without vendor lock-in, serving developers and organizations that need to run open-source models at scale. The engineering work centers on distributed systems, model optimization, and AI infrastructure - areas where trade-offs between throughput, latency, and operational complexity define production viability.

The company maintains active contributions to open-source projects including FlashAttention, Mamba, and RedPajama. Engineers and researchers work in close proximity, with new hires taking ownership of substantial technical challenges from the start. The tech stack spans PyTorch, CUDA, TensorRT, TensorRT-LLM, vLLM, SGLang, and TGI, reflecting the requirement to support multiple inference backends and optimization paths. Work involves designing distributed inference engines and developing model architectures where performance characteristics - memory bandwidth utilization, kernel fusion opportunities, multi-GPU coordination overhead - directly impact what models can run economically in production.

Technical problems include optimizing inference for various model architectures across heterogeneous GPU clusters, managing the reliability and cost trade-offs in serving large language models, and building tooling that makes open-source AI accessible without sacrificing control over deployment parameters. The platform must handle the operational complexity of supporting diverse workloads: training runs with different parallelization strategies, fine-tuning jobs with varying dataset sizes, and inference deployments where tail latency matters.

About

Markets

Open roles at Together AI

Machine Learning Engineer - Inference

Solutions Architect

Senior Software Engineer - Together Cloud Infrastructure

AI Infrastructure Engineer

Product Marketing Director

Research Engineer, Core ML

Research Engineer, Frontier Speculative Decoding

Lead Product Designer

Staff Data Warehouse Engineer

Frontier Agents Intern (Fall 2026)

Research Intern, Inference (Fall 2026)

Customer Support Engineer (Inference), India

Data Center Operations Coordinator

Systems Research Engineer Intern - GPU Programming (Fall 2026)

Solutions Architect

Similar companies