About

Vast.ai operates a peer-to-peer GPU marketplace connecting over 10,000 GPUs across 40 data centers with users requiring compute for training, fine-tuning, and inference workloads. The platform aggregates capacity from data centers and individual providers running Vast's hosting software, offering on-demand, interruptible, and auction-based pricing models that price 3-5x below traditional cloud providers. Instance deployment occurs in seconds, with the marketplace enabling direct comparison of price-performance across heterogeneous hardware.

The architecture surfaces a pricing-availability trade-off inherent to peer-to-peer models: cost savings derive from utilizing underutilized capacity, but availability and reliability vary by provider. Interruptible instances present the sharpest cost-performance point but require fault-tolerant workloads and checkpointing discipline. The platform supports standard ML frameworks (PyTorch, TensorFlow) and containerized deployments via Docker. Enterprise offerings provide dedicated clusters with SLAs, SOC 2 Type I certification, and access to ISO 27001 certified facilities, trading marketplace economics for operational predictability.

The technical stack spans Python and C++ for core platform services, PostgreSQL for marketplace state, Redis for coordination, and Terraform for infrastructure provisioning. CUDA support is foundational for GPU workloads. The system must handle heterogeneous provider configurations, node churn, and pricing dynamics across thousands of GPUs while maintaining search and allocation latency suitable for rapid instance provisioning. Founded in 2018, the company positions itself as infrastructure for cost-sensitive training and inference at scale.

Open roles at Vast.ai

Explore 10 open positions at Vast.ai and find your next opportunity.

VA

QA Engineer

Vast.ai

Los Angeles, California, United States (On-site)

$40 – $40 Hourly3w ago
VA

QA Associate

Vast.ai

Los Angeles, California, United States (On-site)

$40 – $40 Hourly3w ago
VA

Security Engineer

Vast.ai

Los Angeles, California, United States (On-site)

$145K – $185K Yearly3w ago
VA

Systems/GPU Research Engineer

Vast.ai

San Francisco, California, United States (On-site)

$160K – $320K Yearly3w ago
VA

Systems/GPU Research Engineer

Vast.ai

San Francisco, California, United States (On-site)

$160K – $320K Yearly3w ago
VA

C++ Software Engineer — Systems

Vast.ai

San Francisco, California, United States (On-site)

$120K – $180K Yearly3w ago
VA

GPU Systems Engineer – HPC / Parallel Computing

Vast.ai

San Francisco, California, United States (On-site)

$160K – $320K Yearly3w ago
VA

AI Agent Researcher

Vast.ai

San Francisco, California, United States (On-site)

$160K – $320K Yearly3w ago
VA

Data Engineer — Analytics Infrastructure (Foundational Hire)

Vast.ai

Los Angeles, California, United States (On-site)

$140K – $190K Yearly3mo ago
VA

Senior Infrastructure Engineer

Vast.ai

Los Angeles, California, United States (On-site)

$180K – $300K Yearly3mo ago

Similar companies

AN

Anthropic

Anthropic is an AI safety and research company founded in 2021 by seven former OpenAI employees, now operating as a Public Benefit Corporation with approximately 3,000 employees. The company develops the Claude family of large language models and associated AI assistant implementations, with a technical mandate centered on reliability, interpretability, and steerability. Under CEO Dario Amodei, Anthropic has reached a reported valuation of $183 billion while maintaining an explicit focus on AI systems aligned with human values and long-term societal benefit. The core technical work spans AI safety research, interpretable AI systems, and steerable large language models. Claude, Anthropic's primary product line, is positioned as engineered for safety, accuracy, and security in production deployments. The company's research agenda prioritizes understanding failure modes and developing evaluation frameworks that account for reliability constraints in real-world inference scenarios, rather than pursuing capability benchmarks in isolation. Anthropic's operational model combines frontier research with practical deployment considerations - balancing the latency-throughput-cost trade-offs inherent in large-scale language model serving while maintaining interpretability as a first-class constraint. The company approaches AI assistant development through the lens of alignment research, treating production systems as both products and testbeds for safety techniques. This dual mandate shapes technical priorities: understanding model behavior under distribution shift, quantifying uncertainty in high-stakes applications, and building systems where performance degradation is predictable and bounded.

683 jobs
MA

Mistral AI

Mistral AI is a French AI company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix - researchers with prior affiliations at Google DeepMind and Meta and academic roots at École Polytechnique. The company develops and releases open-weight, state-of-the-art generative AI models positioned as alternatives to proprietary solutions, with a focus on democratizing access to frontier AI technology. Their core approach centers on open, transparent model development that enables developers, enterprises, and institutions to build applications while maintaining control over their data and deployments. The company's primary product line consists of open-weight generative AI models released publicly, which Mistral claims rival proprietary solutions in capability. Their technical domains span generative AI model training, with particular emphasis on open-weight architectures, AI transparency, and bias mitigation. The founding mission explicitly opposes what the company characterizes as emerging opacity and centralization in AI systems, positioning their open-weight approach as a structural alternative to closed, proprietary models. Mistral AI's operational model emphasizes community-backed development and targets a broad user base spanning individual developers, enterprise deployments, and institutional applications across global markets. The company's cultural positioning centers on maintaining user control over inference infrastructure and data pipelines, combating censorship in model outputs, and providing an alternative to concentrated control of frontier AI capabilities. While specific scale metrics around model performance, deployment volumes, or operational characteristics are not publicly detailed, the company claims to have achieved state-of-the-art results in their released model family.

212 jobs
GR

Graphcore

Graphcore, a British semiconductor company and wholly owned subsidiary of SoftBank Group, develops specialized AI compute hardware centered on its Intelligence Processing Unit (IPU). The IPU represents a processor architecture specifically designed for machine intelligence workloads rather than general-purpose computing. The company built a complete AI compute stack spanning silicon design through datacenter infrastructure, including the Poplar software framework that sits atop the hardware. Graphcore brought the first Wafer-on-Wafer AI processor to market, a packaging approach that addresses the bandwidth and latency constraints inherent in traditional chip-to-chip interconnects for AI workloads. The technical scope encompasses semiconductor engineering, processor design, and AI-specific optimizations across both hardware and software layers. The engineering team works on silicon design, wafer-scale integration technology, and the development of tools for AI model optimization. The software stack includes developer tools designed to extract performance from the IPU architecture, with ongoing work to optimize popular AI models for the platform. This systems-level approach attempts to address the throughput and efficiency bottlenecks that emerge when running large-scale machine learning workloads on conventional processor architectures. Under CEO Nigel Toon's leadership, Graphcore operates with global presence and maintains teams of semiconductor, software, and AI specialists. The company's technology stack includes standard datacenter interfaces (PCIe, DDR, Ethernet) alongside proprietary elements like the IPU and Poplar software. The subsidiary structure under SoftBank provides backing for continued development of both the silicon and the software layers required to compete in AI compute infrastructure, where the trade-offs between custom silicon development costs and performance gains define commercial viability.

197 jobs
CE

Cerebras

Cerebras Systems designs and manufactures wafer-scale AI chips that consolidate the compute capacity of dozens of GPUs into a single device. Founded in 2015, the company's core architecture is 56 times larger than standard GPUs, addressing the operational complexity of distributed training and inference by offering programmability equivalent to a single-device system while delivering multi-GPU performance. This approach collapses the network bottlenecks and synchronization overhead inherent in GPU clusters, enabling users to run large-scale ML workloads without orchestrating hundreds of accelerators. The company's technical stack spans the full systems hierarchy: custom silicon (wafer-scale chip architecture), compiler infrastructure (MLIR, LLVM IR, and their proprietary CSL language), runtime orchestration (Kubernetes), and deployment tooling. Engineering work touches computer architecture, deep learning kernels, systems software for hardware programmability, and inference serving at scale. Recent partnerships include work with OpenAI on inference deployment, alongside engagements with national laboratories, global enterprises, and healthcare systems requiring high-throughput ML serving. Cerebras positions its hardware for both training and inference workloads, with claimed industry-leading speeds stemming from on-chip interconnect bandwidth and elimination of multi-chip communication latency. The architecture trades traditional data center modularity for integrated performance - relevant for workloads bottlenecked by cross-device synchronization or where cost-per-inference and tail latency matter more than incremental horizontal scaling. Development infrastructure includes C++, Python, Go, and Zig across the stack, with CI/CD through GitHub Actions and Jenkins.

135 jobs
BA

Baseten

Baseten builds AI infrastructure for production deployment and scaling of models, with work spanning kernel-level optimization for inference performance through developer tooling. The platform ships daily, measuring success by real-world impact of AI products running on it rather than vanity metrics. Engineers embed directly with customers to surface operational bottlenecks, then optimize obsessively - work ranges from TensorRT-LLM and CUDA kernel tuning to building developer tools that reduce deployment friction. The stack centers on inference at scale: TensorRT-LLM and PyTorch for model execution, NVIDIA Triton Inference Server for serving, Kubernetes (EKS) with Karpenter for autoscaling, and Knative for event-driven workloads on AWS EC2. Infrastructure decisions prioritize shipping velocity over process - small teams with real ownership iterate rapidly on production reliability, latency (including tail behavior), and cost efficiency. Docker containerization and PostgreSQL round out core operational dependencies. The team is internationally distributed, composed of engineers and designers who take craft seriously without performative posturing. Customer-embedded engineering informs both platform architecture and developer experience tradeoffs, creating tight feedback loops between deployment reality and infrastructure evolution. From founding, the approach has centered on hands-on problem solving and rapid iteration rather than abstraction layers that delay production learning.

69 jobs