1. Home
  2. AI Companies
  3. CoreWeave

About

CoreWeave operates specialized cloud infrastructure purpose-built for AI workloads, with data centers across the US and Europe delivering GPU compute for large language model training and inference at scale. Founded in 2017 as Atlantic Crypto, a cryptocurrency mining operation, the company executed a complete strategic pivot to AI infrastructure - rebuilding from first principles rather than retrofitting existing cloud architectures. The platform runs on Kubernetes-based orchestration designed specifically for AI workloads, coupled with custom storage solutions engineered to handle the I/O patterns and throughput requirements of model training and deployment pipelines.

The technical stack centers on NVIDIA GPUs with orchestration built in Go, Python, and C++ on Linux, instrumented with Prometheus, Grafana, and OpenTelemetry for observability across distributed systems. Rather than adapting general-purpose cloud tooling, CoreWeave's infrastructure treats GPU compute density, inter-node bandwidth, and storage parallelism as primary design constraints. This systems-level focus reflects a team drawn from infrastructure engineering and quantitative trading backgrounds - disciplines where latency budgets and resource utilization directly determine feasibility.

CoreWeave serves AI labs, enterprises, and startups requiring production-scale inference and training capacity. The company's recognition on the TIME100 most influential companies list signals market adoption of specialized AI infrastructure as distinct from traditional cloud providers. For engineers, the environment offers direct exposure to the operational realities of running GPU clusters at scale: thermal management, network topology for distributed training, failure modes in multi-tenant GPU environments, and the cost-performance trade-offs inherent in serving latency-sensitive inference workloads alongside batch training jobs.

Open roles at CoreWeave

Explore 279 open positions at CoreWeave and find your next opportunity.

CO

Product Technical Program Management - Weights & Biases

CoreWeave

San Francisco, California, United States (Hybrid)

$188K – $275K Yearly22h ago
CO

Manager, Energy & Power Transaction Management

CoreWeave

New York, United States (Hybrid)

$143K – $210K Yearly3d ago
CO

Senior Business Systems Engineer - GTM Systems

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$139K – $204K Yearly3d ago
CO

Security Compliance - Technical Program Manager

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$143K – $210K Yearly3d ago
CO

Director, Field Enablement

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$165K – $242K Yearly3d ago
CO

Staff Product Manager, Networking

CoreWeave

Bellevue, Washington, United States (Hybrid)

$188K – $275K Yearly3d ago
CO

Software Engineer, Observability

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$109K – $145K Yearly3d ago
CO

Sr. Manager - Direct Tax Compliance & Reporting

CoreWeave

Bellevue, Washington, United States (Hybrid)

$135K – $198K Yearly3d ago
CO

Senior Production Engineer

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$165K – $242K Yearly3d ago
CO

Staff AI & Agents Growth Product Manager

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$161K – $237K Yearly4d ago
CO

Senior Systems Engineer

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$139K – $242K Yearly4d ago
CO

Senior Electrical Engineer

CoreWeave

New York, United States (Hybrid)

$165K – $242K Yearly4d ago
CO

Software Engineer - Data Infrastructure Services

CoreWeave

Sunnyvale, California, United States (Hybrid)

$109K – $160K Yearly4d ago
CO

Senior Software Engineer - Data Infrastructure Services

CoreWeave

Sunnyvale, California, United States (Hybrid)

$165K – $242K Yearly4d ago
CO

Senior Software Engineer, Applied AI

CoreWeave

New York, United States (Hybrid)

$165K – $242K Yearly4d ago
CO

Engineering Manager, Data Infrastructure

CoreWeave

New York, United States (Hybrid)

$165K – $242K Yearly4d ago
CO

Technical Sourcing Manager - Storage and Components

CoreWeave

Sunnyvale, California, United States (Hybrid)

$161K – $237K Yearly4d ago
CO

Senior Site Reliability Engineer, Data Infrastructure

CoreWeave

New York, United States (Hybrid)

$165K – $242K Yearly4d ago
CO

Technical Sourcing Manager - Mech, Power, Thermal

CoreWeave

Sunnyvale, California, United States (Hybrid)

$143K – $210K Yearly4d ago
CO

Manager, Data Center Operations Accounting

CoreWeave

Dallas, Texas, United States (Hybrid)

$115K – $153K Yearly6d ago

Similar companies

AN

Anthropic

Anthropic is an AI safety and research company founded in 2021 by seven former OpenAI employees, now operating as a Public Benefit Corporation with approximately 3,000 employees. The company develops the Claude family of large language models and associated AI assistant implementations, with a technical mandate centered on reliability, interpretability, and steerability. Under CEO Dario Amodei, Anthropic has reached a reported valuation of $183 billion while maintaining an explicit focus on AI systems aligned with human values and long-term societal benefit. The core technical work spans AI safety research, interpretable AI systems, and steerable large language models. Claude, Anthropic's primary product line, is positioned as engineered for safety, accuracy, and security in production deployments. The company's research agenda prioritizes understanding failure modes and developing evaluation frameworks that account for reliability constraints in real-world inference scenarios, rather than pursuing capability benchmarks in isolation. Anthropic's operational model combines frontier research with practical deployment considerations - balancing the latency-throughput-cost trade-offs inherent in large-scale language model serving while maintaining interpretability as a first-class constraint. The company approaches AI assistant development through the lens of alignment research, treating production systems as both products and testbeds for safety techniques. This dual mandate shapes technical priorities: understanding model behavior under distribution shift, quantifying uncertainty in high-stakes applications, and building systems where performance degradation is predictable and bounded.

683 jobs
GR

Graphcore

Graphcore, a British semiconductor company and wholly owned subsidiary of SoftBank Group, develops specialized AI compute hardware centered on its Intelligence Processing Unit (IPU). The IPU represents a processor architecture specifically designed for machine intelligence workloads rather than general-purpose computing. The company built a complete AI compute stack spanning silicon design through datacenter infrastructure, including the Poplar software framework that sits atop the hardware. Graphcore brought the first Wafer-on-Wafer AI processor to market, a packaging approach that addresses the bandwidth and latency constraints inherent in traditional chip-to-chip interconnects for AI workloads. The technical scope encompasses semiconductor engineering, processor design, and AI-specific optimizations across both hardware and software layers. The engineering team works on silicon design, wafer-scale integration technology, and the development of tools for AI model optimization. The software stack includes developer tools designed to extract performance from the IPU architecture, with ongoing work to optimize popular AI models for the platform. This systems-level approach attempts to address the throughput and efficiency bottlenecks that emerge when running large-scale machine learning workloads on conventional processor architectures. Under CEO Nigel Toon's leadership, Graphcore operates with global presence and maintains teams of semiconductor, software, and AI specialists. The company's technology stack includes standard datacenter interfaces (PCIe, DDR, Ethernet) alongside proprietary elements like the IPU and Poplar software. The subsidiary structure under SoftBank provides backing for continued development of both the silicon and the software layers required to compete in AI compute infrastructure, where the trade-offs between custom silicon development costs and performance gains define commercial viability.

197 jobs
CE

Cerebras

Cerebras Systems designs and manufactures wafer-scale AI chips that consolidate the compute capacity of dozens of GPUs into a single device. Founded in 2015, the company's core architecture is 56 times larger than standard GPUs, addressing the operational complexity of distributed training and inference by offering programmability equivalent to a single-device system while delivering multi-GPU performance. This approach collapses the network bottlenecks and synchronization overhead inherent in GPU clusters, enabling users to run large-scale ML workloads without orchestrating hundreds of accelerators. The company's technical stack spans the full systems hierarchy: custom silicon (wafer-scale chip architecture), compiler infrastructure (MLIR, LLVM IR, and their proprietary CSL language), runtime orchestration (Kubernetes), and deployment tooling. Engineering work touches computer architecture, deep learning kernels, systems software for hardware programmability, and inference serving at scale. Recent partnerships include work with OpenAI on inference deployment, alongside engagements with national laboratories, global enterprises, and healthcare systems requiring high-throughput ML serving. Cerebras positions its hardware for both training and inference workloads, with claimed industry-leading speeds stemming from on-chip interconnect bandwidth and elimination of multi-chip communication latency. The architecture trades traditional data center modularity for integrated performance - relevant for workloads bottlenecked by cross-device synchronization or where cost-per-inference and tail latency matter more than incremental horizontal scaling. Development infrastructure includes C++, Python, Go, and Zig across the stack, with CI/CD through GitHub Actions and Jenkins.

135 jobs
BA

Baseten

Baseten builds AI infrastructure for production deployment and scaling of models, with work spanning kernel-level optimization for inference performance through developer tooling. The platform ships daily, measuring success by real-world impact of AI products running on it rather than vanity metrics. Engineers embed directly with customers to surface operational bottlenecks, then optimize obsessively - work ranges from TensorRT-LLM and CUDA kernel tuning to building developer tools that reduce deployment friction. The stack centers on inference at scale: TensorRT-LLM and PyTorch for model execution, NVIDIA Triton Inference Server for serving, Kubernetes (EKS) with Karpenter for autoscaling, and Knative for event-driven workloads on AWS EC2. Infrastructure decisions prioritize shipping velocity over process - small teams with real ownership iterate rapidly on production reliability, latency (including tail behavior), and cost efficiency. Docker containerization and PostgreSQL round out core operational dependencies. The team is internationally distributed, composed of engineers and designers who take craft seriously without performative posturing. Customer-embedded engineering informs both platform architecture and developer experience tradeoffs, creating tight feedback loops between deployment reality and infrastructure evolution. From founding, the approach has centered on hands-on problem solving and rapid iteration rather than abstraction layers that delay production learning.

69 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

26 jobs