About

Cerebras Systems designs and manufactures wafer-scale AI chips that consolidate the compute capacity of dozens of GPUs into a single device. Founded in 2015, the company's core architecture is 56 times larger than standard GPUs, addressing the operational complexity of distributed training and inference by offering programmability equivalent to a single-device system while delivering multi-GPU performance. This approach collapses the network bottlenecks and synchronization overhead inherent in GPU clusters, enabling users to run large-scale ML workloads without orchestrating hundreds of accelerators.

The company's technical stack spans the full systems hierarchy: custom silicon (wafer-scale chip architecture), compiler infrastructure (MLIR, LLVM IR, and their proprietary CSL language), runtime orchestration (Kubernetes), and deployment tooling. Engineering work touches computer architecture, deep learning kernels, systems software for hardware programmability, and inference serving at scale. Recent partnerships include work with OpenAI on inference deployment, alongside engagements with national laboratories, global enterprises, and healthcare systems requiring high-throughput ML serving.

Cerebras positions its hardware for both training and inference workloads, with claimed industry-leading speeds stemming from on-chip interconnect bandwidth and elimination of multi-chip communication latency. The architecture trades traditional data center modularity for integrated performance - relevant for workloads bottlenecked by cross-device synchronization or where cost-per-inference and tail latency matter more than incremental horizontal scaling. Development infrastructure includes C++, Python, Go, and Zig across the stack, with CI/CD through GitHub Actions and Jenkins.

Open roles at Cerebras

Explore 91 open positions at Cerebras and find your next opportunity.

CE

Senior Runtime Engineer

Cerebras

Sunnyvale, California, United States (On-site)

3mo ago
CE

Senior Manager, Revenue Accounting and Operations

Cerebras

Sunnyvale, California, United States (On-site)

3mo ago
CE

Senior External Reporting & Technical Accountant

Cerebras

Sunnyvale, California, United States (Hybrid)

3mo ago
CE

Performance Engineer - Inference

Cerebras

Toronto, Ontario, Canada (On-site)

3mo ago
CE

Contracts & Legal Operations Manager

Cerebras

Sunnyvale, California, United States (On-site)

$110K – $170K Yearly3mo ago
CE

Senior Accounting Manager

Cerebras

Sunnyvale, California, United States (On-site)

3mo ago
CE

Director of Costing Accounting

Cerebras

Sunnyvale, California, United States (On-site)

3mo ago
CE

AI Models, Product Manager

Cerebras

Sunnyvale, California, United States or Remote (United States)

3mo ago
CE

Software Development Engineer in Test

Cerebras

Sunnyvale, California, United States (On-site)

$170K – $225K Yearly3mo ago
CE

Applied AI/ML Scientist

Cerebras

United Arab Emirates (On-site)

3mo ago
CE

Early Career Compiler Engineer - LLVM

Cerebras

Sunnyvale, California, United States (On-site)

3mo ago

Similar companies

OP

OpenAI

OpenAI develops and deploys generative transformer models at scale, operating production systems that serve millions through ChatGPT, GPT model APIs, and the OpenAI API. The technical challenge spans the full stack: research engineering for novel model architectures, safety engineering for alignment and robustness, and production infrastructure for API deployment at scale. Teams work across research, product engineering, and operations, with work organized around both advancing model capabilities and maintaining reliability for deployed systems serving substantial user traffic. The core technical domains include model development for the GPT series, API infrastructure to support downstream applications, and safety research focused on making AGI beneficial. Engineering work involves trade-offs between model capability, inference cost, latency characteristics, and safety constraints. Research teams collaborate with product and engineering functions to move from experimental systems to production deployment, requiring expertise in distributed systems, model optimization, and operational complexity at scale. The company operates from San Francisco with international presence, positioning work as a global effort toward artificial general intelligence. Cross-functional teams include researchers, engineers, and operations staff working on problems ranging from foundational research to production reliability. The technical culture emphasizes rigorous safety practices alongside advancement of capabilities, with autonomy and ownership distributed across teams working on distinct components of the research-to-deployment pipeline.

741 jobs
AN

Anthropic

Anthropic is an AI safety and research company founded in 2021 by seven former OpenAI employees, now operating as a Public Benefit Corporation with approximately 3,000 employees. The company develops the Claude family of large language models and associated AI assistant implementations, with a technical mandate centered on reliability, interpretability, and steerability. Under CEO Dario Amodei, Anthropic has reached a reported valuation of $183 billion while maintaining an explicit focus on AI systems aligned with human values and long-term societal benefit. The core technical work spans AI safety research, interpretable AI systems, and steerable large language models. Claude, Anthropic's primary product line, is positioned as engineered for safety, accuracy, and security in production deployments. The company's research agenda prioritizes understanding failure modes and developing evaluation frameworks that account for reliability constraints in real-world inference scenarios, rather than pursuing capability benchmarks in isolation. Anthropic's operational model combines frontier research with practical deployment considerations - balancing the latency-throughput-cost trade-offs inherent in large-scale language model serving while maintaining interpretability as a first-class constraint. The company approaches AI assistant development through the lens of alignment research, treating production systems as both products and testbeds for safety techniques. This dual mandate shapes technical priorities: understanding model behavior under distribution shift, quantifying uncertainty in high-stakes applications, and building systems where performance degradation is predictable and bounded.

683 jobs
CO

CoreWeave

CoreWeave operates specialized cloud infrastructure purpose-built for AI workloads, with data centers across the US and Europe delivering GPU compute for large language model training and inference at scale. Founded in 2017 as Atlantic Crypto, a cryptocurrency mining operation, the company executed a complete strategic pivot to AI infrastructure - rebuilding from first principles rather than retrofitting existing cloud architectures. The platform runs on Kubernetes-based orchestration designed specifically for AI workloads, coupled with custom storage solutions engineered to handle the I/O patterns and throughput requirements of model training and deployment pipelines. The technical stack centers on NVIDIA GPUs with orchestration built in Go, Python, and C++ on Linux, instrumented with Prometheus, Grafana, and OpenTelemetry for observability across distributed systems. Rather than adapting general-purpose cloud tooling, CoreWeave's infrastructure treats GPU compute density, inter-node bandwidth, and storage parallelism as primary design constraints. This systems-level focus reflects a team drawn from infrastructure engineering and quantitative trading backgrounds - disciplines where latency budgets and resource utilization directly determine feasibility. CoreWeave serves AI labs, enterprises, and startups requiring production-scale inference and training capacity. The company's recognition on the TIME100 most influential companies list signals market adoption of specialized AI infrastructure as distinct from traditional cloud providers. For engineers, the environment offers direct exposure to the operational realities of running GPU clusters at scale: thermal management, network topology for distributed training, failure modes in multi-tenant GPU environments, and the cost-performance trade-offs inherent in serving latency-sensitive inference workloads alongside batch training jobs.

436 jobs
GR

Graphcore

Graphcore, a British semiconductor company and wholly owned subsidiary of SoftBank Group, develops specialized AI compute hardware centered on its Intelligence Processing Unit (IPU). The IPU represents a processor architecture specifically designed for machine intelligence workloads rather than general-purpose computing. The company built a complete AI compute stack spanning silicon design through datacenter infrastructure, including the Poplar software framework that sits atop the hardware. Graphcore brought the first Wafer-on-Wafer AI processor to market, a packaging approach that addresses the bandwidth and latency constraints inherent in traditional chip-to-chip interconnects for AI workloads. The technical scope encompasses semiconductor engineering, processor design, and AI-specific optimizations across both hardware and software layers. The engineering team works on silicon design, wafer-scale integration technology, and the development of tools for AI model optimization. The software stack includes developer tools designed to extract performance from the IPU architecture, with ongoing work to optimize popular AI models for the platform. This systems-level approach attempts to address the throughput and efficiency bottlenecks that emerge when running large-scale machine learning workloads on conventional processor architectures. Under CEO Nigel Toon's leadership, Graphcore operates with global presence and maintains teams of semiconductor, software, and AI specialists. The company's technology stack includes standard datacenter interfaces (PCIe, DDR, Ethernet) alongside proprietary elements like the IPU and Poplar software. The subsidiary structure under SoftBank provides backing for continued development of both the silicon and the software layers required to compete in AI compute infrastructure, where the trade-offs between custom silicon development costs and performance gains define commercial viability.

197 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

26 jobs