RE

About

Reka builds unified multimodal foundation models that process text, images, video, and audio. The company's core technical focus is modeling the physical world through systems that handle perception, reasoning, and action across modalities. The team includes researchers and engineers from Google DeepMind and Facebook AI Research working on inference-critical domains including GPU performance engineering, computer vision, audio processing, and natural language understanding.

The technical stack centers on Python, PyTorch, and JAX for model development, with CUDA and C++ for performance-critical components. Infrastructure runs on Kubernetes and Slurm for orchestration and job scheduling. Engineering roles emphasize end-to-end ownership - individuals work across the stack from model architecture through deployment, addressing bottlenecks in latency, throughput, and operational complexity at production scale.

Reka operates remote-first, aggregating global talent into a distributed systems organization. The work targets enterprise and organizational deployments where multimodal capabilities must meet reliability and cost constraints. Team structure reflects early-stage dynamics: engineers wear multiple hats, and technical decisions directly shape product capabilities and production characteristics.

Open roles at Reka

Explore 3 open positions at Reka and find your next opportunity.

RE

Member of Technical Staff (GPU Performance Engineer)

Reka

United States + 2 more (Remote)

3mo ago

Similar companies

NE

Nebius

Nebius is a Nasdaq-listed technology company (NBIS) building full-stack AI infrastructure from its Amsterdam headquarters, with GPU clusters deployed across Europe and the United States. Led by CEO Arkady Volozh, the company operates AI-optimized sustainable data centers - including a facility 60 kilometers from Helsinki and a new Vineland, New Jersey site - and has raised significant capital ($700 million from investors including Accel, NVIDIA, and Orbis). The engineering organization, numbering in the hundreds, maintains deep expertise in world-class infrastructure and runs an in-house AI R&D team that dogfoods the platform to validate it against production ML practitioner requirements. The infrastructure stack spans hyperscaler-scale features with supercomputer-grade performance characteristics. ISEG, Nebius's supercomputer, ranks among the world's most powerful systems. The platform integrates NVIDIA GPUs with NVIDIA InfiniBand networking, exposing workload orchestration through both Kubernetes and Slurm. The operational layer includes standard observability (Prometheus, Grafana), data infrastructure (PostgreSQL, Apache Spark), and ML tooling (MLflow, vLLM, Triton, Ray), with infrastructure-as-code managed via Terraform. This architecture targets the latency, throughput, and reliability requirements of AI training and inference workloads at scale. The company has secured a multi-billion dollar agreement with Microsoft to deliver dedicated AI infrastructure from its Vineland data center. Nebius serves startups, research institutes, and enterprises across healthcare and life sciences, robotics, finance, and entertainment verticals. The technical approach emphasizes production-grade infrastructure that handles the operational complexity of large-scale AI deployments - managing GPU utilization, network bottlenecks, and the cost-performance trade-offs inherent in serving diverse AI workloads from model training through inference serving.

477 jobs
HE

Heidi

Heidi builds an AI Care Partner that automates clinical documentation, form filling, and task management for clinicians worldwide. The system has returned over 18 million hours to clinicians in 18 months and currently supports more than 2 million patient visits weekly across 116 countries and 110+ languages. The company has raised nearly $100 million from Point72, Anthropic, and Blackbird, with a stated goal of halving the time required to deliver patient-first care. The core technical challenge sits at the intersection of multilingual NLP, healthcare informatics, and production reliability at global scale. The system must handle clinical documentation workflows across diverse regulatory environments, languages, and medical specialties while maintaining accuracy and latency requirements that directly impact clinician workflows. The stack spans TypeScript, React, Next.js, and Node.js on the frontend with Python, NestJS, and Express on the backend, using PostgreSQL and MongoDB for persistence and running on GCP and AWS infrastructure. The team includes clinicians, engineers, and designers, with most employees having healthcare backgrounds or direct experience with clinician burnout. Operational philosophy emphasizes shipping small, fast iteration cycles, and tolerance for failure in pursuit of reducing administrative burden. The Australian-based company operates globally with Docker-based deployments and CI/CD pipelines supporting continuous delivery across production environments.

112 jobs
RE

Replit

Replit operates a web-based code editor and multiplayer computing environment used by millions for collaborative software development. The platform eliminates traditional barriers to application creation through natural language interfaces, allowing users to build applications without conventional development workflows - demonstrated by architectural decisions like removing the save button from their editor. The multiplayer environment serves as infrastructure for experimentation, sharing, and collaborative growth at scale. The company measures success by the number of people empowered to create software rather than vanity metrics, reflecting a systems-level focus on removing bottlenecks in developer onboarding and productivity. Technical decisions prioritize shipping velocity and operational autonomy: the culture emphasizes extreme ownership, radical bets, and bias toward action. Engineers operate with the latitude to pursue emergent ideas and question established patterns when friction appears in the development loop. The platform's architecture supports collaborative coding workflows at scale, handling millions of concurrent users across a shared computing environment. This requires managing trade-offs between multi-tenancy constraints, latency in collaborative editing, and operational complexity of maintaining compute resources for distributed development sessions. The technical focus centers on developer tools, web-based editing infrastructure, and the reliability challenges of real-time collaborative computing.

76 jobs
TO

Toma

Toma operates a voice AI platform for automotive dealerships, processing over 1,000,000 calls since launching in 2024. The system handles inbound phone operations - service scheduling, call routing, and follow-up automation - with safeguards designed to manage transfer latency and revenue leakage. Core technical challenge: maintaining conversational quality and intent detection accuracy across high-variance dealership scenarios (service appointments, parts inquiries, sales handoffs) while minimizing false transfers and dropped context. The platform implements transfer triggers, clawback mechanisms for mistimed handoffs, and follow-up alerts when human staff doesn't complete actions, addressing the operational complexity of human-AI transition points in production telephony. Infrastructure runs on AWS with a TypeScript/Next.js frontend, PostgreSQL via Prisma for state management, and tRPC for type-safe API boundaries. The voice AI layer must handle real-time constraints - low-latency speech recognition and synthesis, sub-second intent classification - while managing concurrent call volume and dealership-specific context (inventory, scheduling systems, staff availability). Trade-offs center on model selection for conversational understanding versus inference cost at scale, and the reliability surface area of integrating with legacy dealership management systems. Founded by engineers from Scale AI, Uber, Lyft, and Amazon; backed by Andreessen Horowitz and Y Combinator with $17 million Series A funding. Deployment spans dealerships across the United States, including Pohanka Automotive Group, SCHOMP, Hudson Automotive Group, and Bergey's. Primary bottlenecks likely involve tuning voice models for domain-specific terminology (vehicle makes, service codes, dealership jargon), managing tail latency in transfer decisions where milliseconds impact customer experience, and evaluating conversational success beyond simple call completion - did the AI correctly capture appointment details, route urgency appropriately, preserve customer satisfaction? The system's value proposition hinges on converting missed calls and staff bottlenecks into captured revenue, which requires high precision on intent classification and low false-negative rates on transfer triggers to avoid revenue loss from mishandled interactions.

1 job