CL

Clarifai

About

Clarifai operates a full-stack AI platform spanning data preparation, model training, deployment, and monitoring across computer vision, NLP, and audio domains. The platform serves over 400,000 users across 170+ countries, delivering billions of predictions with access to more than 1 million models. Founded in 2013 by Matthew Zeiler after winning top five placements at ImageNet 2013, the company has raised $100 million in funding from Menlo Ventures, Union Square Ventures, NVIDIA, Google Ventures, and Qualcomm. Customers include Amazon, Siemens, NVIDIA, Canva, Vimeo, and OpenTable.

The inference architecture supports orchestrated compute across AWS, GCP, and Azure, with edge deployment through Local Runners for on-premises and edge scenarios. The platform integrates PyTorch, TensorFlow, JAX, Nvidia Triton, and ONNX, with reported performance of 544 tokens per second on GPT-OSS-120B. Technical focus areas include image classification, video analysis, multimodal processing, and MLOps workflows. The stack runs on Python and Golang, with Kubeflow for pipeline orchestration.

The company positions itself as enterprise- and developer-focused, addressing the full AI lifecycle from unstructured data ingestion through production monitoring. Forrester recognized Clarifai as a leader in its Computer Vision report. The platform's scope spans model training, inference orchestration, and operational deployment across cloud and edge environments, serving use cases in e-commerce, manufacturing, semiconductors, creative software, media, and hospitality verticals.

Open roles at Clarifai

Explore 2 open positions at Clarifai and find your next opportunity.

CL

Senior Site Reliability Engineer

Clarifai

United States (Remote)

3mo ago

Similar companies

OP

OpenAI

OpenAI develops and deploys generative transformer models at scale, operating production systems that serve millions through ChatGPT, GPT model APIs, and the OpenAI API. The technical challenge spans the full stack: research engineering for novel model architectures, safety engineering for alignment and robustness, and production infrastructure for API deployment at scale. Teams work across research, product engineering, and operations, with work organized around both advancing model capabilities and maintaining reliability for deployed systems serving substantial user traffic. The core technical domains include model development for the GPT series, API infrastructure to support downstream applications, and safety research focused on making AGI beneficial. Engineering work involves trade-offs between model capability, inference cost, latency characteristics, and safety constraints. Research teams collaborate with product and engineering functions to move from experimental systems to production deployment, requiring expertise in distributed systems, model optimization, and operational complexity at scale. The company operates from San Francisco with international presence, positioning work as a global effort toward artificial general intelligence. Cross-functional teams include researchers, engineers, and operations staff working on problems ranging from foundational research to production reliability. The technical culture emphasizes rigorous safety practices alongside advancement of capabilities, with autonomy and ownership distributed across teams working on distinct components of the research-to-deployment pipeline.

741 jobs
MA

Mistral AI

Mistral AI is a French AI company founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix - researchers with prior affiliations at Google DeepMind and Meta and academic roots at École Polytechnique. The company develops and releases open-weight, state-of-the-art generative AI models positioned as alternatives to proprietary solutions, with a focus on democratizing access to frontier AI technology. Their core approach centers on open, transparent model development that enables developers, enterprises, and institutions to build applications while maintaining control over their data and deployments. The company's primary product line consists of open-weight generative AI models released publicly, which Mistral claims rival proprietary solutions in capability. Their technical domains span generative AI model training, with particular emphasis on open-weight architectures, AI transparency, and bias mitigation. The founding mission explicitly opposes what the company characterizes as emerging opacity and centralization in AI systems, positioning their open-weight approach as a structural alternative to closed, proprietary models. Mistral AI's operational model emphasizes community-backed development and targets a broad user base spanning individual developers, enterprise deployments, and institutional applications across global markets. The company's cultural positioning centers on maintaining user control over inference infrastructure and data pipelines, combating censorship in model outputs, and providing an alternative to concentrated control of frontier AI capabilities. While specific scale metrics around model performance, deployment volumes, or operational characteristics are not publicly detailed, the company claims to have achieved state-of-the-art results in their released model family.

212 jobs
CO

Cohere

Cohere builds enterprise-focused foundational models designed for production deployment with emphasis on security, privacy, and operational trust. Founded in 2019 in Toronto, the company has raised nearly $1 billion and scaled to hundreds of employees worldwide. The technical focus spans semantic search, content generation, and customer experience applications - domains where model reliability and data governance are non-negotiable constraints for enterprise adoption. The company's architecture decisions reflect production realities over research novelty. Models are architected for deployment into regulated environments where data residency, access controls, and audit trails matter as much as accuracy metrics. This positioning addresses the gap between frontier model capabilities and enterprise operational requirements: latency SLAs, cost predictability, and compliance frameworks that prevent many organizations from operationalizing public AI APIs. Cohere Labs has published over 100 papers and built a research community of 4,500+ researchers, signaling ongoing investment in foundational work rather than pure application-layer focus. The team composition skews heavily toward researchers and engineers from academic backgrounds, which maps to the technical challenge space - building models that balance performance, safety constraints, and deployment flexibility across varied enterprise infrastructure.

106 jobs
TA

Together AI

Together AI operates a purpose-built GPU cloud platform for training, fine-tuning, and deploying generative AI models. The infrastructure is designed without vendor lock-in, serving developers and organizations that need to run open-source models at scale. The engineering work centers on distributed systems, model optimization, and AI infrastructure - areas where trade-offs between throughput, latency, and operational complexity define production viability. The company maintains active contributions to open-source projects including FlashAttention, Mamba, and RedPajama. Engineers and researchers work in close proximity, with new hires taking ownership of substantial technical challenges from the start. The tech stack spans PyTorch, CUDA, TensorRT, TensorRT-LLM, vLLM, SGLang, and TGI, reflecting the requirement to support multiple inference backends and optimization paths. Work involves designing distributed inference engines and developing model architectures where performance characteristics - memory bandwidth utilization, kernel fusion opportunities, multi-GPU coordination overhead - directly impact what models can run economically in production. Technical problems include optimizing inference for various model architectures across heterogeneous GPU clusters, managing the reliability and cost trade-offs in serving large language models, and building tooling that makes open-source AI accessible without sacrificing control over deployment parameters. The platform must handle the operational complexity of supporting diverse workloads: training runs with different parallelization strategies, fine-tuning jobs with varying dataset sizes, and inference deployments where tail latency matters.

83 jobs
RU

Runpod

RunPod operates an end-to-end AI infrastructure platform focused on GPU compute provisioning for model training, inference, and distributed agent orchestration. The platform serves over 500,000 developers, spanning solo practitioners to enterprise teams deploying at scale. Core infrastructure handles compute allocation, orchestration complexity, and operational overhead - positioning itself as accessible infrastructure rather than requiring deep systems expertise from users. The technical stack centers on Go, Python, and TypeScript with containerization through Docker and Kubernetes orchestration on Linux. Engineering domains span distributed systems, GPU compute scheduling, and developer tooling designed to abstract provisioning and scaling mechanics. The company emphasizes reducing operational friction: developers interact with compute resources without managing underlying cluster complexity or infrastructure provisioning bottlenecks. RunPod maintains a remote-first structure with team distribution across the U.S., Canada, Europe, and India. The platform's design reflects a systems-first approach to making GPU compute economically viable and operationally manageable - targeting workloads where cost, reliability, and time-to-deployment constrain AI development cycles.

26 jobs
GL

Gladia

Gladia operates speech-to-text APIs across two distinct workloads: real-time streaming at sub-300ms latency and asynchronous batch transcription, both supporting over 100 languages. The real-time path handles streaming audio with integrated speaker diarization, word-level timestamps, and sentiment analysis in the inference loop. The async path processes batch jobs with code-switching detection - single utterances spanning multiple languages - and comparable feature coverage. Over 150,000 users and 700 enterprise deployments (including VEED.IO, Circleback, Attention) generate production traffic against these endpoints. The core technical challenge is maintaining sub-300ms end-to-end latency on the streaming path while running diarization and alignment models alongside the primary ASR stack. Meeting this threshold at scale - across 100+ language models with varying acoustic characteristics - requires careful management of model load times, batching strategies, and inference queue depth. The async API trades latency tolerance for throughput optimization on longer-form audio, though specific cost-per-hour or throughput metrics are not disclosed. Code-switching introduces additional complexity: language detection, model routing, and boundary stitching must occur without degrading transcription accuracy or introducing alignment artifacts at switch points. Founded in 2022, the company raised $16 million Series A from Sequoia Capital, XAnge, and New Wave. Founders Jean-Louis Quéguiner and Jonathan Soto positioned the service as audio infrastructure for voice-first platforms rather than a narrow transcription tool. The engineering focus centers on reliability and operational predictability across multilingual inference workloads - handling acoustic variability, speaker overlap, background noise, and model version rollouts without service degradation. Production deployment at this user scale surfaces edge cases in language detection, diarization boundary errors, and latency tail behavior that define the system's actual robustness beyond benchmarked WER numbers.

2 jobs