Inferact logoIN

Inferact

About

Inferact commercializes and advances vLLM, the open-source LLM inference engine maintained by the company's founders and core creators. vLLM is deployed across research and production systems, with deep integration into model architectures, accelerator types, and large-scale deployment patterns.

The company positions inference as an increasingly constrained problem. As model architectures evolve and hardware fragmentation deepens, the gap widens between what models can express and what serving systems can efficiently execute. Inferact's technical approach builds at this intersection - leveraging vLLM's tight coupling with model-level and hardware-level concerns to reduce latency, throughput bottlenecks, and per-token cost at scale.

vLLM development remains open-source. Inferact plans to contribute performance optimizations, expanded model-architecture support, and broader hardware coverage back to the community while building a commercial offering around the project's stewardship and expertise.

Similar companies

Cerebras logoCE

Cerebras

Cerebras Systems builds the world's fastest AI infrastructure with industry-leading speed, scale, and quality through wafer-scale AI chips.

87 jobs
Perplexity logoPE

Perplexity

Perplexity is an AI-powered answer engine that provides accurate, real-time answers to questions backed by credible sources and citations.

54 jobs
Together AI logoTA

Together AI

Together AI is a research-driven AI cloud infrastructure provider enabling developers and enterprises to train, fine-tune, and deploy open-source generative AI models at scale.

48 jobs
d-Matrix logoD-

d-Matrix

d-Matrix builds purpose-built AI inference computing platforms to make generative AI commercially viable, efficient, and sustainable through digital in-memory compute technology.

43 jobs
Modal logoMO

Modal

Modal is a serverless compute platform for AI and data teams that enables running compute-intensive workloads like ML inference, fine-tuning, and batch jobs with instant GPU access and usage-based pricing.

28 jobs
Bento logoBE

Bento

Bento provides an open-source framework and enterprise platform for deploying and operating AI/ML model inference in production with control over performance, scaling, and operational complexity.