vLLM Jobs

Browse 73 vLLM jobs on Inference Jobs.

21-40 of 73 jobs

6dTA

Machine Learning Engineer

Together AI

San Francisco, California, United States (On-site)$160k – $220k Yearly
2wBA

Software Engineer - Model Performance

Baseten

San Francisco, California, United States (On-site)$150k – $250k Yearly
2wNE

Senior ML Solutions Architect - Token Factory

Nebius

United States (Remote)$215k – $275k Yearly
3wXA

Member of Technical Staff, Model Evaluation

xAI

Palo Alto, California, United States (On-site)$180k – $440k Yearly
3dBA

Solution Architect

Baseten

San Francisco, California, United States (On-site)$165k – $275k Yearly
3wCR

Principal Engineer, AI Model LifeCycle

Crusoe

San Francisco, California, United States (On-site)$256k – $320k Yearly
4dNV

Senior Machine Learning Engineer, Quantized Inference

NVIDIA

Redmond, Washington, United States (On-site)$152k – $287.5k Yearly
3wNV

Principal GenAI Engagement Lead, Partner Platforms

NVIDIA

Santa Clara, California, United States (Hybrid)$272k – $431.3k Yearly
2wOP

Software Engineer, Inference – AMD GPU Enablement

OpenAI

San Francisco, California, United States (On-site)$325k – $490k Yearly
2wPE

Inference Engineering Manager

Perplexity

San Francisco, California, United States (On-site)$300k – $385k Yearly
2wBA

Software Engineer - Model API's

Baseten

San Francisco, California, United States (On-site)$150k – $230k Yearly
2wPE

Full Stack Software Engineer - Applied AI

Perplexity

San Francisco, California, United States (On-site)$210k – $385k Yearly
2wSE

ML Model Serving Engineer

Sesame

San Francisco, California, United States (On-site)$175k – $280k Yearly
4dNV

Principal Software Engineer - AI Inference

NVIDIA

Santa Clara, California, United States (On-site)$272k – $431.3k Yearly
6dNE

HPC System Engineer

Nebius

Amsterdam, North Holland, Netherlands (On-site)
2wPE

Full Stack Software Engineer - Finance

Perplexity

San Francisco, California, United States (On-site)$210k – $385k Yearly