1. Home
  2. Jobs
  3. LLM Serving

LLM Serving jobs

Explore LLM Serving roles on Inference Jobs and apply today.

81-100 of 289 jobs

CO1w

Forward Deployed Engineer

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$188k – $275k Yearly

CE3w

Engineering Manager, Inference Platform

Cerebras

Sunnyvale, California, United States (On-site)

NV2w

Senior AI Software Engineer, GenAI Framework

NVIDIA

Santa Clara, California, United States (On-site)

$152k – $287.5k Yearly

EL2w

Senior Research Scientist

EliseAI

New York, New York, United States (On-site)

$200k – $320k Yearly

NE1w

Senior Technical Product Manager Token Factory - Inference

Nebius

United States (Remote)

$204k – $255k Yearly

HA3w

Senior Forward Deployed Engineer

Harvey

New York, New York, United States (On-site)

$200k – $260k Yearly

CO1w

Senior Manager Forward Deployed Engineers

CoreWeave

Livingston, New Jersey, United States (Hybrid)

$188k – $275k Yearly

NV5d

Senior Deep Learning Research Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-site)

NV2d

Senior Software Engineer, Quantized Inference

NVIDIA

Redmond, Washington, United States (On-site)

$152k – $287.5k Yearly

CR3w

Senior Site Reliability Engineer, Managed AI

Crusoe

San Francisco, California, United States (On-site)

$172k – $209k Yearly

NV2w

AI Safety Scientist, Deep Learning

NVIDIA

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-site)

CE1w

Applied AI/ML Scientist

Cerebras

United Arab Emirates (On-site)

SC4w

Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI

Scale

San Francisco, California, United States (On-site)

$252k – $315k Yearly

NE4d

Senior ML Engineer (Token Factory)

Nebius

Netherlands + 4 more (Remote)

SC4w

Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI

Scale

San Francisco, California, United States (On-site)

$252k – $315k Yearly

OP2w

Software Engineer, Applied Evals

OpenAI

San Francisco, California, United States (Hybrid)

$255k – $325k Yearly

PE2w

AI Research Lead

Perplexity

San Francisco, California, United States (On-site)

$300k – $470k Yearly

LA2w

Senior Full Stack Engineer, Observability & Evals Platform

LangChain

San Francisco, California, United States (On-site)

$175k – $225k Yearly

AN1w

Research Engineer, Pretraining Scaling

Anthropic

San Francisco, California, United States (On-site)

$315k – $560k Yearly

D-2w

ML Compiler Architect, Senior Principal

d-Matrix

Toronto, Ontario, Canada (Hybrid)