1. Home
  2. Jobs
  3. LLM Inference Optimization

LLM Inference Optimization jobs

Explore LLM Inference Optimization roles on Inference Jobs and apply today.

161-180 of 447 jobs

NV6d

Senior Software Engineer, AI Inference Systems

NVIDIA

Toronto, Ontario, Canada (Hybrid)

C$170k – C$275k Yearly

MO2w

Member of Technical Staff - ML Performance

Modal

New York, New York, United States (On-site)

$150k – $270k Yearly

AN3w

Software Engineer, Inference Deployment

Anthropic

San Francisco, California, United States (Hybrid)

$320k – $485k Yearly

OP2w

Software Engineer, Load Balancing - Inference

OpenAI

San Francisco, California, United States (On-site)

$325k – $490k Yearly

NV2w

Senior Software Engineer - VLM Microservices for Neural Reconstruction

NVIDIA

Santa Clara, California, United States (On-site)

$152k – $287.5k Yearly

PE2w

UK Internship Program

Perplexity

London, England, United Kingdom (Hybrid)

LO5d

AI Engineer

Lovable

Stockholm, Stockholm, Sweden (On-site)

NV1w

Senior ML Framework Performance Engineer - AI for Science at Scale

NVIDIA

Santa Clara, California, United States (On-site)

$184k – $287.5k Yearly

NV2d

Senior Systems Software Engineer - Deep Learning Solutions

NVIDIA

Toronto, Ontario, Canada (On-site)

C$225k – C$275k Yearly

TA1w

AI Researcher, Core ML

Together AI

San Francisco, California, United States (On-site)

$160k – $230k Yearly

XA4w

Member of Technical Staff, Model Evaluation

xAI

Palo Alto, California, United States (On-site)

$180k – $440k Yearly

PE2w

Search Machine Learning Research Engineer (Berlin)

Perplexity

Berlin, Berlin, Germany (On-site)

NE2w

Senior ML Engineer (Token Factory)

Nebius

Europe + 6 more (Remote)

AC1w

Infrastructure Engineer, ML Systems

Applied Compute

San Francisco, California, United States (On-site)

BA2w

Software Engineer, Model Performance Tooling

Baseten

Canada or Remote (Canada + 1 more)

C$130k – C$200k Yearly

BA2w

Software Engineer - Model API's

Baseten

San Francisco, California, United States (On-site)

$150k – $230k Yearly

PE3w

Research Engineering Manager - Model Training

Perplexity

San Francisco, California, United States (On-site)

$300k – $470k Yearly

CO2w

Member of Technical Staff, MLE (Korea)

Cohere

Seoul, Seoul, South Korea or Remote (South Korea)

NV2w

Deep Learning Performance Architect - Intern - 2026

NVIDIA

Shanghai, Shanghai, China (On-site)

PE2w

Internship - Machine Learning Research Engineer (Berlin)

Perplexity

Berlin, Berlin, Germany (On-site)