Low Latency Optimization Jobs
Explore Low Latency Optimization roles on Inference Jobs and apply today.
3mo agoSE
ML Model Serving Engineer
Sesame
San Francisco, California, United States (On-site)$175K – $280K Yearly
3mo agoBA
Engineering Manager - Model Performance
Baseten
San Francisco, California, United States (On-site)$230K – $300K Yearly
3mo agoD-
Machine Learning Intern - Dynamic KV-Cache Modeling for Efficient LLM Inference
d-Matrix
Santa Clara, Ca, Ca, United States or Remote (California, United States)$30 – $59 Hourly
3mo agoOP
Inference Technical Lead, Sora
OpenAI
San Francisco, California, United States (Hybrid)$380K – $380K Yearly
2mo agoNV
Senior Software Engineer, Quantized Inference
NVIDIA
Redmond, Washington, United States (On-site)$152K – $287.5K Yearly
2w agoTA
Senior Machine Learning Engineer, Voice AI
Together AI
San Francisco, California, United States (On-site)$200K – $260K Yearly
2w agoNV
2mo agoNV
Senior Software Engineer – TensorRT Edge-LLM
NVIDIA
Santa Clara, California, United States (Hybrid)$152K – $287.5K Yearly
3mo agoCO
Member of Technical Staff, Model Efficiency
Cohere
New York, United States or Remote (New York, United States + 3 more)
1mo agoNV
Senior Performance Engineer - Deep Learning
NVIDIA
Santa Clara, California, United States (On-site)$152K – $241.5K Yearly
2mo agoNV
Senior Machine Learning Applications and Compiler Engineer
NVIDIA
Cambridge, England, United Kingdom (Hybrid)