1. Home
  2. Jobs
  3. United States
  4. New York
  5. New York City
  6. AI Infrastructure
  7. Member of Technical Staff - ML Performance
MO

Member of Technical Staff - ML Performance

Modal
Posted onFeb 16, 2026
LocationNew York, New York, United States | San Francisco, California, United States | Stockholm, Stockholm, Sweden (On-site)
Employment typeFull-time
Salary$150k – $270k Yearly

About Us:

Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. Companies like Suno, Lovable, and Substack rely on Modal to move from prototype to production without the burden of managing infrastructure.

We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit 9-figure ARR and recently raised a Series B at a $1.1B valuation. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno.

Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

The Role:

We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!

Requirements:

  • 5+ years of experience writing high-quality, high-performance code.

  • Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).

  • Familiarity with Nvidia GPU architecture and CUDA.

  • Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).

  • Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).

  • Ability to work in-person, in our NYC, San Francisco or Stockholm office.

Salary: $150K – $270K • Offers Equity

Modal is a serverless compute platform for AI and data teams that enables running compute-intensive workloads like ML inference, fine-tuning, and batch jobs with instant GPU access and usage-based pricing.

Similar jobs

You might also be interested in...

CO1w

Member of Technical Staff, Model Efficiency

Cohere

New York, New York, United States or Remote (New York, United States + 3 more)

AI2w

ML Runtime Optimization Engineer - Lead

Applied Intuition

Sunnyvale, California, United States (On-site)

$199.3k – $264.5k Yearly

NV2d

Principal Software Engineer - AI Inference

NVIDIA

Santa Clara, California, United States (On-site)

$272k – $431.3k Yearly

HA1w

LLM Inference Engineer

Hippocratic AI

Palo Alto, California, United States (On-site)

CO5d

Staff Engineer - Perf and Benchmarking

CoreWeave

Sunnyvale, California, United States (Hybrid)

$188k – $275k Yearly