1. Home
  2. Jobs
  3. United States
  4. California
  5. Palo Alto
  6. Cloud Engineering
  7. Senior Staff Engineer – Platform Engineering
HA

Senior Staff Engineer – Platform Engineering

Hippocratic AI
Posted onFeb 25, 2026
LocationPalo Alto, California, United States (On-site)
Employment typeFull-time

About Us

Hippocratic AI is the leading generative AI company in healthcare. We have the only system that can have safe, autonomous, clinical conversations with patients. We have trained our own LLMs as part of our Polaris constellation, resulting in a system with over 99.9% accuracy.

Why Join Our Team

Reinvent healthcare with AI that puts safety first. We’re building the world’s first healthcare‑only, safety‑focused LLM — a breakthrough platform designed to transform patient outcomes at a global scale. This is category creation.

Work with the people shaping the future. Hippocratic AI was co‑founded by CEO Munjal Shah and a team of physicians, hospital leaders, AI pioneers, and researchers from institutions like El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Meta, Microsoft, and NVIDIA.

Backed by the world’s leading healthcare and AI investors. We recently raised a $126M Series C at a $3.5B valuation, led by Avenir Growth, bringing total funding to $404M with participation from CapitalG, General Catalyst, a16z, Kleiner Perkins, Premji Invest, UHS, Cincinnati Children’s, WellSpan Health, John Doerr, Rick Klausner, and others.

Build alongside the best in healthcare and AI. Join experts who’ve spent their careers improving care, advancing science, and building world‑changing technologies — ensuring our platform is powerful, trusted, and truly transformative.

Location Requirement

We believe the best ideas happen together. To support fast collaboration and a strong team culture, this role is expected to be in our Palo Alto office five days a week, unless otherwise specified.

About the Role

We are seeking a Senior Staff Engineer – Platform Engineering to lead the design, implementation, and operation of HippocraticAI’s cloud infrastructure, observability systems, and GPU control plane. This leader will be responsible for scaling our global compute fabric to support cutting-edge LLM workloads while maintaining exceptional reliability, security, and cost efficiency.

You will lead a multidisciplinary engineering team spanning cloud operations, SRE, and GPU orchestration, working closely with product development, AI research, and compliance to deliver world-class infrastructure for healthcare AI.

What You'll Do

Team Building & Leadership

  • Foster a culture of innovation, accountability, and technical excellence.

  • Mentor and coach engineers to achieve high performance and career growth.

Infrastructure Leadership

  • Build and scale a high-performing team responsible for all infrastructure operations and systems reliability.

  • Define and execute the long-term infrastructure roadmap for a multi-cloud, multi-region GPU and compute environment.

  • Drive excellence in cloud cost optimization, capacity planning, and service reliability.

Cloud Operations & Control Plane

  • Architect and manage HippocraticAI’s global GPU control plane, enabling dynamic provisioning, scheduling, and monitoring of inference workloads across regions and providers.

  • Lead the design and automation of deployments (AWS, GCP, Azure, on-prem) using infrastructure-as-code and CI/CD best practices.

  • Ensure strong security posture and compliance across all environments, aligned with HIPAA, SOC 2, and other healthcare data standards.

Observability & Reliability

  • Develop and scale comprehensive observability systems—covering telemetry, tracing, logging, and alerting—to ensure full visibility into production systems and AI workloads.

  • Establish SLOs, SLIs, and SLAs for all mission-critical services and infrastructure.

  • Implement robust incident management, root cause analysis, and continuous improvement processes.

Technical Strategy & Collaboration

  • Partner with AI and product teams to anticipate infrastructure needs and design scalable architectures for rapid experimentation and deployment.

  • Contribute to the design of internal developer platforms that improve productivity and standardization.

  • Evaluate emerging technologies (e.g., new GPU hardware, orchestration frameworks, data center partnerships) to advance our capabilities.

What You Bring

Must-Have:

  • 10+ years of engineering experience, including 5+ years leading infrastructure, SRE, or platform teams at scale.

  • Proven success in managing large-scale distributed systems and global cloud infrastructure.

  • Deep experience with high-performance computing or large-scale AI workloads.

  • Strong background in cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform, Pulumi, etc.).

  • Expertise in observability stacks (Prometheus, Grafana, OpenTelemetry, Datadog, etc.) and operational excellence.

  • Experience with security and compliance frameworks relevant to healthcare (HIPAA, SOC 2).

  • Exceptional communication skills and the ability to partner across product, AI research, and operations.

Nice-to-Have:

  • Experience designing or operating GPU control planes or schedulers (e.g., Kubernetes, Ray, Slurm, custom orchestration frameworks).

  • Prior work with ML infrastructure, data pipelines, or model-serving platforms.

  • Background in cost optimization and sustainability of GPU/compute operations.

  • Familiarity with edge or hybrid-cloud deployments for low-latency AI systems.


Please be aware of recruitment scams impersonating Hippocratic AI. All recruiting communication will come from @hippocraticai.comemail addresses. We will never request payment or sensitive personal information during the hiring process.

Hippocratic AI

View company profile

Hippocratic AI develops safety-focused LLMs for healthcare, having completed over 150 million clinical patient interactions and deploying 1000+ AI agents to address the global healthcare worker shortage.

Similar jobs

You might also be interested in...

RE2w

Staff Site Reliability Engineer

Replit

Foster City, California, United States (Hybrid)

$220k – $325k Yearly

DE1w

Staff Software Engineer, Infrastructure

Decagon

San Francisco, California, United States (On-site)

$300k – $375k Yearly

FI4w

Staff Site Reliability Engineer

Figure

San Jose, California, United States (On-site)

$175k – $250k Yearly

RE2w

Senior Infrastructure Engineer

Replit

Foster City, California, United States (Hybrid)

$190k – $240k Yearly

RE2w

Site Reliability Engineer

Replit

Foster City, California, United States (Hybrid)

$160k – $250k Yearly