About Us
Hippocratic AI is the leading generative AI company in healthcare. We have the only system that can have safe, autonomous, clinical conversations with patients. We have trained our own LLMs as part of our Polaris constellation, resulting in a system with over 99.9% accuracy.
Why Join Our Team
Reinvent healthcare with AI that puts safety first. We’re building the world’s first healthcare‑only, safety‑focused LLM — a breakthrough platform designed to transform patient outcomes at a global scale. This is category creation.
Work with the people shaping the future. Hippocratic AI was co‑founded by CEO Munjal Shah and a team of physicians, hospital leaders, AI pioneers, and researchers from institutions like El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Meta, Microsoft, and NVIDIA.
Backed by the world’s leading healthcare and AI investors. We recently raised a $126M Series C at a $3.5B valuation, led by Avenir Growth, bringing total funding to $404M with participation from CapitalG, General Catalyst, a16z, Kleiner Perkins, Premji Invest, UHS, Cincinnati Children’s, WellSpan Health, John Doerr, Rick Klausner, and others.
Build alongside the best in healthcare and AI. Join experts who’ve spent their careers improving care, advancing science, and building world‑changing technologies — ensuring our platform is powerful, trusted, and truly transformative.
Location Requirement
We believe the best ideas happen together. To support fast collaboration and a strong team culture, this role is expected to be in our Palo Alto office five days a week, unless otherwise specified.
About the Role
We are seeking a Senior Staff Engineer – Platform Engineering to lead the design, implementation, and operation of HippocraticAI’s cloud infrastructure, observability systems, and GPU control plane. This leader will be responsible for scaling our global compute fabric to support cutting-edge LLM workloads while maintaining exceptional reliability, security, and cost efficiency.
You will lead a multidisciplinary engineering team spanning cloud operations, SRE, and GPU orchestration, working closely with product development, AI research, and compliance to deliver world-class infrastructure for healthcare AI.
What You'll Do
Team Building & Leadership
Foster a culture of innovation, accountability, and technical excellence.
Mentor and coach engineers to achieve high performance and career growth.
Infrastructure Leadership
Build and scale a high-performing team responsible for all infrastructure operations and systems reliability.
Define and execute the long-term infrastructure roadmap for a multi-cloud, multi-region GPU and compute environment.
Drive excellence in cloud cost optimization, capacity planning, and service reliability.
Cloud Operations & Control Plane
Architect and manage HippocraticAI’s global GPU control plane, enabling dynamic provisioning, scheduling, and monitoring of inference workloads across regions and providers.
Lead the design and automation of deployments (AWS, GCP, Azure, on-prem) using infrastructure-as-code and CI/CD best practices.
Ensure strong security posture and compliance across all environments, aligned with HIPAA, SOC 2, and other healthcare data standards.
Observability & Reliability
Develop and scale comprehensive observability systems—covering telemetry, tracing, logging, and alerting—to ensure full visibility into production systems and AI workloads.
Establish SLOs, SLIs, and SLAs for all mission-critical services and infrastructure.
Implement robust incident management, root cause analysis, and continuous improvement processes.
Technical Strategy & Collaboration
Partner with AI and product teams to anticipate infrastructure needs and design scalable architectures for rapid experimentation and deployment.
Contribute to the design of internal developer platforms that improve productivity and standardization.
Evaluate emerging technologies (e.g., new GPU hardware, orchestration frameworks, data center partnerships) to advance our capabilities.
What You Bring
Must-Have:
10+ years of engineering experience, including 5+ years leading infrastructure, SRE, or platform teams at scale.
Proven success in managing large-scale distributed systems and global cloud infrastructure.
Deep experience with high-performance computing or large-scale AI workloads.
Strong background in cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform, Pulumi, etc.).
Expertise in observability stacks (Prometheus, Grafana, OpenTelemetry, Datadog, etc.) and operational excellence.
Experience with security and compliance frameworks relevant to healthcare (HIPAA, SOC 2).
Exceptional communication skills and the ability to partner across product, AI research, and operations.
Nice-to-Have:
Experience designing or operating GPU control planes or schedulers (e.g., Kubernetes, Ray, Slurm, custom orchestration frameworks).
Prior work with ML infrastructure, data pipelines, or model-serving platforms.
Background in cost optimization and sustainability of GPU/compute operations.
Familiarity with edge or hybrid-cloud deployments for low-latency AI systems.
Please be aware of recruitment scams impersonating Hippocratic AI. All recruiting communication will come from @hippocraticai.comemail addresses. We will never request payment or sensitive personal information during the hiring process.