
Posted 5 months ago
Product Reliability Engineer - Defense
New York, NYOn-siteFull-time
AI Summary
Product Reliability Engineer (PRE) responsible for health, performance, and stability of Palantir services; leads end-to-end reliability improvements, on-call incident response, and telemetry-driven enhancements.
About this role
A World-Changing Company
Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.
The Role
Product Reliability Engineers (PREs) are responsible for the health, performance, and stability of the services that power services at Palantir. PREs take ownership over the entire end-to-end cycle of service reliability, from responding to outages to improving codebases and building lasting solutions.
You will tackle critical issues for key customers, introduce observability into complex systems, address tech debt in essential codebases, and inform strategic investments in core products. We are looking for engineers who enjoy deep-dive troubleshooting, feel strong ownership over the problems they encounter, and recognize the urgency of customer-facing outages.
PREs spend the majority of their time on forward-looking product work, including but not limited to, infrastructure migrations, product contributions to improve stability and observability, and codebase enhancements that increase resilience. During periodic on-call shifts, we respond to automated alerts, investigate issues reported by customers, and share technical expertise with adjacent product teams.
Whatever the technical issue or question about your service is, you'll play a central and critical role in resolving it, seeking not just a one-time fix, but a permanent solution. We provide new team members with an experienced mentor and a clear onboarding framework to set them up for success in the role.
Core Responsibilities
What We Value
What We Require
Skills
Cloud InfrastructureConfiguration ManagementCSSData-driven Decision MakingDiagnostic ToolingDistributed SystemsDjangoFlaskGOHealth ChecksHTMLIncident ResponseJavaJavaScriptLoad BalancingMonitoringObservabilityOn-call ExperiencePrometheusPythonRubyRuby On RailsStakeholder CommunicationStorage And Data Processing SystemsTelemetryWeb Technologies