Posted 3 months ago

Forward Deployed Reliability Engineer

New York, NYOn-siteFull-time

AI Summary

FDRE ensures stability and reliability of mission-critical workflows built on Palantir software by diagnosing, resolving, and proactively preventing issues, automating where possible, and sharing best practices across teams.

About this role

A World-Changing Company

Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.

The Role

As a Forward Deployed Reliability Engineer (FDRE), you ensure stability and reliability of mission-critical workflows built on Palantir software. You gather signal by going on call — resolving problems before the customer is impacted — and use those learnings to drive product change, shape our internal tooling, and refine our operational processes such that we provide an increasing quality of service to more and more customers.

Your approach is hands-on and pragmatic: you’ll rapidly address issues as they arise with quick and effective solutions and advocate for workflow or product improvements after the immediate issue is resolved. You are energized by engaging directly with problems, from writing a script to automate a manual task, to finding creative workarounds, or building a case for a product enhancement. You don’t just fix issues— you look for opportunities to simplify, automate, and make the entire system more resilient.

An FDRE synthesizes learnings from support into best practices for others to follow. These are captured into documentation and shared with the team and broader organization. In this way, you raise the bar for reliability and efficiency across Palantir.

Core Responsibilities

Develop a deep understanding of Palantir's products and operational processes

Go on-call, responding quickly and effectively to mission-critical incidents

Diagnose, resolve, and proactively prevent issues encountered in the field

Collaborate with internal stakeholders to increase the scalability and reliability of Foundry workflows for our customers

Identify recurring pain points and inefficiencies, and take initiative to automate or streamline workflows

Advocate for and implement product enhancements based on insights gleamed from the field

Create clear, actionable documentation and share best practices to elevate team and company-wide reliability

Note: While active work is not required on weekends or outside business hours, you must be available to respond to critical outages during assigned on-call weeks.

What We Value

Ability to work independently and collaboratively to solve ambiguous technical and operational challenges

Excellent written and verbal communication skills, capable of interacting effectively with both technical and non-technical stakeholders.

Proficiency in Python, Java, and SQL

Familiarity with parallel data processing and Spark job optimization

Strong organizational skills and attention to detail, with the ability to prioritize effectively

Resourcefulness and creativity in fast-paced dynamic environments

Experience with root cause analysis and documenting solutions for broader impact

Enthusiasm for hands-on problem solving, continuous improvement, and knowledge sharing

What We Require

Background in Computer Science, Engineering, Information Systems, or other technical field

Must be a US citizen or green card holder

Skills

AutomationData ProcessingDocumentationJavaOn-call Incident ResponsePythonRoot-cause AnalysisSparkSQLWorkflow Optimization

Forward Deployed Reliability Engineer

About this role

Core Responsibilities

What We Value

What We Require

Skills

Explore related jobs

More jobs at Palantir Technologies

Similar Automation jobs

Browse these categories