Jobless Developer
Palantir Technologies logo
Palantir Technologies

Posted 3 months ago

Open

Forward Deployed Reliability Engineer

New York, NYOn-siteFull-time

AI Summary

FDRE ensures stability and reliability of mission-critical workflows built on Palantir software by diagnosing, resolving, and proactively preventing issues, automating where possible, and sharing best practices across teams.

About this role

A World-Changing Company
Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.

The Role
As a Forward Deployed Reliability Engineer (FDRE), you ensure stability and reliability of mission-critical workflows built on Palantir software. You gather signal by going on call — resolving problems before the customer is impacted — and use those learnings to drive product change, shape our internal tooling, and refine our operational processes such that we provide an increasing quality of service to more and more customers.
Your approach is hands-on and pragmatic: you’ll rapidly address issues as they arise with quick and effective solutions and advocate for workflow or product improvements after the immediate issue is resolved. You are energized by engaging directly with problems, from writing a script to automate a manual task, to finding creative workarounds, or building a case for a product enhancement. You don’t just fix issues— you look for opportunities to simplify, automate, and make the entire system more resilient.
An FDRE synthesizes learnings from support into best practices for others to follow. These are captured into documentation and shared with the team and broader organization. In this way, you raise the bar for reliability and efficiency across Palantir.

Core Responsibilities

  • Develop a deep understanding of Palantir's products and operational processes
  • Go on-call, responding quickly and effectively to mission-critical incidents
  • Diagnose, resolve, and proactively prevent issues encountered in the field
  • Collaborate with internal stakeholders to increase the scalability and reliability of Foundry workflows for our customers
  • Identify recurring pain points and inefficiencies, and take initiative to automate or streamline workflows
  • Advocate for and implement product enhancements based on insights gleamed from the field
  • Create clear, actionable documentation and share best practices to elevate team and company-wide reliability
  • Note: While active work is not required on weekends or outside business hours, you must be available to respond to critical outages during assigned on-call weeks.

    What We Value

  • Ability to work independently and collaboratively to solve ambiguous technical and operational challenges
  • Excellent written and verbal communication skills, capable of interacting effectively with both technical and non-technical stakeholders.
  • Proficiency in Python, Java, and SQL
  • Familiarity with parallel data processing and Spark job optimization
  • Strong organizational skills and attention to detail, with the ability to prioritize effectively
  • Resourcefulness and creativity in fast-paced dynamic environments
  • Experience with root cause analysis and documenting solutions for broader impact
  • Enthusiasm for hands-on problem solving, continuous improvement, and knowledge sharing
  • What We Require

  • Background in Computer Science, Engineering, Information Systems, or other technical field
  • Must be a US citizen or green card holder
  • Skills

    AutomationData ProcessingDocumentationJavaOn-call Incident ResponsePythonRoot-cause AnalysisSparkSQLWorkflow Optimization

    Explore related jobs

    Browse these categories