Jobless Developer
Otta logo
Otta

Posted 7 months ago

Open

Staff Site Reliability Engineer

San JoseOn-siteFull-time

AI Summary

Site Reliability Engineer responsible for owning internal systems infrastructure, cloud and on-prem, with automation, monitoring, incident response, and collaboration with security and product teams.

About this role

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.

We are looking for a Site Reliability Engineer to own our internal systems infrastructure. This role is responsible for setting up and managing cloud and on-prem infrastructure to deliver highly available, reliable, and automated systems.

Responsibilities:

  • Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more.
  • Migrate SaaS to self-hosted solutions to enhance security and reliability.
  • Implement monitoring and alerting systems, and define incident response plans and runbooks.
  • Reduce human workload through automation to automate deployment and scaling.
  • Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives.
  • Use a data driven approach to demonstrate service robustness and track optimization work.
  • Partner with the security team to ensure that security remediations and updates are applied in a timely manner.

Requirements:

  • Strong experience with Linux/Unix systems administration
  • Proficiency in programming/scripting
  • Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
  • Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems.
  • Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
  • Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
  • Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
  • Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets.
  • Ability to work in cross-functional teams with developers, infra, and product teams
  • Excellent verbal and written communication skills

The US base salary range for this full-time position is between $175,000 - $250,000 annually.

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.



Skills

AutomationCI/CD SystemsCloud Platforms (Azure, AWS, GCP)High-availability And Fault-tolerant Distributed SystemsInfrastructure As Code (Terraform, CloudFormation, Ansible)Linux/Unix AdministrationMonitoring/logging/alerting Tools (Prometheus, Grafana, Datadog)Networking Fundamentals (TCP/IP, DNS, HTTP, Load Balancers, Firewalls)On-prem Hardware ArchitecturesPost-mortemsProgramming/scriptingRunbooksSecurity RemediationsSLOs And Incident Response PlanningSoftware DistributionSource Configuration Management

Explore related jobs

Browse these categories