Jobless Developer
Fal logo
Fal

Posted 10 months ago

Open

Software Engineer, Distributed Systems

San FranciscoOn-siteFull-time

AI Summary

Software engineer focused on building distributed systems platforms for high traffic, data-intensive workloads. Responsible for design, implementation, and tuning of core compute and orchestration components.

About this role

You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.

Key responsibilities

  • Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
  • Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
  • Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
  • Profile and tune low level CPU and memory performance

Requirements

  • 3+ years experience building distributed compute and orchestration platforms in Python or Rust
  • Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
  • Deep understanding of computational complexity and memory allocation
  • Track record of designing systems that scale under real production load
  • Experience building and using observability to drive performance and reliability decisions
  • Excellent communication and ability to drive technical decisions across teams
  • Self-starter who executes quickly, takes ownership, and constantly seeks improvement

Nice to have

  • Experience with AI/ML inference or training infrastructure
  • Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
  • Background in building multi-tenant compute platforms
  • Understanding of networking fundamentals and performance characteristics
  • Familiarity with GPU workload characteristics and scheduling constraints

Compensation

  • $180,000-250,000 plus equity + benefits (This range is across all 3 levels Mid, Senior and Staff)

Location

  • San Francisco, CA (willing to consider remote for Senior and Staff levels)

What we offer at fal

  • Interesting and challenging work

  • A lot of learning and growth opportunities

  • We are currently hiring in downtown San Francisco.

  • We offer relocation assistance to San Francisco.

  • Health, dental, and vision insurance (US)

  • Regular team events and offsites

Skills

AI/ML Inference InfrastructureAsync RuntimesCapacity PlanningConsensusCPU/memory Performance TuningDistributed SystemsFault ToleranceGPU SchedulingGPU Workload CharacteristicsMemory-safe ConcurrencyMulti-tenant ComputeNetworking FundamentalsObservabilityPythonRustSchedulingZero-copy

Explore related jobs

Browse these categories