Jobless Developer
fal logo
fal

Posted 3 months ago

Open

Software Engineer, Distributed Systems

RemoteRemoteFull-time

AI Summary

Distributed systems software engineer focusing on designing and implementing large-scale, reliable Python/Rust platforms for AI workloads, routing, scheduling, and autoscaling.

About this role

You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.

Key responsibilities

  • Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc

  • Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world

  • Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems

  • Profile and tune low level CPU and memory performance

Requirements

  • 5+ years experience building distributed compute and orchestration platforms in Python or Rust

  • Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning

  • Deep understanding of computational complexity and memory allocation

  • Track record of designing systems that scale under real production load

  • Experience building and using observability to drive performance and reliability decisions

  • Excellent communication and ability to drive technical decisions across teams

  • Self-starter who executes quickly, takes ownership, and constantly seeks improvement

Nice to have

  • Experience with AI/ML inference or training infrastructure

  • Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)

  • Background in building multi-tenant compute platforms

  • Understanding of networking fundamentals and performance characteristics

  • Familiarity with GPU workload characteristics and scheduling constraints

Location

  • Turkey

What we offer at fal

  • Interesting and challenging work

  • A lot of learning and growth opportunities

  • Regular team events and offsites

Skills

AI/ML Inference InfraAI Workload OrchestrationAsync RuntimesCapacity PlanningConsensusCPU OptimizationDistributed SystemsFault ToleranceGPU SchedulingGPU Workload CharacteristicsMemory OptimizationMemory-safe ConcurrencyMulti-tenant ComputeNetworking FundamentalsObservabilityPythonQueueingRustScalable StorageSchedulingZero-copy

Explore related jobs

Browse these categories