Software Engineer, Distributed Systems
AI Summary
Software engineer focused on building distributed systems platforms for high traffic, data-intensive workloads. Responsible for design, implementation, and tuning of core compute and orchestration components.
About this role
You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.
Key responsibilities
- Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
- Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
- Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
- Profile and tune low level CPU and memory performance
Requirements
- 3+ years experience building distributed compute and orchestration platforms in Python or Rust
- Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
- Deep understanding of computational complexity and memory allocation
- Track record of designing systems that scale under real production load
- Experience building and using observability to drive performance and reliability decisions
- Excellent communication and ability to drive technical decisions across teams
- Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to have
- Experience with AI/ML inference or training infrastructure
- Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
- Background in building multi-tenant compute platforms
- Understanding of networking fundamentals and performance characteristics
- Familiarity with GPU workload characteristics and scheduling constraints
Compensation
- $180,000-250,000 plus equity + benefits (This range is across all 3 levels Mid, Senior and Staff)
Location
-
San Francisco, CA (willing to consider remote for Senior and Staff levels)
What we offer at fal
-
Interesting and challenging work
-
A lot of learning and growth opportunities
-
We are currently hiring in downtown San Francisco.
-
We offer relocation assistance to San Francisco.
-
Health, dental, and vision insurance (US)
-
Regular team events and offsites
Skills
Explore related jobs
More jobs at Fal
Similar AI/ML Inference Infrastructure jobs
Jobs in San Francisco
- CAccounting LeadCookUnity · New York, New York
Software Engineer (Agent Infra)Hinoki Security · San Francisco- Senior Technical Account ManagerDigitalOcean · San Francisco
- Senior Data EngineerAdyen · San Francisco
Deployment StrategistConsole · San Francisco (On-site)
Senior Service Desk TechnicianBusiness Wire · San Francisco, CA/Hybrid