Software Engineer, Virtualization
San FranciscoOn-siteFull-time
AI Summary
Software Engineer focused on building and maintaining custom compute environments, virtualization stacks, and high-performance networking for customers. Responsible for automating provisioning, Kubernetes on bare metal, and advanced GPU and networking configurations.
About this role
You build the custom compute environments we deliver to customers — bare metal or virtual machines with GPU passthrough, dedicated Kubernetes clusters, and the networking that ties them together. You work across the full stack from Linux image building to overlay network design to cluster bootstrapping.
Key responsibilities
- Build and deliver custom environments with excellent GPU performance for customer workloads
- Leverage AI to an extreme level to automate provisioning, alerting and recovery
- Provision and configure dedicated Kubernetes clusters tailored to customer requirements
- Design and implement overlay networking (VLAN, VXLAN) and routing configurations (ECMP, BGP) and tunnels (strongSwan, IPSEC) for tenant isolation and performance
- Build and maintain Linux images
- Set up network monitoring and diagnostics for customer environments
- Automate the end-to-end lifecycle of customer compute environments: creation, configuration, validation, and teardown
Requirements
- 5+ years experience with Linux virtualization: KVM/QEMU, libvirt, VFIO device passthrough, hugepages, NUMA, CPU pinning
- Strong networking fundamentals: VXLAN, VLAN, ECMP, BGP, ARP, and the ability to debug packet-level issues (tcpdump, Wireshark)
- Production experience building and operating Kubernetes clusters on bare metal (MetalLB)
- Proficiency with Linux image building and OS provisioning (kickstart, cloud-init, PXE/iPXE)
- Proficiency in Python, Bash, Ansible and Terraform
- Deep experience with NVIDIA GPUs: drivers, MIG, container runtimes (nvidia-container-toolkit), InfiniBand, RDMA/RoCEv2 and GPUDirect for high-performance AI networking
- Excellent communication and ability to drive technical decisions across teams
- Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to have
- Experience with SR-IOV, DPDK, or other high-performance networking technologies
- Experience with shared network storage (Ceph, Lustre, Weka)
- Experience with network automation tools (Netbox, Nautobot, Nornir)
Compensation
- $180,000-250,000 plus equity + benefits (This range encompasses 2 levels Senior and Staff)
Location
-
San Francisco, CA
What we offer at fal
- Interesting and challenging work
- A lot of learning and growth opportunities
- We are currently hiring in downtown San Francisco.
- We offer relocation assistance to San Francisco.
- Health, dental, and vision insurance (US)
- Regular team events and offsites
Skills
AnsibleARPBare-metal KubernetesBashBGPCephCloud-initCPU PinningDPDKECMPGPUDirectHugepagesImage BuildingInfinibandIPXEKickstartKubernetesKVMLibvirtLustreMetalLBMIGNautobotNetboxNornirNUMANvidia Container ToolkitNVIDIA GPUsOS ProvisioningPXEPythonQEMURDMARoCEv2SR-IOVTcpdumpTerraformVFIOVLANVxlanWEKAWireshark
Explore related jobs
More jobs at Fal
Similar Ansible jobs
Jobs in San Francisco
- WSite ManagerWah Mei School · San Francisco, Canada
- WSenior AccounantWah Mei School · San Francisco, Canada
- WProgram LeadWah Mei School · San Francisco, Canada
- WGardening CoordinatorWah Mei School · San Francisco, Canada
- TTeam Touchstone Assistant Coach - Dogpatch BouldersTouchstone Climbing · San Francisco, Canada
- TRegional Team Head Coach - NorCalTouchstone Climbing · San Francisco, Canada