Jobless Developer
Omnifold logo
Omnifold

Posted 3 months ago

Open

MTS - Infrastructure

San FranciscoOn-siteFull-time

AI Summary

Leads the deployment, security, and reliability of AI model training and inference infrastructure for customer-specific ML workflows, with emphasis on productionization, GPU resource management, and robust monitoring.

About this role

Infrastructure Team

Omnifold trains custom AI models that help planners forecast the future. We are hiring for our Infrastructure Team, who own the systems that make everything else possible.

What makes this job interesting:

  • We train a unique model for each customer, which means model training and inference work differently here than at any other company. You’ll never get more reps building model training infrastructure!

  • Our team has very fast iteration speed but needs robust monitoring to pick up signal on user patterns. This is especially important as our application interface for AI-driven forecasting is unique on the market.

What you’ll own

  • Deployment: Reliable processes for getting models and services into production

  • Security: Data isolation between customers, product security, infrastructure hardening (SOC2 compliance and beyond)

  • Cloud resource management: GPU allocation, instance sizing, cost optimization

  • Monitoring and logging: Visibility into what's running, what's failing, and why

  • Data and ML ops: ETL pipelines from varied customer data sources, model versioning and lifecycle management

  • Automated testing: Building the test infrastructure that lets us ship with confidence

What we’re looking for

  • Experience with cloud computing (especially GPU workloads), CI/CD infrastructure-as-code. We run on AWS

  • Familiarity with or interest in ML workflows

  • Security fundamentals: encryption, access controls, compliance basics

  • Python proficiency

  • Must have a strong Computer Science background

Location: San Francisco (in-person, 5 days per week)

Omnifold’s Mission

Every bad forecast has a physical consequence. Unnecessary goods are manufactured, shipped, and stored. Emergency air freight is needed for misallocated products. Poor production planning means workers show up with nothing to do, or work frantic overtime. Inefficiency is everywhere.

Our mission is to eliminate waste and accelerate growth for every company with physical products.

Skills

Access ControlsAWSCI/CDCloud ComputingCompliance BasicsContainerizationCost OptimizationData IsolationDistributed SystemsEncryptionETL PipelinesGPU Instance SizingGPU WorkloadsInfrastructure As CodeLogging And MonitoringMLOpsModel VersioningProduction DeploymentPythonSecurity FundamentalsSOC2 ComplianceTesting Infrastructure

Explore related jobs

Browse these categories