Jobless Developer
DEUNA logo
DEUNA

Posted 21 days ago

Open

AI Engineering Lead

San FranciscoOn-siteFull-time

AI Summary

Engineering Lead who will own the full AI/ML stack for a payments platform, spanning model development, data pipelines, production orchestration, and hybrid cloud/on-prem deployments with real-time observability.

About this role

About the Role

Athia is DEUNA's AI-powered payment intelligence platform — moving from early ML experimentation to the critical infrastructure behind billions of dollars in annual transaction volume. We are looking for a hands-on Engineering Lead who can own the full technical stack: from model development and data pipelines to production payment orchestration, cloud/on-prem deployments, and real-time observability.

This is not a coordination role. You will build, ship, and own. You will be the technical authority that bridges AI/ML systems with our core payments stack, leading both the platform engineering and the modeling lifecycle end-to-end.

Core Responsibilities

1 · AI/ML Model Ownership

  • Design, train, and fine-tune ML models for payment optimization use cases — including authorization rate improvement, dynamic routing, cost minimization, and fraud signal detection.

  • Select and apply the right frameworks (PyTorch, TensorFlow, scikit-learn) per model type and latency budget.

  • Own the model lifecycle: experimentation → offline evaluation → shadow deployment → A/B testing → production promotion.

  • Monitor and remediate model drift, data distribution shifts, and performance degradation proactively.

  • Define evaluation metrics that map directly to business KPIs (approval rate lift, GMV impact, provider cost).

  • 2 · Data Pipelines & Feature Engineering

    • Architect and build optimized data pipelines to collect, clean, and preprocess high-volume transaction data for model training and inference.

    • Design feature stores and real-time feature serving layers that keep inference latency within payments SLA requirements (<100 ms).

    • Establish data quality standards, schema validation, and lineage tracking across the ML data stack.

    • Partner with the Data Engineering team to ensure training data reflects the full distribution of providers, regions, and merchant types in our network.

    • 3 · Production Deployment & Payments Stack Integration

      • Integrate ML model outputs into DEUNA's live payment routing and orchestration layer with zero tolerance for latency regressions or silent errors.

      • Develop and own the inference service layer in Go and Python, ensuring thread-safe, performant, and fault-tolerant operation under peak transaction load.

      • Lead the design of hybrid deployment architectures: cloud-native (AWS/GCP) and on-premise client environments, including secure bi-directional data synchronization.

      • Build and maintain RESTful and gRPC APIs that expose Athia capabilities to the broader DEUNA platform and external partners.

      • 4 · Observability, Monitoring & Incident Response

        • Own the full observability stack for Athia: real-time dashboards, alerting thresholds, anomaly detection, and post-incident reviews.

        • Implement model-specific monitoring (prediction distributions, confidence scores, provider error rates) alongside standard infrastructure metrics.

        • Create a fast feedback loop with the Operations team to detect and remediate routing degradation or GMV impact within SLA.

        • Define on-call runbooks and escalation paths that are clear, tested, and kept up to date.

        • 5 · Scalability, Resiliency & Engineering Leadership

          • Provide architectural guidance to scale Athia to handle 10M+ monthly transactions across concurrent global partner launches.

          • Lead and mentor engineers through architecture reviews, code reviews, technical planning, and day-to-day execution.

          • Drive engineering best practices: testing strategy (unit, integration, shadow), CI/CD pipelines, documentation standards, and security compliance.

          • Translate business and product goals into concrete technical roadmaps with realistic timelines and clear dependency mapping.

          • Requirements

            Backend & Infrastructure

            • Go (Golang) — production-grade services

            • Python — ML pipelines, model serving, tooling

            • RESTful APIs and gRPC

            • Distributed systems & event-driven arch

            • CI/CD, Docker, Kubernetes

            • Cloud platforms (AWS or GCP)

            • Hybrid / on-prem deployment patterns

            AI / ML Stack

            • PyTorch or TensorFlow — training & fine-tuning

            • scikit-learn, XGBoost, or tabular ML

            • MLflow, Weights & Biases, or equivalent

            • Feature engineering & feature stores

            • Model monitoring & drift detection

            • A/B testing and shadow deployment

            • Low-latency inference architectures

            Frontend & Full-Stack

            • React and Next.js

            • TypeScript

            • Component design systems

            • API integration patterns

            Observability & Data

            • Prometheus, Grafana, or Datadog

            • Structured logging & distributed tracing

            • SQL and analytical query patterns

            • Data pipeline tooling (Airflow, dbt, etc.)

            Experience

            • 6+ years in software engineering with strong backend foundations.

            • 2+ years in a Tech Lead or Staff Engineer role owning a production platform end-to-end.

            • Demonstrated experience shipping ML/AI systems to production — not just research or notebooks.

            • Background in payments, fintech, or high-transaction environments strongly preferred.

            • Experience with on-premise deployment or hybrid infrastructure for enterprise clients is a plus.

            • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Skills

A/B Testing And Shadow DeploymentAirflow, DbtAWS Or GCP Cloud PlatformsCI/CD, Docker, KubernetesDistributed Systems & Event-driven ArchitectureFeature StoresGo (Golang) Production-grade ServicesHybrid/on-prem Deployment PatternsLow-latency Inference ArchitecturesMLflow Or Weights & BiasesModel Monitoring & Drift DetectionPrometheus, Grafana, DatadogPython ML Pipelines And Model ServingPytorch Or TensorFlowReact And Next.jsRESTful APIs And GRPCScikit-learn, XGBoost, Tabular MLSQL And Analytical QueriesStructured Logging & Distributed TracingTypeScript

Explore related jobs

Browse these categories