Posted 1 month ago

AI Engineer

WarsawOn-siteContract

AI Summary

An AI Engineer builds and maintains a multi-agent AI system that automates performance media trading decisions, owns agent end-to-end development, data integration, evaluation-first testing, and production reliability.

About this role

WHO WE ARE

Delve Deeper is a performance media agency focused on the charity and nonprofit sector, partnering with organizations that invest $5M–$20M annually in media. We help mission-driven teams maximize impact through advanced digital strategies that drive measurable, scalable results.

Our expertise includes advanced analytics, intent-based audience segmentation, full-service media management, and personalized creative—delivering a fully integrated, data-driven approach to growth.

More than a vendor, we serve as a strategic partner, helping organizations solve complex media challenges and turn them into clear outcomes. With decades of leadership experience, Delve Deeper is a trusted voice in the charity space.

We’ve also been named Built In Colorado’s “Best Places to Work” for five consecutive years, reflecting a culture that values performance, growth, and people. As a privately owned company, we move quickly, support our team holistically, and create meaningful opportunities for advancement.

ROLE OVERVIEW

Dedicated individual contributor focused entirely on building and maintaining a multi-agent AI system that automates performance media trading decisions. This is not a generalist developer role. You will work within a complex multi-agent codebase, operate under strict evaluation-first protocols, and develop deep enough understanding of the business domain to make sound implementation decisions without constant oversight. Data errors have direct financial consequences for clients — engineering quality is non-negotiable.

CORE RESPONSIBILITIES

Agent Development: Build, iterate, and maintain AI agents within the architecture and boundaries defined by the Lead AI Engineer. Own your agents end-to-end: prompt design, tool wiring, context routing, failure handling, and output validation.

Data Integration: Integrate data components across media platforms — ingesting, normalizing to schema, and routing to the correct agent context. Work within defined data contracts and surface schema drift before it becomes a runtime failure.

Evaluation-First Development: No feature enters development without defined success criteria and regression tests. Run prompt benchmarking, track output quality across model versions, and flag hallucination patterns or quality regressions proactively. Evaluation is not a post-build step.

Pipeline & ETL Work: Build and maintain ETL/ELT pipelines supporting daily automated callouts and weekly optimisation reporting. Own data freshness and pipeline reliability for the agents you are responsible for.

MCP Connector Work: Operate within and extend the MCP connector library for external platform APIs. Handle rate limits, retries, and failure modes — connectors must be resilient in production, not just in testing.

Human-in-the-Loop Workflows: Build and maintain Slack-based approval flows — agent callouts, feedback capture, exception alerts, and operational notifications. These are the primary interface between the AI system and human decision-makers.

Production Reliability: Own the reliability of your agents in production. Monitor output quality, respond to incidents, drive root-cause fixes rather than surface patches. Alert the Lead AI Engineer early on scope or complexity that affects delivery.

CANDIDATE PROFILE

4+ years across software, data engineering, ML, or AI platform work with direct ownership of production systems
Experience with media platform APIs (Google Ads, Meta, DV360, Semrush, SerpAPI)
Strong Python and SQL — production-grade, not just analytical scripts
MCP or equivalent integration layer experience
Hands-on experience building or operating LLM applications, agentic systems, or tool-calling workflows
Workflow orchestration tooling: Airflow, Dagster, Prefect, dbt
ETL/ELT pipeline design and data reliability in production — schema management, contract enforcement, freshness monitoring
Cloud infrastructure: AWS, GCP, or Azure; containerized deployments (Docker)
Experience defining evaluation frameworks and success criteria for model outputs
Slack API and webhook-based workflow automation
Familiarity with vector databases and RAG patterns for long-context data retrieval
Experience shipping systems that mix model logic, deterministic business rules, and human approval flows
LLM evaluation tooling — token cost tracking, hallucination detection, model benchmarking

THIS ROLE IS NOT

Generalist developers without LLM or agentic system experience
Prompt engineering work without responsibility for data quality and pipeline reliability
Low-code automation builders (n8n, Make, Zapier)
A role with loose evaluation standards or optional quality gates
Research-only AI scientists without production ownership
A role where shipping late with a perfect solution is acceptable
Candidates who need architecture decisions made for them before starting
A role without direct accountability for production incidents on your agents

WHAT WE OFFER

Hybrid working model: three days in the office (Tuesday to Thursday)
A competitive salary with opportunities for growth
Private medical care at Medicover
Multisport card
Annual education budget of $250
Generous employee referral program
Catered office lunch every Tuesday
Snacks and occasional breakfasts available in the office

Skills

Agentic SystemsAirflowAWSAzureDagsterDbtDockerGCPLLMMCPPrefectPythonRAGSlack APISQLTool-calling WorkflowsVector Databases

AI Engineer

About this role

Skills

Explore related jobs

More jobs at Delve

Similar Agentic Systems jobs

Jobs in Warsaw

Browse these categories