AI Engineer
AI Summary
WHO WE ARE Delve Deeper is a performance media agency focused on the charity and nonprofit sector, partnering with organizations that invest $5M–$20M annually in media.
About this role
WHO WE ARE
Delve Deeper is a performance media agency focused on the charity and nonprofit sector, partnering with organizations that invest $5M–$20M annually in media. We help mission-driven teams maximize impact through advanced digital strategies that drive measurable, scalable results.
Our expertise includes advanced analytics, intent-based audience segmentation, full-service media management, and personalized creative—delivering a fully integrated, data-driven approach to growth.
More than a vendor, we serve as a strategic partner, helping organizations solve complex media challenges and turn them into clear outcomes. With decades of leadership experience, Delve Deeper is a trusted voice in the charity space.
We’ve also been named Built In Colorado’s “Best Places to Work” for five consecutive years, reflecting a culture that values performance, growth, and people. As a privately owned company, we move quickly, support our team holistically, and create meaningful opportunities for advancement.
ROLE OVERVIEW
Dedicated individual contributor focused entirely on building and maintaining a multi-agent AI system that automates performance media trading decisions. This is not a generalist developer role. You will work within a complex multi-agent codebase, operate under strict evaluation-first protocols, and develop deep enough understanding of the business domain to make sound implementation decisions without constant oversight. Data errors have direct financial consequences for clients — engineering quality is non-negotiable.
CORE RESPONSIBILITIES
- Agent Development: Build, iterate, and maintain AI agents within the architecture and boundaries defined by the Lead AI Engineer. Own your agents end-to-end: prompt design, tool wiring, context routing, failure handling, and output validation.
- Data Integration: Integrate data components across media platforms — ingesting, normalizing to schema, and routing to the correct agent context. Work within defined data contracts and surface schema drift before it becomes a runtime failure.
- Evaluation-First Development: No feature enters development without defined success criteria and regression tests. Run prompt benchmarking, track output quality across model versions, and flag hallucination patterns or quality regressions proactively. Evaluation is not a post-build step.
- Pipeline & ETL Work: Build and maintain ETL/ELT pipelines supporting daily automated callouts and weekly optimisation reporting. Own data freshness and pipeline reliability for the agents you are responsible for.
- MCP Connector Work: Operate within and extend the MCP connector library for external platform APIs. Handle rate limits, retries, and failure modes — connectors must be resilient in production, not just in testing.
- Human-in-the-Loop Workflows: Build and maintain Slack-based approval flows — agent callouts, feedback capture, exception alerts, and operational notifications. These are the primary interface between the AI system and human decision-makers.
- Production Reliability: Own the reliability of your agents in production. Monitor output quality, respond to incidents, drive root-cause fixes rather than surface patches. Alert the Lead AI Engineer early on scope or complexity that affects delivery.
CANDIDATE PROFILE
- 4+ years across software, data engineering, ML, or AI platform work with direct ownership of production systems
- Experience with media platform APIs (Google Ads, Meta, DV360, Semrush, SerpAPI)
- Strong Python and SQL — production-grade, not just analytical scripts
- MCP or equivalent integration layer experience
- Hands-on experience building or operating LLM applications, agentic systems, or tool-calling workflows
- Workflow orchestration tooling: Airflow, Dagster, Prefect, dbt
- ETL/ELT pipeline design and data reliability in production — schema management, contract enforcement, freshness monitoring
- Cloud infrastructure: AWS, GCP, or Azure; containerized deployments (Docker)
- Experience defining evaluation frameworks and success criteria for model outputs
- Slack API and webhook-based workflow automation
- Familiarity with vector databases and RAG patterns for long-context data retrieval
- Experience shipping systems that mix model logic, deterministic business rules, and human approval flows
- LLM evaluation tooling — token cost tracking, hallucination detection, model benchmarking
THIS ROLE IS NOT
- Generalist developers without LLM or agentic system experience
- Prompt engineering work without responsibility for data quality and pipeline reliability
- Low-code automation builders (n8n, Make, Zapier)
- A role with loose evaluation standards or optional quality gates
- Research-only AI scientists without production ownership
- A role where shipping late with a perfect solution is acceptable
- Candidates who need architecture decisions made for them before starting
- A role without direct accountability for production incidents on your agents
WHAT WE OFFER
- Hybrid working model: three days in the office (Tuesday to Thursday)
- A competitive salary with opportunities for growth
- Private medical care at Medicover
- Multisport card
- Annual education budget of $250
- Generous employee referral program
- Catered office lunch every Tuesday
- Snacks and occasional breakfasts available in the office
Explore related jobs
More jobs at Delve
Jobs in Warsaw
- Account Manager - German SpeakerEcovadis · Warsaw, Masovian Voivodeship
O2C Team Leader - Billing UK&IECustoms Support Group · Warsaw, Mazowieckie- Senior Legal CounselErgomed · Warsaw, Masovian Voivodeship
- Middle Data Engineer (Azure Databricks)Miratech · Warsaw, Masovian Voivodeship
- Expert Level DesignerCD PROJEKT RED · Warsaw, Masovian Voivodeship
Agentic AI Co-Founder / CCO (100 % remote) (m/f/d)EWOR GmbH · Warsaw, Poland
