Posted 6 months ago

Data Engineer (Python)

BerlinRemoteFull-time

AI Summary

A data engineer rapidly prototypes and delivers end-to-end batch and streaming data pipelines, designs adoptable schemas and data models, and produces handoff artifacts for productization.

About this role

Data Engineer (Python)

Company

Orcrist builds the Orcrist Intelligence Platform (OIP), a Kubernetes-based data intelligence system delivered as SaaS or self-hosted/on-prem (including air-gapped deployments). We run streaming and batch pipelines that power search, ML enrichment, and investigative workflows for mission-critical customers.

Role

Rapidly validate new data initiatives end-to-end—without sacrificing adoptability. On Innovation, you’ll prototype representative connectors and pipelines (batch + streaming), generate credible performance/operability readouts, and ship a handoff package that Foundation or a delivery team can productize.

What you'll do

Prototype ingestion and connector patterns (batch + streaming) using NiFi, Kafka, Kafka Connect/Streams, and CDC approaches.
Design “prototype-grade but adoptable” schemas and data models with clear semantics and evolution discipline.
Build incremental lakehouse datasets (Hudi/Iceberg/Delta patterns) and produce queryable outputs for realistic latency/throughput evaluation.
Bake in data quality and provenance mindset early (checks, metadata hooks, operability basics).
Containerize and deploy prototypes on Kubernetes; deliver minimal runbooks/configs that make adoption straightforward.
Produce adoption artifacts: schemas, reference implementations, technical design notes, and an integration backlog.

About You

3+ years data engineering experience (level dependent) with real pipeline delivery beyond ad-hoc scripts.
Strong Python + SQL; comfortable building transformations, validation tooling, and pipeline glue code.
Practical streaming/CDC fundamentals (ordering, duplication, replay, idempotency) and Kafka ecosystem experience.
Familiar with lakehouse/storage and query layers (e.g., Hudi/Iceberg/Delta, Trino/Hive/Postgres) and how to make datasets usable.
Comfortable working in Kubernetes/container environments and documenting decisions clearly.
Eligible to work in Germany; EU/NATO citizenship preferred and export-control screening applies.

Nice‑to‑haves

Great Expectations or similar data quality tooling; metadata/lineage platforms (OpenMetadata/DataHub/Atlas).
Experience shipping in on-prem or air-gapped environments; governance/policy awareness for regulated customers.
German language (B1+) and/or experience with OSINT/GEOINT/multi-INT data shapes.

What We Offer

Modern data stack with real constraints: Kafka + NiFi + lakehouse + distributed SQL + Kubernetes.
Remote-first in Germany with regular Berlin prototyping sprints, 30 days vacation, equipment & learning budget.
High leverage: your prototypes become blueprints multiple teams reuse and productize.

Skills

Apache HudiApache IcebergApache KafkaApache NiFiChange Data CaptureDelta LakeHiveKafka ConnectKafka StreamsKubernetesPostgreSQLPythonSQLTrino

Data Engineer (Python)

About this role

Data Engineer (Python)

Company

Role

What you'll do

About You

Nice‑to‑haves

What We Offer

Skills

Explore related jobs

More jobs at Orcrist Technologies

Similar Apache Hudi jobs

Jobs in Berlin

Browse these categories