Jobless Developer
robusta logo
robusta

Posted 6 days ago

Open

Senior Data Quality Engineer (4 Months Contract ) Onsite in UAE - Octopus by RTG

Abu DhabiOn-site

AI Summary

About the RoleWe are seeking an experienced Senior Databricks Data Quality Engineer to lead the design, implementation, and automation of enterprise-scale data quality frameworks within a Databricks environment.

About this role

About the Role

We are seeking an experienced Senior Databricks Data Quality Engineer to lead the design, implementation, and automation of enterprise-scale data quality frameworks within a Databricks environment. The successful candidate will play a key role in establishing data quality controls, profiling frameworks, remediation processes, and AI-assisted quality monitoring across a large-scale data platform consisting of 170+ datasets and over 1,300 Critical Data Elements (CDEs).

This role requires strong expertise in Databricks, PySpark, Delta Lake, MLflow, and modern data quality management practices.

Key Responsibilities

Data Platform & Databricks Configuration

  • Configure and manage Databricks workspaces, compute clusters, PySpark notebooks, Delta Lake architecture, and Unity Catalog integrations.
  • Design scalable data quality processing frameworks across 170+ datasets and 1,346 prioritized Critical Data Elements (CDEs).

Data Profiling & Quality Assessment

  • Develop AI-assisted profiling notebooks using PySpark to establish baseline data quality scores.
  • Assess data quality across six key dimensions including:
    • Completeness
    • Uniqueness
    • Validity
    • Consistency
    • Accuracy
    • Timeliness
  • Analyze null rates, duplicate records, invalid values, format violations, outliers, and schema drift.

Data Quality Rule Framework

  • Design and build a scalable Data Quality Rule Factory using parameterized PySpark functions.
  • Enable automated deployment of over 6,700 data quality rules without manual rule-by-rule development.
  • Create reusable rule templates across datasets and data quality dimensions.

Pipeline Quality Enforcement

  • Integrate data quality controls within Bronze, Silver, and Gold Delta Lake layers.
  • Implement quality gates that prevent data progression unless predefined thresholds are met.
  • Develop reusable Databricks Jobs for automated validation and monitoring.

Data Cleansing & AI-Driven Remediation

  • Build automated data cleansing pipelines for:
    • Standardization
    • Deduplication
    • Schema harmonization
  • Deploy MLflow-managed machine learning models for:
    • Anomaly detection
    • Fuzzy duplicate detection
    • Exact duplicate identification
  • Ensure explainability of model outputs and support human-in-the-loop validation processes.

Exception Management

  • Design failed-record handling frameworks and quarantine Delta tables.
  • Capture failure reasons, affected CDEs, rule references, and timestamps.
  • Develop automated reprocessing mechanisms for corrected records.

Data Quality Monitoring & Reporting

  • Build Delta Lake aggregation tables for data quality metrics.
  • Deliver data quality KPIs to Power BI dashboards including:
    • Dimension-level scores
    • Rule pass/fail rates
    • SLA adherence metrics
  • Configure automated alerting using Databricks SQL Alerts and Azure Monitor.

Predictive Data Quality Analytics

  • Develop predictive models to identify datasets at risk of quality degradation.
  • Support AI-assisted Root Cause Analysis (RCA) using profiling outputs and machine learning techniques.
  • Export and prepare remediation datasets for prioritization and governance reporting.

Requirements

  • Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related field.
  • 5+ years of experience in Data Engineering or Data Quality Engineering.
  • 3+ years of hands-on experience with Databricks and PySpark.
  • Strong expertise in Delta Lake architecture and data pipeline development.
  • Experience with Unity Catalog implementation and governance.
  • Hands-on experience with MLflow and machine learning deployment.
  • Strong SQL skills and data modeling expertise.
  • Experience building enterprise-scale data quality frameworks.
  • Experience integrating Databricks with Power BI and Azure services.
  • Strong understanding of data governance, metadata management, and data quality dimensions.

Preferred Qualifications

  • Microsoft Azure certifications.
  • Databricks Certified Data Engineer Associate or Professional.
  • Experience with enterprise data governance programs.
  • Experience implementing AI-assisted data quality and remediation solutions.
  • Knowledge of Master Data Management (MDM) principles.

Explore related jobs

Browse these categories