Senior Data Engineer
AI Summary
Senior Data Engineer Role Summary The Senior Data Engineer is responsible for designing, building, and optimizing scalable data pipelines and platform infrastructure within a medallion architecture (Bronze, Silver, Gold) on Microsoft Fabric and OneLake.
About this role
Senior Data Engineer
Role Summary
The Senior Data Engineer is responsible for designing, building, and optimizing scalable data pipelines and platform infrastructure within a medallion architecture (Bronze, Silver, Gold) on Microsoft Fabric and OneLake. This role delivers enterprise-grade ingestion, transformation, and enrichment solutions that convert raw healthcare, legal, and insurance data into structured intelligence signals used for identification scoring, analytics, and operational reporting.
The role requires deep experience with cloud data platforms, strong Python and SQL skills, and the ability to operate in a regulated healthcare environment with strict HIPAA compliance and multi-tenant data isolation requirements across a large portfolio of client contracts. This person will work closely with Data Science, ML Engineering, and Software Engineering teams to ensure reliable, governed, and performant data delivery across the organization.
Core Responsibilities
- Data Ingestion Pipeline Development
- Design and build data ingestion pipelines from multiple structured and unstructured sources including healthcare claims, P&C insurance data, and legal filings into the Bronze layer of the medallion architecture.
- Optimize ingestion workflows for reliability, throughput, and compliance across regulated production environments.
- Implement error handling, retry logic, and dead-letter patterns to ensure pipeline resilience.
- Medallion Architecture and Transformation
- Develop Silver layer transformation logic including normalization, deduplication, entity resolution, and schema enforcement within Microsoft Fabric and OneLake.
- Build Gold layer aggregations and enriched datasets that support ML scoring models and embedded analytics reporting.
- Maintain Feature Store pipelines that produce machine learning-ready feature sets for model training and inference.
- Data Governance and Compliance
- Enforce data contractual constraints from third-party data providers, including requirements for stateless processing and restrictions on data persistence or model training.
- Implement multi-tenant data isolation patterns including partitioning, access controls, and governed data handling across a large number of client contracts.
- Document data lineage, transformations, and data contracts to support governance, audit readiness, and operational clarity.
- Data Quality and Monitoring
- Build and maintain data quality validation scripts to detect schema drift, completeness gaps, and business-rule violations across pipeline stages.
- Implement monitoring on pipeline health, data freshness, and operational exceptions to maintain high-confidence production data.
- Establish alerting and escalation processes for pipeline failures and data anomalies.
- Cross-Functional Collaboration
- Partner with ML Engineering and Data Science to deliver features that support model retraining, scoring pipelines, and identification engine capabilities.
- Collaborate with Software Engineering, Analytics, and business stakeholders to translate operational needs into reliable, production-ready data solutions.
- Contribute to architectural decisions and technical documentation that support the broader data platform strategy.
Qualifications
Required
- B.S. or B.A. in Computer Science, Information Systems, Mathematics, or a related field.
- 7+ years of professional data engineering experience, preferably within Azure-based or Microsoft Fabric environments.
- Hands-on experience designing enterprise data pipelines, ETL/ELT workflows, and medallion or lakehouse architecture patterns.
- Strong programming skills in Python, with advanced SQL experience and data quality validation logic.
- Experience with Microsoft Fabric, OneLake, Azure Data Factory, or equivalent cloud data orchestration tools.
- Working knowledge of CI/CD practices for data pipelines and infrastructure-as-code concepts.
- Demonstrated experience using AI-assisted development tools (e.g., GitHub Copilot, Cursor, or similar) to accelerate pipeline development, code generation, and debugging workflows.
Preferred
- Familiarity with healthcare data formats (claims, eligibility, EDI 837/835) and HIPAA compliance requirements.
- Experience with multi-tenant data architectures and governed data handling in regulated environments.
- Exposure to ML feature engineering, Feature Store design, or data pipelines supporting model training workflows.
- Experience with dbt, PySpark, or similar transformation frameworks.
Professional Skills
- Highly organized with the ability to manage multiple concurrent technical workstreams.
- Critical thinker with strong problem-solving ability and attention to detail.
- Clear communicator, comfortable working asynchronously with distributed teams.
- Self-directed with a track record of driving projects through completion without constant oversight.
