AI Summary

Data Engineer builds and maintains scalable data pipelines, cloud data infrastructure, and analytics-ready datasets to power business decision-making. Focuses on ETL/ELT development, data warehousing, and reliability.

About this role

🚀 Data Engineer (Python, SQL, ETL, Airflow, Snowflake, BigQuery)

Full-Time | Remote | U.S. Business Hours

💡 About the Role

We’re hiring a highly technical Data Engineer to build and maintain scalable data pipelines, cloud data infrastructure, and analytics-ready datasets that power business decision-making.

This role is focused on:

✅ ETL/ELT pipeline development

✅ Data warehouse architecture

✅ SQL optimization

✅ Cloud-based data infrastructure

✅ Pipeline reliability & monitoring

✅ Scalable analytics systems

You’ll work closely with:

Data Analysts
Data Scientists
Engineering Teams
BI & Leadership Teams

to ensure the organization always has accurate, clean, and trustworthy data.

If you:

enjoy building robust data systems,
love optimizing pipelines and queries,
and care deeply about data quality and scalability,

this role is a strong fit.

🔥 What You’ll Own

ETL / ELT Pipeline Development

Build and maintain scalable ETL/ELT pipelines using:
- Python
- SQL
- Scala
Ingest data from:
- APIs
- SaaS platforms
- relational databases
- cloud applications
- streaming systems
Develop reliable workflows for:
- data extraction
- transformation
- loading
- validation

Workflow Orchestration & Automation

Manage orchestration platforms such as:
- Apache Airflow
- Prefect
- Dagster
- Luigi
Monitor:
- pipeline health
- failed jobs
- scheduling reliability
Build automated workflows with:
- retries
- alerting
- dependency management

Data Warehousing & Modeling

Design and optimize cloud data warehouses using:
- Snowflake
- BigQuery
- Redshift
Develop:
- star schemas
- snowflake schemas
- analytics-ready data models
Improve:
- query performance
- clustering
- partitioning
- warehouse efficiency

Data Quality & Governance

Implement:
- validation checks
- anomaly detection
- logging systems
- lineage tracking
Use tools such as:
- dbt
- Great Expectations
Ensure:
- consistent naming conventions
- clean transformations
- audit-ready datasets
Support compliance requirements:
- GDPR
- HIPAA
- industry-specific governance standards

Streaming & Real-Time Data

Build and maintain streaming pipelines using:
- Kafka
- Kinesis
- Pub/Sub
Support:
- real-time ingestion
- event-driven processing
- low-latency analytics workflows

Infrastructure & DevOps

Containerize services using:
- Docker
- Kubernetes
Build CI/CD workflows with:
- GitHub Actions
- Jenkins
- GitLab CI
Manage cloud infrastructure using:
- Terraform
- CloudFormation
Improve scalability, reliability, and deployment automation

Cross-Functional Collaboration

Partner with:
- analysts
- data scientists
- BI teams
- product teams
Deliver curated datasets for:
- dashboards
- analytics
- machine learning workflows
Support BI tools such as:
- Tableau
- Looker
- Power BI
Maintain documentation for:
- pipelines
- schemas
- workflows
- data definitions

✅ Required Experience & Skills

3+ years of Data Engineering or backend engineering experience
Strong proficiency with:
- Python
- SQL
Experience with:
- Snowflake
- BigQuery
- Redshift
Familiarity with:
- Airflow
- Prefect
- workflow orchestration tools
Strong understanding of:
- ETL pipelines
- data modeling
- cloud infrastructure
- warehouse optimization

⭐ Ideal Experience

Experience using:
- dbt
- Great Expectations
- data lineage tools
Streaming experience with:
- Kafka
- Kinesis
- Pub/Sub
Experience with:
- AWS Glue
- GCP Dataflow
- Azure Data Factory
Background in:
- healthcare
- fintech
- regulated environments
Experience optimizing large-scale warehouse costs and performance

🧠 What Makes You a Great Fit

You care deeply about clean and reliable data
You enjoy debugging complex pipeline and infrastructure issues
You think about scalability and long-term maintainability
You combine engineering rigor with analytical thinking
You communicate effectively across technical and non-technical teams

📅 What a Typical Day Looks Like

Review Airflow/Prefect pipeline health and resolve failures
Build connectors for new APIs or SaaS platforms
Optimize SQL queries and warehouse performance
Collaborate with analysts and data scientists on datasets
Improve validation and monitoring systems
Document pipelines and warehouse structures
Reduce warehouse costs and improve pipeline reliability

In short:

You build the data infrastructure that powers analytics, reporting, automation, and business intelligence across the organization.

📊 Key Success Metrics (KPIs)

Pipeline uptime ≥ 99%
Data freshness within SLA
Zero critical data quality issues reaching production
Query performance & warehouse cost optimization
Reliable and scalable pipeline infrastructure
Positive feedback from analysts, BI teams, and leadership

🌟 Why This Role Stands Out

Work on modern cloud-native data infrastructure
Build scalable ETL and analytics systems
Exposure to:
- streaming pipelines
- cloud data platforms
- orchestration frameworks
- warehouse optimization
Opportunity to grow into:
- Senior Data Engineer
- Analytics Engineering
- Platform Engineering
- Data Architecture
Fully remote flexibility with collaborative engineering teams

🧪 Interview Process

Initial Phone Screen
Video Interview with Pavago Recruiter
Technical Task

(Build a small ETL pipeline or optimize a SQL query)

Client Interview with Engineering/Data Team
Offer & Background Verification

👉 Apply Now