Jobless Developer
Samba logo
Samba

Posted 2 months ago

Open

Senior Data Engineer

WarsawOn-siteFull-time

AI Summary

Senior Data Engineer leads design and development of scalable data pipelines and infrastructure for Samba's analytics, mentors engineers, and collaborates with Data Science, Analytics, and Product teams to deliver production-ready data solutions.

About this role

Samba is an AI-powered media intelligence company on a mission to give marketers the complete picture of their audiences. Our AI indexes media consumption across millions of smart TVs and 2.5 billion web pages, combining that data with third-party signals through the Samba Knowledge Graph, a map of the real interests, behaviors, and purchase intent of 1.5 billion user profiles globally. Brands, agencies, publishers, and platforms use Samba to make smarter decisions across every stage of the marketing funnel.

As a Senior Data Engineer, you will be responsible for leading the development of scalable, high-performance data pipelines and infrastructure that power Samba analytics and insights. You will play a critical role in designing and implementing architectural improvements, ensuring best practices, and mentoring a team of engineers. You will collaborate closely with Data Science, Analytics, and Product teams to deliver robust, production-ready data solutions that drive business impact.

What You'll Do

  • Architectural Leadership: Lead the design and development of scalable, high-performance data pipelines and infrastructure that power Samba TV's analytics.
  • Complex Problem Solving: Resolve complex technical issues in creative and effective ways, understanding the interrelationships of different disciplines.
  • Production Dataset Management: Lead the design, build, and maintenance of high-scale production datasets. Ensure the delivery of versioned outputs and reliable customer-facing reports.
  • Schema Evolution & Lifecycle: Manage the end-to-end lifecycle of data features - adding new attributes, updating business logic, and executing safe rollouts (staging → performance check → production), including complex backfills and reprocessing.
  • Performance & Cost Engineering: Architect scalable and efficient solutions for data ingestion, transformation, and storage, ensuring performance, reliability, and security. Proactively drive down compute and storage costs.
  • Incident Response & Debugging: Serve as a technical lead for production incidents. Investigate root causes, implement permanent fixes, and validate data recovery across the ecosystem.
  • Cross-Functional Influence: Network with key contacts outside your own area of expertise and frequently advise others on complex matters.
  • Mentorship: Guide the development of new policies and ideas while mentoring and guiding other engineers to foster a culture of technical excellence.
  • Observability: Enhance monitoring and observability of data processes, improving debugging, error detection, and system reliability at scale.
  • Who You Are

  • Experience: Typically 8+ years of related experience with a Bachelor’s degree (or 6 years with a Master’s; 3 years with a PhD).
  • Technical Mastery: Advanced knowledge of Python and deep understanding of distributed data processing frameworks like Apache Spark or PySpark.
  • Orchestration Expertise: Must-have expertise in Apache Airflow and Databricks for orchestration and scalable data processing.
  • Infrastructure: Extensive experience with cloud-based data infrastructure (AWS, GCP) and modern data lake architectures.
  • Data Modeling: Strong knowledge of data modeling, database design, and query optimization for both relational and non-relational databases.
  • Software Excellence: Proven track record of driving best practices for code quality, testing, and software design in a data engineering context.
  • Communication: Ability to adapt your communication style and use persuasion when delivering messages that relate to the wider firm business.
  • Skills

    Apache AirflowApache SparkAWSBackfillsBatch ProcessingCloud-based Data InfrastructureDatabase DesignDatabricksData GovernanceData IngestionData Lake ArchitecturesData ModelingData TransformationDebuggingGCPMonitoringObservabilityOrchestrationPerformance & Cost EngineeringProduction Data PipelinesPySparkPythonQuery OptimizationReprocessingSecurityStorageVersioned Outputs

    Explore related jobs

    Browse these categories