Jobless Developer
Hyphen Connect Limited logo
Hyphen Connect Limited

Posted 1 month ago

Open

Synthetic Data Engineer (AI Data/Training)

SingaporeOn-siteFull-time

AI Summary

Synthetic Data Engineer who builds domain-specific synthetic data generation pipelines for training loops, with data quality, de-duplication, and integration into SFT/DPO workflows.

About this role

We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.

Responsibilities:

  • Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
  • Implement automated quality scoring and de-duplication systems.
  • Manage data pipelines that feed directly into SFT and DPO training loops.

Qualifications:

  • Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation and bias mitigation.

Skills

AirflowBias MitigationData PipelinesDataset DistillationDe-duplicationDPO Training LoopsPrompt EngineeringQuality ScoringRaySFTSparkSynthetic Data Generation

Explore related jobs

Browse these categories