Hyphen Connect Limited
Posted 1 month ago
Synthetic Data Engineer (AI Data/Training)
AustraliaRemoteFull-time
AI Summary
Synthetic Data Engineer who designs domain-specific data generation pipelines and automated quality scoring for model training loops.
About this role
We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.
Responsibilities:
- Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
- Implement automated quality scoring and de-duplication systems.
- Manage data pipelines that feed directly into SFT and DPO training loops.
Qualifications:
- Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
- Deep knowledge of prompt engineering for data generation.
- Familiarity with dataset distillation and bias mitigation.
Skills
AirflowBias MitigationData ManagementData PipelinesDataset DistillationDe-duplicationDPO TrainingPrompt EngineeringRaySDG (synthetic Data Generation)SFT TrainingSpark
Explore related jobs
More jobs at Hyphen Connect Limited
- Compliance Officer/ Money Laundering Reporting Officer (CO/MLRO)Hong Kong
- DeFi Product Owner (Bilingual: English & Mandarin)Remote - Global
- Senior 2D Game Artist (Characters & Animation) - Taiwan/ Hong KongAPAC
- Social Media Content Creator (Instagram Reels | AI-Driven)APAC
- Founding Engineer/ Tech Lead (Stablecoin Cross-border Payments Infrastructure)APAC
- Founding Mobile Development Lead (Game/ React Native/ Mandarin speaking)APAC