Data Manager — Multimodal Medical Foundation Models
AI Summary
Leads end-to-end data operations for multimodal medical foundation models, overseeing ingestion, cleaning, versioning, labeling, governance, and delivery of complex 3D medical data to research teams.
About this role
About the Role
You will lead data operations for a cutting-edge research group developing 3D medical multimodal foundation modelsand agentic clinical AI systems. These models rely on extremely high-quality, well-structured, and compliant datasets—including 3D medical imaging volumes (MRI, CT, PET), clinical text corpora, annotations, and multimodal metadata.
Your job is to own the end-to-end data lifecycle: acquisition, ingestion, cleaning, versioning, labeling, quality control, governance, and delivery to researchers. You are the central node ensuring our foundation model teams and medical agent teams have clean, scalable, well-documented data pipelines.
This is a pivotal foundational role—without great data, large models cannot be great.
What You Will Work On
Multimodal Medical Data Ops
- Oversee ingestion and processing of 3D medical volumes (DICOM, NIfTI, MHA) and associated clinical texts.
- Build automated pipelines for metadata extraction, de-identification, slice/series validation, and cohort structuring.
- Manage large-scale internal datasets and external research datasets (BraTS, LiTS, MIMIC-CXR, CheXpert, MosMed, etc.).
Data Infrastructure & Versioning
- Implement scalable data storage, cataloging, and retrieval systems for multimodal training data.
- Own dataset version control, lineage tracking, reproducibility, and dataset documentation.
- Collaborate with ML systems engineers on high-throughput data loaders, sharding strategies, and caching mechanisms.
Annotation & Labeling Programs
- Lead medical annotation workflows with radiologists, medical students, and labeling vendors.
- Create guidelines for ROI labeling, segmentation, captioning, report alignment, and case-level curation.
- Build semi-automated labeling pipelines using model-assisted tools.
Data Quality, Compliance & Governance
- Enforce strict standards on data quality, completeness, consistency, and bias control.
- Ensure adherence to medical data privacy, HIPAA-equivalent frameworks, and institutional data-sharing rules.
- Manage PHI de-identification, audit logs, access control, and compliance approvals.
Collaboration with Research & Engineering
- Work closely with foundation-model researchers to understand data needs for model training.
- Partner with agentic system designers to supply structured datasets for clinical reasoning tasks.
- Collaborate with foundational engineers on data access layers, performance bottlenecks, and dataset optimization.
Why This Role Is Critical
- The foundation model relies on high-quality 3D and textual data at scale.
- You shape the data pipelines enabling next-generation medical AI agents.
- You ensure clinical-grade governance, safety, reproducibility, and trust.
- Your systems become the backbone for research, experiments, and deployments.
For candidates motivated by the intersection of data, healthcare, and machine learning, this is a high-impact opportunity.
What We’re Looking For
- Strong experience managing large multimodal or imaging datasets, ideally medical imaging.
- Proficiency with DICOM/DICOMweb, NIfTI, PACS systems, and medical imaging toolkits (dicompyler, pydicom, MONAI, ITK).
- Experience with ETL pipelines, distributed data systems, and cloud/on-prem storage.
- Knowledge of metadata standards, ontologies, and text–image linking strategies.
- Comfortable working with Python, SQL, and data tooling (Airflow, Prefect, Dagster, DBT, Delta Lake, etc.).
- Understanding of data privacy, de-identification, and compliance requirements in healthcare.
- Strong communication skills and the ability to coordinate between engineers, researchers, clinicians, and data partners.
Nice to Have
- Experience with vector databases, multimodal retrieval, or embedding store design.
- Familiarity with annotation tools (Labelbox, CVAT, iMerit, custom MONAI Label pipelines).
- Prior work with clinical NLP datasets or multilingual Indian medical corpora.
- Experience conducting bias audits, dataset characterization, or quality scoring at scale.
- Contributions to open datasets, benchmarks, or data documentation frameworks.
What We Offer
- Competitive compensation.
- Access to one of the most ambitious medical multimodal datasets in the region.
- Collaboration with scientists building India’s first 3D multimodal medical foundation model.
- Autonomy to design data systems from the ground up.
- A mission-driven team working to transform clinical care with agentic AI.
Skills
Explore related jobs
More jobs at SAIGroup
- Presales ConsultantBengaluru, India
- Senior Technical RecruiterBengaluru, Karnataka
- Sales Executive (Hunter) – Mortgage IndustryBengaluru, Karnataka
- Strategic Sales Director – Enterprise Transformation (Mortgage Industry)Bengaluru, Karnataka
- Pre-Sales ArchitectLos Altos, CA
- Senior Developer — Agentic Clinical Workflow & OrchestrationBangalore
Similar Access Control jobs
Jobs in Bangalore
- Business Development Associate (Fresher)Playto Labs · Bangalore, Karnataka
- HR AssociatePlayto Labs · Bangalore, Karnataka
- DevOps EngineerTransperfect · Bangalore, Karnātaka
- Manager - Accounts Payable - SSC IndiaH&M Group · Bangalore, Karnataka
Microsoft 365 Support EngineerBosch Group · bangalore, India- Business Development RepresentativeServicenow · Bangalore, Karnataka