Skip to main navigation Skip to search Skip to main content

Biomedical Data Manifest: A lightweight data documentation mapping to increase transparency for AI/ML

  • The ARTNet Consortium
  • Oregon Health and Science University
  • Versiti
  • Albert Einstein College of Medicine
  • Roswell Park Cancer Institute
  • University of California at Los Angeles
  • University of Texas MD Anderson Cancer Center
  • University of Maryland, Baltimore
  • Houston Methodist
  • University of Michigan, Ann Arbor
  • University of Alabama at Birmingham
  • University of Nebraska Medical Center
  • University of Oklahoma
  • University of California at San Francisco
  • Baylor College of Medicine
  • University of Texas Health Science Center at Houston

Research output: Contribution to journalArticlepeer-review

Abstract

Biomedical machine learning (ML) models raise critical concerns about embedded assumptions influencing clinical decision-making, necessitating robust documentation frameworks for datasets that are shared via external repositories. Fairness-aware algorithm effectiveness hinges on users’ prior awareness of specific issues in the data – information such as data collection methodology, provenance and quality. Current ML-focused documentation approaches impose impractical burdens on data generators and conflate data/model accountability. This is problematic for resource datasets not explicitly created for ML applications. This study addresses these gaps through a two-step process: First, we derived consensus documentation fields by mapping elements across four key templates. Second, we surveyed biomedical stakeholders across four roles (clinicians, bench scientists, data manager and computationalists) to assess field importance and relevance. This revealed important role-dependent prioritization differences, motivating the development of the Biomedical Data Manifest – a modular template employing persona-specific field presentation reducing generator burden while ensuring end-users receive role-relevant information. The Biomedical Data Manifest improves transparency for datasets deposited in public or controlled-access repositories and bias mitigation across ML applications.

Original languageEnglish
Article number414
JournalScientific Data
Volume13
Issue number1
DOIs
StatePublished - Dec 2026

Fingerprint

Dive into the research topics of 'Biomedical Data Manifest: A lightweight data documentation mapping to increase transparency for AI/ML'. Together they form a unique fingerprint.

Cite this