TY - JOUR
T1 - Biomedical Data Manifest
T2 - A lightweight data documentation mapping to increase transparency for AI/ML
AU - The ARTNet Consortium
AU - Bottomly, Daniel
AU - Suciu, Christopher G.
AU - Cordier, Benjamin
AU - Evans, Nathaniel
AU - Poire, Alfonso
AU - Zheng, Christina
AU - Deininger, Michael
AU - Zang, Xingxing
AU - Witkiewicz, Agnieszka
AU - Knudsen, Erik
AU - Boutros, Paul C.
AU - Soragni, Alice
AU - Vlashi, Erina
AU - Lin, Chunru
AU - Tran, Phuoc
AU - Chan, Keith Syson
AU - Alumkal, Joshi
AU - Willey, Christopher
AU - Liu, Tao
AU - Liu, Song
AU - Goodrich, David W.
AU - Druker, Brian
AU - Dong, Jixin
AU - Hollingsworth, Michael
AU - Singh, Pankaj
AU - Koong, Albert
AU - Gan, Boyi
AU - Roth, Jack
AU - Bivona, Trever
AU - Sandulache, Vlad
AU - Myers, Jeffrey
AU - Tyner, Jeffrey W.
AU - Hutson, Alan
AU - McWeeney, Shannon K.
N1 - Publisher Copyright:
© The Author(s) 2026.
PY - 2026/12
Y1 - 2026/12
N2 - Biomedical machine learning (ML) models raise critical concerns about embedded assumptions influencing clinical decision-making, necessitating robust documentation frameworks for datasets that are shared via external repositories. Fairness-aware algorithm effectiveness hinges on users’ prior awareness of specific issues in the data – information such as data collection methodology, provenance and quality. Current ML-focused documentation approaches impose impractical burdens on data generators and conflate data/model accountability. This is problematic for resource datasets not explicitly created for ML applications. This study addresses these gaps through a two-step process: First, we derived consensus documentation fields by mapping elements across four key templates. Second, we surveyed biomedical stakeholders across four roles (clinicians, bench scientists, data manager and computationalists) to assess field importance and relevance. This revealed important role-dependent prioritization differences, motivating the development of the Biomedical Data Manifest – a modular template employing persona-specific field presentation reducing generator burden while ensuring end-users receive role-relevant information. The Biomedical Data Manifest improves transparency for datasets deposited in public or controlled-access repositories and bias mitigation across ML applications.
AB - Biomedical machine learning (ML) models raise critical concerns about embedded assumptions influencing clinical decision-making, necessitating robust documentation frameworks for datasets that are shared via external repositories. Fairness-aware algorithm effectiveness hinges on users’ prior awareness of specific issues in the data – information such as data collection methodology, provenance and quality. Current ML-focused documentation approaches impose impractical burdens on data generators and conflate data/model accountability. This is problematic for resource datasets not explicitly created for ML applications. This study addresses these gaps through a two-step process: First, we derived consensus documentation fields by mapping elements across four key templates. Second, we surveyed biomedical stakeholders across four roles (clinicians, bench scientists, data manager and computationalists) to assess field importance and relevance. This revealed important role-dependent prioritization differences, motivating the development of the Biomedical Data Manifest – a modular template employing persona-specific field presentation reducing generator burden while ensuring end-users receive role-relevant information. The Biomedical Data Manifest improves transparency for datasets deposited in public or controlled-access repositories and bias mitigation across ML applications.
UR - https://www.scopus.com/pages/publications/105032029178
U2 - 10.1038/s41597-026-06670-0
DO - 10.1038/s41597-026-06670-0
M3 - Article
C2 - 41673000
AN - SCOPUS:105032029178
SN - 2052-4463
VL - 13
JO - Scientific Data
JF - Scientific Data
IS - 1
M1 - 414
ER -