Skip to main navigation Skip to search Skip to main content

Comparison of multivariate pooling strategies based on skewed data in light of the receiver operating characteristic curve analysis

  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

To tackle a fast spread of infectious disease (e.g., COVID-19), statistical instruments for diagnostic testing based on large populations given a fixed budget are required. This motivates investigators to attend to different sampling strategies. Pooling biospecimens and random sampling are well-accepted methods for collecting measurements associated with large biological datasets. Applications of equal size pools of biological assays can reduce costs of data collections. This strategy is very efficient when individual test scores are normally distributed. When individual measurements follow skewed distributions, the traditional pooling design can provide inaccurate estimations of unknown quantiles. We address an implementation of a simple one-pool design that combines measurements from a random sampling and one pool of bioassays. In the setting of the skewness of underlying data, we compare the following strategies: (a) random sampling; (b) traditional equal pooling group size design; and (c) one-pool design. We aim to (1) examine the strategies above, when biomarkers' values are from skewed distributions, identifying an optimal strategy with respect to the quality of estimators of unknown parameters; and (2) explain best combinations of pooled biomarkers' values, maximizing their discriminating ability with respect to diseased and healthy populations from bivariate skewed distributions. The efficiency of these designs is evaluated in the context of estimating the receiver operating characteristic curves. In both the theoretical and experimental aspects, we conclude that when we are able to measure only n biospecimens from N individual tests, the suggested scheme is as follows: randomly select and measure n−1 biospecimens, and the rest N−n+1 bioassays can be pooled together and then measured. The proposed strategy is exemplified using a real data from a study on coronary heart disease.

Original languageEnglish
Title of host publicationModern Inference Based on Health-Related Markers
Subtitle of host publicationBiomarkers and Statistical Decision Making
PublisherElsevier
Pages245-281
Number of pages37
ISBN (Electronic)9780128152478
ISBN (Print)9780128152485
DOIs
StatePublished - Jan 1 2024

Keywords

  • AUC
  • Best combination of biomarkers
  • Bivariate gamma distribution
  • Pooling
  • Random sampling
  • ROC curves analysis

Fingerprint

Dive into the research topics of 'Comparison of multivariate pooling strategies based on skewed data in light of the receiver operating characteristic curve analysis'. Together they form a unique fingerprint.

Cite this