TY - GEN
T1 - Multi-source hierarchical prediction consolidation
AU - Zhang, Chenwei
AU - Xie, Sihong
AU - Li, Yaliang
AU - Gao, Jing
AU - Fan, Wei
AU - Yu, Philip S.
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/24
Y1 - 2016/10/24
N2 - In big data applications such as healthcare data mining, due to privacy concerns, it is necessary to collect predictions from multiple information sources for the same instance, with raw features being discarded or withheld when aggregating multiple predictions. Besides, crowd-sourced labels need to be aggregated to estimate the ground truth of the data. Due to the imperfection caused by predictive models or human crowdsourcing workers, noisy and conflicting information is ubiquitous and inevitable. Although state-of-the-art aggregation methods have been proposed to handle label spaces with flat structures, as the label space is becoming more and more complicated, aggregation under a label hierarchical structure becomes necessary but has been largely ignored. These label hierarchies can be quite informative as they are usually created by domain experts to make sense of highly complex label correlations such as protein functionality interactions or disease relationships. We propose a novel multi-source hierarchical prediction consolidation method to effectively exploits the complicated hierarchical label structures to resolve the noisy and conflicting information that inherently originates from multiple imperfect sources. We formulate the problem as an optimization problem with a closed-form solution. The consolidation result is inferred in a totally unsupervised, iterative fashion. Experimental results on both synthetic and real-world data sets show the effectiveness of the proposed method over existing alternatives.
AB - In big data applications such as healthcare data mining, due to privacy concerns, it is necessary to collect predictions from multiple information sources for the same instance, with raw features being discarded or withheld when aggregating multiple predictions. Besides, crowd-sourced labels need to be aggregated to estimate the ground truth of the data. Due to the imperfection caused by predictive models or human crowdsourcing workers, noisy and conflicting information is ubiquitous and inevitable. Although state-of-the-art aggregation methods have been proposed to handle label spaces with flat structures, as the label space is becoming more and more complicated, aggregation under a label hierarchical structure becomes necessary but has been largely ignored. These label hierarchies can be quite informative as they are usually created by domain experts to make sense of highly complex label correlations such as protein functionality interactions or disease relationships. We propose a novel multi-source hierarchical prediction consolidation method to effectively exploits the complicated hierarchical label structures to resolve the noisy and conflicting information that inherently originates from multiple imperfect sources. We formulate the problem as an optimization problem with a closed-form solution. The consolidation result is inferred in a totally unsupervised, iterative fashion. Experimental results on both synthetic and real-world data sets show the effectiveness of the proposed method over existing alternatives.
KW - Crowdsourcing
KW - Ensemble
KW - Hierarchy
KW - Unsupervised learning
UR - https://www.scopus.com/pages/publications/84996548972
U2 - 10.1145/2983323.2983676
DO - 10.1145/2983323.2983676
M3 - Conference contribution
AN - SCOPUS:84996548972
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2251
EP - 2256
BT - CIKM 2016 - Proceedings of the 2016 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 25th ACM International Conference on Information and Knowledge Management, CIKM 2016
Y2 - 24 October 2016 through 28 October 2016
ER -