Skip to main navigation Skip to search Skip to main content

OnlineCM: Real-time consensus classification with missing values

  • Bowen Dong
  • , Sihong Xie
  • , Jing Gao
  • , Wei Fan
  • , Philip S. Yu
  • University of Illinois at Chicago
  • Baidu Inc
  • Tsinghua University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Combining predictions from multiple sources or models has been shown to be a useful technique in data mining. For example, in network anomaly detection, multiple detectors' output have to be combined to obtain the diagnostic decisions. Unfortunately, as data are generated at an increasingly high speed, existing prediction aggregation methods are facing new challenges. First, the high velocity and hugh volume of the data render existing batch mode prediction aggregation algorithms infeasible. Second, due to the heterogeneity, predictions from multiple models or data sources might not be perfectly synchronized, leading to abundant missing values in the prediction stream. We propose OnlineCM, short for Online Consensus Maximization, to address the above challenges. OnlineCM keeps only a minimal yet sufficient footprint for both consensus prediction and missing value imputation over the prediction stream. In particular, we show that the correlations among base models or data sources are sufficient for effective consensus prediction, require small storage and can be updated in an online fashion. Further, we identify a reinforcing relationship between missing value imputation and the consensus predictions, leading to a novel consensus-based missing values imputation method, which in turn makes model correlation estimation more accurate. Experiments demonstrates that OnlineCM achieves aggregated predictions that has close performance to the batch mode consensus maximization algorithm, and outperforms baseline methods significantly in 4 large real world datasets.

Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining 2015, SDM 2015
EditorsSuresh Venkatasubramanian, Jieping Ye
PublisherSociety for Industrial and Applied Mathematics Publications
Pages685-693
Number of pages9
ISBN (Electronic)9781510811522
DOIs
StatePublished - 2015
EventSIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada
Duration: Apr 30 2015May 2 2015

Publication series

NameSIAM International Conference on Data Mining 2015, SDM 2015

Conference

ConferenceSIAM International Conference on Data Mining 2015, SDM 2015
Country/TerritoryCanada
CityVancouver
Period04/30/1505/2/15

Fingerprint

Dive into the research topics of 'OnlineCM: Real-time consensus classification with missing values'. Together they form a unique fingerprint.

Cite this