Skip to main navigation Skip to search Skip to main content

Classifying imbalanced data streams via dynamic feature group weighting with importance sampling

  • Ke Wu
  • , Andrea Edwards
  • , Wei Fan
  • , Jing Gao
  • , Kun Zhang
  • Xavier University of Louisiana
  • Huawei Noah Ark's Lab

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Scopus citations

Abstract

Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.

Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining 2014, SDM 2014
EditorsMohammed Zaki, Zoran Obradovic, Pang Ning-Tan, Arindam Banerjee, Chandrika Kamath, Srinivasan Parthasarathy
PublisherSociety for Industrial and Applied Mathematics Publications
Pages722-730
Number of pages9
ISBN (Electronic)9781510811515
DOIs
StatePublished - 2014
Event14th SIAM International Conference on Data Mining, SDM 2014 - Philadelphia, United States
Duration: Apr 24 2014Apr 26 2014

Publication series

NameSIAM International Conference on Data Mining 2014, SDM 2014
Volume2

Conference

Conference14th SIAM International Conference on Data Mining, SDM 2014
Country/TerritoryUnited States
CityPhiladelphia
Period04/24/1404/26/14

Keywords

  • Class imbalance
  • Data stream classification
  • Ensemble weighting
  • Feature group ensemble
  • Importance sampling

Fingerprint

Dive into the research topics of 'Classifying imbalanced data streams via dynamic feature group weighting with importance sampling'. Together they form a unique fingerprint.

Cite this