Skip to main navigation Skip to search Skip to main content

Robust crowd bias correction via dual knowledge transfer from multiple overlapping sources

  • Sihong Xie
  • , Qingbo Hu
  • , Jingyuan Zhang
  • , Jing Gao
  • , Wei Fan
  • , Philip S. Yu
  • University of Illinois at Chicago
  • Baidu Inc

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

One of the largest constituents of big data is the crowdsourced or user-generated data which contain a wide range of valuable information. However, they are inherently biased and possibly spammed, making trustworthy information extraction an imperative task. As a special case, we study reviewer-posted ratings for products. The biased ratings can lead to disappointed customers due to overrated products, and reduced revenues of business owners caused by undeserved negative ratings. To distill objective product quality measurements, most existing methods try to infer unbiased ratings from the raw ratings alone, and may not overcome the inherent bias to recover the underlying true ratings. Though improved bias corrections have been achieved with domain expert helps, the overhead of expert efforts can be rather expensive in practice. We exploit the variety of big data and adopt a multiple source mining approach, which finds trustworthy measurements without domain expert, but with knowledge crowdsourced and transferred from external domains. We address the challenges that the multiple data sources are 1) inherently heterogeneous, 2) at most only partially overlapping and 3) biased by themselves. We explore and analyze the strengths and weaknesses of various knowledge transfer strategies. We then propose Consensus Ranking Dual Transfer (CRDT) to handle the above challenges by identifying «anchor reviewers» as a bridge for robust «dual transfer», and removing bias in individual sources via consensus ranking aggregation. Experiments on real-world rating datasets demonstrate that the proposed approach can deliver more robust bias correcting effects than the baselines and can identify abnormal reviewers.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages815-820
Number of pages6
ISBN (Electronic)9781479999255
DOIs
StatePublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Conference

Conference3rd IEEE International Conference on Big Data, IEEE Big Data 2015
Country/TerritoryUnited States
CitySanta Clara
Period10/29/1511/1/15

Fingerprint

Dive into the research topics of 'Robust crowd bias correction via dual knowledge transfer from multiple overlapping sources'. Together they form a unique fingerprint.

Cite this