TY - GEN
T1 - Active learning based news veracity detection with feature weighting and deep-shallow fusion
AU - Das Bhattacharjee, Sreyasee
AU - Talukder, Ashit
AU - Balantrapu, Bala Venkatram
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - The objective of a news veracity detection system is to identify various types of potentially misleading or false information, typically in a digital platform. A critical challenge in this scenario is that there are large volumes of data available online. However, obtaining samples with annotations (i.e. ground-truth labels) is difficult and a known limiting factor for many data analytic tasks including the current problem of news veracity detection. In this paper, we propose a human-machine collaborative learning system to evaluate the veracity of a news content, with a limited amount of annotated data samples. In a semi-supervised scenario, an initial classifier is learnt on a small, limited amount of the annotated data followed by an interactive approach to gradually update the model by shortlisting only relevant samples from the large pool of unlabeled data that are most likely to improve the classifier performance. Our prioritized active learning solution achieves faster convergence in terms of the classification performance, while requiring about 1-2 orders of magnitude fewer annotated samples compared to fully supervised solutions to attain a reasonably acceptable accuracy of nearly 80%. Unlike traditional deep learning architecture, the proposed active learning based deep model designed with a smaller number of more localized filters per layer can efficiently learn from small relevant sample batches that can effectively improve performance in the weakly-supervised learning environment and thus is more suitable for several practical applications. An effective dynamic domain adaptive feature weighting scheme can adjust the relative importance of feature dimensions iteratively. Insightful initial feedback gathered from two independent learning modules (a NLP shallow feature based classifier and a deep classifier), modeled to capture complementary information about data characteristics are finally fused together to achieve an impressive 25% average gain in the detection performance.
AB - The objective of a news veracity detection system is to identify various types of potentially misleading or false information, typically in a digital platform. A critical challenge in this scenario is that there are large volumes of data available online. However, obtaining samples with annotations (i.e. ground-truth labels) is difficult and a known limiting factor for many data analytic tasks including the current problem of news veracity detection. In this paper, we propose a human-machine collaborative learning system to evaluate the veracity of a news content, with a limited amount of annotated data samples. In a semi-supervised scenario, an initial classifier is learnt on a small, limited amount of the annotated data followed by an interactive approach to gradually update the model by shortlisting only relevant samples from the large pool of unlabeled data that are most likely to improve the classifier performance. Our prioritized active learning solution achieves faster convergence in terms of the classification performance, while requiring about 1-2 orders of magnitude fewer annotated samples compared to fully supervised solutions to attain a reasonably acceptable accuracy of nearly 80%. Unlike traditional deep learning architecture, the proposed active learning based deep model designed with a smaller number of more localized filters per layer can efficiently learn from small relevant sample batches that can effectively improve performance in the weakly-supervised learning environment and thus is more suitable for several practical applications. An effective dynamic domain adaptive feature weighting scheme can adjust the relative importance of feature dimensions iteratively. Insightful initial feedback gathered from two independent learning modules (a NLP shallow feature based classifier and a deep classifier), modeled to capture complementary information about data characteristics are finally fused together to achieve an impressive 25% average gain in the detection performance.
KW - Active Learning
KW - Decision Fusion
KW - Deep Classification
KW - Fake News Detection
KW - Rumor
UR - https://www.scopus.com/pages/publications/85047809284
U2 - 10.1109/BigData.2017.8257971
DO - 10.1109/BigData.2017.8257971
M3 - Conference contribution
AN - SCOPUS:85047809284
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 556
EP - 565
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
Y2 - 11 December 2017 through 14 December 2017
ER -