TY - GEN
T1 - TagLearner
T2 - 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009
AU - Dutta, Haimonti
AU - Zhu, Xianshu
AU - Mahule, Tushar
AU - Kargupta, Hillol
AU - Borne, Kirk
AU - Lauth, Codrina
AU - Holz, Florian
AU - Heyer, Gerhard
PY - 2009
Y1 - 2009
N2 - The amount of text data on the Internet is growing at a very fast rate. Online text repositories for news agencies, digital libraries and other organizations currently store giga- and tera-bytes of data. Large amounts of unstructured text poses a serious challenge for data mining and knowledge extraction. End user participation coupled with distributed computation can play a crucial role in meeting these challenges. In many applications involving classification of text documents, web users often participate in the tagging process. This collaborative tagging results in the formation of large scale Peer-to-Peer (P2P) systems which can function, scale and self-organize in the presence of highly transient population of nodes and do not need a central server for co-ordination. In this paper, we describe TagLearner, a P2P classifier learning system for extracting patterns from text data where the end users can participate both in the task of labeling the data and building a distributed classifier on it. We present a novel distributed linear programming based classification algorithm which is asynchronous in nature. The paper also provides extensive empirical results on text data obtained from an online repository - the NSF Abstracts Data.
AB - The amount of text data on the Internet is growing at a very fast rate. Online text repositories for news agencies, digital libraries and other organizations currently store giga- and tera-bytes of data. Large amounts of unstructured text poses a serious challenge for data mining and knowledge extraction. End user participation coupled with distributed computation can play a crucial role in meeting these challenges. In many applications involving classification of text documents, web users often participate in the tagging process. This collaborative tagging results in the formation of large scale Peer-to-Peer (P2P) systems which can function, scale and self-organize in the presence of highly transient population of nodes and do not need a central server for co-ordination. In this paper, we describe TagLearner, a P2P classifier learning system for extracting patterns from text data where the end users can participate both in the task of labeling the data and building a distributed classifier on it. We present a novel distributed linear programming based classification algorithm which is asynchronous in nature. The paper also provides extensive empirical results on text data obtained from an online repository - the NSF Abstracts Data.
UR - https://www.scopus.com/pages/publications/77951149175
U2 - 10.1109/ICDMW.2009.90
DO - 10.1109/ICDMW.2009.90
M3 - Conference contribution
AN - SCOPUS:77951149175
SN - 9780769539027
T3 - ICDM Workshops 2009 - IEEE International Conference on Data Mining
SP - 495
EP - 500
BT - ICDM Workshops 2009 - IEEE International Conference on Data Mining
Y2 - 6 December 2009 through 6 December 2009
ER -