TY - GEN
T1 - Distributed optimization strategies for mining on peer-to-peer networks
AU - Dutta, Haimonti
AU - Matthur, Ananda
PY - 2008
Y1 - 2008
N2 - Peer-to-peer (P2P) networks are distributed systems in which nodes of equal roles and capabilities exchange information and services directly with each other. In recent years, they have become a popular way to share large amounts of data. However, such an architecture adds a new dimension to the process of knowledge discovery and data mining - the challenge of mining distributed {and often) dynamic sources of data and computing. Furthermore, effective utilization of the distributed resources needs to be carefully analyzed. In this paper, we study the problem of optimization of resources to enable efficient and scalable mining on a peer-to-peer (P2P) network. We develop a crawler based on the Gnutella protocol and use it to simulate a P2P network on which we run a classification task. Our results from the case-study indicate that not only do we have an effective utilization of resources but also the accuracy of the distributed mining algorithm is likely to be close to the hypothetical scenario where all data in the network is stored in a central location.
AB - Peer-to-peer (P2P) networks are distributed systems in which nodes of equal roles and capabilities exchange information and services directly with each other. In recent years, they have become a popular way to share large amounts of data. However, such an architecture adds a new dimension to the process of knowledge discovery and data mining - the challenge of mining distributed {and often) dynamic sources of data and computing. Furthermore, effective utilization of the distributed resources needs to be carefully analyzed. In this paper, we study the problem of optimization of resources to enable efficient and scalable mining on a peer-to-peer (P2P) network. We develop a crawler based on the Gnutella protocol and use it to simulate a P2P network on which we run a classification task. Our results from the case-study indicate that not only do we have an effective utilization of resources but also the accuracy of the distributed mining algorithm is likely to be close to the hypothetical scenario where all data in the network is stored in a central location.
UR - https://www.scopus.com/pages/publications/60749094309
U2 - 10.1109/ICMLA.2008.57
DO - 10.1109/ICMLA.2008.57
M3 - Conference contribution
AN - SCOPUS:60749094309
SN - 9780769534954
T3 - Proceedings - 7th International Conference on Machine Learning and Applications, ICMLA 2008
SP - 350
EP - 355
BT - Proceedings - 7th International Conference on Machine Learning and Applications, ICMLA 2008
T2 - 7th International Conference on Machine Learning and Applications, ICMLA 2008
Y2 - 11 December 2008 through 13 December 2008
ER -