Skip to main navigation Skip to search Skip to main content

Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems

  • State University of New York Binghamton University
  • University of Florida

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Data classification is a widely used data mining technique for big data analysis. By training massive data collected from the real world, data classification helps learners discover hidden data patterns. In addition to data training, given a trained model from collected data, a user can classify whether a new incoming data belongs to an existing class, or, multiple distributed entities may collaborate to test the similarity of their trained results. However, due to data locality and privacy concerns, it is infeasible for large-scale distributed systems to share each individual's datasets with each other for data similarity check. On the one hand, the trained model is an entity's private asset and may leak private information, which should be well protected from all other non-collaborative entities. On the other hand, the new incoming data may contain sensitive information which cannot be disclosed directly for classification. To address the above privacy issues, we propose a privacy-preserving data classification and similarity evaluation scheme for distributed systems. With our scheme, neither new arriving data nor trained models are directly revealed during the classification and similarity evaluation procedures. The proposed scheme can be applied to many fields using data classification and evaluation. Based on extensive real-world experiments, we have also evaluated the privacy preservation, feasibility, and efficiency of the proposed scheme.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 36th International Conference on Distributed Computing Systems, ICDCS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages690-699
Number of pages10
ISBN (Electronic)9781509014828
DOIs
StatePublished - Aug 8 2016
Event36th IEEE International Conference on Distributed Computing Systems, ICDCS 2016 - Nara, Japan
Duration: Jun 27 2016Jun 30 2016

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2016-August

Conference

Conference36th IEEE International Conference on Distributed Computing Systems, ICDCS 2016
Country/TerritoryJapan
CityNara
Period06/27/1606/30/16

Keywords

  • Data Classification
  • Machine Learning
  • Privacy Preservation
  • Similarity Evaluation

Fingerprint

Dive into the research topics of 'Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems'. Together they form a unique fingerprint.

Cite this