Skip to main navigation Skip to search Skip to main content

Phrasescope: An effective and unsupervised framework for mining high quality phrases

  • University of Illinois at Urbana-Champaign

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Phrase mining is one of the fundamental NLP tasks that can have significant impact on the efficacy of many downstream applications. Many supervised and unsupervised phrase mining approaches have been proposed. Some rely on linguistic analyzers, and others are language agnostic. A daunting challenge in this task is to distinguish quality phrases from noise phrases, which tightly coexists with quality phrases in the entire frequency spectrum. Most existing approaches to phrase mining, however, rely on frequency-based statistics, hence suffer from quality loss. In this paper, we propose an unsupervised phrase mining framework, “PhraseScope”, which consists of a sequence of filters, namely cohesion, domain, and graph filters, to remove noise phrase. Each filter is responsible for removing noise phrase of particular characteristics. Collectively, our proposed filters are capable of detecting and removing noise phrases effectively while preserving quality phrases. Our results show significant improvement in both recall and precision over state-of-the-art frameworks when tested on three different domains of datasets.

Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining, SDM 2021
PublisherSiam Society
Pages639-647
Number of pages9
ISBN (Electronic)9781611976700
StatePublished - 2021
Event2021 SIAM International Conference on Data Mining, SDM 2021 - Virtual, Online
Duration: Apr 29 2021May 1 2021

Publication series

NameSIAM International Conference on Data Mining, SDM 2021

Conference

Conference2021 SIAM International Conference on Data Mining, SDM 2021
CityVirtual, Online
Period04/29/2105/1/21

Fingerprint

Dive into the research topics of 'Phrasescope: An effective and unsupervised framework for mining high quality phrases'. Together they form a unique fingerprint.

Cite this