Skip to main navigation Skip to search Skip to main content

Matching titles with cross title web-search enrichment and community detection

  • SUNY Buffalo

Research output: Contribution to journalConference articlepeer-review

15 Scopus citations

Abstract

Title matching refers roughly to the following problem. We are given two strings of text obtained from different data sources. The texts refer to some underlying physical entities and the problem is to report whether the two strings refer to the same physical entity or not. There are manifestations of this problem in a variety of domains, such as product or bibliography matching, and location or person disambiguation. We propose a new approach to solving this problem, consisting of two main components. The first component uses Web searches to "enrich" the given pair of titles: making titles that refer to the same physical entity more similar, and those which do not, much less similar. A notion of similarity is then measured using the second component, where the tokens from the two titles are modelled as vertices of a "social" network graph. A "strength of ties" style of clustering algorithm is then applied on this to see whether they form one cohesive "community" (matching titles), or separately clustered communities (mismatching titles). Experimental results confirm the effectiveness of our approach over existing title matching methods across several input domains.

Original languageEnglish
Pages (from-to)1167-1178
Number of pages12
JournalProceedings of the VLDB Endowment
Volume7
Issue number12
DOIs
StatePublished - 2014
EventProceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China
Duration: Sep 1 2014Sep 5 2014

Fingerprint

Dive into the research topics of 'Matching titles with cross title web-search enrichment and community detection'. Together they form a unique fingerprint.

Cite this