Abstract
Title matching refers roughly to the following problem. We are given two strings of text obtained from different data sources. The texts refer to some underlying physical entities and the problem is to report whether the two strings refer to the same physical entity or not. There are manifestations of this problem in a variety of domains, such as product or bibliography matching, and location or person disambiguation. We propose a new approach to solving this problem, consisting of two main components. The first component uses Web searches to "enrich" the given pair of titles: making titles that refer to the same physical entity more similar, and those which do not, much less similar. A notion of similarity is then measured using the second component, where the tokens from the two titles are modelled as vertices of a "social" network graph. A "strength of ties" style of clustering algorithm is then applied on this to see whether they form one cohesive "community" (matching titles), or separately clustered communities (mismatching titles). Experimental results confirm the effectiveness of our approach over existing title matching methods across several input domains.
| Original language | English |
|---|---|
| Pages (from-to) | 1167-1178 |
| Number of pages | 12 |
| Journal | Proceedings of the VLDB Endowment |
| Volume | 7 |
| Issue number | 12 |
| DOIs | |
| State | Published - 2014 |
| Event | Proceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China Duration: Sep 1 2014 → Sep 5 2014 |
Fingerprint
Dive into the research topics of 'Matching titles with cross title web-search enrichment and community detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver