Skip to main navigation Skip to search Skip to main content

CLEF-2005 CL-SR at Maryland: Document and query expansion using side collections and thesauri

  • University of Maryland, College Park

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

This paper reports results for the University of Maryland's participation in the CLEF-2005 Cross-Language Speech Retrieval track. Techniques that were tried include: (1) document expansion with manually created metadata (thesaurus keywords and segment summaries) from a large side collection, (2) query refinement with pseudo-relevance feedback, (3) keyword expansion with thesaurus synonyms, and (4) cross-language speech retrieval using translation knowledge obtained from the statistics of a large parallel corpus. The results show that document expansion and query expansion using blind relevance feedback were effective, although optimal parameter choices differed somewhat between the training and evaluation sets. Document expansion in which manually assigned keywords were augmented with thesaurus synonyms yielded marginal gains on the training set, but no improvement on the evaluation set. Cross-language retrieval with French queries yielded 79% of monolingual mean average precision when searching manually assigned metadata despite a substantial domain mismatch between the parallel corpus and the retrieval task. Detailed failure analysis indicates that speech recognition errors for named entities were an important factor that substantially degraded retrieval effectiveness.

Original languageEnglish
Title of host publicationAccessing Multilingual Information Repositories - 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005
PublisherSpringer Verlag
Pages800-809
Number of pages10
ISBN (Print)354045697X, 9783540456971
DOIs
StatePublished - 2006
EventAccessing Multilingual Information Repositories - 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005 - Vienna, Austria
Duration: Sep 21 2005Sep 23 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4022 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceAccessing Multilingual Information Repositories - 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005
Country/TerritoryAustria
CityVienna
Period09/21/0509/23/05

Fingerprint

Dive into the research topics of 'CLEF-2005 CL-SR at Maryland: Document and query expansion using side collections and thesauri'. Together they form a unique fingerprint.

Cite this