Skip to main navigation Skip to search Skip to main content

A case restoration approach to named entity tagging in degraded documents

  • Cymfony Inc

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

This paper describes a novel approach to named entity (NE) tagging on degraded documents. NE tagging is the process of identifying salient text strings in unstructured text, corresponding to names of people, places, organizations, times/dates, etc. Although NE tagging is typically part of a larger information extraction process, it has other applications, such as improving search in an information retrieval system, and post-processing the results of an OCR system. We focus on degraded documents, i.e. case insensitive documents that lack orthographic information. Examples include output of speech recognition systems, as well as e-mail. The traditional approach involves retraining an NE tagger on degraded text, a cumbersome operation. This paper describes an approach whereby text is first "restored" to its implicit case sensitive form, and subsequently processed by the original NE tagger. Results show that this new approach leads to far less precision loss in NE tagging of degraded documents.

Original languageEnglish
Title of host publicationProceedings - 7th International Conference on Document Analysis and Recognition, ICDAR 2003
PublisherIEEE Computer Society
Pages720-724
Number of pages5
ISBN (Electronic)0769519601
DOIs
StatePublished - 2003
Event7th International Conference on Document Analysis and Recognition, ICDAR 2003 - Edinburgh, United Kingdom
Duration: Aug 3 2003Aug 6 2003

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2003-January
ISSN (Print)1520-5363

Conference

Conference7th International Conference on Document Analysis and Recognition, ICDAR 2003
Country/TerritoryUnited Kingdom
CityEdinburgh
Period08/3/0308/6/03

Fingerprint

Dive into the research topics of 'A case restoration approach to named entity tagging in degraded documents'. Together they form a unique fingerprint.

Cite this