Skip to main navigation Skip to search Skip to main content

Separating text and background in degraded document images - A comparison of global thresholding techniques for multi-stage thresholding

  • Nanyang Technological University
  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

85 Scopus citations

Abstract

Before any processing of the textual content of a document image can be performed the text must be separated from the background of the image. Several thresholding algorithms have previously been proposed and are widely used in document processing. None have been shown effective at thresholding difficult documents where the background and foreground are non-uniform. In this paper we investigate the use of three global thresholding algorithms (Otsu's, Kapur's entropy and Solihin's quadratic integral ratio (QIR)) as the first stage in a multi-stage thresholding algorithm for use in degraded document images. It is concluded that Otsu's and Kapur's algorithms do not work well for difficult documents as they tend to over-threshold the image, thus losing much of the useful information. The QIR algorithm is more accurate in separating the foreground and background in these images, leaving a range of undecided, fuzzy, pixels for later processing in a subsequent stage.

Original languageEnglish
Title of host publicationProceedings - 8th International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002
Pages244-249
Number of pages6
DOIs
StatePublished - 2002
Event8th International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002 - Ontario, ON, Canada
Duration: Aug 6 2002Aug 8 2002

Publication series

NameProceedings - International Workshop on Frontiers in Handwriting Recognition, IWFHR
ISSN (Print)1550-5235

Conference

Conference8th International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002
Country/TerritoryCanada
CityOntario, ON
Period08/6/0208/8/02

Fingerprint

Dive into the research topics of 'Separating text and background in degraded document images - A comparison of global thresholding techniques for multi-stage thresholding'. Together they form a unique fingerprint.

Cite this