TY - GEN
T1 - Separating text and background in degraded document images - A comparison of global thresholding techniques for multi-stage thresholding
AU - Leedham, Graham
AU - Varma, Saket
AU - Patankar, Anish
AU - Govindaraju, Venu
PY - 2002
Y1 - 2002
N2 - Before any processing of the textual content of a document image can be performed the text must be separated from the background of the image. Several thresholding algorithms have previously been proposed and are widely used in document processing. None have been shown effective at thresholding difficult documents where the background and foreground are non-uniform. In this paper we investigate the use of three global thresholding algorithms (Otsu's, Kapur's entropy and Solihin's quadratic integral ratio (QIR)) as the first stage in a multi-stage thresholding algorithm for use in degraded document images. It is concluded that Otsu's and Kapur's algorithms do not work well for difficult documents as they tend to over-threshold the image, thus losing much of the useful information. The QIR algorithm is more accurate in separating the foreground and background in these images, leaving a range of undecided, fuzzy, pixels for later processing in a subsequent stage.
AB - Before any processing of the textual content of a document image can be performed the text must be separated from the background of the image. Several thresholding algorithms have previously been proposed and are widely used in document processing. None have been shown effective at thresholding difficult documents where the background and foreground are non-uniform. In this paper we investigate the use of three global thresholding algorithms (Otsu's, Kapur's entropy and Solihin's quadratic integral ratio (QIR)) as the first stage in a multi-stage thresholding algorithm for use in degraded document images. It is concluded that Otsu's and Kapur's algorithms do not work well for difficult documents as they tend to over-threshold the image, thus losing much of the useful information. The QIR algorithm is more accurate in separating the foreground and background in these images, leaving a range of undecided, fuzzy, pixels for later processing in a subsequent stage.
UR - https://www.scopus.com/pages/publications/84893443077
U2 - 10.1109/IWFHR.2002.1030917
DO - 10.1109/IWFHR.2002.1030917
M3 - Conference contribution
AN - SCOPUS:84893443077
SN - 0769516920
SN - 9780769516929
T3 - Proceedings - International Workshop on Frontiers in Handwriting Recognition, IWFHR
SP - 244
EP - 249
BT - Proceedings - 8th International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002
T2 - 8th International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002
Y2 - 6 August 2002 through 8 August 2002
ER -