TY - GEN
T1 - Dynamic local connectivity and its application to page segmentation
AU - Shi, Zhixin
AU - Govindaraju, Venu
PY - 2004
Y1 - 2004
N2 - Page segmentation is one of the important stage in most document processing systems. Algorithms found in published literatures often rely on some predetermined parameters such as general font sizes, distances between text lines and document scan resolutions. Variations of these parameters in real document images greatly affect the performance of the algorithms. In this paper we present a novel approach for document page segmentation using dynamic local connectivity transform. An efficient implementation of a local connectivity algorithm transforms a document image into a parameter domain in which a parameter value at a pixel location represents a connectivity property for its neighboring foreground pixels in the original document image. Then a top-down approach with a linear search reveals the document regions at each resolution levels as text block, text lines and graphics. We consider our algorithm a transform based multi-resolution method. Our ongoing research shows that the algorithm is robust for variations of document parameters.
AB - Page segmentation is one of the important stage in most document processing systems. Algorithms found in published literatures often rely on some predetermined parameters such as general font sizes, distances between text lines and document scan resolutions. Variations of these parameters in real document images greatly affect the performance of the algorithms. In this paper we present a novel approach for document page segmentation using dynamic local connectivity transform. An efficient implementation of a local connectivity algorithm transforms a document image into a parameter domain in which a parameter value at a pixel location represents a connectivity property for its neighboring foreground pixels in the original document image. Then a top-down approach with a linear search reveals the document regions at each resolution levels as text block, text lines and graphics. We consider our algorithm a transform based multi-resolution method. Our ongoing research shows that the algorithm is robust for variations of document parameters.
KW - Character recognition
KW - Document image analysis
KW - Local connectivity
KW - Multi-resolution
KW - Page segmentation
KW - Region identification
UR - https://www.scopus.com/pages/publications/20444508034
U2 - 10.1145/1031442.1031450
DO - 10.1145/1031442.1031450
M3 - Conference contribution
AN - SCOPUS:20444508034
SN - 1581139764
SN - 9781581139761
T3 - HDP 2004: Proceedings of the First ACM Hardcopy Document Processing Workshop
SP - 47
EP - 51
BT - HDP 2004
PB - Association for Computing Machinery (ACM)
T2 - HDP 2004: Proceedings of the First ACM Hardcopy Document Processing Workshop
Y2 - 12 November 2004 through 12 November 2004
ER -