TY - GEN
T1 - Handwritten Arabic text line segmentation using Affinity propagation
AU - Kumar, Jayant
AU - Abd-Almageed, Wael
AU - Kang, Le
AU - Doermann, David
PY - 2010
Y1 - 2010
N2 - In this paper, we present a novel graph-based method for extracting handwritten text lines in monochromatic Ara- bic document images. Our approach consists of two steps - Coarse text line estimation using primary components which define the line and assignment of diacritic components which are more difficult to associate with a given line. We first esti- mate local orientation at each primary component to build a sparse similarity graph. We then, use a shortest path algorithm to compute similarities between non-neighboring components. From this graph, we obtain coarse text lines using two estimates obtained from Affinity propagation and Breadth-first search. In the second step, we assign secondary components to each text line. The proposed method is very fast and robust to non-uniform skew and character size variations, normally present in handwritten text lines. We evaluate our method using a pixel-matching criteria, and report 96% accuracy on a dataset of 125 Arabic document images. We also present a proximity analysis on datasets generated by artificially decreasing the spacings between text lines to demonstrate the robustness of our approach.
AB - In this paper, we present a novel graph-based method for extracting handwritten text lines in monochromatic Ara- bic document images. Our approach consists of two steps - Coarse text line estimation using primary components which define the line and assignment of diacritic components which are more difficult to associate with a given line. We first esti- mate local orientation at each primary component to build a sparse similarity graph. We then, use a shortest path algorithm to compute similarities between non-neighboring components. From this graph, we obtain coarse text lines using two estimates obtained from Affinity propagation and Breadth-first search. In the second step, we assign secondary components to each text line. The proposed method is very fast and robust to non-uniform skew and character size variations, normally present in handwritten text lines. We evaluate our method using a pixel-matching criteria, and report 96% accuracy on a dataset of 125 Arabic document images. We also present a proximity analysis on datasets generated by artificially decreasing the spacings between text lines to demonstrate the robustness of our approach.
KW - Arabic documents
KW - Handwritten documents
KW - Text line segmentation
UR - https://www.scopus.com/pages/publications/77954993477
U2 - 10.1145/1815330.1815349
DO - 10.1145/1815330.1815349
M3 - Conference contribution
AN - SCOPUS:77954993477
SN - 9781605587738
T3 - ACM International Conference Proceeding Series
SP - 135
EP - 142
BT - Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, DAS '10
T2 - 2010 IAPR Workshop on Document Analysis Systems, DAS 2010
Y2 - 9 June 2010 through 11 June 2010
ER -