TY - GEN
T1 - Overlapped text segmentation using markov random field and aggregation
AU - Peng, Xujun
AU - Setlur, Srirangaraj
AU - Govindaraju, Venu
AU - Sitaram, Ramachandrula
PY - 2010
Y1 - 2010
N2 - Separating machine printed text and handwriting from overlapping text is a challenging problem in the document analysis field and no reliable algorithms have been developed thus far. In this paper, we propose a novel approach for separating handwriting from binary image of overlapped text. Instead of using fixed size training patches, we describe an aggregation method which uses shape context features to extract training samples automatically. We use a Markov Random Field (MRF) to model the over-lapped text. The neighbor system is inherited from a coarsening procedure and the prior and likelihood of the MRF is learned based on a distance metric. Experimental results show that the proposed method can achieve 87.97% recall for handwriting and 91.44% recall for machine printed text.
AB - Separating machine printed text and handwriting from overlapping text is a challenging problem in the document analysis field and no reliable algorithms have been developed thus far. In this paper, we propose a novel approach for separating handwriting from binary image of overlapped text. Instead of using fixed size training patches, we describe an aggregation method which uses shape context features to extract training samples automatically. We use a Markov Random Field (MRF) to model the over-lapped text. The neighbor system is inherited from a coarsening procedure and the prior and likelihood of the MRF is learned based on a distance metric. Experimental results show that the proposed method can achieve 87.97% recall for handwriting and 91.44% recall for machine printed text.
UR - https://www.scopus.com/pages/publications/77954985065
U2 - 10.1145/1815330.1815348
DO - 10.1145/1815330.1815348
M3 - Conference contribution
AN - SCOPUS:77954985065
SN - 9781605587738
T3 - ACM International Conference Proceeding Series
SP - 129
EP - 134
BT - Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, DAS '10
T2 - 2010 IAPR Workshop on Document Analysis Systems, DAS 2010
Y2 - 9 June 2010 through 11 June 2010
ER -