TY - GEN
T1 - Word-level script identification for handwritten Indic scripts
AU - Singh, Pawan Kumar
AU - Sarkar, Ram
AU - Nasipuri, Mita
AU - Doermann, David
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/20
Y1 - 2015/11/20
N2 - Automatic script identification from handwritten document images facilitates many important applications such as indexing, sorting and triage. A given Optical Character Recognition (OCR) system is typically trained on only a single script but for documents or collections containing different scripts, there must be some way to automatically identify the script prior to OCR. For Indic script research, some results have been reported in the literature but the task is far from solved. In this paper, we propose a word-level script identification technique for six handwritten Indic scripts- Bangla, Devanagari, Gurumukhi, Malayalam, Oriya Telugu and the Roman script. A set of 82 features has been designed using a combination of elliptical and polygonal approximation techniques. Our approach has been evaluated on a dataset of 7000 handwritten text words, using multiple classifiers. A Multi-Layer Perceptron (MLP) classifier was found to be the best classifier resulting in 95.35% accuracy. The result is progressive considering the complexities and shape variations of the Indic scripts.
AB - Automatic script identification from handwritten document images facilitates many important applications such as indexing, sorting and triage. A given Optical Character Recognition (OCR) system is typically trained on only a single script but for documents or collections containing different scripts, there must be some way to automatically identify the script prior to OCR. For Indic script research, some results have been reported in the literature but the task is far from solved. In this paper, we propose a word-level script identification technique for six handwritten Indic scripts- Bangla, Devanagari, Gurumukhi, Malayalam, Oriya Telugu and the Roman script. A set of 82 features has been designed using a combination of elliptical and polygonal approximation techniques. Our approach has been evaluated on a dataset of 7000 handwritten text words, using multiple classifiers. A Multi-Layer Perceptron (MLP) classifier was found to be the best classifier resulting in 95.35% accuracy. The result is progressive considering the complexities and shape variations of the Indic scripts.
KW - Elliptical based features
KW - Handwritten words
KW - Indic scripts
KW - Multiple Classifiers
KW - Polygonal approximation based features
KW - Script Identification
UR - https://www.scopus.com/pages/publications/84962574661
U2 - 10.1109/ICDAR.2015.7333932
DO - 10.1109/ICDAR.2015.7333932
M3 - Conference contribution
AN - SCOPUS:84962574661
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 1106
EP - 1110
BT - 13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PB - IEEE Computer Society
T2 - 13th International Conference on Document Analysis and Recognition, ICDAR 2015
Y2 - 23 August 2015 through 26 August 2015
ER -