Skip to main navigation Skip to search Skip to main content

Word-level script identification for handwritten Indic scripts

  • Jadavpur University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

Automatic script identification from handwritten document images facilitates many important applications such as indexing, sorting and triage. A given Optical Character Recognition (OCR) system is typically trained on only a single script but for documents or collections containing different scripts, there must be some way to automatically identify the script prior to OCR. For Indic script research, some results have been reported in the literature but the task is far from solved. In this paper, we propose a word-level script identification technique for six handwritten Indic scripts- Bangla, Devanagari, Gurumukhi, Malayalam, Oriya Telugu and the Roman script. A set of 82 features has been designed using a combination of elliptical and polygonal approximation techniques. Our approach has been evaluated on a dataset of 7000 handwritten text words, using multiple classifiers. A Multi-Layer Perceptron (MLP) classifier was found to be the best classifier resulting in 95.35% accuracy. The result is progressive considering the complexities and shape variations of the Indic scripts.

Original languageEnglish
Title of host publication13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PublisherIEEE Computer Society
Pages1106-1110
Number of pages5
ISBN (Electronic)9781479918058
DOIs
StatePublished - Nov 20 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: Aug 23 2015Aug 26 2015

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2015-November
ISSN (Print)1520-5363

Conference

Conference13th International Conference on Document Analysis and Recognition, ICDAR 2015
Country/TerritoryFrance
CityNancy
Period08/23/1508/26/15

Keywords

  • Elliptical based features
  • Handwritten words
  • Indic scripts
  • Multiple Classifiers
  • Polygonal approximation based features
  • Script Identification

Fingerprint

Dive into the research topics of 'Word-level script identification for handwritten Indic scripts'. Together they form a unique fingerprint.

Cite this