Skip to main navigation Skip to search Skip to main content

Scene text recognition: an Indic perspective

  • Vasanthan P. Vijayan
  • , Sukalpa Chanda
  • , David Doermann
  • , Narayanan C. Krishnan
  • Indian Institute of Technology Palakkad
  • Østfold University College

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Exploring Scene Text Recognition (STR) in Indian languages is an important research domain due to its wide applications. This paper proposes a spatial attention-based model (LaSA-Net) that combines visual features and language knowledge for word recognition from scene image word segments. We augment the classical cross-entropy loss with a novel language-attunement loss that enables the model to learn valid and prevalent character sequences in the word. This enhances the model’s ability to perform zero-shot word recognition. Further, to compensate for the lack of rotational invariance in CNN based feature extraction backbone, we propose a training data augmentation strategy involving the creation of glyphs: images of individual characters of different orientations. This improves LaSA-Net’s ability to recognize words in images with curved/vertically aligned text, alleviating the need for computationally expensive preprocessing modules. Our experiments with Tamil, Malayalam, and Telugu scripts on the IIIT-ILST datasets have achieved new benchmark results and outperformed other state-of-the-art STR models.

Original languageEnglish
Pages (from-to)31-40
Number of pages10
JournalInternational Journal on Document Analysis and Recognition
Volume28
Issue number1
DOIs
StatePublished - Mar 2025

Keywords

  • Glyph-based data enhancement
  • Language attunement loss
  • Scene-text recognition
  • Tamil, Telugu, and Malayalam languages

Fingerprint

Dive into the research topics of 'Scene text recognition: an Indic perspective'. Together they form a unique fingerprint.

Cite this