Abstract
Exploring Scene Text Recognition (STR) in Indian languages is an important research domain due to its wide applications. This paper proposes a spatial attention-based model (LaSA-Net) that combines visual features and language knowledge for word recognition from scene image word segments. We augment the classical cross-entropy loss with a novel language-attunement loss that enables the model to learn valid and prevalent character sequences in the word. This enhances the model’s ability to perform zero-shot word recognition. Further, to compensate for the lack of rotational invariance in CNN based feature extraction backbone, we propose a training data augmentation strategy involving the creation of glyphs: images of individual characters of different orientations. This improves LaSA-Net’s ability to recognize words in images with curved/vertically aligned text, alleviating the need for computationally expensive preprocessing modules. Our experiments with Tamil, Malayalam, and Telugu scripts on the IIIT-ILST datasets have achieved new benchmark results and outperformed other state-of-the-art STR models.
| Original language | English |
|---|---|
| Pages (from-to) | 31-40 |
| Number of pages | 10 |
| Journal | International Journal on Document Analysis and Recognition |
| Volume | 28 |
| Issue number | 1 |
| DOIs | |
| State | Published - Mar 2025 |
Keywords
- Glyph-based data enhancement
- Language attunement loss
- Scene-text recognition
- Tamil, Telugu, and Malayalam languages
Fingerprint
Dive into the research topics of 'Scene text recognition: an Indic perspective'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver