Skip to main navigation Skip to search Skip to main content

Segmentation of highly unstructured handwritten documents using a neural network technique

  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In recent years there has been a growing interest in digitizing the extensive amounts of books and documents that existed preceding the widespread adoption of digital technologies. Many of these digitizing initiatives deal with huge collections of handwritten documents, for which document image analysis techniques (page segmentation, keyword-spotting, optical character recognition (OCR), etc) are not yet as mature as for printed text. Thus, there is an imminent need to develop techniques to understand, archive, index and search the manuscripts. The antiquated approach of manually transcribing handwritten collections and then using standard text retrieval techniques can be very expensive for large collections. But many of the manuscripts in these collections, unlike machine-printed texts, contain unstructured information, cluttered group of texts and graphics that do not necessarily follow a pre-specified format, thus making it quite challenging to automatically process. Thus, in this paper we present a convolutional neural network (CNN) based implementation that is used to segment pages of handwritten documents into their constituent sections. We showcase a multiscale sliding window based network that is trained to predict the sections of the pages in handwritten manuscripts. The results of the network are post-processed with a novel region growing technique to further improve the segmentation results. The implementation is applied on the Marianne Moore archival collection, a body of handwritten notes and memos by the renowned author Marianne Moore (1887-1972), one of the foremost modernist poets of the early twentieth-century. We present our segmentation results both quantitatively and qualitatively.

Original languageEnglish
Title of host publication2016 23rd International Conference on Pattern Recognition, ICPR 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1291-1296
Number of pages6
ISBN (Electronic)9781509048472
DOIs
StatePublished - Jan 1 2016
Event23rd International Conference on Pattern Recognition, ICPR 2016 - Cancun, Mexico
Duration: Dec 4 2016Dec 8 2016

Publication series

NameProceedings - International Conference on Pattern Recognition
Volume0
ISSN (Print)1051-4651

Conference

Conference23rd International Conference on Pattern Recognition, ICPR 2016
Country/TerritoryMexico
CityCancun
Period12/4/1612/8/16

Fingerprint

Dive into the research topics of 'Segmentation of highly unstructured handwritten documents using a neural network technique'. Together they form a unique fingerprint.

Cite this