Skip to main navigation Skip to search Skip to main content

A dataset for quality assessment of camera captured document images

  • University of Maryland, College Park

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

33 Scopus citations

Abstract

With the proliferation of cameras on mobile devices there is an increased desire to image document pages as an alternative to scanning. However, the quality of captured document images is often lower than its scanned equivalent due to hardware limitations and stability issues. In this context, automatic assessment of the quality of captured images is useful for many applications. Although there has been a lot of work on developing computational methods and creating standard datasets for natural scene image quality assessment, until recently quality estimation of camera captured document images has not been given much attention. One traditional quality indicator for document images is the Optical Character Recognition (OCR) accuracy. In this work, we present a dataset of camera captured document images containing varying levels of focal-blur introduced manually during capture. For each image we obtained the character level OCR accuracy. Our dataset can be used to evaluate methods for predicting OCR quality of captured documents as well as enhancements. In order to make the dataset publicly and freely available, originals from two existing datasets - University of Washington dataset and Tobacco Database were selected. We present a case study with three recent methods for predicting the OCR quality of images on our dataset.

Original languageEnglish
Title of host publicationCamera-Based Document Analysis and Recognition - 5th International Workshop, CBDAR 2013, Revised Selected Papers
PublisherSpringer Verlag
Pages113-125
Number of pages13
ISBN (Print)9783319051666
DOIs
StatePublished - 2014
Event5th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2013 - Washington, DC, United States
Duration: Aug 23 2013Aug 23 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8357 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2013
Country/TerritoryUnited States
CityWashington, DC
Period08/23/1308/23/13

Keywords

  • Document image quality
  • Image quality dataset
  • Optical character recognition
  • Sharpness

Fingerprint

Dive into the research topics of 'A dataset for quality assessment of camera captured document images'. Together they form a unique fingerprint.

Cite this