Skip to main navigation Skip to search Skip to main content

Text separation from mixed documents using a tree-structured classifier

  • SUNY Buffalo
  • Hewlett-Packard

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment.

Original languageEnglish
Title of host publicationProceedings - 2010 20th International Conference on Pattern Recognition, ICPR 2010
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages241-244
Number of pages4
ISBN (Print)9780769541099
DOIs
StatePublished - 2010

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Fingerprint

Dive into the research topics of 'Text separation from mixed documents using a tree-structured classifier'. Together they form a unique fingerprint.

Cite this