TY - GEN
T1 - Text separation from mixed documents using a tree-structured classifier
AU - Peng, Xujun
AU - Setlur, Srirangaraj
AU - Govindaraju, Venu
AU - Sitaram, Ramachandrula
PY - 2010
Y1 - 2010
N2 - In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment.
AB - In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment.
UR - https://www.scopus.com/pages/publications/78149487081
U2 - 10.1109/ICPR.2010.68
DO - 10.1109/ICPR.2010.68
M3 - Conference contribution
AN - SCOPUS:78149487081
SN - 9780769541099
T3 - Proceedings - International Conference on Pattern Recognition
SP - 241
EP - 244
BT - Proceedings - 2010 20th International Conference on Pattern Recognition, ICPR 2010
PB - Institute of Electrical and Electronics Engineers Inc.
ER -