Writer Demographic Classification Using Bagging and Boosting

Similar documents
Document Image Retrieval using Signatures as Queries

Data Mining Practical Machine Learning Tools and Techniques

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

A Search Engine for Handwritten Documents

L25: Ensemble learning

CS570 Data Mining Classification: Ensemble Methods

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Signature verification using Kolmogorov-Smirnov. statistic

How To Identify A Churner

REVIEW OF ENSEMBLE CLASSIFICATION

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Model Combination. 24 Novembre 2009

A Learning Algorithm For Neural Network Ensembles

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Active Learning with Boosting for Spam Detection

Ensemble Data Mining Methods

New Ensemble Combination Scheme

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Boosting.

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Data Mining. Nonlinear Classification

Chapter 12 Bagging and Random Forests

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

2 Signature-Based Retrieval of Scanned Documents Using Conditional Random Fields

Establishing the Uniqueness of the Human Voice for Security Applications

Knowledge Discovery and Data Mining

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Leveraging Ensemble Models in SAS Enterprise Miner

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Signature Segmentation from Machine Printed Documents using Conditional Random Field

Bootstrapping Big Data

Data Mining - Evaluation of Classifiers

Chapter 6. The stacking ensemble approach

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Journal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA.

Ensembles and PMML in KNIME

Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets

MS1b Statistical Data Mining

Knowledge Discovery and Data Mining

Solving Regression Problems Using Competitive Ensemble Models

Supporting Online Material for

Predict Influencers in the Social Network

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Comparison of K-means and Backpropagation Data Mining Algorithms

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Decompose Error Rate into components, some of which can be measured on unlabeled data

Decision Trees from large Databases: SLIQ

On the effect of data set size on bias and variance in classification learning

Neural Networks and Support Vector Machines

Data Mining Methods: Applications for Institutional Research

Azure Machine Learning, SQL Data Mining and R

How Boosting the Margin Can Also Boost Classifier Complexity

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

DIAGONAL BASED FEATURE EXTRACTION FOR HANDWRITTEN ALPHABETS RECOGNITION SYSTEM USING NEURAL NETWORK

Comparison of Data Mining Techniques used for Financial Data Analysis

6.2.8 Neural networks for data mining

Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble

Novelty Detection in image recognition using IRF Neural Networks properties

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Unconstrained Handwritten Character Recognition Using Different Classification Strategies

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Advanced Ensemble Strategies for Polynomial Models

Predicting the Stock Market with News Articles

Data Mining using Artificial Neural Network Rules

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms

Using Neural Networks to Create an Adaptive Character Recognition System

Data, Measurements, Features

On Adaboost and Optimal Betting Strategies

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Better credit models benefit us all

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

SVM Ensemble Model for Investment Prediction

Classification and Regression by randomforest

Programming Exercise 3: Multi-class Classification and Neural Networks

DATA MINING TECHNIQUES AND APPLICATIONS

A simple application of Artificial Neural Network to cloud classification

Classification of Bad Accounts in Credit Card Industry

How To Choose A Churn Prediction

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Binary Logistic Regression

FilterBoost: Regression and Classification on Large Datasets

An Overview of Knowledge Discovery Database and Data mining Techniques

Demographics of Atlanta, Georgia:

Ensemble Approach for the Classification of Imbalanced Data

Statistical Analysis of Signature Features with Respect to Applicability in Off-line Signature Verification

II. RELATED WORK. Sentiment Mining

Sub-class Error-Correcting Output Codes

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analecta Vol. 8, No. 2 ISSN

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Introduction to Logistic Regression

Handwritten Character Recognition from Bank Cheque

Visual Structure Analysis of Flow Charts in Patent Images

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Transcription:

Writer Demographic Classification Using Bagging and Boosting Karthik R BANDI, Sargur N SRIHARI CEDAR, Department of Computer Science and Engineering State University of New York at Buffalo, Amherst, NY 14228, USA. {kkr2, srihari}@cedar.buffalo.edu Abstract. Classifying handwriting into a writer demographic category, e.g., gender, age, or handedness of the writer, is useful for more detailed analysis such as writer verification and identification. This paper describes classification into binary demographic categories using document macro features and several different classification methods: a single feed-forward neural network classifier and combinations of neural network classifiers based on boosting and bagging. The textual content of each document was the same, consisting of a full page of writing, and eleven macro features were used. Training set sizes for gender, age and handedness populations were 800, 650 and 300 respectively for each class. Test set sizes were 400, 350 and 150 respectively. Classification accuracy using a single neural network was: 73.2%, 82.7%, and 68.5%. Performance improved with the number of classifiers combined, saturating at around ten neural networks: combined by bagging: 74.7%, 83.4%, and 70.1%; and combined by boosting: 77.5%, 86.6%, and 74.4%. Accuracies achieved by boosting are significantly higher than what has been observed before for demographic classification. Results also show the promise of the boosting technique for weak classifiers. 1. Introduction Handwriting is affected by several factors such as biological, age, social habits, etc. This intuition motivates the study of classification techniques suitable for identifying demographic classes, given a handwritten document. The demographic classes are: male-female, left handed-right handed and different age groups. In most cases where handwriting is used as evidence, few global features extracted from checks, tax forms, etc. are at the disposal of the forensic document examiners. In these cases, the knowledge of the individual features can be used to estimate the validity of the writer verification/identification tasks or to improve their accuracy. Since in some cases partial information about the writer of the query document is available from other sources (his/her gender, approximate age, etc) we also evaluate how useful this information is to perform faster and more accurate writer verification. In the works of (Tomai, Kshirsagar & Srihari, 2004) there is a detailed analysis of identifying features that best discriminate between demographic classes (using k-nearest-neighbor classification), and then using this information to enhance the writer verification task. In their work, the k nearest neighbor classifier used a set of features called micro features based on the individual characters written by each writer. The micro features use gradient, structural and concavity (Favata & Srikantan, 1996) of each character and encode them in a 512-bit feature vector. Our paper is only concerned with demographic classification and the approach we have used is different: using a neural network classifier and improving performance by combining several neural networks (using bagging and boosting). We have used a set of features called macro features, which are global to the document unlike the micro features, which are specific to a single character occurring in the document. 2. Writing Element Extraction Writing elements are the discriminating features of handwriting. There are many descriptions of writing elements for questioned document examination, e.g., there are 21 classes of discriminating elements (Huber & Headrick, 1999). In order to match elements between two documents, the presence of the elements are first recognized in each document. Elements, or features, that capture the global characteristics of writer s individual writing habit and style can be regarded to be macro-features and features that capture finer details at the character level as micro-features. In this paper we are concerned with the global or macro features only. The system captures 12 macro features as described by (Srihari, Cha, Arora & Lee, 2002). They are features such as slant, word gap, grey-scale threshold, etc. This study used eleven of the twelve-macro features. The feature number of black pixels was not used since it did not contribute towards demographic classification. 3. Handwriting database The CEDAR letter database consists of more than 3,000 handwritten document images written by more than 1,000 writers representative of the US population. Each individual provided three handwriting samples of the same text.the letter contain all the 26 alphabets in upper case and lower case and the 10 digits. Each writer has contributed to 3 documents. The database is stratified along six categories: gender (G) male, female, handedness (H) right, left, age (A) under 15, 15-24, 25-44, 45-64, 65-84, over 85, ethnicity (E) white, black, Hispanic, Asian and pacific islander, American Indian, Eskimo, Aleut, highest level of education (D) below high school graduate, above, and place of schooling (S) USA, Foreign. The writer s population is unequally distributed in these categories: for example, there are 443 X 3 male documents and 1119 X 3 female documents, 599 X 3 for high

school graduates, 276 X 3 for bachelor s degree holders, 1406 X 3 right-handed, 156 X 3 left-handed, 337 X 3 under the age of 24 and 347 X 3 above the age of 45. This study considered the following three binary classification problems- gender: male vs female, handedness: left vs right, and age: below 24 vs above 45. The training set size for each class in gender, age and handedness populations were 800, 650 and 300 respectively. The test sets for these were of size 400, 350 and 150 respectively. The choice of data set size was based on the availability of data and for maintaining equal number of samples of either class in a given 2-category classification problem. If this is not done, then the classification could be biased towards the class, which has more number of samples. Fig 1 shows the result of applying principal component analysis on the gender data set. From the plot we can observe that the classification task is hard and a single classifier may not necessarily achieve good performance. Fig 1: Plot after applying PCA and choosing best two components. + represents female population and x represents male population. The two axes correspond to the best two principal components. 4. Classifiers The performance of a single neural network classifier as well as combined neural network classifiers were evaluated. In order to combine the classifiers both bagging and boosting (AdaBoost.M1) techniques were used. Classifier performance, with the number of combined classifiers ranging from two to ten, were determined. The effect on performance by varying the number of features input to the classifier was also analyzed. The classifier used was a feed-forward neural network classifier with three layers. The transfer function used at every neuron is the log sigmoid function, the number of inputs is the number of input features and the number of outputs is 1. All problems we have considered were 2-class problems. Thus 0.05 represents class A and 0.95 represents class B, for any two classes A and B. In all experiments the size of the test set was roughly half the size of the training set. 4.1 Classifier combination: Boosting (Ada Boost) Boosting is a general technique used to improve the performance of any learning algorithm that consistently generates classifiers with misclassification errors lesser than 50% on a given problem. Schapire (Schapire, 1990) introduced the boosting algorithm. More recently, Freund and Schapire (Freund & Schapire, 1996; Freund & Schapire, 1997) introduced AdaBoost that has undergone intense theoretical study and empirical testing in the last few years. The AdaBoost.M1 algorithm (Freund & Schapire, 1996) was implemented for boosting by resampling, whose pseudo code is given below. AdaBoost.M1 takes as inputs a weak classification method (Weak Learn) and a dataset D = {(x i, y i ), I=1, m} with m examples, where x i is a feature vector and x i X(instance space), y i is the corresponding class label and y i Y(set of class labels). Here each problem has only two class labels. In each iteration t, the algorithm repeatedly calls Weak Learn and applies this algorithm on a training dataset D t. The training set D t is obtained by using the distribution weight_ t for re-sampling from the data set D. Weak Learn finds a new classifier h t : X Y seeking to minimize the training error ε t on the data set D t. This process is iterated T times, and the hypotheses h 1,...,h T obtained are combined in a final classifier h_final. The re-weighting of examples in D and the way of combining the successive hypotheses h t are indicated in the algorithm s pseudo code below. The main theme in boosting is, examples that are correctly classified by most previous hypotheses get a small weight, while examples usually wrongly classified get larger weights. Thus, boosting concentrates the efforts of the learning algorithm on those examples that are difficult to learn.

AdaBoost.M1 Algorithm Input: Data set D={(x i, y i ), I=1,m}, where x i X and y i I {1... m} Base learning algorithm Weak Learn Initialize: weight_ 1 (i) = 1/m i. For t = 1 to T: Generate data set D t by re-sampling m examples from D using distribution weight_ t Train Weak Learn on D t to obtain the base classifier h t : X Y Compute the error of h t : ε t = weight _ t( i). If εt > ½ then set T = t-1 and exit; i: ht ( xi) yi assign β t = ε t /(1-ε t ) update distribution weight_ t : weight_ t+1 (i) = Weight_ t (i)*β t if h t ( x i ) = y i ; Weight_ t (i) 1 otherwise ; normalize weight_ t+1 to make it a distribution. Weight_ t+1 (i) = weight_ t+1 (i)/(n t ) for I=1..m ; N t is the normalization constant. N t = m i= 1 Weight_ t+1(i) Output: The Final hypothesis H_final(x) = arg max (y Y) t : ht ( x) = y log(1/ β t ) 4.2 Bagging Bagging is an ensemble method that creates individual classifiers for the ensemble by training each classifier on a random redistribution of the training set. The training set for each classifier is generated by randomly drawing N examples from the training set where N is the size of the training set. It is possible that many of the original examples may be repeated in the training sets for each classifier, while others may be left out. Classifier combination by Bagging and how it reduces error has been discussed in (Breiman, 1996; Ali & Pazzani, 1996). Bagging is simple: it basically trains T base classifiers on bootstrap re-samples (Efron & Tibshirani, 1993) of the training data set and outputs a final hypothesis on the basis of plurality voting. In bagging, the re-sampling of the training set is not dependent on the performance of the earlier classifiers, unlike Boosting. For our experimental analysis we have also implemented bagging for the classifier combination task and compared the performance with boosting. 5. Experimental Results Performance based on number of features considered: The graphs of Figure 2 show how the number of features input to the classifier can affect performance. As expected the performance increases as the number of features considered is greater. The results consider only a single neural network classifier on test data sets. Using a single neural network, performance on gender, age and handedness were 73.2%, 82.7%, and 68.5% respectively. Fig 2: Performance improvement with the number of features on all 3 test data sets (male-female +, left right handed x, age < 24 age > 45 o ).

Performance based on all features using Bagging, Boosting (test data sets): Combining ten neural networks (Figs. 3(a)-(c)) produced the following results- bagging: 74.7%, 83.4%, and 70.1%, and boosting: 77.5%, 86.6%, and 74.4%. Fig 3 (a): Results of Bagging ( o ) and Boosting ( + ) based on number of classifiers combined (male female population). Fig 3 (b): Results of Bagging ( o ) and Boosting ( + ) based on number of classifiers combined (left handed right handed population). Fig 3 (c): Results of Bagging ( o ) and Boosting ( + ) based on number of classifiers combined (age < 24 age > 45 population).

Performance of the Ada-Boost algorithm is clearly better then the bagging approach. From the graphs in Figures 3(a)-(c) it is also evident that increasing the number of classifiers improves performance for both bagging and boosting. 6. Concluding Remarks Demographic classification of handwriting can be performed with nearly 75% accuracy for gender, age and handedness. Bagging and boosting methods improve classification performance. In our work we have considered documents with same content. Demographic classification considering documents with different content would need to consider micro (character) features since macro features tend to be document-content dependent. The problems we have considered are binary demographic classification problems. In the future we shall extend our work to non-binary categories for demographic classification. References Tomai, C.I., Kshirsagar, D., Srihari, S.N. (2004). Group Discriminatory power of handwritten characters. Document Recognition and Retrieval XI. Proc. of SPIE-IS&T Electronic Imaging, Vol 5296: 116-123. Favata, J.T. and Srikantan, G. (1996). A multiple feature/resolution approach to hand printed digit and character recognition. International Journal of Imaging Systems and Technology, 7:304 311. Huber, R.A., and Headrick, A.M. (1999). Handwriting Identification: Facts and Fundamentals, Boca Roton: CRC Press. Srihari, S.N., Cha, S.H., Arora, H. and Lee, S. (2002). Individuality of Handwriting. Journal of Forensic Sciences, 44(4), 856-72. Schapire, R.E. (1990). The strength of weak learnability. Machine Learning 5(2): 197-227. Freund, Y., and Schapire, R.E. (1997). A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139. Freund, Y., and Schapire, R.E. (1996). Experiments with a New Boosting Algorithm. Machine Learning: Proceedings of the Thirteenth International Conference, pages 148-156. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2): 123-140. Ali, K. and Pazzani, M. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24(3): 173-202. Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.