Resampling for Face Recognition

Similar documents
Ethnicity Identification from Face Images

SYMMETRIC EIGENFACES MILI I. SHAH

Capacity of an RCE-based Hamming Associative Memory for Human Face Recognition

Statistics in Face Recognition: Analyzing Probability Distributions of PCA, ICA and LDA Performance Results

A comparative study on face recognition techniques and neural network

Subspace Analysis and Optimization for AAM Based Face Alignment

Adaptive Face Recognition System from Myanmar NRC Card

What is the Right Illumination Normalization for Face Recognition?

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set

Index Terms: Face Recognition, Face Detection, Monitoring, Attendance System, and System Access Control.

A Nonparametric Statistical Comparison of Principal Component and Linear Discriminant Subspaces for Face Recognition

L25: Ensemble learning

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Illumination, Expression and Occlusion Invariant Pose-Adaptive Face Recognition System for Real- Time Applications

Fuzzy based Pixel wise Information Extraction for Face Recognition

Efficient Attendance Management: A Face Recognition Approach

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

Open-Set Face Recognition-based Visitor Interface System

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Data Mining Practical Machine Learning Tools and Techniques

Data Mining. Nonlinear Classification

A secure face tracking system

How To Identify A Churner

Comparison of Data Mining Techniques used for Financial Data Analysis

Online Learning in Biometrics: A Case Study in Face Classifier Update

Model Combination. 24 Novembre 2009

A Comparison of Photometric Normalisation Algorithms for Face Verification

Leveraging Ensemble Models in SAS Enterprise Miner

Sketch to Photo Matching: A Feature-based Approach

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Knowledge Discovery and Data Mining

Towards better accuracy for Spam predictions

Active Learning with Boosting for Spam Detection

Decision Trees from large Databases: SLIQ

Object Recognition and Template Matching

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Ensemble Data Mining Methods

ILLUMINATION NORMALIZATION BASED ON SIMPLIFIED LOCAL BINARY PATTERNS FOR A FACE VERIFICATION SYSTEM. Qian Tao, Raymond Veldhuis

Architecture for Mobile based Face Detection / Recognition

Advances in Face Recognition Research Second End-User Group Meeting - Feb 21, 2008 Dr. Stefan Gehlen, L-1 Identity Solutions AG, Bochum, Germany

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

REVIEW OF ENSEMBLE CLASSIFICATION

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Random Forest Based Imbalanced Data Cleaning and Classification

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY Principal Components Null Space Analysis for Image and Video Classification

Interactive person re-identification in TV series

Using multiple models: Bagging, Boosting, Ensembles, Forests

Towards usable authentication on mobile phones: An evaluation of speaker and face recognition on off-the-shelf handsets

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Statistical Machine Learning

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

CS570 Data Mining Classification: Ensemble Methods

New Ensemble Combination Scheme

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Data Mining Methods: Applications for Institutional Research

Dissimilarity representations allow for building good classifiers

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Local features and matching. Image classification & object localization

Classification of Bad Accounts in Credit Card Industry

From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

Supporting Online Material for

A Mobile-based Face Verification System

Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Review Jeopardy. Blue vs. Orange. Review Jeopardy

The Scientific Data Mining Process

Data Mining - Evaluation of Classifiers

Introducing diversity among the models of multi-label classification ensemble

Chapter 6. The stacking ensemble approach

Inductive Learning in Less Than One Sequential Data Scan

An Experimental Study on Rotation Forest Ensembles

Image Normalization for Illumination Compensation in Facial Images

Predicting borrowers chance of defaulting on credit loans

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Advanced Ensemble Strategies for Polynomial Models

Face Recognition with Local Binary Patterns, Spatial Pyramid Histograms and Naive Bayes Nearest Neighbor classification

Getting Even More Out of Ensemble Selection

A KERNEL MAXIMUM UNCERTAINTY DISCRIMINANT ANALYSIS AND ITS APPLICATION TO FACE RECOGNITION

Environmental Remote Sensing GEOG 2021

Novelty Detection in image recognition using IRF Neural Networks properties

Knowledge Discovery and Data Mining

On-line Boosting and Vision

Bootstrapping Big Data

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distributed forests for MapReduce-based machine learning

Unsupervised Joint Alignment of Complex Images

Sub-class Error-Correcting Output Codes

Support Vector Machines with Clustering for Training with Very Large Datasets

CS231M Project Report - Automated Real-Time Face Tracking and Blending

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Data, Measurements, Features

Multi-class Classification: A Coding Based Space Partitioning

Principal Gabor Filters for Face Recognition

The Implementation of Face Security for Authentication Implemented on Mobile Phone

Transcription:

Resampling for Face Recognition Xiaoguang Lu and Anil K. Jain Dept. of Computer Science & Engineering, Michigan State University East Lansing, MI 48824 {lvxiaogu,jain}@cse.msu.edu Abstract. A number of applications require robust human face recognition under varying environmental lighting conditions and different facial expressions, which considerably vary the appearance of human face. However, in many face recognition applications, only a small number of training samples for each subject are available; these samples are not able to capture all the facial appearance variations. We utilize the resampling techniques to generate several subsets of samples from the original training dataset. A classic appearance-based recognizer, LDA-based classifier, is applied to each of the generated subsets to construct a LDA representation for face recognition. The classification results from each subset are integrated by two strategies: majority voting and the sum rule. Experiments conducted on a face database containing 206 subjects (2,060 face images) show that the proposed approaches improve the recognition accuracy of the classical LDA-based face classifier by about 7 percentages. 1 Introduction Human face recognition has been drawing a lot of attention in the past decade. A number of face recognition algorithms have been investigated [21] and several commercial face recognition products [9][20] are available. However, robust face recognition in unconstrained environments is still a very challenging problem. A face recognition system has two stages, training and test. In real applications, current face recognition systems encounter difficulties due to the small number of available training face images and complicated facial variations during the testing stage. In other words, available training samples are not very representative. Human face appearance has a lot of variations resulting from varying lighting conditions, different head poses and facial expressions. Examples of these variations for one subject are illustrated in Fig. 1. In real-world situations, only a small number of samples for each subject are available for training. These samples cannot capture all the possible facial variations. Among the face recognition algorithms, appearance-based approaches [2][19], which utilize the intensity or intensity-derived features of original images, have been successfully developed [21][13]. The dimensionality of the feature vector This research was supported by NSF IUC on Biometrics (CITeR), at West Virginia University.

used by these methods is often very high while the training sample size is relatively small. The classifier based on such training data may be biased and have a large variance, resulting in a poor performance [10][17]. To improve the performance of the weak classifiers, a number of approaches have been presented [4][6][8]. Breiman [4] proposed a bootstrapping-aggregating (bagging) method. The training set is randomly resampled with replacement to generate independent bootstrap replicates. A classifier is developed based on each replicate. Finally, the majority voting is applied to integrate results of all the classifiers. Freund and Schapire [6] have developed a boosting algorithm, which trains a series of classifiers based on the reweighted training set in a sequential mode. The final decision is made by majority voting. In the random subspace method [8], classifiers are constructed in the random subspaces of the feature space. Simple majority voting is used as the final decision rule. Skurichina and Duin analyzed these methods for linear classifiers [17][16]. Bolle et al. [3] used the bootstrap techniques for evaluating authentication systems. The boosting has been used to several applications, such as text categorization [15] and image retrieval [18]. Guo and Zhang [7] applied boosting for fast face recognition. Fig. 1. Facial variations under different lighting conditions and facial expressions for the same subject [1] We propose a resampling-integration scheme for face recognition. A resampling technique is utilized to generate a number of subsets from the original training dataset. A classic appearance-based face recognizer based on the LDA representation is constructed on each of the generated subsets. Two integration strategies, majority voting and the sum rule, are used to combine the classification results to arrive at the final decision. In section 2, the resampling and integration scheme is presented. Section 3 provides the experimental results and discussion. Conclusions are summarized in section 4. 2 Resampling and Integration 2.1 System Overview Our resampling-integration scheme is illustrated in Fig. 2. The training dataset contains a small number of sample face images. A number of subsets are generated by resampling the training set. Each subset S i is used to train a classifier C i. In the test stage, the test image is loaded into each component classifier. Two strategies, (i) simple majority voting, and (ii) the sum rule, are used to integrate the outputs of component classifiers; the classifier outputs can be either the classification labels or the matching scores. Currently, the face recognizer based on

LDA representation is used as the component classifier, but this framework does not limit the component classifiers to be of the same type. Fig. 2. The Resampling-Integration scheme for face recognition. S 1 to S K are the subsets resampled from the original training dataset. C 1 to C K are classifiers trained using the corresponding subsets. Here, K is the total number of subsets 2.2 LDA-based Face Classifier A two-dimensional face image is considered as a vector, by concatenating each row (or column) of the image. Let X = (x 1, x 2,..., x i,..., x N ) denote the data matrix, where N is the number of face images in the training set. Each x i is a face vector of dimension n, concatenated from a p p face image, where n represents the total number of pixels in the face image and n = p p. The Linear Discriminant Analysis (LDA) [5][2] representation is a linear transformation from the original image vector to a projection feature vector, i.e. Y = W T LDAX, (1) where Y is the d N feature vector matrix, d is the dimension of the feature vector, d n and W LDA is the transformation matrix, derived by W LDA = arg max W W T S B W W T S W W, (2) where S B is the between-class scatter matrix and S W is the within-class scatter matrix, c S B = N i (x i m)(x i m) T, (3) i=1

S W = c i=1 x k X i (x k m i )(x k m i ) T. (4) In the above expression, N i is the number of training samples in class i; c is the number of distinct classes; m is the mean vector of all the samples, i.e., m = N i=1 x i; m i is the mean vector of samples belonging to class i and X i represents the set of samples belonging to class i. In the face recognition problem, if the within-class scatter matrix S W is singular, due to the facts that the rank of S W is at most (N c) and the number of training samples is generally less than the dimensionality of the face image (number of pixels), PCA [19] can be used to reduce the dimensionality of the original face image space [2] prior to applying LDA. LDA derives a low dimensional representation of a high dimensional face feature vector space. The face vector is projected by the transformation matrix W LDA. The projection coefficients are used as the feature representation of each face image. The matching score between the test face image and the training image is calculated as the cosine value of the angle between their coefficients vectors. A larger matching score means a better match. 2.3 Resampling The resampling module generates a number of subsets from the original training set. A number of resampling methods have been proposed in the literatures. For instance, in classic bagging [4], a random sampling with replacement is used to generate independent bootstrap replicates where the size of the subset is the same as that of the original set. In the LDA based face recognition, both intra- and inter-class information (between-class scatter matrix and within-class scatter matrix) are utilized, so our sampling strategy does not randomly sample the whole training set, but does randomly sampling within each class (subject), subject to the following conditions: 1. The number of sample images for each subject in the subset is equal or as equal as possible. 2. Sampling within each class is achieved based on a uniform distribution. The requirements listed above may not be the optimal ones, but work well as demonstrated by the empirical evaluation. 2.4 Integration After resampling, several LDA-based classifiers are constructed. The matching scores between the test face image and the training images are computed by each component classifier. Let M S(i, j) be the matching score between the test image and the j th training image, calculated by the i th component classifier. For the i th component classifier, the classification result for the test image is the subject label, denoted by Label(i). This classification can be achieved by the nearest neighbor rule.

Two strategies for integration are applied, namely the simple majority voting and the sum rule. 1. Simple majority voting Assign the test image with the label which appears most frequently in Label(i), where i = 1... K. 2. The sum rule Calculate MS j = K i=1 MS(i, j). Assign the test image with the label of the J th training image, such that J = arg max MS j. (5) j The integration rules may not give desired results when the number of component classifiers (K) is too small. But due to the resampling scheme presented, here K could be as large as needed. 3 Experiments and Discussion Our database is a union of four different face databases, which are available in the public domain (see table 1). It contains 2,060 face images of 206 subjects, with 10 images per subject. The set of face images contains variations in pose, illumination and expression. Some images in the individual databases were not selected for our experiments because they either had out-of-plane rotation by more than 45 degrees or were occluded due to sun glasses or a scarf. Sample images from the databases are shown in Fig. 3. Face images are closely cropped to include only the internal facial structures such as the eyebrows, eyes, nose and mouth, and aligned by the centers of the two eyes. All cropped images are resized to 42 42 pixels. Each image vector is normalized to be of unit length. Face database Table 1. Database description number of subjects Variations included ORL [14] 40 Slight pose and expression Yale [1] 15 Illumination and expression AR [12] 120 Illumination and expression NLPR+MSU 31 Slight pose and expression (collected by the authors) The entire face database is divided into two parts. Nine images of each subject are used to construct the original training data and the remaining one is used for testing. This partition is repeated 10 different times so that every image of the subject can be used for testing. The recognition accuracy is the average of these ten different test sets. In resampling, 8 of 9 images for each subject are randomly selected according to the uniformly distributed seeds between 1 and 9. The sampling is without replacement. Each subject has different random generated

Fig. 3. Representative face images in the database. (a) ORL, (b) Yale, (c) AR and (d) NLPR+MSU seeds. Consequently, each resampled subset contains 8x206=1,648 images. The LDA-based classifier is trained on this subset. The component classifiers compute the cosine value of the angle between the two projection coefficients vectors (one from the test image and the other from the database image) as the matching score. Database image with the best match is used to determine the classification result of the input image from the component classifier. The recognition accuracy of different face recognition schemes is listed in table 2. Figure 4 shows some images which were misclassified by the classic LDA-based face recognizer but correctly classified using the proposed scheme. Table 2. Recognition accuracy (The number of resampled subsets, K = 20.) Without resampling Resampling + Majority Voting Resampling + Sum rule 81.0% 88.7% 87.9% Fig. 4. Examples which are misclassified by classic LDA-based face recognizer but correctly classified using the proposed scheme The number of subsets, K, is decided empirically. In order to analyze the influence of K in our scheme, we conducted experiments with different settings of K values, from 1 to 20. Figure 5 demonstrates the recognition accuracy of the

proposed recognition schemes as the number of subsets changes. These results show that the proposed resampling-integration scheme generally improves the performance of the LDA-based face classifier as K increases up to 20. Fig. 5. Recognition accuracy with respect to the number of subsets 4 Conclusions and Future Work The resampling-integration scheme is proposed to improve the recognition accuracy of face classification. The resampling is designed to generate a number of subsets, which are used to train the component classifiers (parameter adjustment). The integration rules are applied to combine the outputs of component classifiers for the final decision. Two integration rules are presented and corresponding experiments are carried out. The LDA-based face classifier is integrated into the scheme with the corresponding resampling design. Experiments conducted on a face database containing 206 subjects (2,060 face images) show that the recognition accuracy of the classical LDA-based face classifier is improved by applying the proposed scheme. The system framework is scalable in terms of the number of subsets, the type of component classifiers and resampling techniques. Different resampling techniques can be explored in this scheme. Since the subsets are resampled randomly, the resulting component classifiers may have different weights in the final decision. Some classifier selection technique can be applied. Although in our experiments, all component classifiers are LDAbased, the presented scheme does not limit the type of the component classifier. However, currently there is no guarantee that the proposed scheme always works for any type of classifiers. From the perspective of classifier combination, many other integration rules can be tried out [11].

References 1. Yale University face database. <http://cvc.yale.edu/projects/yalefaces/yalefaces.html>. 2. P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):711 720, Jul. 1997. 3. R. Bolle, N. Ratha, and S. Pankanti. Evaluating authentication systems using bootstrap confidence intervals. In Proc. 1999 IEEE Workshop on Automatic Identification Advanced Technologies, pages 9 13, Morristown NJ, 1999. 4. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123 140, 1996. 5. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, New York, 2nd edition, 2000. 6. Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In Proc. International Conference on Machine Learning, pages 148 156, 1996. 7. Guo-Dong Guo and Hong-Jiang Zhang. Boosting for fast face recognition. In Proc. IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pages 96 100, 2001. 8. T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(8):832 844, 1998. 9. Identix. <http://www.identix.com/>. Minnetonka, MN. 10. A. K. Jain and B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition practice. Handbook of Statistics, P. R. Krishnaiah and L. N. Kanal (eds.), 2:835 855, 1987. 11. J. Kittler, M. Hatef, R. Duin, and J. Matas. On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(3):226 239, 1998. 12. A.M. Martinez and R. Benavente. The ar face database. CVC Tech. Report # 24, Jun. 1998. 13. P. Jonathon Phillips, Hyeonjoon Moon, Syed A. Rizvi, and Patrick J. Rauss. The feret evaluation methodology for face-recognition algorithms. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(10):1090 1104, 2000. 14. Ferdinando Samaria and Andy Harter. Parameterisation of a stochastic model for human face identification. In Proc. 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, Dec. 1994. 15. R.E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2-3):135 168, May/June 2000. 16. M. Skurichina and R.P.W. Duin. Bagging for linear classifiers. Pattern Recognition, 31(7):909 930, 1998. 17. M. Skurichina and R.P.W. Duin. Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis and Applications, 5(2):121 135, 2002. 18. K. Tieu and P. Viola. Boosting image retrieval. In Proc. CVPR, 2000. 19. M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71 86, Mar. 1991. 20. Viisage. <http://www.viisage.com/>. Littleton, MA. 21. W. Zhao, R. Chellappa, A. Rosenfeld, and P.J. Phillips. Face recognition: A literature survey. CVL Technical Report, University of Maryland, October 2000. <ftp://ftp.cfar.umd.edu/trs/cvl-reports-2000/tr4167-zhao.ps.gz>.