Neurocomputing ] (]]]]) ]]] ]]] Contents lists available at ScienceDirect. Neurocomputing. journal homepage:

Transcription

1 Neurocomputing ] (]]]]) ]]] ]]] Contents lists available at ScienceDirect Neurocomputing journal homepage: LIFT: A new framework of learning from testing data for face recognition Yuan Cao a, Haibo He b,, He (Helen) Huang b a Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 070, USA b Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA article info Article history: Received July 09 Received in revised form 23 September 10 Accepted 18 October 10 Communicated by D. Xu Keywords: Face recognition Semi-supervised learning One-against-all Feature extraction Data quality abstract In this paper, a novel learning methodology for face recognition, LearnIng From Testing data (LIFT) framework, is proposed. Considering many face recognition problems featured by the inadequate training examples and availability of the vast testing examples, we aim to explore the useful information from the testing data to facilitate learning. The one-against-all technique is integrated into the learning system to recover the labels of the testing data, and then expand the training population by such recovered data. In this paper, neural networks and support vector machines are used as the base learning models. Furthermore, we integrate two other transductive methods, consistency method and LRGA method into the LIFT framework. Experimental results and various hypothesis testing over five popular face benchmarks illustrate the effectiveness of the proposed framework. & 10 Elsevier B.V. All rights reserved. 1. Introduction Recently, many new theories and methodologies for face recognition have been developed in the community, and many new algorithms and practical tools have been designed and successfully applied to a wide range of applications, such as biometrics, surveillance, human computer interface, information security, and others [12,59,49,18]. Generally, the face recognition problem aims to identify or verify one or more persons in still or video images of a scene based on a stored database. Three important subtasks are generally considered in the solution to a face recognition problem: face segmentation/detection, feature extraction, and face recognition/ identification. For instance, face segmentation/detection aims to detect and localize an unknown number (if any) of faces from a simple or complex background in a still image or video stream data [22]. In this paper, we focus on the latter two phases and address the face recognition problem as to predict the identity label of a given still face image by designing an effective learning method. Feature extraction is an important step for a successful face recognition approach. Generally speaking, feature extraction techniques for face recognition can be categorized into two types [8]: feature-based matching and template matching (also called holistic [15]) methods. In the feature-based matching approach, the most characteristic face components (eyes, nose, mouth, chin, etc.) Corresponding author. addresses: ycao@stevens.edu (Y. Cao), he@ele.uri.edu (H. He), huang@ele.uri.edu (H. (Helen) Huang) and their features (colors, shapes, positions, etc.) are recognized and extracted within a face image. In the template matching approach, important features are extracted from the images and are represented as a bidimensional matrix of intensity values. Refs. [8] and [15] argue that although feature-based approach has many advantages, such as robust performance against rotation/scale and illumination variations, fast computation, and efficient memory utilization, its performance heavily relies on the facial feature detection methods and the quality of individual facial features. On the other hand, since many feature extraction methods, such as the principal component analysis (PCA) [48], independent component analysis (ICA) [1], Fisher s linear discriminant (FLD) or linear discriminant analysis (LDA) [2], and Gabor wavelets [27], have been applied to face recognition problems, the template matching approach has attracted significantly growing attention in the community. For instance, many variants of the aforementioned basic feature extraction methods have been extensively proposed in literature, including the fractional-step linear discriminant analysis (F-LDA) method [], the direct linear discriminant analysis (D-LDA) method [57], the direct fractional-step linear discriminant analysis (DF-LDA) method [31], regularized discriminant analysis (RDA) method [13], among others. In F-LDA [], the concept of fractional dimensionality was introduced and integrated into an incremental dimensionality reduction procedure based on linear discriminant analysis. Due to computational constraints, the traditional LDA approach was performed in the low-dimensional PCA subspace, which may result in a loss of significant discriminatory information contained in the discarded null space. Therefore, D-LDA was proposed to process data directly in the original high-dimensional input space by modifying the simultaneous /$ - see front matter & 10 Elsevier B.V. All rights reserved. doi: /j.neucom Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

2 2 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] diagonalization procedure of the LDA method [57]. DF-LDA [31] combines the strengths of the D-LDA and the F-LDA methods by conducting D-LDA and F-LDA sequentially. Based on LDA, RDA was developed in [13] by using a regularized discriminate scheme instead of optimizing the Fisher index. This scheme was used to solve the small sample-size problem and evaluated on three popular databases, ORL, Yale, and Feret database. However, this method suffers the drawback of high computational load. In [27], the Gabor wavelets were used for face recognition within a dynamic link architecture (DLA) framework. Gabor jets were first calculated and then used for a flexible template comparison between the resulting image decompositions. A Gabor Fisher classifier was proposed in [28], in which an augmented Gabor feature vector was derived from the Gabor wavelet representation of the face images. This method is illustrated to be robust to the variations in illumination and facial expression. Recently, new approaches and methods have been presented in literature. In [52], an iterative algorithm was proposed to rearrange elements of the data matrix in order to maximize intramatrix correlation. This algorithm was extended to supervised learning problems and simulation results demonstrated the effectiveness of the proposed algorithms. Another iterative method, adaptive regularization-based semi-supervised discriminant analysis with tensor representation (ARSDA/T) and its vector-based variant (ARSDA/V) were presented in [51]. In these algorithms, graph Laplacian regularization was used based on the data representation in the low-dimensional feature space. In [54], two null space-based schemes, NS2DLDA and NS2DMFA, extended from the traditional 2-dimensional LDA and MFA algorithms were presented. The proposed schemes could solve the convergence problem and the experiment results over CMU PIE and FERET datasets showed superior performance of the proposed methods when compared to the traditional approaches. Spatially constrained earth mover s distance (SEMD) was used in [53] in order to improve the robustness of the face recognition algorithms against image misalignments. The distances were treated as features in a Kernel Discriminant Analysis framework. Experiments over three benchmark face datasets illustrated the effectiveness of the proposed distance measure approach. In [55], a concurrent subspaces analysis method for object reconstruction and recognition was proposed. A high order tensor object was encoded in multiple concurrent subspaces that were learned by an iterative procedure sequentially. Simulation results on four popular face datasets showed that CSA outperforms the traditional PCA. Another key aspect for face recognition is the underline learning algorithms. For instance, the learning methods such as neural networks, support vector machines (s), k-nearest neighbors (KNN), among others, have been studied extensively in the community for face recognition. For instance, in [15], a high speed face recognition strategy using radial basis function (RBF) neural networks was proposed. This system involves two subtasks: feature extraction based on the discrete cosine transform (DCT) and Fisher s linear discriminant (FLD) analysis, and face classification using RBF neural networks. Simulation results on the Colorado State University (CSU) Face Identification Evaluation System and the Yale Database illustrates that the proposed method can reduce the computational cost and achieve competitive recognition accuracy compare to other face recognition methodologies, such as pseudo-2d hidden Markov models, probabilistic decision-based neural network, among others. A novel neural architecture, PyraNet, was proposed in [39] for visual pattern recognition. PyraNet consists of two layers: a pyramidal layer used for feature extraction and reduction, and a one-dimensional layer used for classification. Five training methods including gradient descent, gradient descent with momentum, resilient backpropagation, Polak Ribiere conjugate gradient, and Levenberg Marquadrt are analyzed in PyraNet for visual pattern recognition. In [], binary s are used to tackle the face recognition problem. One-against-one technique and a bottom-up binary tree structure were employed to solve a multiclass face recognition problem. Refs. [24] and [25] also discussed s in the context of face recognition. The authors argued that s can capture the relevant discriminatory information from the training data and provide superior learning performance compare to other classification methods such as Euclidean distance and normalized correlation method. However, if the data have been preprocessed by some feature extraction techniques that can capture the discriminatory information, such as FLD, s may not be able to outperform other classification methods. In [33], nearest neighbor classifier was developed for face recognition, in which a new linear feature extraction technique was proposed by transferring the problem of finding the optimal linear projection matrix in feature extraction to a classification problem that is solved by AdaBoost algorithm and multitask learning theory. In [26], a face recognition scheme was proposed by combining wavelet decomposition, Fisherface method, and fuzzy integral. The wavelet decomposition and Fisherface method are used to extract important features from the image, and fuzzy integral method is used to combine multiple classifiers that are trained on different subspaces generated by the wavelet decomposition. The effectiveness of the proposed method is indicated by simulation results over the Chungbuk National University face database and Yale database. In this paper, we assume a learner is provided with limited training data, whereas a large amount of testing data. Under this scenario, we propose a new learning methodology, Learning From Testing data (LIFT) framework, by exploring useful information from the accessible testing data to facilitate the final decisionmaking processes. Meanwhile, since the proposed framework has a modularized structure, the method in each module of the framework can be replaced by other approaches. For instance, in this paper, we substitute the one-against-all strategy with two other transductive learning methods, consistency method [63,61] and LRGA method [56], in the data selection step of the proposed framework. Consistency method and LRGA method employ manifold learning and are developed based on the global patterns and local structures in the data. The rest of the paper is organized as follows. Section 2 formulates the problem addressed in this paper. Section 3 presents the details of the LIFT approach for face recognition problems. System level framework and a learning algorithm are proposed in this section. Experimental results on five popular face databases and statistical analysis of these results are presented in Section 4 to show the effectiveness of this method. Meanwhile, the performance of the variants with consistency method and LRGA method over the five face datasets are presented. In Section 5, a brief analysis of the data quality is provided. Finally, a conclusion and a brief discussion on future research directions are outlined in Section Problem formulation In the traditional face recognition problems, we generally assume an adequate and representative training data is available to develop the decision boundaries for future prediction. However, in many realworld applications, collecting and acquisition of labeled face images is often expensive and time consuming. Meanwhile, such image collection process normally requires the efforts of experienced human annotators, which is not suitable for the automated face recognition systems. This introduced the semi-supervised learning scenario [64,42]. Generally speaking, the key idea of semi-supervised learning is to exploit the unlabeled training examples together with the labeled ones to modify and refine the hypothesis to improve learning accuracy [64,37,10,]. For instance, self-training methods [] first develop an initial classifier with labeled data examples alone. Then this classifier is used to recover the unlabeled data examples and append them to the labeled data to retrain the classifier. This Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

3 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 3 procedure is repeated, and each time the most confident unlabeled examples will be labeled with the estimated labels. In this way, the classifier uses its own knowledge to teach itself iteratively. Some other representative work includes co-training methods [36,58,6,44], semi-supervised support vector machines [11,3],graph-based methods [62,5], EM with generative mixture models [38,], among others. In many applications, it is not uncommon that only scarce labeled training images are available originally, whereas large amount of unlabeled images become available at the online testing stage. For instance, in [46], an active learning paradigm was presented, in which the testing phase was regarded as the beginning of a machine learning experiment instead of the end in the traditional approaches. Therefore, the testing data were used iteratively to evaluate the learning process and, if necessary, construct the new training dataset in order to cover the instance space more completely. This learning paradigm was illustrated with a case study of the robotic soccer game. A semisupervised PCA-based face recognition algorithm based on selftraining was proposed in [41]. Unlabeled images are used to update the eigenspace and the templates in order to improve the performance of the face recognition systems. In our previous study, we proposed an iterative learning strategy of the incremental semisupervised learning problem by adaptively recovering the labels for the testing data that become available incrementally [9]. Motivated by these ideas, in this paper we consider the following face recognition problem: given inadequate labeled training images, can one use the unlabeled testing images to improve the recognition performance? To address this problem, we propose a novel face recognition framework by recovering the labels of the testing images and moving the most confidently recovered testing images into the training set to facilitate learning and recognition. To our best knowledge, this is the first study to regard the unknown testing images as a new source of information in the face recognition problems. One of the main contributions of this paper is that we provide a new direction of understanding the semi-supervised learning. Compared to our previous work in [9], in this article we develop a general-purpose learning framework by combining the feature extraction/reduction method, such as PCA, the one-against-all strategy, and the computational intelligence methods, such as neural networks and support vector machines as discussed in this paper. These techniques enable the proposed framework to deal with the high-dimensional and multiple-class databases effectively, which makes the proposed framework suitable for most face recognition problems. We investigate the use of the proposed method in the context of the face recognition problems, and test it on five popular face databases to illustrate its effectiveness. In this work, we also analyze several aspects of the data quality of the recovered testing data, such as the size, the accuracy rate, and the error type, in detail and show the impact of these attributes of the recovered data on the performance of the final decision-making processes for face recognition. Consider an original training dataset D tr with n tr samples, which can be represented as fx q,y q g,q ¼ 1,...,n tr, where x q is a face image sample and y q AY ¼f1,...,Cg is the subject identity label associated with x q. We assume that the testing dataset D te with n te images is available without the identity labels, i.e., D te can be represented as fx p g,p ¼ 1,...,n te. Moreover, we assume that n te is much greater than n tr. Due to the inadequate training dataset, the hypothesis built on D tr cannot provide satisfactory prediction performance. Therefore, the objective here is to design an effective learning framework to exploit the useful information from the testing samples in order to improve the face recognition performance. Before we proceed to the details of the proposed framework, we would like to note the major differences between the problem we aim to tackle in this paper compared to the traditional semi-supervise learning problems. In the traditional semi-supervised learning scenario, one aims to exploit the unlabeled training examples to benefit the learning process, which can be accomplished based on the labeled training data information. In this paper, we are interested in finding potential useful information from the testing data to improve the decision-making process. In addition, we investigate the impact of the data quality of the recovered testing data on the accuracy performance of the face recognition system. From our observation, three attributes of the recovered testing dataset greatly affect the system performance, including the sample sizes, the error rates, and the error types. Various experiments on five frequently used face benchmarks are used to demonstrate the effectiveness of the proposed framework by using neural networks and s with different kernel functions as thebaseclassifier. 3. LIFT: learning from testing data framework We propose the LIFT learning framework as illustrated in Fig. 1. Briefly speaking, the LIFT framework consists of three phases: feature extraction, data selection, and final training. All the images first go through a preprocessing procedure for dimensionality reduction to facilitate learning. Then the one-against-all technique with a base learning model is developed to estimate the identity Fig. 1. The proposed LIFT system diagram. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

4 4 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] labels of the testing images. By adding the most confident testing images with the estimated labels into the original training dataset, the final classification hypothesis will be built on the expanded training dataset. We will present this system in detail in the following sections Feature extraction Similar to many of the existing face classification research, we use the PCA method for feature extraction to handle the highdimensional facial database. In PCA, feature extraction is conducted on the original face images in order to find the subset of basis images and in this new feature space, the original images can be represented by the coordinates that are uncorrelated [48]. According to the well-known eigenface methods proposed in [48], we generate the eigenfaces that correspond to the eigenvectors associated with the dominant eigenvalues of the facial image covariance matrix. These eigenfaces define the new feature space that greatly reduces the dimensionality of the original space, which allows the efficient learning from the reduce feature space. Specifically, we take the first N principal components and generate the subspaces with dimensionality of N for all face images. After this phase, each face image is expressed as a pair of fx i,y i g, where x i is an image vector with size of N, and for the images in the training set, y i AY ¼f1,...,Cg is the class identity label associated with x i, while for those in the testing set, y i ¼0 stands for an unknown label. In our current experiments, we set N to 100. We would like to note that other dimensional reduction methods can also be integrated into the proposed LIFT framework. Interested readers can find more details on this issue in [1,2] Data selection Due to the inadequate training samples, the classifier obtained based on such limited training data may not provide accurate and robust classification performance. The key question is whether one can take advantage of the testing data itself to benefit the learning process. To this end, we propose to use the one-against-all technique to estimate and recover the labels of the testing images to augment thetrainingdatasize. One-against-all method [4,29] is a standard technique to solve multiclass classification problems by transforming a multiclass classification problem to multiple binary classification problems. By focusing on one class each time, the one-against-all method can provide well-suited classification capability. For class label i, we partition the training dataset D tr into two subsets: D i tr that contains all the examples with label i and D i tr that contains all the examples that do not belong to class i. All the examples in D i tr are labeled as 1 and all the examples in D i tr are labeled as 2. Then a hypothesis h i is trained based on the newly labeled training data. Once the hypothesis h i is developed, all the testing examples are applied to the hypothesis to predict if these examples belong to class i or not. If the recovered label is 1, then we consider that this example may belong to class i and this example is added to the recovered testing dataset D i re,otherwise, this example is skipped for class i and may be evaluated for other class labels. Note that any testing example that is predicted to two or more different classes will be excluded from the recovered testing datasets. Finally, the recovered testing datasets for all labels are combined to form the recovered testing dataset D re as D re ¼ D 1 re [ D2 re [...[ DC re. Fig. 2 illustrates an example of the one-against-all technique by considering the class 3 data of the Yale face database used in this paper. We first divide all the training images into two groups, those that belong to class 3 and those that do not belong to class 3. The hypothesis h 3 is trained to predict the class 3 and non-class 3 examples. Then, this hypothesis is used to evaluate the testing dataset. Those that are Fig. 2. An example of the one-against-all technique. (a) The decision boundary of hypothesis h 3 based on the training data; (b) the prediction of the testing data based on the hypothesis h 3. classified as class 3, i.e., the images left to the h 3 decision boundary, are added to the training dataset. One should note that due to the limited learning capability of the learning method used to generate h i,thefalse positive failure (FP) (i.e., predicting an example as class i when the correct label is not i) and false negative failure (FN) (i.e.,predictingan example not as class i when the correct label is i) may occur when evaluating the testing data. For instance, the circled image left to the decision boundary h 3 is a false positive example. The correct label of this image is class 2 but it is recovered incorrectly as class 3. On the other hand, the circled image right to the h 3 boundary is a false negative misclassification, i.e., the correct label of this image is class 3 while it is misclassified not to class 3. Due to the existence of the false positive and false negative failures, the recovered data from the testing dataset may contain some misclassified examples. In this case, the inaccurate information learned from the testing data may undermine the final prediction performance. We will discuss the impact of the quality of the recovered data on the classification performance of the system in further detail in Section Final training Based on the discussions in Sections 3.1 and 3.2, we have three datasets, the training dataset D tr, the testing dataset D te, and the recovered dataset D re. We add D re into D tr and form an augmented training set ^D tr : ^D tr ¼ D tr [ D re. Based on ^D tr, we develop the final hypothesis h f for the final face recognition. The objective of the LIFT framework is to design an effective learning methodology to exploit the useful information from the testing samples in order to improve the face recognition performance. The main procedure of the learning framework is summarized as follows. Algorithm 1 (LIFT-Learning Algorithm). Input: Initial training dataset D tr ¼fx q,y q g,q ¼ 1,...,n tr, where x q is a face image sample and y q AY ¼f1,...,Cg is the subject identity label associated with x q ; Available testing dataset D te ¼fx p g,p ¼ 1,...,n te ; Recovered testing dataset D re that is empty initially D re ¼ F; Learning algorithm Learn1 used in the data selection phase; Learning algorithm Learn2 used to generate the final hypothesis; Procedure: Preprocessing (feature extraction) (1) PCA is performed to all images in D tr and D te, the first N principal components are chosen to transform all images into vectors with length N; Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

5 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 5 Learning from testing data (data selection) Do for each potential class label i,iay ¼f1,...,Cg: (1) Partition D tr into D i tr and Di tr where D i tr ¼ffx k,y k g : fx k,y k gad tr,y k ¼ ig D i tr ¼ffx,y g : fx,y gad tr,y aig (2) Label all examples in Di tr as class 1 and all examples in Di tr as class 2 and form a binary classification training set, ^D i tr ¼ffx k,1g : y k ¼ ig[ffx,2g : y aig. (3) Train Learn1 on ^D i tr and return a hypothesis h i. (4) Apply the testing dataset D te to h i, and return the predicted labels ^y q. Add all testing examples that are predicted to class 1(i.e., class i), D i re ¼ffx q,ig : x q AD te, ^y q ¼ ig, into D re : D re ¼ D re [ D i re ð2þ Final training (1) Exclude any testing example that is recovered to two or more different classes from D re. (2) Combine D tr and D re to form the augmented training dataset ^D tr : ^D tr ¼ D tr [ D re ð3þ (3) Train Learn2 on ^D tr and return the final hypothesis H. Output: the final hypothesis H. Fig. 3 visualizes the main procedure of the proposed LIFT learning framework. In the traditional face recognition approaches, the training images and the testing images are handled separately. Normally, the hypothesis is obtained only based on the training images and applied to the testing images to predict the labels. Instead, the LIFT learning framework proposed in this paper aims to take advantage of the vast amount of the unlabeled testing images and learn from this unknown information. The recovered testing data can be integrated into the training process and largely improve the accuracy and robustness of the final hypothesis. Since many face recognition problems are characterized by the inadequate labeled training images and the large amount of testing images, we believe that this idea may provide important new insights to the face recognition applications. One should also note that LIFT is a general learning framework allowing a wide range of choices of the base classification models for Learn1 and Learn2. For instance, different kinds of the base learning algorithms, such as neural networks, s, decision tree, among others, can be integrated into this framework. Furthermore, the users can also choose different learning schemes for Learn1 and Learn2 in different applications. For example, when only weak learners that can merely do better than random guessing are available, then bootstrap aggregating (bagging) or boosting algorithm can be employed to construct a much stronger learner from these weak learners [16,17,7]. This provides the flexibility of using this framework as a general learning methodology in a wide range of real-world applications. ð1þ 4. Experimental results and analysis Five popular face benchmarks, the Yale face (YALE) database [2], the extended Yale face (EYB) database B [19], Cambridge ORL face (ORL) database [43], the CMU PIE face (PIE) database [] and the Japanese Female Facial Expression (JAFFE) database [32], are used to investigate the performance of the proposed LIFT learning framework. We obtained these databases from [14] and [23].TheYaleface database contains 165 grayscale face images for 15 persons with 11 images for each person. The extended Yale face database B consists of 2414 images of 38 subjects around 64 near frontal images under different illuminations per individual. The ORL database of faces consists of 0 images, which are 10 different images for distinct subjects. The CMU-PIE database contains images of 68 subjects around 170 images for each subject. The JAFFE database contains 213 images of seven facial expressions (angry, disappointed, fearful, happy, sad, surprised and neutral) posed by 10 Japanese female models. Each image in the first four databases is originally cropped into pixels and expressed as a 1024-dimensional vector. For JAFFE, the images are originally pixels. We first crop the images in JAFFE into pixels, and then use a 4 4averagefilter to reduce the JAFFE to pixels that are represented as a 1024-dimensional vector as the other four datasets. Table 1 summarizes the benchmark characteristics and Fig. 4 shows some examples of the databases used in the experiments. Because PCA is independent on the identity labels of the data, we can first apply the PCA paradigm on all the images to extract the face features from each image and compress the image vectors significantly. In our current experiments, we choose the first 100 principle components and transform each image to a 100-dimensional vector. Then, we randomly partition the whole image dataset into a training dataset and a testing set. For example, in the Yale Face database, for each subject, we randomly select three images as the training samples and use the remaining eight images as testing images. Therefore, a total of images are used for training and the other 1 images for testing. Table 2 shows the configurations of the training sets and the testing sets for the five image databases. Because some face datasets are not well-balanced, the numbers of the testing samples for each class in these datasets may not be the Table 1 The benchmark characteristics used in this paper. # example # class # feature YALE EYB ORL PIE JAFFE Fig. 3. Block diagram of the main procedure of the proposed LIFT learning framework. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

6 6 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] Fig. 4. Examples of the five face databases used in the experiments. (a) The Yale face database; (b) the extended Yale face database B; (c) the ORL database; (d) the CMU-PIE database; (e) the JAFFE database. Table 2 The configuration of training set and testing set in each experiment. The training set The testing set Each class Total Each class Total YALE EYB ORL PIE JAFFE Table 3 Testing error performance comparison (in percentage). NN (Linear kernel) (Poly. kernel) (RBF kernel) Trad. LIFT Trad. LIFT Trad. LIFT Trad. LIFT YALE EYB ORL PIE JAFFE same. For example, for PIE benchmark, there is only 126 testing samples for the class 38 subject, whereas 1 for other classes; for EYB benchmark, the numbers of the testing samples for each class range from 44 to 49; for JAFFE benchmark, the numbers of the testing samples for each class range from 16 to 19. In our experiments, we verify our framework by using two different sets of the base algorithms, the neural networks with multi-layer perceptron (MLP) structure and. Moreover, we adopt the same base algorithms for Learn1 and Learn2. In other words, in the first set of experiments, we use neural networks for Learn1 and Learn2, in which the number of hidden neuron is, and the number of input neurons and output neurons are equal to the number of dimensions and classes for each dataset, respectively. Sigmoid function is used for the activation function and backpropagation is used to train the network. Parameter settings for the neural networks include a learning rate of and a training cycle of 00. In the second set of experiments, we use (linear kernel, polynomial kernel with degree of 3, and radial basis function (RBF)) for both Learn1 and Learn2. We compare our LIFT learning framework to the traditional learning scheme in which the final hypothesis is generated only based on the training dataset with the same base learning model. Tables 3 and 4 show the averaged testing error performance and error standard deviations of the results of the 100 random runs for the LIFT algorithm as well as the traditional learning method for the five benchmarks. From this table, one can see that the proposed LIFT framework can provide better classification accuracy performance over the traditional learning method. Table 4 Testing error standard deviation (in percentage). In order to further investigate the performance improvement of the proposed framework over the tradition method, we compare the statistical characteristics of the results of all the 100 runs from both methods by using two difference testing schemes, hypothesis testing of the average values and box plot. In the hypothesis testing, we calculate the mean and the standard deviation that are shown in Tables 3 and 4 using the following equations [34,21]: m ¼ 1 n X n i ¼ 1 NN err i (Linear kernel) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n P n i ¼ 1 s ¼ err2 i ðp n i ¼ 1 err iþ 2 nðn 1Þ (Poly. kernel) (RBF kernel) Trad. LIFT Trad. LIFT Trad. LIFT Trad. LIFT YALE EYB ORL PIE JAFFE ð4þ ð5þ Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

7 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 7 We formulate the hypothesis as: Null hypothesis: H 0 : m 1 ¼ m 2 Alternative hypothesis: H 1 : m 1 am 2 Table 5 Hypothesis testing results of LIFT and traditional method. NN (Linear kernel) (Poly. kernel) ð6þ ð7þ (RBF kernel) YALE EYB ORL PIE JAFFE The test statistic is calculated as follows: Z ¼ m m 1 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffi s 2 1 n 1 þ s2 2 n 2 For a two-tailed test, we will reject H 0 if jzjo2:33. (2.33 is for a two-tailed test where the results are significant at a level of 0.02). Table 5 shows the hypothesis testing results. From Table 5 we can see all results are greater than 2.33 and most of them are even greater than 10. Therefore, we accept the alternative hypothesis H 1, which means there is statistically significant difference in the classification performance of the traditional method and the proposed LIFT framework. In others words, LIFT can significantly improve the recognition performance of the tradition method. The Boxplot method is a standard technique to depict groups of numerical data by presenting their 5-number summary including the minimum and maximum range values, the upper and lower quartiles, and the median [47,]. We have investigated the boxplot results for all the five face image sets for the tradition method and LIFT framework using MLP and s (linear kernel, polynomial kernel, and RBF kernel). Figs. 5 8 provide several snapshots of this analysis for the EYB database using the four base learners. The left parts of these figures are the error ð8þ 55 Traditional LIFT run 15 NN_Traditional NN_LIFT Fig. 5. Error performance and Boxplot results on the EYB database using NN Traditional LIFT run _Traditional (linear kernel) _LIFT (linear kernel) Fig. 6. Error performance and Boxplot results on the EYB database using (linear kernel). Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

8 8 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] Traditional LIFT run _Traditional (polynomial kernel) _LIFT (polynomial kernel) Fig. 7. Error performance and Boxplot results on the EYB database using (polynomial kernel) Traditional LIFT run 12 _Traditional (RBF kernel) _LIFT (RBF kernel) Fig. 8. Error performance and Boxplot results on the EYB database using (RBF kernel). Table 6 Numerical characteristics of the LIFT framework and the traditional strategy on the EYB database (In percentage). NN (Linear kernel) (Poly. kernel) (RBF kernel) Trad. LIFT Trad. LIFT Trad. LIFT Trad. LIFT Largest non-outlier Upper quartile Median Lower quartile Smallest non-outlier rates of the 100 runs and the right parts are the boxplot results of the 100 runs. Table 6 summarizes the corresponding numerical results of the box plot methods on the EYB database. One can see each numerical result of the LIFE framework is smaller than that of the traditional method. In other words, these statistical analysis results indicate that the proposed LIFT framework can greatly improve the classification performance over the traditional learning method. As we discussed in Section 1, the proposed framework has a modularized structure, which means that the methods used in the modules of the framework can be replaced seamlessly. For instance, in the data selection step, instead of using the oneagainst-all strategy, we can use other transductive learning algorithms to explore useful information from the testing dataset. In this work, we investigate the use of the consistency method [63,61] Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

9 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 9 and LRGA method [56] in the data selection step. We use the JAFFE database and (linear kernel, polynomial kernel, and RBF kernel) as learners in the final training step to compare the performance of the LIFT framework with consistency method and LRGA method to the traditional approach in terms of the error rates, standard deviation of the error rates, and hypothesis testing in Table 7. Cross-validation is used to find the parameters used in these two methods. For consistency method, a is set to 0.25, which is consistent with the discussion in [56], and s is set to For LRGA, k is set to 2, and l is set to 10. In Fig. 9, we investigate the error performance of LIFT framework with LRGA method using different k values. The simulation results show that in this particular scenario, better performance can be obtained with a smaller k value. This is probably due to the small size of the datasets and large Table 7 Simulation results of the LIFT framework with consistency method and LRGA method compared to the traditional strategy on the JAFFE database and learners (in percentage). Learner Linear kernel Poly. kernel RBF kernel Trad. Error rate Std Consistency Error rate Std Hypothesis testing vs. Trad LRGA Error rate Std Hypothesis testing vs. Trad number of class categories. From Table 7 one can see that there are significant differences between the two variants of the LIFT framework with the consistency method, LRGA method, and the traditional approach, respectively, with confidence level Furthermore, we compare the proposed framework with a commonly used semi-supervised learning scheme, self-training method [], on the EYB dataset with neural network as the base learner. The experiment is designed in the following way. The EYB dataset is divided to three datasets: the labeled training dataset with 0 images, the unlabeled training dataset with 1057 images, and the testing dataset with the rest 1057 images. In the self-training scheme, we use the labeled training dataset to recover the labels of the unlabeled training dataset, and then with the labeled data and the recovered unlabeled data, a final classifier is trained and applied to the testing dataset. In our learning scheme, only the labeled training data and the testing data are used. Specifically, the labeled training data are used to recover the labels of the testing data, and the information explored from the testing dataset are integrated to the final learning procedure. Table 8 illustrates the simulation results for both methods. We also present the simulation results for the tradition approach, in which only the labeled training data are used to develop the final classifier. Here, and are the hypothesis testing results Table 8 Simulation results of the LIFT framework and self-training scheme compared to the traditional strategy on the EYB database and NN learners (in percentage). Scheme Error rate Standard deviation Hypothesis testing Trad Self-training LIFT Fig. 9. Error performance of the LIFT framework for LRGA method using different k on the JAFFE database and learners. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

10 10 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] among the traditional approach, self-training, and LIFT, respectively, and 2. is the hypothesis testing result between self-training and LIFT. These results show that both the self-training scheme and LIFT outperform the traditional approach. LIFT can achieve better performance compared to the self-training scheme. 5. Data quality for LIFT framework An important question still remains for the proposed framework, i.e., to what extend or under what assumption that the proposed method can benefit the final decision-making processes? In this section, we provide some discussions of the impact of the data quality learned from the testing set on the performance of the face recognition system. Generally speaking, high quality of the recovered data can benefit learning, whereas the recovered testing data with low quality may indeed degrade the performance of the proposed system since it may simply introduce more noise into the learning system. Here we use the YALE benchmark and with a linear kernel as an example to explore how the quality of the recovered data impacts the performance of the LIFT framework. To do this, we explicitly add the recovered testing data into the training set and then train a final hypothesis based on the expanded training set. We adjust three attributes of the added testing dataset, i.e., the sample size, the accuracy rates, and the error type, to investigate their influences to the LIFT framework Size and accuracy rate Fig. 10 shows the adjustments of the sample size and accuracy rates to test their influences to the LIFT framework. The x-axis stands for the size of the recovered testing dataset to be added and the y-axis stands for the accuracy of such recovered testing data. We partition all the testing data (1 samples in this case) into chunks, each chunk with 6 samples. The first k chunks of the testing data are combined to generate the kth block. For example, block 1 contains six samples from chunk 1; block 15 contains 90 samples from chunk 1 to chunk 15. Along the y-axis, we explicitly change the accuracy rates of the recovered class labels from 0% to 100% with step size of 5%. For instance, in block 15 that contains 90 samples, when the accuracy rate is %, then 36 samples will be labeled with correct class labels and the rest 54 samples will be labeled with incorrect class labels Error type When we generate incorrect class labels for the recovered testing data to adjust the accuracy rate, we design two approaches to generate the incorrect labels: random error and biased error. In the random error approach, we randomly pick a class label other than the actual class label for a recovered testing sample. For example, for the recovered testing data x t with an actual class label y t, we randomly generate a label ^y t, ^y t Afy i : y i AY,y i ay t g and use this incorrect label as the recovered label for x t. On the other hand, for the biased error method, we directly use the misclassified label from LIFT framework as the recovered label for x t. Fig. 11 provides an example of how to generate incorrect labels in these two approaches. In this example, we assume that the size of the block is 12 and the accuracy rate is 75%, therefore, the recovered testing set contains nine samples with correct class labels and 3 with incorrect class labels. In the random error approach, all incorrect labels are generated by randomly picking a class label other than the correct label, whereas in the biased error approach, since the last three samples are incorrectly estimated by Learn1 in the LIFT framework in our experiments, then we directly use the incorrect labels estimated by Learn1 as the recovered labels for the last three samples. According to these discussions, Fig. 12 illustrates the results of the two sets of the experiments. In each set of experiment, we evaluate the recognition performance with respect to the changing sample size and accuracy rate of the added testing dataset when different error type methods are adopted. Specifically, in Fig. 12(a), biased error method is used, whereas in Fig. 12(b), random error method is used. In each figure, the x-axis and y-axis are the size and the accuracy of the added testing dataset, respectively, and the z- label is the final classification error rate of the final hypothesis. Since the result of the traditional approach e traditional is a constant value that is independent of the added testing data, therefore, we can draw e traditional as a plane parallel to xy-plane that is illustrated in Fig. 12(a) and (b). From Fig. 12, we can draw the contours of the differences of the performance between the approach based on the augmented training set and the traditional approach, i.e., De ¼ e LIFT e traditional where e LIFT are the error rates of the system based on the augmented training set (the LIFT framework) and e traditional are the error rates of the traditional approach based on the original training dataset. We illustrate the contours of De when biased error and random error method are used in Fig. 13(a) and (b), respectively. From Fig. 13, we can see that if the accuracy rate is high, with the increasing size of the recovered testing dataset, the performance will push to the upper-right corner with larger negative value of De. This means that the proposed LIFT framework can achieve better performance of recognition compare to the traditional approach. However, if the accuracy rate of the added testing Fig. 10. Examples of the adjustments in the size and the accuracy rate. Fig. 11. Examples of the adjustments in the error type. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

11 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 11 Fig. 12. The error rate results of the two sets of the experiments: (a) biased error; (b) random error. Fig. 13. Contour of De: (a) biased error; (b) random error. data is too low, larger sizes of the recovered testing dataset result in even worse performance, i.e., the upper-left corner. Meanwhile, contour curve with value of 0 can be considered as the performance boundary, i.e., any point left to the boundary stands for that the added testing data degrade the original recognition system, while for any point right to the boundary, the recovered testing data can benefits the final recognition performance. Furthermore, this boundary provides us a criterion to decide under what data quality the recovered testing data should be added to the training set. In fact, we can use cross-validation to obtain this contour to provide a criterion for this purpose. Figs. 14 and 15 show several snapshots of the error performance with the fixed accuracy rate and size of the added dataset, respectively. In each subfigure of Fig. 14, the accuracy rate of the added dataset is fixed. The x-axis represents the size of the added testing data increasing from 6 to 1, and the y-axis represents the error rate performance of the final system. In each subfigure of Fig. 15, the size of the added dataset is fixed. The x-axis represents the accuracy rate of the added testing data improving from 0% to 100%, and the y-axis represents the error rate performance of the final system. The results from both error type methods discussed in Section 5.2 (random error and biased error) are showed in each figure, as well as those from the traditional approach based only on the training set. From Fig. 14 one can see, if the accuracy rate of the added testing data is below %, the final recognition performance will indeed decrease with the increase of the number of augmented data. On the other hand, if the recovered accuracy is above %, then increasing the size of augmented data will benefit the final recognition process. If the accuracy rate of the added testing data is between % and %, increasing the size brings worse performance initially then further increasing the size benefits the performance. When the size of the added testing data is fixed as shown in Fig. 15, then increasing in the accuracy rate of the added testing data always benefits the final recognition. Also, another interesting phenomenon observed is that the results from random errors are always better than those from biased errors, except when the number of the added data is very large (i.e., the last subfigure in Fig. 15), both random error and biased error methods give the same level of performance (the two lines are overlapped). Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

12 12 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 100 0% % % 70 % 70 random error biased error traditional method % 55% 55 % 65% % % 90% 100% Fig. 14. Some snapshots of the error performance with respect to the size when the accuracy rate is fixed in each subfigure. 6. Conclusion We propose the LIFT framework for face recognition problems in this paper. The key idea of this approach is to reinforce the final learning system based on the extra information learned from the testing data distribution. In order to effectively explore such useful information from the testing data, we use one-against-all technique to recover the labels of the testing examples. By adding the recovered testing examples into the training set, a more reliable and robust hypothesis can be developed based on the expanded training set. Neural network with multi-layer perceptron and support vector machines with three different kernels are integrated into the proposed learning framework. Furthermore, we investigate two variants of the proposed algorithm by integrating two other transductive methods, consistency method and LRGA method into the LIFT framework. Simulation results on five face benchmarks, including the Yale database, the extended Yale face database B, Cambridge ORL face database, the CMU PIE face database and the Japanese Female Facial Expression database, are used to demonstrate the effectiveness and robustness of the proposed learning methodology. There are several interesting directions that can be further studied. For instance, different feature extraction methods, such as ICA, FLD, and others, can be integrated into the LIFT framework. The influence of different feature extraction methods on classification accuracy and robustness of LIFT framework is an interesting future direction. Second, our framework requires good data quality of the recovered testing data. Therefore, the method used to recover the labels of the testing data is critical for this method. In our current study, we adopt the one-against-all technique in our experiments. It would be interesting to study other mechanisms for the label recovery process for the proposed LIFT framework. For instance, some of the existing semi-supervised learning methods, such as cotraining, self-training, among others, may be integrated into this framework to facilitate the learning process. Furthermore, large scale empirical study of the proposed method across different types of benchmarks will be necessary to fully justify the effectiveness of this framework across different application domains. Currently, we are investigating all these aspects and new results will be reported in future research publications. Motivated by our results in this paper, we believe the essential idea of LIFT, that is to say, the usage of testing data to reinforce the final decision-making process, may provide the community a new angle to address this issue, and can potentially be a powerful method for a wide range of real-world applications. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

13 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] biased error traditional method random error Fig. 15. Some snapshots of the error performance with respect to the accuracy rates when the size is fixed in each subfigure. References [1] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent component analysis, IEEE Transaction on Neural Networks 13 (6) (02) [2] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) [3] M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning Research 7 (06) [4] A. Beygelzimer, J. Langford, B. Zadrozn, Weighted one against all, in: Proceedings of the th National Conference on Artificial Intelligence (AAAI), 05, pp [5] A. Blum, S. Chawla, Learning from labeled and unlabeled data using graph mincuts, in: Proceedings of the International Conference on Machine Learning (ICML 01), 01, pp [6] A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Workshop on Computational Learning Theory (COLT 98), 1998, pp [7] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) [8] R. Brunelli, T. Poggio, Face recognition: features versus templates, IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (10) (1993) [9] Y. Cao, H. He, Learning from testing data: a new view of incremental semisupervised learning, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN 08), 08, pp [10] O. Chapelle, B. Schölkopf, A. Zien, Semi-Supervised Learning, MIT press, 06. [11] O. Chapelle, V. Sindhwani, S.S. Keerthi, Branch and bound for semi-supervised support vector machines, in: Proceedings of Neural Information Processing Systems (NIPS 06), 06, pp [12] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces: a survey, Proceedings of the IEEE 83 (5) (1995) [13] D.-Q. Dai, P.C. Yuen, Face recognition by regularized discriminant analysis, IEEE Transactions on System, Man, and Cybernetics, Part B: Cybernetics 37 (4) (07) [14] Four face databases in matlab format, [Online], available: / uiuc.edu/homes/dengcai2/data/facedata.htmls. [15] M.J. Er, W. Chen, S. Wu, High-speed face recognition based on discrete cosine transform and RBF neural networks, IEEE Transaction on Neural Networks 16 (3) (05) [16] Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the International Conference on Machine Learning, 1996, pp [17] Y. Freund, R.E. Schapire, Decision-theoretic generalization of on-line learning and application to boosting, Journal of Computer and System Sciences 55 (1) (1997) [18] Y. Fu, Z. Li, J. Yuan, Y. Wu, T.S. Huang, Locality versus globality: query-driven localized linear models for facial image computing, IEEE Transactions on Circuitsand Systems for Video Technology (T-CSVT) 18 (12) (08) [19] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, From few to many: illumination cone models for face recognition under variable lighting and pose, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (6) (01) [] G. Guo, S.Z. Li, K. Chan, Face recognition by support vector machines, in: Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 00. [21] H. He, J.A. Starzyk, A self organizing learning array system for power quality classification based on wavelet transform, IEEE Transactions on Power Delivery 21 (06) [22] E. Hjelmås, B.K. Low, Face detection: a survey, Computer Vision and Image Understanding 83 (3) (01) [23] JAFFE Download, [Online], available: / htmls. [24] K. Jonsson, J. Kittler, Y.P. Li, J. Matas, Support vector machines for face authentication, in: T. Pridmore, D. Elliman (Eds.), BMVC 99, 1999, pp [25] K. Jonsson, J. Matas, J. Kittler, Y. P. Li, Learning support vectors for face verification and recognition, in: Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 00. [26] K.-C. Kwak, W. Pedrycz, Face recognition using fuzzy integral and wavelet decomposition method, IEEE Transactions on System, Man, and Cybernetics, Part B: Cybernetics 34 (4) (04) [27] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, W. Konen, Distortion invariant object recognition in the dynamic link architecture, IEEE Transactions on Computers 42 (1993) [28] C. Liu, H. Wechsler, Gabor feature based classification using the enhanced fisher linear discrimination model for face recognition, IEEE Transactions on Image Processing 11 (4) (02) [29] Y. Liu, Y.F. Zheng, One-against-all multi-class classification using reliability measures, in: Proceedings of 05 IEEE International Joint Conference on Neural Networks, IJCNN 05, 05, pp [] R. Lotlikar, R. Kothari, Fractional-step dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (6) (00) Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

14 14 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] [31] J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Face recognition using LDA-based algorithms, IEEE Transactions on Neural Networks 14 (1) (03) [32] M.J. Lyons, J. Budynek, S. Akamatsu, Automatic classification of single facial images, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (12) (1999) [33] D. Masip, J. Vitria, Shared feature extraction for nearest neighbor face recognition, IEEE Transactions on Neural Networks 1 (4) (08) [34] I. Miller, J.E. Fruend, Probability and Statistics for Engineers, Prentice-Hall, Englewood Cliffs, NJ, [] D.J. Miller, H.S. Uyar, A mixture of experts classifier with learning based on both labelled and unlabelled data, in: Proceedings of Neural Information Processing Systems (NIPS 97), 1997, pp [36] T. Mitchell, The role of unlabeled data in supervised learning, in: Proceedings of the International Colloquium on Cognitive Science, [37] T. Mitchell, The discipline of machine learning, Technical Report, CMU-ML , Carnegie Mellon University, 06. [38] K. Nigam, A.K. McCallum, S. Thrun, T. Mitchell, Text classification from labeledand unlabeled documents using EM, Machine Learning 3 (2 3) (00) [39] S.L. Phung, A. Bouzerdoum, A pyramidal neural network for visual pattern recognition, IEEE Transactions on Neural Networks 18 (2) (07) [] K. Potter, Methods for presenting statistical information: the box plot, in: H. Hagen, A. Kerren, P. Dannenmann (Eds.), Visualization of Large and Unstructured Data Sets, (LNI), Vol. S-4, 06, pp [41] F. Roli, G.L. Marcialis, Semi-supervised PCA-based face recognition using selftraining, in: D.-Y. Yeung, J.T. Kwok, A.L.N. Fred, F. Roli, D. Ridder (Eds.), Structural, Syntactic, and Statistical Pattern Recognition, (SSPR/SPR), Springer, 06, pp [42] C. Rosenberg, M. Hebert, H. Schneiderman, Semi-supervised self-training of object detection models, in: Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION 05), 1(05) [43] F. Samaria, A. Harter, Parameterisation of a stochastic model for human faceidentification, in: Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision, Sarasota, 1994, pp [44] A. Sarkar, Applying co-training methods to statistical parsing, in: Proceedings of the North American Chapter of the Association for Computational Linguistics on Language Technologies (NAACL 01), 01, pp [] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, and expression (PIE) database, in: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, 02, pp [46] P. Stone, M. Veloso, Using testing to iteratively improve training, in: Working Notes of the AAAI 1995 Fall Symposium on Active Learning, 1995, pp [47] J.W. Tukey, Exploratory Data Analysis, Addison-Wesley, Reading, MA, [48] M. Turk, A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience 13 (1) (1991) [49] H. Wang, S. Yan, T. Huang, J. Liu, X. Tang, Misalignment-robust face recognition, ACM Computing Surveys (CSUR) (4) (03) [] M. Wang, X.S. Hua, L.R. Dai, Y. Song, Enhanced semi-supervised learning for automatic video annotation, in: Proceedings of the IEEE International Conference on Multimedia and Expo, 06. [51] D. Xu, S. Yan, Semi-supervised bilinear subspace learning, IEEE Transactions on Image Processing 18 (7) (09) [52] D. Xu, S. Yan, S. Lin, T.S. Huang, S.-F. Chang, Enhancing bilinear subspace learning by element rearrangement, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (10) (09) [53] D. Xu, S. Yan, J. Luo, Face recognition using spatially constrained earth mover s distance, IEEE Transactions on Image Processing 17 (11) (08) [54] D. Xu, S. Yan, L. Zhang, S. Lin, T.S. Huang, Convergent 2D subspace learning with null space analysis, IEEE Transactions on Circuits Systems for Video Technology 18 (12) (08) [55] D. Xu, S. Yan, L. Zhang, S. Lin, H.-J. Zhang, T.S. Huang, Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis, IEEE Transactions on Circuits Systems for Video Technology 18 (1) (08) [56] Y. Yang, D. Xu, F. Nie, J. Luo, Y. Zhuang, Ranking with local regression and global alignment for cross media retrieval, in: Proceedings of the Seventeen ACM International Conference Multimedia, 09, pp [57] H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data with application to face recognition, Pattern Recognition 34 (10) (01) [58] D. Zhang, W.S. Lee, Validating co-training models for web image classification, in: Proceedings of SMA Annual Symposium, NUS, 05. [59] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature survey, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 08, 08. [] Z.H. Zhou, M. Li, Tri-training: exploiting unlabeled data using three classifiers, IEEE Transactions on Knowledge and Data Engineering 17 (11) (05) [61] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, in: S. Thrun, L. Saul (Eds.), Advances in Neural Information Processing Systems, vol. 16, MIT Press, Cambridge, MA, USA, 04, pp [62] D. Zhou, B. Schölkopf, T. Hofmann, Semi-supervised learning on directed graphs, in: Proceedings of Neural Information Processing Systems (NIPS 05), 05, pp [63] D. Zhou, J. Weston, A. Gretton, O. Bousquet, B. Schölkopf, Ranking on data manifolds, MPI Technical Report (113), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 03. [64] X. Zhu, Semi-supervised learning literature survey, Technical Report: TR-15, Department of Computer Sciences, University of Wisconsin at Madison, 07. Yuan Cao received the B.E. and M.S. degrees from Zhejiang University, China, in 01 and 04, respectively, and the M.S. degree from Oklahoma State University, Stillwater, in 07, all in electrical engineering. He is currently a Ph.D candidate in computer engineering at Stevens Institute of Technology, Hoboken. His current research interests include pattern recognition, machine learning, and data mining. Haibo He received the B.S. and M.S. degrees in electrical engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1999 and 02, respectively, and the Ph.D. degree in electrical engineering from Ohio University, Athens, in 06. From 06 to 09, he was an assistant professor in the Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey. He is currently an assistant professor in the Department of Electrical, Computer, and Biomedical Engineering at the University of Rhode Island, Kingston, Rhode Island. His research interests include self-adaptive intelligent systems, machine learning and data mining, computational intelligence, VLSI and FPGA design, and smart grid. He has served regularly on the organization committees and the program committees of many international conferences and has also been a reviewer for the leading academic journals in his fields. He has also served as a guest editor for several international journals. Currently, he is an Associate Editor of the IEEE Transactions on Neural Network, Editor of the IEEE Transactions on Smart Grid, and the Editor of the IEEE Computational Intelligence Society (CIS) Electronic Letter (E-letter). (He) Helen Huang received a B.S. from the School of Electronic and Information Engineering at Xi an Jiao- Tong University, China in 00 and a M.S. and Ph.D. degree from the Harrington Department of Bioengineering, Arizona State University in 02 and 06, respectively. She worked as a post-doc research associate in the Neural Engineering Center for Artificial Limbs at the Rehabilitation Institute of Chicago from 06 to 08. She is currently an assistant professor of the Department of Electrical, Computer, and Biomedical Engineering at the University of Rhode Island. Dr. Huang s primary research interests include neural-machine interface, modeling and analysis of neuromuscular control of movement in normal and neurologically disordered humans, virtual reality in neuromotor rehabilitation, and design and control of therapeutic robots, orthoses, and prostheses. Her specialties lie in machine learning, adaptive control, biomechanical modeling, signal and image processing, and motion analysis. She is a member of the IEEE Medicine and Biology Society and the Society for Neuroscience. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom