Neurocomputing ] (]]]]) ]]] ]]] Contents lists available at ScienceDirect. Neurocomputing. journal homepage:

Size: px
Start display at page:

Download "Neurocomputing ] (]]]]) ]]] ]]] Contents lists available at ScienceDirect. Neurocomputing. journal homepage: www.elsevier."

Transcription

1 Neurocomputing ] (]]]]) ]]] ]]] Contents lists available at ScienceDirect Neurocomputing journal homepage: LIFT: A new framework of learning from testing data for face recognition Yuan Cao a, Haibo He b,, He (Helen) Huang b a Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 070, USA b Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA article info Article history: Received July 09 Received in revised form 23 September 10 Accepted 18 October 10 Communicated by D. Xu Keywords: Face recognition Semi-supervised learning One-against-all Feature extraction Data quality abstract In this paper, a novel learning methodology for face recognition, LearnIng From Testing data (LIFT) framework, is proposed. Considering many face recognition problems featured by the inadequate training examples and availability of the vast testing examples, we aim to explore the useful information from the testing data to facilitate learning. The one-against-all technique is integrated into the learning system to recover the labels of the testing data, and then expand the training population by such recovered data. In this paper, neural networks and support vector machines are used as the base learning models. Furthermore, we integrate two other transductive methods, consistency method and LRGA method into the LIFT framework. Experimental results and various hypothesis testing over five popular face benchmarks illustrate the effectiveness of the proposed framework. & 10 Elsevier B.V. All rights reserved. 1. Introduction Recently, many new theories and methodologies for face recognition have been developed in the community, and many new algorithms and practical tools have been designed and successfully applied to a wide range of applications, such as biometrics, surveillance, human computer interface, information security, and others [12,59,49,18]. Generally, the face recognition problem aims to identify or verify one or more persons in still or video images of a scene based on a stored database. Three important subtasks are generally considered in the solution to a face recognition problem: face segmentation/detection, feature extraction, and face recognition/ identification. For instance, face segmentation/detection aims to detect and localize an unknown number (if any) of faces from a simple or complex background in a still image or video stream data [22]. In this paper, we focus on the latter two phases and address the face recognition problem as to predict the identity label of a given still face image by designing an effective learning method. Feature extraction is an important step for a successful face recognition approach. Generally speaking, feature extraction techniques for face recognition can be categorized into two types [8]: feature-based matching and template matching (also called holistic [15]) methods. In the feature-based matching approach, the most characteristic face components (eyes, nose, mouth, chin, etc.) Corresponding author. addresses: ycao@stevens.edu (Y. Cao), he@ele.uri.edu (H. He), huang@ele.uri.edu (H. (Helen) Huang) and their features (colors, shapes, positions, etc.) are recognized and extracted within a face image. In the template matching approach, important features are extracted from the images and are represented as a bidimensional matrix of intensity values. Refs. [8] and [15] argue that although feature-based approach has many advantages, such as robust performance against rotation/scale and illumination variations, fast computation, and efficient memory utilization, its performance heavily relies on the facial feature detection methods and the quality of individual facial features. On the other hand, since many feature extraction methods, such as the principal component analysis (PCA) [48], independent component analysis (ICA) [1], Fisher s linear discriminant (FLD) or linear discriminant analysis (LDA) [2], and Gabor wavelets [27], have been applied to face recognition problems, the template matching approach has attracted significantly growing attention in the community. For instance, many variants of the aforementioned basic feature extraction methods have been extensively proposed in literature, including the fractional-step linear discriminant analysis (F-LDA) method [], the direct linear discriminant analysis (D-LDA) method [57], the direct fractional-step linear discriminant analysis (DF-LDA) method [31], regularized discriminant analysis (RDA) method [13], among others. In F-LDA [], the concept of fractional dimensionality was introduced and integrated into an incremental dimensionality reduction procedure based on linear discriminant analysis. Due to computational constraints, the traditional LDA approach was performed in the low-dimensional PCA subspace, which may result in a loss of significant discriminatory information contained in the discarded null space. Therefore, D-LDA was proposed to process data directly in the original high-dimensional input space by modifying the simultaneous /$ - see front matter & 10 Elsevier B.V. All rights reserved. doi: /j.neucom Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

2 2 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] diagonalization procedure of the LDA method [57]. DF-LDA [31] combines the strengths of the D-LDA and the F-LDA methods by conducting D-LDA and F-LDA sequentially. Based on LDA, RDA was developed in [13] by using a regularized discriminate scheme instead of optimizing the Fisher index. This scheme was used to solve the small sample-size problem and evaluated on three popular databases, ORL, Yale, and Feret database. However, this method suffers the drawback of high computational load. In [27], the Gabor wavelets were used for face recognition within a dynamic link architecture (DLA) framework. Gabor jets were first calculated and then used for a flexible template comparison between the resulting image decompositions. A Gabor Fisher classifier was proposed in [28], in which an augmented Gabor feature vector was derived from the Gabor wavelet representation of the face images. This method is illustrated to be robust to the variations in illumination and facial expression. Recently, new approaches and methods have been presented in literature. In [52], an iterative algorithm was proposed to rearrange elements of the data matrix in order to maximize intramatrix correlation. This algorithm was extended to supervised learning problems and simulation results demonstrated the effectiveness of the proposed algorithms. Another iterative method, adaptive regularization-based semi-supervised discriminant analysis with tensor representation (ARSDA/T) and its vector-based variant (ARSDA/V) were presented in [51]. In these algorithms, graph Laplacian regularization was used based on the data representation in the low-dimensional feature space. In [54], two null space-based schemes, NS2DLDA and NS2DMFA, extended from the traditional 2-dimensional LDA and MFA algorithms were presented. The proposed schemes could solve the convergence problem and the experiment results over CMU PIE and FERET datasets showed superior performance of the proposed methods when compared to the traditional approaches. Spatially constrained earth mover s distance (SEMD) was used in [53] in order to improve the robustness of the face recognition algorithms against image misalignments. The distances were treated as features in a Kernel Discriminant Analysis framework. Experiments over three benchmark face datasets illustrated the effectiveness of the proposed distance measure approach. In [55], a concurrent subspaces analysis method for object reconstruction and recognition was proposed. A high order tensor object was encoded in multiple concurrent subspaces that were learned by an iterative procedure sequentially. Simulation results on four popular face datasets showed that CSA outperforms the traditional PCA. Another key aspect for face recognition is the underline learning algorithms. For instance, the learning methods such as neural networks, support vector machines (s), k-nearest neighbors (KNN), among others, have been studied extensively in the community for face recognition. For instance, in [15], a high speed face recognition strategy using radial basis function (RBF) neural networks was proposed. This system involves two subtasks: feature extraction based on the discrete cosine transform (DCT) and Fisher s linear discriminant (FLD) analysis, and face classification using RBF neural networks. Simulation results on the Colorado State University (CSU) Face Identification Evaluation System and the Yale Database illustrates that the proposed method can reduce the computational cost and achieve competitive recognition accuracy compare to other face recognition methodologies, such as pseudo-2d hidden Markov models, probabilistic decision-based neural network, among others. A novel neural architecture, PyraNet, was proposed in [39] for visual pattern recognition. PyraNet consists of two layers: a pyramidal layer used for feature extraction and reduction, and a one-dimensional layer used for classification. Five training methods including gradient descent, gradient descent with momentum, resilient backpropagation, Polak Ribiere conjugate gradient, and Levenberg Marquadrt are analyzed in PyraNet for visual pattern recognition. In [], binary s are used to tackle the face recognition problem. One-against-one technique and a bottom-up binary tree structure were employed to solve a multiclass face recognition problem. Refs. [24] and [25] also discussed s in the context of face recognition. The authors argued that s can capture the relevant discriminatory information from the training data and provide superior learning performance compare to other classification methods such as Euclidean distance and normalized correlation method. However, if the data have been preprocessed by some feature extraction techniques that can capture the discriminatory information, such as FLD, s may not be able to outperform other classification methods. In [33], nearest neighbor classifier was developed for face recognition, in which a new linear feature extraction technique was proposed by transferring the problem of finding the optimal linear projection matrix in feature extraction to a classification problem that is solved by AdaBoost algorithm and multitask learning theory. In [26], a face recognition scheme was proposed by combining wavelet decomposition, Fisherface method, and fuzzy integral. The wavelet decomposition and Fisherface method are used to extract important features from the image, and fuzzy integral method is used to combine multiple classifiers that are trained on different subspaces generated by the wavelet decomposition. The effectiveness of the proposed method is indicated by simulation results over the Chungbuk National University face database and Yale database. In this paper, we assume a learner is provided with limited training data, whereas a large amount of testing data. Under this scenario, we propose a new learning methodology, Learning From Testing data (LIFT) framework, by exploring useful information from the accessible testing data to facilitate the final decisionmaking processes. Meanwhile, since the proposed framework has a modularized structure, the method in each module of the framework can be replaced by other approaches. For instance, in this paper, we substitute the one-against-all strategy with two other transductive learning methods, consistency method [63,61] and LRGA method [56], in the data selection step of the proposed framework. Consistency method and LRGA method employ manifold learning and are developed based on the global patterns and local structures in the data. The rest of the paper is organized as follows. Section 2 formulates the problem addressed in this paper. Section 3 presents the details of the LIFT approach for face recognition problems. System level framework and a learning algorithm are proposed in this section. Experimental results on five popular face databases and statistical analysis of these results are presented in Section 4 to show the effectiveness of this method. Meanwhile, the performance of the variants with consistency method and LRGA method over the five face datasets are presented. In Section 5, a brief analysis of the data quality is provided. Finally, a conclusion and a brief discussion on future research directions are outlined in Section Problem formulation In the traditional face recognition problems, we generally assume an adequate and representative training data is available to develop the decision boundaries for future prediction. However, in many realworld applications, collecting and acquisition of labeled face images is often expensive and time consuming. Meanwhile, such image collection process normally requires the efforts of experienced human annotators, which is not suitable for the automated face recognition systems. This introduced the semi-supervised learning scenario [64,42]. Generally speaking, the key idea of semi-supervised learning is to exploit the unlabeled training examples together with the labeled ones to modify and refine the hypothesis to improve learning accuracy [64,37,10,]. For instance, self-training methods [] first develop an initial classifier with labeled data examples alone. Then this classifier is used to recover the unlabeled data examples and append them to the labeled data to retrain the classifier. This Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

3 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 3 procedure is repeated, and each time the most confident unlabeled examples will be labeled with the estimated labels. In this way, the classifier uses its own knowledge to teach itself iteratively. Some other representative work includes co-training methods [36,58,6,44], semi-supervised support vector machines [11,3],graph-based methods [62,5], EM with generative mixture models [38,], among others. In many applications, it is not uncommon that only scarce labeled training images are available originally, whereas large amount of unlabeled images become available at the online testing stage. For instance, in [46], an active learning paradigm was presented, in which the testing phase was regarded as the beginning of a machine learning experiment instead of the end in the traditional approaches. Therefore, the testing data were used iteratively to evaluate the learning process and, if necessary, construct the new training dataset in order to cover the instance space more completely. This learning paradigm was illustrated with a case study of the robotic soccer game. A semisupervised PCA-based face recognition algorithm based on selftraining was proposed in [41]. Unlabeled images are used to update the eigenspace and the templates in order to improve the performance of the face recognition systems. In our previous study, we proposed an iterative learning strategy of the incremental semisupervised learning problem by adaptively recovering the labels for the testing data that become available incrementally [9]. Motivated by these ideas, in this paper we consider the following face recognition problem: given inadequate labeled training images, can one use the unlabeled testing images to improve the recognition performance? To address this problem, we propose a novel face recognition framework by recovering the labels of the testing images and moving the most confidently recovered testing images into the training set to facilitate learning and recognition. To our best knowledge, this is the first study to regard the unknown testing images as a new source of information in the face recognition problems. One of the main contributions of this paper is that we provide a new direction of understanding the semi-supervised learning. Compared to our previous work in [9], in this article we develop a general-purpose learning framework by combining the feature extraction/reduction method, such as PCA, the one-against-all strategy, and the computational intelligence methods, such as neural networks and support vector machines as discussed in this paper. These techniques enable the proposed framework to deal with the high-dimensional and multiple-class databases effectively, which makes the proposed framework suitable for most face recognition problems. We investigate the use of the proposed method in the context of the face recognition problems, and test it on five popular face databases to illustrate its effectiveness. In this work, we also analyze several aspects of the data quality of the recovered testing data, such as the size, the accuracy rate, and the error type, in detail and show the impact of these attributes of the recovered data on the performance of the final decision-making processes for face recognition. Consider an original training dataset D tr with n tr samples, which can be represented as fx q,y q g,q ¼ 1,...,n tr, where x q is a face image sample and y q AY ¼f1,...,Cg is the subject identity label associated with x q. We assume that the testing dataset D te with n te images is available without the identity labels, i.e., D te can be represented as fx p g,p ¼ 1,...,n te. Moreover, we assume that n te is much greater than n tr. Due to the inadequate training dataset, the hypothesis built on D tr cannot provide satisfactory prediction performance. Therefore, the objective here is to design an effective learning framework to exploit the useful information from the testing samples in order to improve the face recognition performance. Before we proceed to the details of the proposed framework, we would like to note the major differences between the problem we aim to tackle in this paper compared to the traditional semi-supervise learning problems. In the traditional semi-supervised learning scenario, one aims to exploit the unlabeled training examples to benefit the learning process, which can be accomplished based on the labeled training data information. In this paper, we are interested in finding potential useful information from the testing data to improve the decision-making process. In addition, we investigate the impact of the data quality of the recovered testing data on the accuracy performance of the face recognition system. From our observation, three attributes of the recovered testing dataset greatly affect the system performance, including the sample sizes, the error rates, and the error types. Various experiments on five frequently used face benchmarks are used to demonstrate the effectiveness of the proposed framework by using neural networks and s with different kernel functions as thebaseclassifier. 3. LIFT: learning from testing data framework We propose the LIFT learning framework as illustrated in Fig. 1. Briefly speaking, the LIFT framework consists of three phases: feature extraction, data selection, and final training. All the images first go through a preprocessing procedure for dimensionality reduction to facilitate learning. Then the one-against-all technique with a base learning model is developed to estimate the identity Fig. 1. The proposed LIFT system diagram. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

4 4 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] labels of the testing images. By adding the most confident testing images with the estimated labels into the original training dataset, the final classification hypothesis will be built on the expanded training dataset. We will present this system in detail in the following sections Feature extraction Similar to many of the existing face classification research, we use the PCA method for feature extraction to handle the highdimensional facial database. In PCA, feature extraction is conducted on the original face images in order to find the subset of basis images and in this new feature space, the original images can be represented by the coordinates that are uncorrelated [48]. According to the well-known eigenface methods proposed in [48], we generate the eigenfaces that correspond to the eigenvectors associated with the dominant eigenvalues of the facial image covariance matrix. These eigenfaces define the new feature space that greatly reduces the dimensionality of the original space, which allows the efficient learning from the reduce feature space. Specifically, we take the first N principal components and generate the subspaces with dimensionality of N for all face images. After this phase, each face image is expressed as a pair of fx i,y i g, where x i is an image vector with size of N, and for the images in the training set, y i AY ¼f1,...,Cg is the class identity label associated with x i, while for those in the testing set, y i ¼0 stands for an unknown label. In our current experiments, we set N to 100. We would like to note that other dimensional reduction methods can also be integrated into the proposed LIFT framework. Interested readers can find more details on this issue in [1,2] Data selection Due to the inadequate training samples, the classifier obtained based on such limited training data may not provide accurate and robust classification performance. The key question is whether one can take advantage of the testing data itself to benefit the learning process. To this end, we propose to use the one-against-all technique to estimate and recover the labels of the testing images to augment thetrainingdatasize. One-against-all method [4,29] is a standard technique to solve multiclass classification problems by transforming a multiclass classification problem to multiple binary classification problems. By focusing on one class each time, the one-against-all method can provide well-suited classification capability. For class label i, we partition the training dataset D tr into two subsets: D i tr that contains all the examples with label i and D i tr that contains all the examples that do not belong to class i. All the examples in D i tr are labeled as 1 and all the examples in D i tr are labeled as 2. Then a hypothesis h i is trained based on the newly labeled training data. Once the hypothesis h i is developed, all the testing examples are applied to the hypothesis to predict if these examples belong to class i or not. If the recovered label is 1, then we consider that this example may belong to class i and this example is added to the recovered testing dataset D i re,otherwise, this example is skipped for class i and may be evaluated for other class labels. Note that any testing example that is predicted to two or more different classes will be excluded from the recovered testing datasets. Finally, the recovered testing datasets for all labels are combined to form the recovered testing dataset D re as D re ¼ D 1 re [ D2 re [...[ DC re. Fig. 2 illustrates an example of the one-against-all technique by considering the class 3 data of the Yale face database used in this paper. We first divide all the training images into two groups, those that belong to class 3 and those that do not belong to class 3. The hypothesis h 3 is trained to predict the class 3 and non-class 3 examples. Then, this hypothesis is used to evaluate the testing dataset. Those that are Fig. 2. An example of the one-against-all technique. (a) The decision boundary of hypothesis h 3 based on the training data; (b) the prediction of the testing data based on the hypothesis h 3. classified as class 3, i.e., the images left to the h 3 decision boundary, are added to the training dataset. One should note that due to the limited learning capability of the learning method used to generate h i,thefalse positive failure (FP) (i.e., predicting an example as class i when the correct label is not i) and false negative failure (FN) (i.e.,predictingan example not as class i when the correct label is i) may occur when evaluating the testing data. For instance, the circled image left to the decision boundary h 3 is a false positive example. The correct label of this image is class 2 but it is recovered incorrectly as class 3. On the other hand, the circled image right to the h 3 boundary is a false negative misclassification, i.e., the correct label of this image is class 3 while it is misclassified not to class 3. Due to the existence of the false positive and false negative failures, the recovered data from the testing dataset may contain some misclassified examples. In this case, the inaccurate information learned from the testing data may undermine the final prediction performance. We will discuss the impact of the quality of the recovered data on the classification performance of the system in further detail in Section Final training Based on the discussions in Sections 3.1 and 3.2, we have three datasets, the training dataset D tr, the testing dataset D te, and the recovered dataset D re. We add D re into D tr and form an augmented training set ^D tr : ^D tr ¼ D tr [ D re. Based on ^D tr, we develop the final hypothesis h f for the final face recognition. The objective of the LIFT framework is to design an effective learning methodology to exploit the useful information from the testing samples in order to improve the face recognition performance. The main procedure of the learning framework is summarized as follows. Algorithm 1 (LIFT-Learning Algorithm). Input: Initial training dataset D tr ¼fx q,y q g,q ¼ 1,...,n tr, where x q is a face image sample and y q AY ¼f1,...,Cg is the subject identity label associated with x q ; Available testing dataset D te ¼fx p g,p ¼ 1,...,n te ; Recovered testing dataset D re that is empty initially D re ¼ F; Learning algorithm Learn1 used in the data selection phase; Learning algorithm Learn2 used to generate the final hypothesis; Procedure: Preprocessing (feature extraction) (1) PCA is performed to all images in D tr and D te, the first N principal components are chosen to transform all images into vectors with length N; Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

5 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 5 Learning from testing data (data selection) Do for each potential class label i,iay ¼f1,...,Cg: (1) Partition D tr into D i tr and Di tr where D i tr ¼ffx k,y k g : fx k,y k gad tr,y k ¼ ig D i tr ¼ffx,y g : fx,y gad tr,y aig (2) Label all examples in Di tr as class 1 and all examples in Di tr as class 2 and form a binary classification training set, ^D i tr ¼ffx k,1g : y k ¼ ig[ffx,2g : y aig. (3) Train Learn1 on ^D i tr and return a hypothesis h i. (4) Apply the testing dataset D te to h i, and return the predicted labels ^y q. Add all testing examples that are predicted to class 1(i.e., class i), D i re ¼ffx q,ig : x q AD te, ^y q ¼ ig, into D re : D re ¼ D re [ D i re ð2þ Final training (1) Exclude any testing example that is recovered to two or more different classes from D re. (2) Combine D tr and D re to form the augmented training dataset ^D tr : ^D tr ¼ D tr [ D re ð3þ (3) Train Learn2 on ^D tr and return the final hypothesis H. Output: the final hypothesis H. Fig. 3 visualizes the main procedure of the proposed LIFT learning framework. In the traditional face recognition approaches, the training images and the testing images are handled separately. Normally, the hypothesis is obtained only based on the training images and applied to the testing images to predict the labels. Instead, the LIFT learning framework proposed in this paper aims to take advantage of the vast amount of the unlabeled testing images and learn from this unknown information. The recovered testing data can be integrated into the training process and largely improve the accuracy and robustness of the final hypothesis. Since many face recognition problems are characterized by the inadequate labeled training images and the large amount of testing images, we believe that this idea may provide important new insights to the face recognition applications. One should also note that LIFT is a general learning framework allowing a wide range of choices of the base classification models for Learn1 and Learn2. For instance, different kinds of the base learning algorithms, such as neural networks, s, decision tree, among others, can be integrated into this framework. Furthermore, the users can also choose different learning schemes for Learn1 and Learn2 in different applications. For example, when only weak learners that can merely do better than random guessing are available, then bootstrap aggregating (bagging) or boosting algorithm can be employed to construct a much stronger learner from these weak learners [16,17,7]. This provides the flexibility of using this framework as a general learning methodology in a wide range of real-world applications. ð1þ 4. Experimental results and analysis Five popular face benchmarks, the Yale face (YALE) database [2], the extended Yale face (EYB) database B [19], Cambridge ORL face (ORL) database [43], the CMU PIE face (PIE) database [] and the Japanese Female Facial Expression (JAFFE) database [32], are used to investigate the performance of the proposed LIFT learning framework. We obtained these databases from [14] and [23].TheYaleface database contains 165 grayscale face images for 15 persons with 11 images for each person. The extended Yale face database B consists of 2414 images of 38 subjects around 64 near frontal images under different illuminations per individual. The ORL database of faces consists of 0 images, which are 10 different images for distinct subjects. The CMU-PIE database contains images of 68 subjects around 170 images for each subject. The JAFFE database contains 213 images of seven facial expressions (angry, disappointed, fearful, happy, sad, surprised and neutral) posed by 10 Japanese female models. Each image in the first four databases is originally cropped into pixels and expressed as a 1024-dimensional vector. For JAFFE, the images are originally pixels. We first crop the images in JAFFE into pixels, and then use a 4 4averagefilter to reduce the JAFFE to pixels that are represented as a 1024-dimensional vector as the other four datasets. Table 1 summarizes the benchmark characteristics and Fig. 4 shows some examples of the databases used in the experiments. Because PCA is independent on the identity labels of the data, we can first apply the PCA paradigm on all the images to extract the face features from each image and compress the image vectors significantly. In our current experiments, we choose the first 100 principle components and transform each image to a 100-dimensional vector. Then, we randomly partition the whole image dataset into a training dataset and a testing set. For example, in the Yale Face database, for each subject, we randomly select three images as the training samples and use the remaining eight images as testing images. Therefore, a total of images are used for training and the other 1 images for testing. Table 2 shows the configurations of the training sets and the testing sets for the five image databases. Because some face datasets are not well-balanced, the numbers of the testing samples for each class in these datasets may not be the Table 1 The benchmark characteristics used in this paper. # example # class # feature YALE EYB ORL PIE JAFFE Fig. 3. Block diagram of the main procedure of the proposed LIFT learning framework. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

6 6 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] Fig. 4. Examples of the five face databases used in the experiments. (a) The Yale face database; (b) the extended Yale face database B; (c) the ORL database; (d) the CMU-PIE database; (e) the JAFFE database. Table 2 The configuration of training set and testing set in each experiment. The training set The testing set Each class Total Each class Total YALE EYB ORL PIE JAFFE Table 3 Testing error performance comparison (in percentage). NN (Linear kernel) (Poly. kernel) (RBF kernel) Trad. LIFT Trad. LIFT Trad. LIFT Trad. LIFT YALE EYB ORL PIE JAFFE same. For example, for PIE benchmark, there is only 126 testing samples for the class 38 subject, whereas 1 for other classes; for EYB benchmark, the numbers of the testing samples for each class range from 44 to 49; for JAFFE benchmark, the numbers of the testing samples for each class range from 16 to 19. In our experiments, we verify our framework by using two different sets of the base algorithms, the neural networks with multi-layer perceptron (MLP) structure and. Moreover, we adopt the same base algorithms for Learn1 and Learn2. In other words, in the first set of experiments, we use neural networks for Learn1 and Learn2, in which the number of hidden neuron is, and the number of input neurons and output neurons are equal to the number of dimensions and classes for each dataset, respectively. Sigmoid function is used for the activation function and backpropagation is used to train the network. Parameter settings for the neural networks include a learning rate of and a training cycle of 00. In the second set of experiments, we use (linear kernel, polynomial kernel with degree of 3, and radial basis function (RBF)) for both Learn1 and Learn2. We compare our LIFT learning framework to the traditional learning scheme in which the final hypothesis is generated only based on the training dataset with the same base learning model. Tables 3 and 4 show the averaged testing error performance and error standard deviations of the results of the 100 random runs for the LIFT algorithm as well as the traditional learning method for the five benchmarks. From this table, one can see that the proposed LIFT framework can provide better classification accuracy performance over the traditional learning method. Table 4 Testing error standard deviation (in percentage). In order to further investigate the performance improvement of the proposed framework over the tradition method, we compare the statistical characteristics of the results of all the 100 runs from both methods by using two difference testing schemes, hypothesis testing of the average values and box plot. In the hypothesis testing, we calculate the mean and the standard deviation that are shown in Tables 3 and 4 using the following equations [34,21]: m ¼ 1 n X n i ¼ 1 NN err i (Linear kernel) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n P n i ¼ 1 s ¼ err2 i ðp n i ¼ 1 err iþ 2 nðn 1Þ (Poly. kernel) (RBF kernel) Trad. LIFT Trad. LIFT Trad. LIFT Trad. LIFT YALE EYB ORL PIE JAFFE ð4þ ð5þ Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

7 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 7 We formulate the hypothesis as: Null hypothesis: H 0 : m 1 ¼ m 2 Alternative hypothesis: H 1 : m 1 am 2 Table 5 Hypothesis testing results of LIFT and traditional method. NN (Linear kernel) (Poly. kernel) ð6þ ð7þ (RBF kernel) YALE EYB ORL PIE JAFFE The test statistic is calculated as follows: Z ¼ m m 1 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffi s 2 1 n 1 þ s2 2 n 2 For a two-tailed test, we will reject H 0 if jzjo2:33. (2.33 is for a two-tailed test where the results are significant at a level of 0.02). Table 5 shows the hypothesis testing results. From Table 5 we can see all results are greater than 2.33 and most of them are even greater than 10. Therefore, we accept the alternative hypothesis H 1, which means there is statistically significant difference in the classification performance of the traditional method and the proposed LIFT framework. In others words, LIFT can significantly improve the recognition performance of the tradition method. The Boxplot method is a standard technique to depict groups of numerical data by presenting their 5-number summary including the minimum and maximum range values, the upper and lower quartiles, and the median [47,]. We have investigated the boxplot results for all the five face image sets for the tradition method and LIFT framework using MLP and s (linear kernel, polynomial kernel, and RBF kernel). Figs. 5 8 provide several snapshots of this analysis for the EYB database using the four base learners. The left parts of these figures are the error ð8þ 55 Traditional LIFT run 15 NN_Traditional NN_LIFT Fig. 5. Error performance and Boxplot results on the EYB database using NN Traditional LIFT run _Traditional (linear kernel) _LIFT (linear kernel) Fig. 6. Error performance and Boxplot results on the EYB database using (linear kernel). Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

8 8 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] Traditional LIFT run _Traditional (polynomial kernel) _LIFT (polynomial kernel) Fig. 7. Error performance and Boxplot results on the EYB database using (polynomial kernel) Traditional LIFT run 12 _Traditional (RBF kernel) _LIFT (RBF kernel) Fig. 8. Error performance and Boxplot results on the EYB database using (RBF kernel). Table 6 Numerical characteristics of the LIFT framework and the traditional strategy on the EYB database (In percentage). NN (Linear kernel) (Poly. kernel) (RBF kernel) Trad. LIFT Trad. LIFT Trad. LIFT Trad. LIFT Largest non-outlier Upper quartile Median Lower quartile Smallest non-outlier rates of the 100 runs and the right parts are the boxplot results of the 100 runs. Table 6 summarizes the corresponding numerical results of the box plot methods on the EYB database. One can see each numerical result of the LIFE framework is smaller than that of the traditional method. In other words, these statistical analysis results indicate that the proposed LIFT framework can greatly improve the classification performance over the traditional learning method. As we discussed in Section 1, the proposed framework has a modularized structure, which means that the methods used in the modules of the framework can be replaced seamlessly. For instance, in the data selection step, instead of using the oneagainst-all strategy, we can use other transductive learning algorithms to explore useful information from the testing dataset. In this work, we investigate the use of the consistency method [63,61] Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

9 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 9 and LRGA method [56] in the data selection step. We use the JAFFE database and (linear kernel, polynomial kernel, and RBF kernel) as learners in the final training step to compare the performance of the LIFT framework with consistency method and LRGA method to the traditional approach in terms of the error rates, standard deviation of the error rates, and hypothesis testing in Table 7. Cross-validation is used to find the parameters used in these two methods. For consistency method, a is set to 0.25, which is consistent with the discussion in [56], and s is set to For LRGA, k is set to 2, and l is set to 10. In Fig. 9, we investigate the error performance of LIFT framework with LRGA method using different k values. The simulation results show that in this particular scenario, better performance can be obtained with a smaller k value. This is probably due to the small size of the datasets and large Table 7 Simulation results of the LIFT framework with consistency method and LRGA method compared to the traditional strategy on the JAFFE database and learners (in percentage). Learner Linear kernel Poly. kernel RBF kernel Trad. Error rate Std Consistency Error rate Std Hypothesis testing vs. Trad LRGA Error rate Std Hypothesis testing vs. Trad number of class categories. From Table 7 one can see that there are significant differences between the two variants of the LIFT framework with the consistency method, LRGA method, and the traditional approach, respectively, with confidence level Furthermore, we compare the proposed framework with a commonly used semi-supervised learning scheme, self-training method [], on the EYB dataset with neural network as the base learner. The experiment is designed in the following way. The EYB dataset is divided to three datasets: the labeled training dataset with 0 images, the unlabeled training dataset with 1057 images, and the testing dataset with the rest 1057 images. In the self-training scheme, we use the labeled training dataset to recover the labels of the unlabeled training dataset, and then with the labeled data and the recovered unlabeled data, a final classifier is trained and applied to the testing dataset. In our learning scheme, only the labeled training data and the testing data are used. Specifically, the labeled training data are used to recover the labels of the testing data, and the information explored from the testing dataset are integrated to the final learning procedure. Table 8 illustrates the simulation results for both methods. We also present the simulation results for the tradition approach, in which only the labeled training data are used to develop the final classifier. Here, and are the hypothesis testing results Table 8 Simulation results of the LIFT framework and self-training scheme compared to the traditional strategy on the EYB database and NN learners (in percentage). Scheme Error rate Standard deviation Hypothesis testing Trad Self-training LIFT Fig. 9. Error performance of the LIFT framework for LRGA method using different k on the JAFFE database and learners. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

10 10 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] among the traditional approach, self-training, and LIFT, respectively, and 2. is the hypothesis testing result between self-training and LIFT. These results show that both the self-training scheme and LIFT outperform the traditional approach. LIFT can achieve better performance compared to the self-training scheme. 5. Data quality for LIFT framework An important question still remains for the proposed framework, i.e., to what extend or under what assumption that the proposed method can benefit the final decision-making processes? In this section, we provide some discussions of the impact of the data quality learned from the testing set on the performance of the face recognition system. Generally speaking, high quality of the recovered data can benefit learning, whereas the recovered testing data with low quality may indeed degrade the performance of the proposed system since it may simply introduce more noise into the learning system. Here we use the YALE benchmark and with a linear kernel as an example to explore how the quality of the recovered data impacts the performance of the LIFT framework. To do this, we explicitly add the recovered testing data into the training set and then train a final hypothesis based on the expanded training set. We adjust three attributes of the added testing dataset, i.e., the sample size, the accuracy rates, and the error type, to investigate their influences to the LIFT framework Size and accuracy rate Fig. 10 shows the adjustments of the sample size and accuracy rates to test their influences to the LIFT framework. The x-axis stands for the size of the recovered testing dataset to be added and the y-axis stands for the accuracy of such recovered testing data. We partition all the testing data (1 samples in this case) into chunks, each chunk with 6 samples. The first k chunks of the testing data are combined to generate the kth block. For example, block 1 contains six samples from chunk 1; block 15 contains 90 samples from chunk 1 to chunk 15. Along the y-axis, we explicitly change the accuracy rates of the recovered class labels from 0% to 100% with step size of 5%. For instance, in block 15 that contains 90 samples, when the accuracy rate is %, then 36 samples will be labeled with correct class labels and the rest 54 samples will be labeled with incorrect class labels Error type When we generate incorrect class labels for the recovered testing data to adjust the accuracy rate, we design two approaches to generate the incorrect labels: random error and biased error. In the random error approach, we randomly pick a class label other than the actual class label for a recovered testing sample. For example, for the recovered testing data x t with an actual class label y t, we randomly generate a label ^y t, ^y t Afy i : y i AY,y i ay t g and use this incorrect label as the recovered label for x t. On the other hand, for the biased error method, we directly use the misclassified label from LIFT framework as the recovered label for x t. Fig. 11 provides an example of how to generate incorrect labels in these two approaches. In this example, we assume that the size of the block is 12 and the accuracy rate is 75%, therefore, the recovered testing set contains nine samples with correct class labels and 3 with incorrect class labels. In the random error approach, all incorrect labels are generated by randomly picking a class label other than the correct label, whereas in the biased error approach, since the last three samples are incorrectly estimated by Learn1 in the LIFT framework in our experiments, then we directly use the incorrect labels estimated by Learn1 as the recovered labels for the last three samples. According to these discussions, Fig. 12 illustrates the results of the two sets of the experiments. In each set of experiment, we evaluate the recognition performance with respect to the changing sample size and accuracy rate of the added testing dataset when different error type methods are adopted. Specifically, in Fig. 12(a), biased error method is used, whereas in Fig. 12(b), random error method is used. In each figure, the x-axis and y-axis are the size and the accuracy of the added testing dataset, respectively, and the z- label is the final classification error rate of the final hypothesis. Since the result of the traditional approach e traditional is a constant value that is independent of the added testing data, therefore, we can draw e traditional as a plane parallel to xy-plane that is illustrated in Fig. 12(a) and (b). From Fig. 12, we can draw the contours of the differences of the performance between the approach based on the augmented training set and the traditional approach, i.e., De ¼ e LIFT e traditional where e LIFT are the error rates of the system based on the augmented training set (the LIFT framework) and e traditional are the error rates of the traditional approach based on the original training dataset. We illustrate the contours of De when biased error and random error method are used in Fig. 13(a) and (b), respectively. From Fig. 13, we can see that if the accuracy rate is high, with the increasing size of the recovered testing dataset, the performance will push to the upper-right corner with larger negative value of De. This means that the proposed LIFT framework can achieve better performance of recognition compare to the traditional approach. However, if the accuracy rate of the added testing Fig. 10. Examples of the adjustments in the size and the accuracy rate. Fig. 11. Examples of the adjustments in the error type. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

11 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 11 Fig. 12. The error rate results of the two sets of the experiments: (a) biased error; (b) random error. Fig. 13. Contour of De: (a) biased error; (b) random error. data is too low, larger sizes of the recovered testing dataset result in even worse performance, i.e., the upper-left corner. Meanwhile, contour curve with value of 0 can be considered as the performance boundary, i.e., any point left to the boundary stands for that the added testing data degrade the original recognition system, while for any point right to the boundary, the recovered testing data can benefits the final recognition performance. Furthermore, this boundary provides us a criterion to decide under what data quality the recovered testing data should be added to the training set. In fact, we can use cross-validation to obtain this contour to provide a criterion for this purpose. Figs. 14 and 15 show several snapshots of the error performance with the fixed accuracy rate and size of the added dataset, respectively. In each subfigure of Fig. 14, the accuracy rate of the added dataset is fixed. The x-axis represents the size of the added testing data increasing from 6 to 1, and the y-axis represents the error rate performance of the final system. In each subfigure of Fig. 15, the size of the added dataset is fixed. The x-axis represents the accuracy rate of the added testing data improving from 0% to 100%, and the y-axis represents the error rate performance of the final system. The results from both error type methods discussed in Section 5.2 (random error and biased error) are showed in each figure, as well as those from the traditional approach based only on the training set. From Fig. 14 one can see, if the accuracy rate of the added testing data is below %, the final recognition performance will indeed decrease with the increase of the number of augmented data. On the other hand, if the recovered accuracy is above %, then increasing the size of augmented data will benefit the final recognition process. If the accuracy rate of the added testing data is between % and %, increasing the size brings worse performance initially then further increasing the size benefits the performance. When the size of the added testing data is fixed as shown in Fig. 15, then increasing in the accuracy rate of the added testing data always benefits the final recognition. Also, another interesting phenomenon observed is that the results from random errors are always better than those from biased errors, except when the number of the added data is very large (i.e., the last subfigure in Fig. 15), both random error and biased error methods give the same level of performance (the two lines are overlapped). Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

12 12 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] 100 0% % % 70 % 70 random error biased error traditional method % 55% 55 % 65% % % 90% 100% Fig. 14. Some snapshots of the error performance with respect to the size when the accuracy rate is fixed in each subfigure. 6. Conclusion We propose the LIFT framework for face recognition problems in this paper. The key idea of this approach is to reinforce the final learning system based on the extra information learned from the testing data distribution. In order to effectively explore such useful information from the testing data, we use one-against-all technique to recover the labels of the testing examples. By adding the recovered testing examples into the training set, a more reliable and robust hypothesis can be developed based on the expanded training set. Neural network with multi-layer perceptron and support vector machines with three different kernels are integrated into the proposed learning framework. Furthermore, we investigate two variants of the proposed algorithm by integrating two other transductive methods, consistency method and LRGA method into the LIFT framework. Simulation results on five face benchmarks, including the Yale database, the extended Yale face database B, Cambridge ORL face database, the CMU PIE face database and the Japanese Female Facial Expression database, are used to demonstrate the effectiveness and robustness of the proposed learning methodology. There are several interesting directions that can be further studied. For instance, different feature extraction methods, such as ICA, FLD, and others, can be integrated into the LIFT framework. The influence of different feature extraction methods on classification accuracy and robustness of LIFT framework is an interesting future direction. Second, our framework requires good data quality of the recovered testing data. Therefore, the method used to recover the labels of the testing data is critical for this method. In our current study, we adopt the one-against-all technique in our experiments. It would be interesting to study other mechanisms for the label recovery process for the proposed LIFT framework. For instance, some of the existing semi-supervised learning methods, such as cotraining, self-training, among others, may be integrated into this framework to facilitate the learning process. Furthermore, large scale empirical study of the proposed method across different types of benchmarks will be necessary to fully justify the effectiveness of this framework across different application domains. Currently, we are investigating all these aspects and new results will be reported in future research publications. Motivated by our results in this paper, we believe the essential idea of LIFT, that is to say, the usage of testing data to reinforce the final decision-making process, may provide the community a new angle to address this issue, and can potentially be a powerful method for a wide range of real-world applications. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

13 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] biased error traditional method random error Fig. 15. Some snapshots of the error performance with respect to the accuracy rates when the size is fixed in each subfigure. References [1] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent component analysis, IEEE Transaction on Neural Networks 13 (6) (02) [2] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) [3] M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning Research 7 (06) [4] A. Beygelzimer, J. Langford, B. Zadrozn, Weighted one against all, in: Proceedings of the th National Conference on Artificial Intelligence (AAAI), 05, pp [5] A. Blum, S. Chawla, Learning from labeled and unlabeled data using graph mincuts, in: Proceedings of the International Conference on Machine Learning (ICML 01), 01, pp [6] A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Workshop on Computational Learning Theory (COLT 98), 1998, pp [7] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) [8] R. Brunelli, T. Poggio, Face recognition: features versus templates, IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (10) (1993) [9] Y. Cao, H. He, Learning from testing data: a new view of incremental semisupervised learning, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN 08), 08, pp [10] O. Chapelle, B. Schölkopf, A. Zien, Semi-Supervised Learning, MIT press, 06. [11] O. Chapelle, V. Sindhwani, S.S. Keerthi, Branch and bound for semi-supervised support vector machines, in: Proceedings of Neural Information Processing Systems (NIPS 06), 06, pp [12] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces: a survey, Proceedings of the IEEE 83 (5) (1995) [13] D.-Q. Dai, P.C. Yuen, Face recognition by regularized discriminant analysis, IEEE Transactions on System, Man, and Cybernetics, Part B: Cybernetics 37 (4) (07) [14] Four face databases in matlab format, [Online], available: / uiuc.edu/homes/dengcai2/data/facedata.htmls. [15] M.J. Er, W. Chen, S. Wu, High-speed face recognition based on discrete cosine transform and RBF neural networks, IEEE Transaction on Neural Networks 16 (3) (05) [16] Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the International Conference on Machine Learning, 1996, pp [17] Y. Freund, R.E. Schapire, Decision-theoretic generalization of on-line learning and application to boosting, Journal of Computer and System Sciences 55 (1) (1997) [18] Y. Fu, Z. Li, J. Yuan, Y. Wu, T.S. Huang, Locality versus globality: query-driven localized linear models for facial image computing, IEEE Transactions on Circuitsand Systems for Video Technology (T-CSVT) 18 (12) (08) [19] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, From few to many: illumination cone models for face recognition under variable lighting and pose, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (6) (01) [] G. Guo, S.Z. Li, K. Chan, Face recognition by support vector machines, in: Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 00. [21] H. He, J.A. Starzyk, A self organizing learning array system for power quality classification based on wavelet transform, IEEE Transactions on Power Delivery 21 (06) [22] E. Hjelmås, B.K. Low, Face detection: a survey, Computer Vision and Image Understanding 83 (3) (01) [23] JAFFE Download, [Online], available: / htmls. [24] K. Jonsson, J. Kittler, Y.P. Li, J. Matas, Support vector machines for face authentication, in: T. Pridmore, D. Elliman (Eds.), BMVC 99, 1999, pp [25] K. Jonsson, J. Matas, J. Kittler, Y. P. Li, Learning support vectors for face verification and recognition, in: Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 00. [26] K.-C. Kwak, W. Pedrycz, Face recognition using fuzzy integral and wavelet decomposition method, IEEE Transactions on System, Man, and Cybernetics, Part B: Cybernetics 34 (4) (04) [27] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, W. Konen, Distortion invariant object recognition in the dynamic link architecture, IEEE Transactions on Computers 42 (1993) [28] C. Liu, H. Wechsler, Gabor feature based classification using the enhanced fisher linear discrimination model for face recognition, IEEE Transactions on Image Processing 11 (4) (02) [29] Y. Liu, Y.F. Zheng, One-against-all multi-class classification using reliability measures, in: Proceedings of 05 IEEE International Joint Conference on Neural Networks, IJCNN 05, 05, pp [] R. Lotlikar, R. Kothari, Fractional-step dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (6) (00) Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

14 14 Y. Cao et al. / Neurocomputing ] (]]]]) ]]] ]]] [31] J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Face recognition using LDA-based algorithms, IEEE Transactions on Neural Networks 14 (1) (03) [32] M.J. Lyons, J. Budynek, S. Akamatsu, Automatic classification of single facial images, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (12) (1999) [33] D. Masip, J. Vitria, Shared feature extraction for nearest neighbor face recognition, IEEE Transactions on Neural Networks 1 (4) (08) [34] I. Miller, J.E. Fruend, Probability and Statistics for Engineers, Prentice-Hall, Englewood Cliffs, NJ, [] D.J. Miller, H.S. Uyar, A mixture of experts classifier with learning based on both labelled and unlabelled data, in: Proceedings of Neural Information Processing Systems (NIPS 97), 1997, pp [36] T. Mitchell, The role of unlabeled data in supervised learning, in: Proceedings of the International Colloquium on Cognitive Science, [37] T. Mitchell, The discipline of machine learning, Technical Report, CMU-ML , Carnegie Mellon University, 06. [38] K. Nigam, A.K. McCallum, S. Thrun, T. Mitchell, Text classification from labeledand unlabeled documents using EM, Machine Learning 3 (2 3) (00) [39] S.L. Phung, A. Bouzerdoum, A pyramidal neural network for visual pattern recognition, IEEE Transactions on Neural Networks 18 (2) (07) [] K. Potter, Methods for presenting statistical information: the box plot, in: H. Hagen, A. Kerren, P. Dannenmann (Eds.), Visualization of Large and Unstructured Data Sets, (LNI), Vol. S-4, 06, pp [41] F. Roli, G.L. Marcialis, Semi-supervised PCA-based face recognition using selftraining, in: D.-Y. Yeung, J.T. Kwok, A.L.N. Fred, F. Roli, D. Ridder (Eds.), Structural, Syntactic, and Statistical Pattern Recognition, (SSPR/SPR), Springer, 06, pp [42] C. Rosenberg, M. Hebert, H. Schneiderman, Semi-supervised self-training of object detection models, in: Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION 05), 1(05) [43] F. Samaria, A. Harter, Parameterisation of a stochastic model for human faceidentification, in: Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision, Sarasota, 1994, pp [44] A. Sarkar, Applying co-training methods to statistical parsing, in: Proceedings of the North American Chapter of the Association for Computational Linguistics on Language Technologies (NAACL 01), 01, pp [] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, and expression (PIE) database, in: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, 02, pp [46] P. Stone, M. Veloso, Using testing to iteratively improve training, in: Working Notes of the AAAI 1995 Fall Symposium on Active Learning, 1995, pp [47] J.W. Tukey, Exploratory Data Analysis, Addison-Wesley, Reading, MA, [48] M. Turk, A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience 13 (1) (1991) [49] H. Wang, S. Yan, T. Huang, J. Liu, X. Tang, Misalignment-robust face recognition, ACM Computing Surveys (CSUR) (4) (03) [] M. Wang, X.S. Hua, L.R. Dai, Y. Song, Enhanced semi-supervised learning for automatic video annotation, in: Proceedings of the IEEE International Conference on Multimedia and Expo, 06. [51] D. Xu, S. Yan, Semi-supervised bilinear subspace learning, IEEE Transactions on Image Processing 18 (7) (09) [52] D. Xu, S. Yan, S. Lin, T.S. Huang, S.-F. Chang, Enhancing bilinear subspace learning by element rearrangement, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (10) (09) [53] D. Xu, S. Yan, J. Luo, Face recognition using spatially constrained earth mover s distance, IEEE Transactions on Image Processing 17 (11) (08) [54] D. Xu, S. Yan, L. Zhang, S. Lin, T.S. Huang, Convergent 2D subspace learning with null space analysis, IEEE Transactions on Circuits Systems for Video Technology 18 (12) (08) [55] D. Xu, S. Yan, L. Zhang, S. Lin, H.-J. Zhang, T.S. Huang, Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis, IEEE Transactions on Circuits Systems for Video Technology 18 (1) (08) [56] Y. Yang, D. Xu, F. Nie, J. Luo, Y. Zhuang, Ranking with local regression and global alignment for cross media retrieval, in: Proceedings of the Seventeen ACM International Conference Multimedia, 09, pp [57] H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data with application to face recognition, Pattern Recognition 34 (10) (01) [58] D. Zhang, W.S. Lee, Validating co-training models for web image classification, in: Proceedings of SMA Annual Symposium, NUS, 05. [59] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature survey, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 08, 08. [] Z.H. Zhou, M. Li, Tri-training: exploiting unlabeled data using three classifiers, IEEE Transactions on Knowledge and Data Engineering 17 (11) (05) [61] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, in: S. Thrun, L. Saul (Eds.), Advances in Neural Information Processing Systems, vol. 16, MIT Press, Cambridge, MA, USA, 04, pp [62] D. Zhou, B. Schölkopf, T. Hofmann, Semi-supervised learning on directed graphs, in: Proceedings of Neural Information Processing Systems (NIPS 05), 05, pp [63] D. Zhou, J. Weston, A. Gretton, O. Bousquet, B. Schölkopf, Ranking on data manifolds, MPI Technical Report (113), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 03. [64] X. Zhu, Semi-supervised learning literature survey, Technical Report: TR-15, Department of Computer Sciences, University of Wisconsin at Madison, 07. Yuan Cao received the B.E. and M.S. degrees from Zhejiang University, China, in 01 and 04, respectively, and the M.S. degree from Oklahoma State University, Stillwater, in 07, all in electrical engineering. He is currently a Ph.D candidate in computer engineering at Stevens Institute of Technology, Hoboken. His current research interests include pattern recognition, machine learning, and data mining. Haibo He received the B.S. and M.S. degrees in electrical engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1999 and 02, respectively, and the Ph.D. degree in electrical engineering from Ohio University, Athens, in 06. From 06 to 09, he was an assistant professor in the Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey. He is currently an assistant professor in the Department of Electrical, Computer, and Biomedical Engineering at the University of Rhode Island, Kingston, Rhode Island. His research interests include self-adaptive intelligent systems, machine learning and data mining, computational intelligence, VLSI and FPGA design, and smart grid. He has served regularly on the organization committees and the program committees of many international conferences and has also been a reviewer for the leading academic journals in his fields. He has also served as a guest editor for several international journals. Currently, he is an Associate Editor of the IEEE Transactions on Neural Network, Editor of the IEEE Transactions on Smart Grid, and the Editor of the IEEE Computational Intelligence Society (CIS) Electronic Letter (E-letter). (He) Helen Huang received a B.S. from the School of Electronic and Information Engineering at Xi an Jiao- Tong University, China in 00 and a M.S. and Ph.D. degree from the Harrington Department of Bioengineering, Arizona State University in 02 and 06, respectively. She worked as a post-doc research associate in the Neural Engineering Center for Artificial Limbs at the Rehabilitation Institute of Chicago from 06 to 08. She is currently an assistant professor of the Department of Electrical, Computer, and Biomedical Engineering at the University of Rhode Island. Dr. Huang s primary research interests include neural-machine interface, modeling and analysis of neuromuscular control of movement in normal and neurologically disordered humans, virtual reality in neuromotor rehabilitation, and design and control of therapeutic robots, orthoses, and prostheses. Her specialties lie in machine learning, adaptive control, biomechanical modeling, signal and image processing, and motion analysis. She is a member of the IEEE Medicine and Biology Society and the Society for Neuroscience. Please cite this article as: Y. Cao, et al., LIFT: A new framework of learning from testing data for face recognition, Neurocomputing (11), doi: /j.neucom

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

L25: Ensemble learning

L25: Ensemble learning L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Adaptive Face Recognition System from Myanmar NRC Card

Adaptive Face Recognition System from Myanmar NRC Card Adaptive Face Recognition System from Myanmar NRC Card Ei Phyo Wai University of Computer Studies, Yangon, Myanmar Myint Myint Sein University of Computer Studies, Yangon, Myanmar ABSTRACT Biometrics is

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,

More information

Capacity of an RCE-based Hamming Associative Memory for Human Face Recognition

Capacity of an RCE-based Hamming Associative Memory for Human Face Recognition Capacity of an RCE-based Hamming Associative Memory for Human Face Recognition Paul Watta Department of Electrical & Computer Engineering University of Michigan-Dearborn Dearborn, MI 48128 watta@umich.edu

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Accurate and robust image superresolution by neural processing of local image representations

Accurate and robust image superresolution by neural processing of local image representations Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica

More information

Online Learning in Biometrics: A Case Study in Face Classifier Update

Online Learning in Biometrics: A Case Study in Face Classifier Update Online Learning in Biometrics: A Case Study in Face Classifier Update Richa Singh, Mayank Vatsa, Arun Ross, and Afzel Noore Abstract In large scale applications, hundreds of new subjects may be regularly

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Face Recognition For Remote Database Backup System

Face Recognition For Remote Database Backup System Face Recognition For Remote Database Backup System Aniza Mohamed Din, Faudziah Ahmad, Mohamad Farhan Mohamad Mohsin, Ku Ruhana Ku-Mahamud, Mustafa Mufawak Theab 2 Graduate Department of Computer Science,UUM

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification 1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,

More information

Open-Set Face Recognition-based Visitor Interface System

Open-Set Face Recognition-based Visitor Interface System Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Network Intrusion Detection using Semi Supervised Support Vector Machine

Network Intrusion Detection using Semi Supervised Support Vector Machine Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

DYNAMIC LOAD BALANCING OF FINE-GRAIN SERVICES USING PREDICTION BASED ON SERVICE INPUT JAN MIKSATKO. B.S., Charles University, 2003 A THESIS

DYNAMIC LOAD BALANCING OF FINE-GRAIN SERVICES USING PREDICTION BASED ON SERVICE INPUT JAN MIKSATKO. B.S., Charles University, 2003 A THESIS DYNAMIC LOAD BALANCING OF FINE-GRAIN SERVICES USING PREDICTION BASED ON SERVICE INPUT by JAN MIKSATKO B.S., Charles University, 2003 A THESIS Submitted in partial fulfillment of the requirements for the

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring 714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: raghavendra_bk@rediffmail.com

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

More information

A comparative study on face recognition techniques and neural network

A comparative study on face recognition techniques and neural network A comparative study on face recognition techniques and neural network 1. Abstract Meftah Ur Rahman Department of Computer Science George Mason University mrahma12@masonlive.gmu.edu In modern times, face

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

A secure face tracking system

A secure face tracking system International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

High-Performance Signature Recognition Method using SVM

High-Performance Signature Recognition Method using SVM High-Performance Signature Recognition Method using SVM Saeid Fazli Research Institute of Modern Biological Techniques University of Zanjan Shima Pouyan Electrical Engineering Department University of

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Methods and Applications for Distance Based ANN Training

Methods and Applications for Distance Based ANN Training Methods and Applications for Distance Based ANN Training Christoph Lassner, Rainer Lienhart Multimedia Computing and Computer Vision Lab Augsburg University, Universitätsstr. 6a, 86159 Augsburg, Germany

More information

Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

A Survey on Pre-processing and Post-processing Techniques in Data Mining

A Survey on Pre-processing and Post-processing Techniques in Data Mining , pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Solving Regression Problems Using Competitive Ensemble Models

Solving Regression Problems Using Competitive Ensemble Models Solving Regression Problems Using Competitive Ensemble Models Yakov Frayman, Bernard F. Rolfe, and Geoffrey I. Webb School of Information Technology Deakin University Geelong, VIC, Australia {yfraym,brolfe,webb}@deakin.edu.au

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE. indhubatchvsa@gmail.com

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE. indhubatchvsa@gmail.com LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE 1 S.Manikandan, 2 S.Abirami, 2 R.Indumathi, 2 R.Nandhini, 2 T.Nanthini 1 Assistant Professor, VSA group of institution, Salem. 2 BE(ECE), VSA

More information

Trading Strategies and the Cat Tournament Protocol

Trading Strategies and the Cat Tournament Protocol M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

FPGA Implementation of Human Behavior Analysis Using Facial Image

FPGA Implementation of Human Behavior Analysis Using Facial Image RESEARCH ARTICLE OPEN ACCESS FPGA Implementation of Human Behavior Analysis Using Facial Image A.J Ezhil, K. Adalarasu Department of Electronics & Communication Engineering PSNA College of Engineering

More information

TIETS34 Seminar: Data Mining on Biometric identification

TIETS34 Seminar: Data Mining on Biometric identification TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland Youming.Zhang@uta.fi Course Description Content

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Lecture 6. Artificial Neural Networks

Lecture 6. Artificial Neural Networks Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Efficient Attendance Management: A Face Recognition Approach

Efficient Attendance Management: A Face Recognition Approach Efficient Attendance Management: A Face Recognition Approach Badal J. Deshmukh, Sudhir M. Kharad Abstract Taking student attendance in a classroom has always been a tedious task faultfinders. It is completely

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Recognition of Facial Expression Using AAM and Optimal Neural Networks

Recognition of Facial Expression Using AAM and Optimal Neural Networks International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-4 E-ISSN: 2347-2693 Recognition of Facial Expression Using AAM and Optimal Neural Networks J.Suneetha

More information

Classification Techniques for Remote Sensing

Classification Techniques for Remote Sensing Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information