Facing Imbalanced Data Recommendations for the Use of Performance Metrics
|
|
|
- Jessica Hamilton
- 10 years ago
- Views:
Transcription
1 Facing Imbalanced Data Recommendations for the Use of Performance Metrics László A. Jeni, Jeffrey F. Cohn, 2, and Fernando De La Torre Carnegie Mellon University, Pittsburgh, PA, 2 University of Pittsburgh, Pittsburgh, PA, Abstract Recognizing facial action units (AUs) is important for situation analysis and automated video annotation. Previous work has emphasized face tracking and registration and the choice of features classifiers. Relatively neglected is the effect of imbalanced data for action unit detection. While the machine learning community has become aware of the problem of skewed data for training classifiers, little attention has been paid to how skew may bias performance metrics. To address this question, we conducted experiments using both simulated classifiers and three major databases that differ in size, type of FACS coding, and degree of skew. We evaluated influence of skew on both threshold metrics (Accuracy, F-score, Cohen s kappa, and Krippendorf s alpha) and rank metrics (area under the receiver operating characteristic (ROC) curve and precision-recall curve). With exception of area under the ROC curve, all were attenuated by skewed distributions, in many cases, dramatically so. While ROC was unaffected by skew, precision-recall curves suggest that ROC may mask poor performance. Our findings suggest that skew is a critical factor in evaluating performance metrics. To avoid or minimize skewbiased estimates of performance, we recommend reporting skew-normalized scores along with the obtained ones. I. INTRODUCTION Our everyday communication is highly influenced by the emotional information available to us from other people. Recognizing facial expression is important for situation analysis and automated video annotation. In the last decade many approaches have been proposed for automatic facial expression recognition [7], [29]. Although, previous work has emphasized face tracking and registration and the choice of feature classifiers, relatively neglected is the effect of imbalanced data when evaluating action unit detection. In the case of facial expression data, the samples can be annotated using either emotion-specified labels (e.g., happy or sad) or action units, as defined by the Facial Action Coding System (FACS) []. Action units are anatomically defined facial actions that singly or in combinations can describe nearly all possible facial expressions or movements. Action unit (AU) detection, as well as expression detection of which AU detection is a subset, is a typical binary classification problem where the vast majority of examples are from one class, but the practitioner is typically interested in the minority (positive) class. The problem of learning from imbalanced data sets is twofold. First of all, from the perspective of classifier training, imbalance in training data distribution often causes learning algorithms to perform poorly on the minority class. This issue has been well addressed in the machine learning literature [4], [5], [27], [26], [8] A common solution is to sample the data prior to training to re-balance the class distribution [2], [27]. An alternative to sampling is to use cost-sensitive learning. This approach targets the problem of skew by applying different cost matrices that describe the costs for misclassifying any particular data point [26], [8]. For a more detailed survey on the problem see [6] and the references therein. Relatively little attention has been paid to how skew may spoil performance metrics. Facial expression data is typically highly skewed. Imbalance in the test data distribution might produce misleading conclusions with certain metrics. Percentage agreement, referred to as accuracy, is especially vulnerable to bias from skew. When base rate is low, high accuracy can result even when alternative methods rarely if ever agree [2], [4]. Agreement in that case is about the very large number of negative cases rather than the very few positive ones. Alternative metrics have been proposed to address this issue [24], [5]. Ferri et al. studied the relationship between different performance metrics and address the problem of rank correlations between them [2]. How does skewed data influence performance metrics for action unit detection? To address this question, we conducted experiments using both simulated classifiers and three major databases that include both posed and spontaneous facial expression and differ in database size, type of FACS coding [9], [], and degree of skew. The databases were Cohn- Kanade [2], RU-FACS [3], and UNBC-McMaster Pain Archive [22]. We included a broad range of metrics that included both threshold metrics (Accuracy, F -score, Cohen s kappa, and Krippendorf s alpha) and rank metrics (area under the ROC curve [] and precision-recall curve). With exception of area under the ROC curve, all were attenuated by skewed distributions; in many cases, dramatically so. Alpha and kappa were affected by skew in either direction; whereas F -score was affected by skew only in one direction. While
2 ROC was unaffected by skew, precision-recall curves can reveal differences between classifiers, because of the different visual representation of the curves. Very different precisionrecall can be associated with same ROC. Our findings suggest that skew is a critical factor in evaluating performance metrics. Metrics of classifier performance may reveal more about skew than they do about actual performance. Databases that are otherwise identical with respect to intensity of action units, head pose, and so on may give rise to very different metric values depending only on differences in skew. This finding has implications for testing classifiers so as to avoid or minimize confounds and for meta-analyses of classifier performance. Sensitivity of the threshold metrics for skewed distributions could be reduced by balancing the distribution of datasets. The paper is built as follows. Datasets and their properties are reviewed in Section 2. Theoretical components are described in Section 3. Experimental results on the effect of imbalanced data on performance metrics and AU classification are detailed in Section 4. Discussion and a summary conclude the paper (Section 5). II. DATASETS First, we describe the datasets (Section II.A-C). We then report findings with respect to skew for each AU (Section II.D). In our simulations we used three major databases that include both posed and spontaneous facial expression and differ in database size, type of FACS coding, and degree of skew. The databases were Cohn-Kanade, RU-FACS and UNBC-McMaster Pain Archive. occasions. The distribution has 2 video sequences with frames from 25 participants. All of the frames were FACS coded for 2 AU by certified FACS coders and have frame level pain scores, sequence-level self-report, and observer measures. Facial landmarks (66-point mesh) were tracked using person-specific active appearance models [28]. C. RU-FACS Database The version of RU-FACS available to us consisted of unscripted (i.e., spontaneous) facial behavior from 34 participants. Participants had been randomly assigned to either lie or tell the truth about an issue for which they had strong feelings. The scenario involves natural interaction with another person. AUs were manually coded for each video frame. Video from five participants had to be excluded due to excessive noise in the digitized video. Thus, video from 29 participants was used. Facial landmarks were tracked using a 68-point mesh using same AAM implementation [3]. D. Imbalance in the Datasets Action unit classification is a typical two-class problem. The positive class is the given action unit that we want to detect, and the negative class contains all of the other examples. Unless databases have been contrived to minimize skew, skew is quite common. Most facial actions have relatively low rates of occurrence. Smile controls, actions that counteract the upward pull of a smile (e.g., AU 4 or AU Table I DATABASE STATISTICS. FOR MORE DETAILS, SEE TEXT. A. Cohn-Kanade Extended The Cohn-Kanade Extended Facial Expression (CK+) Database [2] is an extension of the original Cohn-Kanade Database [8]. Cohn-Kanade has been widely used to compare the performance of different methods of automated facial expression analysis. CK+ includes 593 frontal image sequences of directed facial action tasks (i.e., posed AU and AU combinations) performed by 23 different participants. Facial landmarks (68-point mesh) were tracked using person-specific active appearance models [28]. Twentyseven action units were manually coded for presence or absence by certified FACS coders. For a subset of 8 sequences, the seven universal emotion expressions (anger, contempt, disgust, fear, happy, sad and surprise) plus neutral were labeled. We used all 593 sequences for the current study. B. McMaster Pain Archive The Pain Archive [22] consists of facial expressions of 29 participants who were suffering from shoulder pain. The participants performed different active and passive motion tests with their affected and unaffected limbs on two separate
3 5), occur less than 3% of the time even in a highly social context [25]. Thus, for action unit detection, the number of positive training examples will often be small, which can result in large imbalance between the positive and negative examples. While skew in training sets can be adjusted by under-sampling negative cases, skew in test sets remains. The imbalance of this type of data can be defined by the skew ratio between the classes: = negative examples positive examples, () Table I shows the skew ratios of action units from the three datasets. In the small, posed CK+ dataset, the average skew ratio is around 3. In the case of larger, spontaneous datasets the skew ratio is even more extreme: about 6 in the Pain Archive and over 8 in RU-FACS. III. METHODS We tuned high precision shape-based AU classifiers in each dataset. Details of the methods are presented in Section III.A. To evaluate the effect of skew on the classifiers, we used a broad range of both threshold and rank metrics. These are described in Section III.B. In Section III.C we describe random sampling methods to balance the distribution of the testing partition of the datasets. A. Training AU Classifiers Our method contains two main steps. First, we estimate 3D landmark positions on face images using a 2D/3D AAM method [23]. We describe the details of this technique in Section III-A. Second, we remove the rigid transformation from the acquired 3D shape and perform an SVM-based binary classification on it using the different AUs as the class labels. We show this method in Section III-A2 and III-A3. ) Active Appearance Models: As noted above, each of the datasets had been tracked using person-specific AAM. AAMs are generative parametric models for face alignment. A 3D shape model is defined by a 3D mesh and in particular the 3D vertex locations of the mesh, called landmark points. Consider the 3D shape as the coordinates of 3D vertices that make up the mesh: x = (x,y,z,...,x M,y M,z M ) T, (2) or, x = (x,...,x M ) T, where x i = (x i,y i,z i ) T. We have T samples: {x(t)} T t=. We assume that apart from scale, rotation, and translation all samples {x(t)} T t= can be approximated by means of the linear principal component analysis (PCA). The interested reader is referred to [23] for the details of the 2D/3D AAM algorithm. 2) Extracted Features: To register face images, 3D structure from motion first was estimated using the method of Xiao et al. [28]. We then extracted the normalized 3D shape parameters by removing the rigid transformation. Next, we performed a personal mean shape normalization [7]. We calculated an average shape for each subject (the so called personal mean shape) and computed the differences between the features of the actual shape and the features of the personal mean shape. This step removes within-person variation. 3) Support Vector Machine for AU Detection: After extracting the normalized 3D shape, we performed an SVMbased binary-class classification using each AU in turn as the positive class labels. Negative labels were all other AU. Support Vector Machines (SVMs) are powerful for binary and multi-class classification as well as for regression problems. They are robust against outliers []. For two-class separation, SVM estimates the optimal separating hyperplane between the two classes by maximizing the margin between the hyper-plane and closest points of the classes. The closest points of the classes are called support vectors. They determine the optimal separating hyper-plane, which lies at half distance between them. We are given sample and label pairs (x (k),y (k) ) with x (k) R m, y (k) {,}, and k =,...,K. Here, for class (class 2) y (k) = (y (k) = ). Assume further that we have a set of feature vectors φ(= [φ ;...;φ M ]) : R m R M, where M might be infinite. The support vector classification seeks to minimize the cost function min w,b,ξ 2 wt w+c K ξ i (3) i= y (k) (w T φ(x (k) )+b) ξ i,ξ i. (4) We used binary-class classification for each AU, where the positive class contains all shapes labelled by the given AU, and the negative class contains every other shapes. In all cases, we used only linear classifiers and also varied the regularization parameter by factors of ten from 4 to 2. B. Performance Metrics In a binary classification problem the labels are either positive or negative. The decision made by the classifier can be represented as a 2 2 confusion matrix. The matrix has four categories: True positives (TP) are examples correctly labeled as positives. False positives (FP) refer to negative examples incorrectly labeled as positive. True negatives (TN) correspond to negatives correctly labeled as negative and false negatives (FN) refer to positive examples incorrectly labeled as negative. Using these categories we can derive two performance metrics: the precision (P = TP TP+FP ) and the recall (R = TP TP+FN ) values of the classifier. Precision is
4 the fraction of recognized instances that are relevant, while recall is the fraction of relevant instances that are retrieved. For the comparison we used both threshold metrics (Accuracy, F -score, Cohen s kappa, and Krippendorf s alpha) and rank metrics (area under the ROC curve and precisionrecall curve). ) Threshold Metrics: The threshold metrics used in this paper are Accuracy, F -score, Cohen s kappa, and Krippendorf s alpha. These metrics have a threshold level, where examples above the threshold are predicted as positive and the rest as negative. For these metrics, it is not important how close a prediction is to the level, only if it is above or below threshold. Accuracy is the percentage of the correctly classified positive and negative examples: Acc = TP +TN TP +FP +TN +FN. (5) Accuracy is a widely used metric for measuring the performance of a classifier, however, when the prior probabilities of the classes are very different, this metric can be misleading. A better choice is F -score, which can be interpreted as a weighted average of the precision and recall values: F = 2 P R P +R Cohen s kappa is a coefficient developed to measure agreement among observers [6]. It shows the observed agreement normalized to the agreement by chance: (6) K = P Obs P Chance P Chance. (7) Krippendorff s α-reliability measures the observed disagreement normalized to the observed disagreement [9], [2]: α = D Obs D Chance. (8) 2) Rank Metrics: The rank metrics depend only on the ordering of the cases, not the actual predicted values. As long as ordering is preserved, it makes no difference whether predicted values fall between different intervals. These metrics measure how well the positive cases are ordered before negative cases and can be viewed as a summary of model performance across all possible thresholds. The rank metrics we use are area under the ROC curve (AUC-ROC) and area under Precision-Recall curve (AUC-PR). The ROC curve depicts the true positive rate as the function of the false positive rate, while the Precision-Recall curve shows the precision as the function of recall. Recall is the same as TPR, whereas Precision measures that fraction of examples classified as positive that are truly positive. C. Normalization using Random Sampling Different forms of re-sampling such as random over- and under-sampling can be used to balance the skewed distribution of the test partitions of the dataset before calculating the performance metrics. Random under-sampling tries to balance the class distribution through the random elimination of majority class examples. The major drawback of random under-sampling is that this method can discard examples that could be important for the performance metric. In this paper we used random under-sampling with averaging: first, we under-sample the majority class, then calculate the performance metrics. We repeat the process in the function of the skew present in the data. IV. EXPERIMENTS We executed a number of evaluations to judge the influence of the skewed distributions on the performance metrics. Studies concern (i) simulated classifiers with given relative misclassification rates, (ii) the effect of the skewed distributions on performance scores using different databases for AU classification. A. Experiments on Simulated Classifiers In this experiment we simulated binary classifiers with different properties to understand the effect of the skew on the performance metrics better. The classifiers were different in the relative misclassification rate: a fixed percentage of the positive (and negative) examples were misclassified in proportion to the number of positive (and negative) examples. For example, in the case of the positive examples were labelled as false negatives (FN), and of the negative examples were labelled as false positives (FP). In the case of the threshold metrics, the score was calculated from confusion matrices, while the rank metrics were calculated by drawing random samples from Gaussian distribution representing the decision values of the classifiers. Fig. depicts the different metric scores in the function of the skew ratio. = represents a fully balanced dataset, > shows that the negative samples are the majority, and the < values represent positive sample dominance in the distribution. With the exception of area under the ROC curve, all metrics are attenuated by skewed distributions. Alpha and kappa are affected by skew in either direction; whereas F - score is affected by skew in one direction only. Random performance in the alpha and kappa spaces is equivalent with the value, but in the F -space it changes as a function of skew: in the balanced case ( = ) is associated with.5 score and drops exponentially as skew increases. It is important to note, that even the best (% error rate) classifier s performance drops significantly in the high skew ratio part of the graph ( = 5). This imbalance range
5 Accuracy.6.4 % % Cohen s kappa.6.4 % % AUC ROC.6.4 % % (a) (b) (c) F score % % (d) Krippendorff s alpha % % (e) AUC PR % % (f) Negative examples Figure. The behaviour of different metrics using simulated classifiers. The horizontal axis depicts the skew ratio ( = Positive examples, while the vertical axis shows the given metric score. The metrics are (a): Accuracy, (b): Cohen s kappa, (c) Area Under ROC, (d) F score, (e) Krippendorff s alpha, (f) Area Under PR Curve. The different lines show the relative misclassification rates of the simulated classifiers. is equal or even below the skew ratio present in spontaneous facial behaviour datasets (see Table I). B. Experiments on Real data In this experiment we studied the effect of skewed AU distributions on the CK+, McMaster Pain Archive and RU- FACS. In the case of CK+ dataset we used leave-one-subjectout cross-validation to maximize the data available in the database. In the RU-FACS and Pain dataset for each AU we divided the data into a training and testing set in a way that the skew ratio of the two sets was similar. We calculated F score, kappa, alpha measures and area under ROC and PR curves. Tables II - III show these measures in the columns labelled original. To proceed, we repeated the same procedure, but this time we balanced the distribution of the classes in the testing set using random under-sampling and averaging. The performance scores are depicted in the normalized columns of Tables II - III. From the results, we can draw several observations as follows. First of all, by examining the scores in the imbalanced case of the CK+ dataset, we found that these performances are similar to other shape based methods in the literature [5], [7]. Second, by comparing the skew normalized results to the imbalanced ones, we noticed that (except the area under ROC curve) all scores improved. The average F score increased from.45 to.77 in the case of CK+, from 3 to.68 in the case of RU-FACS and from.7 to.65 on the Pain data. The difference between the scores is the smallest in the case of the CK+ data, because this is the smallest dataset with the smallest skew ratio (around 2) among the three. The improvement is smaller in kappa and alpha: these measures are somewhat more strict and a bit tolerant to the prior distributions of the classes. The differences in the case of the area under PR curve are comparable to the F score improvements. Third, while ROC was unaffected by skew, the precisionrecall curves suggest that ROC may mask poor performance in some cases. V. DISCUSSION AND SUMMARY In the present work, we addressed the question how do imbalanced datasets influence performance metrics. We conducted studies using three major databases that include both posed and spontaneous facial expression and differ in database size, type of FACS coding, and degree of imbalance. The databases were Cohn-Kanade, RU-FACS, and McMaster Pain Archive. We included metrics used in
6 facial behaviour analysis plus some others: we included both threshold metrics (Accuracy, F -score, Cohen s kappa, and Krippendorf s alpha) and rank metrics (area under the ROC curve and precision-recall curve). We used a variety of evaluations to study the influence of imbalanced distribution on performance metrics. We used simulated classifiers and binary SVMs trained on expert annotated datasets as well. We discovered that with exception of area under the ROC curve, all performance metrics were attenuated by imbalanced distributions; in many cases, dramatically so. Alpha and kappa measures were affected by skew in either direction; whereas F -score was affected by skew only in one direction. While ROC was unaffected by skew, precision-recall curves suggest that ROC may mask poor performance. Metrics of classifier performance may reveal more about skew than they do about actual performance. Databases that are otherwise identical with respect to intensity of action units, head pose, and so on may give rise to very different metric values depending only on differences in skew. To avoid or minimize biased estimates of performance metrics, we recommend that investigators report both obtained performance metrics and skew-normalized scores. Alternatively, report both the obtained scores and the degree of skew in databases. In these ways, classifiers can be compared across Table III PERFORMANCE SCORES ON COHN-KANADE EXTENDED. Code to compute skew-normalized scores for all of the metrics considered above and visualizations is available from jeffcohn/skew/ Table II PERFORMANCE SCORES FOR THE ORIGINAL AND THE = NORMALIZED VERSION OF UNBC-MCMASTER PAIN ARCHIVE AND RU-FACS.
7 databases free of confounds introduced by skew. VI. ACKNOWLEDGMENTS Research reported in this publication was supported in part by the National Institute of Mental Health of the National Institutes of Health under Award Number MH9695. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. REFERENCES [] S. Abe: Support vector machines for pattern classification. Springer, (2) 3 [2] R. Akbani, S. Kwek, N. Japkowicz. Applying support vector machines to imbalanced datasets. Machine Learning: ECML 24. Springer Berlin Heidelberg, (24) [3] A. B. Ashraf, S. Lucey, J. F. Cohn, T. Chen, Z. Ambadar, K. M. Prkachin, P. E. Solomon: The painful face Pain expression recognition using active appearance models, Image and Vision Computing 27(2), (29) 2 [4] N. V. Chawla, N. Japkowicz, A. Kotcz. Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6, (June 24), -6. (24) [5] S. W. Chew, P. J. Lucey, S. Lucey, J. Saragih, J. F. Cohn S. Sridharan. Person-independent facial expression detection using constrained local models. In Proceedings of FG 2 Facial Expression Recognition and Analysis Challenge, Santa Barbara, CA, (2) 5 [6] J. Cohen, A coefficient of agreement for nominal scales. Educational and psychological measurement 2(), (96) 4 [7] J. F. Cohn and F. De la Torre. Automated face analysis for affective computing. In Handbook of Affective Computing, R. A. Calvo, S. K. D Mello, J. Gratch, and A. Kappas, Eds., ed New York, NY: Oxford, In press. [8] T. Eitrich, B. Lang. Efficient optimization of support vector machine learning parameters for unbalanced datasets. Journal of computational and applied mathematics 96(2) (26) [9] P. Ekman, W. Friesen: Facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto (978) [] P. Ekman, W. Friesen, J. Hager: Facial action coding system: Research nexus. Network Research Information, Salt Lake City, UT (22) [] T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 26. Elsevier Science Inc., (26) [2] C. Ferri, J. Hernndez-Orallo, R. Modroiu. An experimental comparison of performance measures for classification. Pattern Recognition Letters 3() (29) [3] M. Frank, J. Movellan, M. Bartlett, G. Littleworth. RU-FACS- database, Machine Perception Laboratory, U.C. San Diego [4] V. Garcia, R. A. Mollineda, J. S. Sanchez. Index of balanced accuracy: A performance measure for skewed class distributions. Pattern Recognition and Image Analysis. Springer Berlin Heidelberg, (29) [5] V. Garcia, R. A. Mollineda, J. S. Sanchez. Theoretical analysis of a performance measure for imbalanced data. Pattern Recognition (ICPR), 2 2th International Conference on. IEEE, (2) [6] H. He, E. A. Garcia. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on 2(9), (29) [7] L. A. Jeni, A. Lorincz, T. Nagy, Zs. Palotai, J. Sebok, Z. Szabo, D. Takacs, 3D shape estimation in video sequences provides high precision evaluation of facial expressions, Image and Vision Computing, 3(), February (22) 3, 5 [8] T. Kanade, J. F. Cohn, Y. Tian: Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG ), Grenoble, France, (2) 2 [9] K. Krippendorff. Estimating the reliability, systematic error and random error of interval data. Educational and Psychological Measurement, 3, 6-7. (97) 4 [2] K. Krippendorff. Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks, CA: Sage (24) 4 [2] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion specified expression, In: 3rd IEEE Workshop on CVPR for Human Communicative Behavior Analysis (2), 2 [22] P. Lucey, J. F. Cohn, K. M. Prkachin, P. E. Solomon, I. Matthews. Painful data: The UNBC-McMaster shoulder pain expression archive database. IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG 2), 57 64, 2-25 March (2), 2 [23] I. Matthews and S. Baker. Active appearance models revisited. Int. J. Comp. Vision, 6(2): 35 64, (24) 3 [24] R. Ranawana, V. Palade. Optimized Precision-A new measure for classifier performance evaluation. Evolutionary Computation, 26. CEC 26. IEEE Congress on. IEEE, (26) [25] M. A. Sayette, K. G. Creswell, J. D. Dimoff, C. E. Fairbairn, J. F. Cohn, B. W. Heckman, T. R. Kirchner, J. M. Levine, R. L. Moreland: Alcohol and group formation: A multimodal investigation of the effects of alcohol on emotion and social bonding. Psychological Science 23(8) (22) 3 [26] Y. Tang, Y.-Q. Zhang, N.V. Chawla, S. Krasser. SVMs modeling for highly imbalanced classification. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 39() (29) [27] J. Van Hulse, T. M. Khoshgoftaar, A. Napolitano. Experimental perspectives on learning from imbalanced data. Proceedings of the 24th international conference on Machine learning. ACM, (27) [28] J. Xiao, S. Baker, I. Matthews, T. Kanade: Real-time combined 2D+3D active appearance models. Proceedings of the 24 IEEE computer society conference on Computer vision and pattern recognition, Washington, D.C., USA, , (24) 2, 3 [29] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(), (29)
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Open-Set Face Recognition-based Visitor Interface System
Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe
Using Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, [email protected] Department of Statistics,UC Berkeley Andy Liaw, andy [email protected] Biometrics Research,Merck Research Labs Leo Breiman,
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Face Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University [email protected] Taghi M. Khoshgoftaar
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Random Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Addressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
The face conveys information
[social SCIENCES] Jeffrey F. Cohn Advances in Behavioral Science Using Automated Facial Image Analysis and Synthesis The face conveys information about a person s age, sex, background, and identity; what
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
How To Solve The Class Imbalance Problem In Data Mining
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS 1 A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches Mikel Galar,
Interactive person re-identification in TV series
Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
Predicting Ad Liking and Purchase Intent: Large-scale Analysis of Facial Responses to Ads
1 Predicting Ad Liking and Purchase Intent: Large-scale Analysis of Facial Responses to Ads Daniel McDuff, Student Member, IEEE, Rana El Kaliouby, Jeffrey F. Cohn, and Rosalind Picard, Fellow, IEEE Abstract
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ
Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ David Cieslak and Nitesh Chawla University of Notre Dame, Notre Dame IN 46556, USA {dcieslak,nchawla}@cse.nd.edu
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Performance Measures for Machine Learning
Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, -1/+1, True/False,
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-17-AUC
Discovering Criminal Behavior by Ranking Intelligence Data
UNIVERSITY OF AMSTERDAM Faculty of Science Discovering Criminal Behavior by Ranking Intelligence Data by 5889081 A thesis submitted in partial fulfillment for the degree of Master of Science in the field
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Learning on the Border: Active Learning in Imbalanced Data Classification
Learning on the Border: Active Learning in Imbalanced Data Classification Şeyda Ertekin 1, Jian Huang 2, Léon Bottou 3, C. Lee Giles 2,1 1 Department of Computer Science and Engineering 2 College of Information
ClusterOSS: a new undersampling method for imbalanced learning
1 ClusterOSS: a new undersampling method for imbalanced learning Victor H Barella, Eduardo P Costa, and André C P L F Carvalho, Abstract A dataset is said to be imbalanced when its classes are disproportionately
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Mining Life Insurance Data for Customer Attrition Analysis
Mining Life Insurance Data for Customer Attrition Analysis T. L. Oshini Goonetilleke Informatics Institute of Technology/Department of Computing, Colombo, Sri Lanka Email: [email protected] H. A. Caldera
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
Selecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: [email protected] Mariusz Łapczyński
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms
Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2009 30 June, 1 3 July 2009. A Hybrid Approach to Learn with Imbalanced Classes using
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? Erjin Zhou [email protected] Zhimin Cao [email protected] Qi Yin [email protected] Abstract Face recognition performance improves rapidly
CLASS imbalance learning refers to a type of classification
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, PART B Multi-Class Imbalance Problems: Analysis and Potential Solutions Shuo Wang, Member, IEEE, and Xin Yao, Fellow, IEEE Abstract Class imbalance problems
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
PASSIVE DRIVER GAZE TRACKING WITH ACTIVE APPEARANCE MODELS
PASSIVE DRIVER GAZE TRACKING WITH ACTIVE APPEARANCE MODELS Takahiro Ishikawa Research Laboratories, DENSO CORPORATION Nisshin, Aichi, Japan Tel: +81 (561) 75-1616, Fax: +81 (561) 75-1193 Email: [email protected]
Data Mining Practical Machine Learning Tools and Techniques
Credibility: Evaluating what s been learned Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Issues: training, testing,
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
Class Imbalance Problem in Data Mining: Review
Class Imbalance Problem in Data Mining: Review 1 Mr.Rushi Longadge, 2 Ms. Snehlata S. Dongre, 3 Dr. Latesh Malik 1 Department of Computer Science and Engineering G. H. Raisoni College of Engineering Nagpur,
Decision Support Systems
Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
ROC Graphs: Notes and Practical Considerations for Data Mining Researchers
ROC Graphs: Notes and Practical Considerations for Data Mining Researchers Tom Fawcett Intelligent Enterprise Technologies Laboratory HP Laboratories Palo Alto HPL-23-4 January 7 th, 23* E-mail: [email protected]
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
Distances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
The Relationship Between Precision-Recall and ROC Curves
Jesse Davis [email protected] Mark Goadrich [email protected] Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 2 West Dayton Street,
MHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
Quality and Complexity Measures for Data Linkage and Deduplication
Quality and Complexity Measures for Data Linkage and Deduplication Peter Christen and Karl Goiser Department of Computer Science, Australian National University, Canberra ACT 0200, Australia {peter.christen,karl.goiser}@anu.edu.au
Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition
IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,
FPGA Implementation of Human Behavior Analysis Using Facial Image
RESEARCH ARTICLE OPEN ACCESS FPGA Implementation of Human Behavior Analysis Using Facial Image A.J Ezhil, K. Adalarasu Department of Electronics & Communication Engineering PSNA College of Engineering
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Fraud Detection for Online Retail using Random Forests
Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Review of Ensemble Based Classification Algorithms for Nonstationary and Imbalanced Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. IX (Feb. 2014), PP 103-107 Review of Ensemble Based Classification Algorithms for Nonstationary
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Machine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Consistent Binary Classification with Generalized Performance Metrics
Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation
Class Imbalance Learning in Software Defect Prediction
Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang [email protected] University of Birmingham Research keywords: ensemble learning, class imbalance learning, online learning Shuo Wang
Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition
Send Orders for Reprints to [email protected] The Open Electrical & Electronic Engineering Journal, 2014, 8, 599-604 599 Open Access A Facial Expression Recognition Algorithm Based on Local Binary
IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Getting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand [email protected] ABSTRACT Ensemble Selection uses forward stepwise
Learning with Skewed Class Distributions
CADERNOS DE COMPUTAÇÃO XX (2003) Learning with Skewed Class Distributions Maria Carolina Monard and Gustavo E.A.P.A. Batista Laboratory of Computational Intelligence LABIC Department of Computer Science
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037 medical diagnosis, ontology, subjective intelligence, reasoning, fuzzy rules Hamido FUJITA 1 KNOWLEDGE-BASED IN MEDICAL DECISION
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
