Data Mining Algorithms to Classify Students
|
|
|
- Paul Stevenson
- 10 years ago
- Views:
Transcription
1 Data Mining Algorithms to Classify Students Cristóbal Romero, Sebastián Ventura, Pedro G. Espejo and César Hervás {cromero, sventura, pgonzalez, Computer Science Department, Córdoba University, Spain Abstract. In this paper we compare different data mining methods and techniques for classifying students based on their Moodle usage data and the final marks obtained in their respective courses. We have developed a specific mining tool for making the configuration and execution of data mining techniques easier for instructors. We have used real data from seven Moodle courses with Cordoba University students. We have also applied discretization and rebalance preprocessing techniques on the original numerical data in order to verify if better classifier models are obtained. Finally, we claim that a classifier model appropriate for educational use has to be both accurate and comprehensible for instructors in order to be of use for decision making. 1 Introduction The ability to predict/classify a student s performance is very important in web-based educational environments. A very promising arena to attain this objective is the use of Data Mining (DM) [26]. In fact, one of the most useful DM tasks in e-learning is classification. There are different educational objectives for using classification, such as: to discover potential student groups with similar characteristics and reactions to a particular pedagogical strategy [6], to detect students misuse or game-playing [2], to group students who are hint-driven or failure-driven and find common misconceptions that students possess [34], to identify learners with low motivation and find remedial actions to lower drop-out rates [9], to predict/classify students when using intelligent tutoring systems [16], etc. And there are different types of classification methods and artificial intelligent algorithms that have been applied to predict student outcome, marks or scores. Some examples are: predicting students grades (to classify in five classes: A, B, C, D and E or F) from test scores using neural networks [14]; predicting student academic success (classes that are successful or not) using discriminant function analysis [19]; classifying students using genetic algorithms to predict their final grade [21]; predicting a student s academic success (to classify as low, medium and high risk classes) using different data mining methods [30]; predicting a student s marks (pass and fail classes) using regression techniques in Hellenic Open University data [18] or using neural network models from Moodle logs [11]. In this paper we are going to compare different data mining techniques for classifying students based on both students usage data in a web-based course and the final marks obtained in the course. We have also developed a specific Moodle data mining tool for making this task easier for instructors. The paper is arranged in the following way: Section 2 describes the background of the main classification methods and algorithms; section 3 describes the Moodle data mining tool; section 4 details the comparison of the classification techniques; finally, the conclusions and further research are outlined.
2 2 Background Classification is one of the most frequently studied problems by DM and machine learning (ML) researchers. It consists of predicting the value of a (categorical) attribute (the class) based on the values of other attributes (the predicting attributes). There are different classification methods, such as: - Statistical classification is a procedure in which individual items are placed into groups based on the quantitative information of characteristics inherent in the items (referred to as variables, characters, etc.) and based on a training set of previously labelled items [23]. Some examples of statistical algorithms are linear discriminant analysis [21], least mean square quadratic [27], kernel [21] and k nearest neighbors [21]. - A decision tree is a set of conditions organized in a hierarchical structure [25]. It is a predictive model in which an instance is classified by following the path of satisfied conditions from the root of the tree until reaching a leaf, which will correspond to a class label. A decision tree can easily be converted to a set of classification rules. Some of the most well-known decision tree algorithms are C4.5 [25] and CART [4]. - Rule Induction is an area of machine learning in which IF-THEN production rules are extracted from a set of observations [11]. The algorithms included in this paradigm can be considered as a heuristic state-space search. In rule induction, a state corresponds to a candidate rule and operators correspond to generalization and specialization operations that transform one candidate rule into another. Examples of rule induction algorithms are CN2 [8], AprioriC [17], XCS [32], Supervised Inductive Algorithm (SIA) [31], a genetic algorithm using real-valued genes (Corcoran) [10] and a Grammar-based genetic programming algorithm (GGP) [15]. - Fuzzy rule induction applies fuzzy logic in order to interpret the underlying data linguistically [35]. To describe a fuzzy system completely, a rule base (structure) and fuzzy partitions have to be determined (parameters) for all variables. Some fuzzy rule learning methods are LogitBoost [23], MaxLogitBoost [29], AdaBoost [12], Grammarbased genetic Programming (GP) [28], a hybrid Grammar-based genetic Programming/genetic Algorithm method (GAP) [28], a hybrid Simulated Annealing/genetic Programming algorithm (SAP) [28] and an adaptation of the Wang- Mendel algorithm (Chi) [7]. - Neural Networks can also be used for rule induction. A neural network, also known as a parallel distributed processing network, is a computing paradigm that is loosely modeled after cortical structures in the brain. It consists of interconnected processing elements called nodes or neurons that work together to produce an output function. Examples of neural network algorithms are multilayer perceptron (with conjugate gradient-based training) [22], a radial basis function neural network (RBFN) [5], incremental RBFN [24], decremental RBFN [5], a hybrid Genetic Algorithm Neural Network (GANN) [33] and Neural Network Evolutionary Programming (NNEP) [20].
3 3 Moodle Data Mining Tool We have developed a specific Moodle data mining tool oriented for use by on-line instructors. It has a simple interface (see Figure 1) to facilitate the execution of data mining techniques. We have integrated this tool into the Moodle environment itself. In this way, instructors can both create/maintain courses and carry out all data mining processing with the same interface. Likewise, they can directly apply feedback and results obtained by data mining back into Moodle courses. We have implemented this tool in Java using the KEEL framework [1] which is an open source framework for building data mining models including classification (all the previously described algorithms in Section 2), regression, clustering, pattern mining, and so on. Figure 1. Moodle Data Mining Tool executing C4.5 algorithm. In order to use it, first of all the instructors have to create training and test data files starting from the Moodle database. They can select one or several courses and one Moodle table (mdl_log, mdl_chat, mdl_forum, mdl_quiz, etc.) or create a summary table (see Table 1). Then, data files will be automatically preprocessed and created. Next, they only have to select one of the available mining algorithms and the location of the output directory. For example, in Figure 1, we show the execution of the C4.5 algorithm over a summary file and the decision tree obtained. We can see that the results files (.tra and.test files with partial results and.txt file with the obtained model) appear in a new window (see Figure 1 down in the right hand corner). Finally, instructors can use this model for decision making concerning the suitability of the Moodle activities in each specific course and also to classify new students depending on the course usage data.
4 4 Experimental Results We have carried out some experiments in order to evaluate the performance and usefulness of different classification algorithms for predicting students final marks based on information in the students usage data in an e-learning system. Our objective is to classify students with equal final marks into different groups depending on the activities carried out in a web-based course. We have chosen the data of 438 Cordoba University students in 7 Moodle courses (security and hygienee in the work, projects, engineering firm, programming for enginnering, computer science basis, applied computer science, and scientific programming). Moodle ( is one of the most frequently used free Learning Content Management Systems (LCMS). Moodle keeps detailed logs of all activities that students perform in a data base. Information is available about the use of Moodle activities and resources (assignments, forums and quizzes). We have preprocessed the data in order to transform them into a suitable format to be used by our Moodle data mining tool. First, we have created a new summary table (see Table 1) which integrates the most important information for our objective (Moodle activities and the final marks obtained in the course). Using our Moodle mining tool a particular teacher could select these or other attributes for different courses during the data preprocessing phase. The Table 1 summarises row by row all the activities done by each student in the course (input variables) and the final mark obtained in this course (class). Table 1. Attributes used by each student. Name course n_assigment n_quiz n_quiz_a n_quiz_s n_posts n_read total_time_assignment total_time_quiz total_time_forum mark Description Identification number of the course. Number of assignments done. Number of quizzes taken. Number of quizzes passed. Number of quizzes failed. Number of messages sent to the forum. Number or messages read on the forum. Total time used on assignments. Total time used on quizzes. Total time used on forum. Final mark the student obtained in the course. Secondly, we have discretized all the numerical values of the summary table into a new summarization table. Discretization divides the numerical data into categorical classes that are easier for the teacher to understand. It consists of transforming continuous attributes into discrete attributes that can be treated as categorical attributes. Discretization is also a requirement for some algorithms. We have applied the manual method (in which you have to specify the cut-off points) to the mark attribute. We have used four intervals and labels (FAIL: if value is <5; PASS: if value is >=5 and <7; GOOD: if value is >=7 and <9; and EXCELLENT: if value is >=9). In addition, we have
5 applied the equal-width method [13] to all the other attributes with three intervals and labels (LOW, MEDIUM and HIGH). Then, we have exported both versions of the summary table (with numerical and categorical values) to text files with KEEL format [1]. Next, we have made partitions of whole files (numerical and categorical files) into pairs of training and test files. Each algorithm is evaluated using stratified 10-fold crossvalidation. The dataset is randomly divided into 10 disjointed subsets of equal size in a stratified way (maintaining the original class distribution). In each repetition, one of the 10 subsets is used as the test set and the other 9 subsets are combined to form the training set. In this work we also take into consideration the problem of learning from imbalanced data. We say data is imbalanced when some classes differ significantly from others with respect to the number of instances available. The problem with imbalanced data arises because learning algorithms tend to overlook less frequent classes (minority classes), paying attention just to the most frequent ones (majority classes). As a result, the classifier obtained will not be able to correctly classify data instances corresponding to poorly represented classes. Our data presents a clear imbalance since its distribution is: EXCELLENT 3.89%, GOOD 14.15%, PASS 22.15%, FAIL 59.81%. One of the most frequent methods used to learn from imbalanced data consists of resampling the data, either by over-sampling the minority classes or under-sampling the majority ones, until every class is equally represented [3]. When we deal with balanced data, the quality of the induced classifier is usually measured in terms of classification accuracy, defined as the fraction of correctly classified examples. But accuracy is known to be unsuitable to measure classification performance with imbalanced data. An evaluation measure well suited to imbalanced data is the geometric mean of accuracies per class (g-mean), defined as g mean = n n hitsi ins tan ces i= 1 i, where n is the number of classes, hits i is the number of instances of class i correctly classified and instances i is the number of instances of class i. In our work, we have used random over-sampling, a technique consisting of copying randomly chosen instances of minority classes in the dataset until all classes have the same number of instances, and we use the geometric mean to measure the quality of the induced classifiers. Finally, we have used three sets of 10-fold data files: the original numerical data, the categorical data and the numerical rebalanced data. We have carried out one execution with all the determinist algorithms and 5 executions with the nondeterministic algorithms. In Table 2 we show the global percentage of the accuracy rate and geometric means (the averages of 5 executions for nondeterministic algorithms). We have used the same default parameters for algorithms of the same type (For example, 1000 iterations in evolutionary algorithms and 4 labels in fuzzy algorithms). We have used these 25 specific classification algorithms due to they are implemented in Keel software, but there are some other classification techniquess such as bayesina networks, logistic regression, etc. The global percentage of those correctly classified (global PCC) shows the accuracy of the classifiers (see Table 2). More than half of the algorithms obtain their highest values using original numerical data, and the other algorithms obtain them using the categorical data. This can be due to the nature and implementation of each algorithm which might be more appropriate for using numerical or categorical data. As we have seen above, it is
6 easier to obtain a high accuracy rate when data are imbalanced, but when all the classes have the same number of instances it becomes more difficult to achieve a good classification rate. The best algorithms (with more than 65% global PCC) with original data (numerical) are CART, GAP, GGP and NNEP. The best algorithms (with over 65% global PCC) using categorical data are the two decision tree algorithms: CART and C4.5. The best algorithms (with over 60% global PCC) with balanced data are Corcoran, XCS, AprioriC and MaxLogicBoost. It is also important to note that no algorithm exceeds 70% global percentage of correctly classified results. One possible reason for this is due to the fact that we have used incomplete data, that is, we have used the data of all the students examined although some students who did not do all the course activities did do the final exam. In particular, about 30% of our students have not used the forum or have not done some quizzes. But we have not eliminated these students from the dataset because it shows a real problem about the students usage level of e-learning systems. So, we have used all the data although we know that this fact can affect the accuracy of the classification algorithms. Table 2. Classification results (Global percentage of correctly classified / Geometric Mean). Method Algorithm Numerical data Categorical data Rebalanced data Statistical Classifier ADLinear / / / 0.00 Statistical Classifier PolQuadraticLMS / / / Statistical Classifier Kernel / / / 0.00 Statistical Classifier KNN / / / Decision Tree C / / / 9.37 Decision Tree CART / 39, / 24, / 34,65 Rule Induction AprioriC / / / 0.00 Rule Induction CN / / / Rule Induction Corcoran / / / 0.00 Rule Induction XCS / / / Rule Induction GGP / / / Rule Induction SIA / / / Fuzzy Rule Learning MaxLogitBoost / / / 8.83 Fuzzy Rule Learning SAP / / / 3.20 Fuzzy Rule Learning AdaBoost / / / 0.00 Fuzzy Rule Learning LogitBoost / / / Fuzzy Rule Learning GAP / / / Fuzzy Rule Learning GP / / / Fuzzy Rule Learning Chi / / / Neural Networks NNEP / / / Neural Networks RBFN / / / 4.00 Neural Networks RBFN Incremental / / / Neural Networks RBFN Decremental / / / 8.41 Neural Networks GANN / / / Neural Networks MLPerceptron / / / 17.16
7 The geometric mean tells us about the effect of rebalancing on the performance of the classifiers obtained, since the geometric mean offers us a better view of the classification performance in each of the classes. We can see in Table 2 that the behavior depends to a great extent on the learning algorithm used. There are some algorithms which are not affected by rebalancing (Kernel, KNN, AprioriC, Corcoran, AdaBoost and LogitBoost): the two decision tree methods (CART and C4.5) give worse results with rebalanced data (C4.5) but most of the algorithms (all the rest, 17 out of 25) obtain better results with the rebalanced data. Thus we can see that the rebalancing of the data is generally beneficial for most of the algorithms. We can also see that many algorithms obtain a value of 0 in the geometric mean. This is because some algorithms do not classify any of the students correctly into a specific group. It is interesting to see that it only happens to the group of EXCELLENT students (EXCELLENT students are incorrectly classified as GOOD and PASS students). But in education this is not very dramatic after all since the most important thing is to be able to distinguish perfectly between FAIL students and PASS students (PASS, GOOD and EXCELENT). On the other hand, in our educational problem it is also very important for the classification model obtained to be user friendly, so that teachers can make decisions about some students and the on-line course to improve the students learning. In general, models obtained using categorical data are more comprehensible than when using numerical data because categorical values are easier for a teacher to interpret than precise magnitudes and ranges. Nonetheless, some models are more interpretable than others: - Decision trees are considered easily understood models because a reasoning process can be given for each conclusion. However, if the tree obtained is very large (a lot of nodes and leaves) then they are less comprehensible. A decision tree can be directly transformed into a set of IF-THEN rules that are one of the most popular forms of knowledge representation, due to their simplicity and comprehensibility. So, C4.5 and CART algorithms are simple for instructors to understand and interpret. - Rule induction algorithms are normally also considered to produce comprehensible models because they discover a set of IF-THEN classification rules that are a highlevel knowledge representation and can be used directly for decision making. And some algorithms such as GGP have a higher expressive power allowing the user to determine the specific format of the rules (number of conditions, operators, etc.). - Fuzzy rule algorithms obtain IF-THEN rules that use linguistic terms that make them more comprehensible/interpretable by humans. So, this type of rules is very intuitive and easily understood by problem-domain experts like teachers. - Statistical methods and neural networks are deemed to be less suitable for data mining purposes. This rejection is due to the lack of comprehensibility. Knowledge models obtained under these paradigms are usually considered to be black-box mechanisms, able to attain very good accuracy rates but very difficult for people to understand. However, some of the algorithms of this type obtain models people can understand easily. For example, ADLinear, PolQuadraticLMS, Kernel and NNEP algorithms obtain functions that express the possible strong interactions among the variables.
8 Finally, in our educational problem the final objective of using a classification model is to show the instructor interesting information about student classification (prediction of marks) depending on the usage of Moodle courses. Then, the instructor could use this discovered knowledge for decision making and for classifying new students. Some of the rules discovered show that the number of quizzes passed in Moodle was the main determiner of the final marks, but there are some others that could help the teacher to decide whether to promote the use of some activities to obtain higher marks, or on the contrary, to decide to eliminate some activities because they are related to low marks. It could be also possible for the teacher to detect new students with learning problems in time (students classified as FAIL). The teacher could use the classification model in order to classify new students and detect in time if they will have learning problems (students classified as FAIL) or not (students classified as GOOD or EXCELLENT). 5 Conclusions In this paper we have compared the performance and usefulness of different data mining techniques for classifying students using a Moodle mining tool. We have shown that some algorithms improve their classification performance when we apply such preprocessing tasks as discretization and rebalancing data, but others do not. We have also indicated that a good classifier model has to be both accurate and comprehensible for instructors. In future experiments, we want to measure the compressibility of each classification model and use data with more information about the students (i.e. profile and curriculum) and of higher quality (complete data about students that have done all the course activities). In this way we could measure how the quantity and quality of the data can affect the performance of the algorithms. Finally, we want also test the use of the tool by teachers in real pedagogical situations in order to prove on its acceptability. Acknowledgments. The authors gratefully acknowledge the financial subsidy provided by the Spanish Department of Research under TIN C05-02 projects. FEDER also provided additional funding. References [1] Alcalá, J., Sánchez, L., García, S., del Jesus, M. et. al. KEEL A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. Soft Computing, [2] Baker, R., Corbett, A., Koedinger, K. Detecting Student Misuse of Intelligent Tutoring Systems. Intelligent Tutoring Systems. Alagoas, pp [3] Barandela, R., Sánchez, J.S., García, V., Rangel, E. Strategies for Learning in Class Imbalance Problems. Pattern Recognition 2003, 36(3), pp [4] Breiman, L. Friedman, J.H., Olshen, R.A., Stone, C.J. Classification and Regression Trees. Chapman & Hall, New York, [5] Broomhead, D.S., Lowe, D. Multivariable Functional Interpolation and Adaptative Networks. Complex Systems 11, 1988, pp
9 [6] Chen, G., Liu, C., Ou, K., Liu, B. Discovering Decision Knowledge from Web Log Portfolio for Managing Classroom Processes by Applying Decision Tree and Data Cube Technology. Journal of Educational Computing Research 2000, 23(3), pp [7] Chi, Z., Yan, H., Pham, T. Fuzzy Algorithms: with Applications to Image Processing and Pattern Recognition. World Scientific, Singapore, [8] Clark, P., Niblett, T. The CN2 Induction Algorithm. Machine Learning 1989, 3(4), pp [9] Cocea, M., Weibelzahl, S. Can Log Files Analysis Estimate Learners Level of Motivation? Workshop on Adaptivity and User Modeling in Interactive Systems, Hildesheim, pp [10] Corcoran, A.L., Sen, S. Using Real-valued Genetic Algorithms to Evolve Rule Sets for Classification. Conference on Evolutionary Computation, Orlando, 1994, pp [11] Delgado, M., Gibaja, E., Pegalajar, M.C., Pérez, O. Predicting Students Marks from Moodle Logs using Neural Network Models. Current Developments in Technology- Assisted Education, Badajoz, pp [12] del Jesus, M.J., Hoffmann, F., Junco, L., Sánchez, L. Induction of Fuzzy-Rule-Based Classifiers With Evolutionary Boosting Algorithms. IEEE Transactions on Fuzzy Systems 2004, 12(3), pp [13] Dougherty, J., Kohavi, M., Sahami, M. Supervised and Unsupervised Discretization of Continuous Features. Conf. on Machine Learning, San Francisco, pp [14] Fausett, L., Elwasif, W. Predicting Performance from Test Scores using Backpropagation and Counterpropagation. IEEE Congress on Computational Intelligence, pp [15] Espejo, P.G., Romero, C., Ventura, S. Hervas, C. Induction of Classification Rules with Grammar-Based Genetic Programming. Conference on Machine Intelligence, pp [16] Hämäläinen, W., Vinni, M. Comparison of machine learning methods for intelligent tutoring systems. Conference Intelligent Tutoring Systems, Taiwan, pp [17] Jovanoski, V., Lavrac, N. Classification Rule Learning with APRIORI-C. In: Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving, pp [18] Kotsiantis, S.B., Pintelas, P.E. Predicting Students Marks in Hellenic Open University. Conference on Advanced Learning Technologies. IEEE, pp [19] Martínez, D. Predicting Student Outcomes Using Discriminant Function Analysis. Annual Meeting of the Research and Planning Group. California, pp
10 [20] Martínez, F.J., Hervás, C., Gutiérrez, P.A., Martínez, A.C., Ventura, S. Evolutionary Product-Unit Neural Networks for Classification. Conference on Intelligent Data Engineering and Automated Learning pp [21] Minaei-Bidgoli, B., Punch, W. Using Genetic Algorithms for Data Mining Optimization in an Educational Web-based System. Genetic and Evolutionary Computation, Part II pp [22] Moller, M.F. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Neural Networks, 1993, 6(4), pp [23] Otero, J., Sánchez, L. Induction of Descriptive Fuzzy Classifiers with the Logitboost Algorithm. Soft Computing 2005, 10(9), pp [24] Platt, J. A Resource Allocating Network for Function Interpolation. Neural Computation, 1991, 3(2), pp [25] Quinlan, J.R. C4.5: Programs for Machine Learning. Morgan Kaufman [26] Romero, C., Ventura, S. Educational Data Mining: a Survey from 1995 to Expert Systems with Applications, 2007, 33(1), pp [27] Rustagi, J.S.: Optimization Techniques in Statistics. Academic Press, [28] Sánchez, L., Couso, I., Corrales, J.A. Combining GP Operators with SA Search to Evolve Fuzzy Rule Based Classifiers. Information Sciences, 2001, 136(1-4), pp [29] Sánchez, L., Otero, J. Boosting Fuzzy Rules in Classification Problems under Single-winner Inference (in press). International Journal of Intelligent Systems [30] Superby, J.F., Vandamme, J.P., Meskens, N. Determination of Factors Influencing the Achievement of the First-year University Students using Data Mining Methods. Workshop on Educational Data Mining, pp [31] Venturini, G. SIA A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts. Conf. on Machine Learning, pp [32] Wilson, S.W. Classifier Fitness Based on Accuracy. Evolutionary Computation, 1995, 3(2), pp [33] Yao, X. Evolving Artificial Neural Networks. Proceedings of the IEEE, 1999, 87(9), pp [34] Yudelson, M.V., Medvedeva, O., Legowski, E., Castine, M., Jukic, D., Rebecca, C. Mining Student Learning Data to Develop High Level Pedagogic Strategy in a Medical ITS. AAAI Workshop on Educational Data Mining, pp.1-8.
Mining Wiki Usage Data for Predicting Final Grades of Students
Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University [email protected], [email protected], [email protected]
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
DATA MINING APPROACH FOR PREDICTING STUDENT PERFORMANCE
. Economic Review Journal of Economics and Business, Vol. X, Issue 1, May 2012 /// DATA MINING APPROACH FOR PREDICTING STUDENT PERFORMANCE Edin Osmanbegović *, Mirza Suljić ** ABSTRACT Although data mining
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal
E-Learning Using Data Mining Shimaa Abd Elkader Abd Elaal -10- E-learning using data mining Shimaa Abd Elkader Abd Elaal Abstract Educational Data Mining (EDM) is the process of converting raw data from
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Comparison of machine learning methods for intelligent tutoring systems
Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Predicting Students Final GPA Using Decision Trees: A Case Study
Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to
Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2171322/ Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition Description: This book reviews state-of-the-art methodologies
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)
260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case
Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento
Classification of Learners Using Linear Regression
Proceedings of the Federated Conference on Computer Science and Information Systems pp. 717 721 ISBN 978-83-60810-22-4 Classification of Learners Using Linear Regression Marian Cristian Mihăescu Software
Neural Network Predictor for Fraud Detection: A Study Case for the Federal Patrimony Department
DOI: 10.5769/C2012010 or http://dx.doi.org/10.5769/c2012010 Neural Network Predictor for Fraud Detection: A Study Case for the Federal Patrimony Department Antonio Manuel Rubio Serrano (1,2), João Paulo
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Model Trees for Classification of Hybrid Data Types
Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,
Studying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Clustering Marketing Datasets with Data Mining Techniques
Clustering Marketing Datasets with Data Mining Techniques Özgür Örnek International Burch University, Sarajevo [email protected] Abdülhamit Subaşı International Burch University, Sarajevo [email protected]
Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case
Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case A. Selman Bozkır 1, S. Güzin Mazman 2, and Ebru Akçapınar Sezer 1 1 Hacettepe University, Department of Computer
A Fuzzy Decision Tree to Estimate Development Effort for Web Applications
A Fuzzy Decision Tree to Estimate Development Effort for Web Applications Ali Idri Department of Software Engineering ENSIAS, Mohammed Vth Souissi University BP. 713, Madinat Al Irfane, Rabat, Morocco
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
An Evaluation of Neural Networks Approaches used for Software Effort Estimation
Proc. of Int. Conf. on Multimedia Processing, Communication and Info. Tech., MPCIT An Evaluation of Neural Networks Approaches used for Software Effort Estimation B.V. Ajay Prakash 1, D.V.Ashoka 2, V.N.
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
ClusterOSS: a new undersampling method for imbalanced learning
1 ClusterOSS: a new undersampling method for imbalanced learning Victor H Barella, Eduardo P Costa, and André C P L F Carvalho, Abstract A dataset is said to be imbalanced when its classes are disproportionately
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Course Outline Department of Computing Science Faculty of Science. COMP 3710-3 Applied Artificial Intelligence (3,1,0) Fall 2015
Course Outline Department of Computing Science Faculty of Science COMP 710 - Applied Artificial Intelligence (,1,0) Fall 2015 Instructor: Office: Phone/Voice Mail: E-Mail: Course Description : Students
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Addressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Sub-class Error-Correcting Output Codes
Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat
