Classification of poses and movement phases

Size: px
Start display at page:

Download "Classification of poses and movement phases"

Transcription

1 Classification of poses and movement phases Adam Świtoński12,, Henryk Josiński 12, Karol Jedrasiak 1, Andrzej Polański 12, and Konrad Wojciechowski 12 1 Polish-Japanese Institute of Information Technology, Aleja Legionw Bytom, Poland {aswitonski,apolanski, hjosinski, kwojciechowski}@pjwstk.edu.pl 2 Silesian University of Technology, ul. Akademicka Gliwice Poland {adam.switonski, henryk.josinski, andrzej.polanski, konrad.wojciechowski}@polsl.pl Abstract. We have focused on the problem of classification of motion frames representing different poses by supervised machine learning and dimensionality reduction techniques. We have extracted motion frames from global database manually, divided them into six different classes and applied classifiers to automatic pose type detection. We have used statistical Bayes, neural network, random forest and Kernel PCA classifiers with wide range of their parameters. We have tried classification on the original data frames and additional reduced their dimensionality by PCA and Kernel PCA methods. We have obtained satisfactory results rated in best case 1 percent of classifiers efficiency. 1 Introduction Motion databases consist of a very large amount of data. They store hundreds of motions and each motion is a large sequence of frames, usually captured with minimum 1Hz frequency. Motion data usually comes directly from mocap capturing devices [8] or they can be estimated from static 2D images. In [6] such a method based on the Markov chain Monte Carlo is proposed. In practice it is impossible to analyze and search in such kind of databases manually. One of the first tasks in the automatic motion analysis could be pose detection, which means pointing for each frame of the pose type. It could be useful in database searching problem. On the basis of labeled frames we are able to build criteria of the database query, for instance find a motion in which a human is sitting. Pose identification also has medical applications. It could be used in automatic detection of some improper poses, typical in given kinds of diseases. Finally, pose identification could be useful in further automatic motion analysis, as for instance segmentation. The boundary of the motion segments could be placed in the moment of pose changing. Comparing of motion frames is not trivial problem. A frame is described by position of special markers located by the joints. The frame can be represented by the data of direct position of each marker in 3D global coordinate system. But the most often used representation is a kinematic chain, which has format

2 of a tree structure. The root object is placed on the top of the tree and is described by its position in global coordinate system. Child objects are connected to their parents and have information of transformation relative to the parents. Both formats contain exactly the same data, but the advantage of kinematic chain is that identical poses captured in different places have almost the same numerical representation, except for root objects, which is completely different in the representation in 3D global coordinate system [8]. On the basis of the frame format we can build pose similarity measures. The distance could be an aggregation of distances each pair of suitable markers in 3D global coordinate system, but it has disadvantages described above. Thus, in practice it is not used. The authors of the [4] propose 3D cloud point distance measure. First they build cloud points for compared frames and their temporal context. Further, they find global transition to match both clouds and finally calculate the sum of distances corresponding points of matched clouds. In [5] clouds are built based on the downsampled frame representation, which avoids focusing on the pose details. In kinematic chain format, transformations are usually coded with unit quaternions. Thus pose distance can be evaluated as sum of quaternion distances. In [3] frame distance is total weighted sum of quaternion distances because influence of transformations can differ on the pose - the differences depends on the joints. [7] propose binary relational motion features as description of pose. Relational feature is enabled if given joints and bones are in the defined relation: the left knee is behind the right knee, the right ankle is higher than the left knee and so on. We prepare such a set of features and this way describe pose by the binary vector. Pose distance can be calculated as distance metric of vectors descriptors. The basic problem in relational motion features is the proper set of features to distinguish between different kinds of poses. It is very difficult to prepare a single set of features which is applicable to the recognition of every kind of poses. Features are usually dedicated to specialized detections and because of their relatively easy interpretation they are prepared by medical experts who know the meaning of the given joint and bones dependencies. We can generate large features vectors from generic features set proposed by [7], but because of the difficulty in pointing significant features, this leads to long pose description and redundant data. The problem of recognition of a pose type is much more general than evaluating similarity of two different poses. A single pose type can be represented by different, not similar poses. It is so because different phases of each pose type, for example jumping can be divided into starting, flying and landing phases. Secondly it happens so because of different characters of the same pose type - fast run and slow run generates other pose frames. Each pose is represented by the location of tens of markers. Thus, a manual discovery of dependencies in each pose and finding numerical boundaries of given poses representation is almost impossible. Considering that we have decided to use machine learning techniques which are able to explore data, find dependencies and generalize knowledge. We have tested supervised learning with pose

3 distance metric based on the tree like representation. We have also tested linear and nonlinear dimensionality reduction methods to reduce pose description. 2 Pose Database We have prepared poses data from Carnegie Mellon University Motion Capture Database [2]. We have analyzed motion clips, selected pose frames of six different pose types: climbing, jumping, running, sitting, standing and walking. Each pose type contains a wide range of instances, taken from different movements and in different move phases. Finally we have labeled pose frames. Example poses are shown in Fig. 1. Fig. 1. Randomly selected poses from prepared test database. The following rows represent: climbing, jumping, running, sitting, standing and walking pose types Each pose is identified by six root attributes pointing location and orientation of global coordinate system and 56 relative attributes pointing twenty six body parts in a tree like structure. The number of description values of a given body part depends on its degrees of freedom. In preprocessing step we have removed root attributes to avoid learning of classifiers pose type by location and orientation of global coordinate system. The data originates from tens of different motion clips and pose instances are usually located in different places, which can make it easier for classifier to learn by frame location instead of real pose state. At the current stage we have not decided to add pose dynamics attributes such as velocities and accelerations.

4 To reduce computational complexity of machine learning methods we have prepared test set with only 2 randomly selected pose frames. 3 Classification First we have tested supervised learning methods based on the raw data containing all 56 relative attributes. We have used cross validation for to split our test set into the train and test parts and focused on the classifier efficiency, meaning percentage of correctly classified poses of the test sets. We have chose following classifiers: Naive Bayes [1] with normal and kernel based density estimator, knn [1] with number of analyzed nearest neighbors ranging from 1 to 1, Random Forest [1] with various number of features, MultiLayer Perceptron [1] with various numbers of hidden layer neurons and epochs plus several different learning rates. 1 Naive Bayes 1 knn Normal Kernel Density estimation Random Forest k MultiLayer Perceptron Number of features Hidden Layer Neurons Fig. 2. Classification results The efficiencies of all classifiers are over 95 percent and in best case of neural network classifier it comes 1 percent. All results are presented in Fig. 2. For Naive Bayes, which achieved worst efficiencies, there is significant difference for normal and kernel based density estimator. The advantage of kernel based one probably means that the assumption of normal distribution of pose attributes is not so accurate, but the other hand, almost 95 percent efficiency does not deny normal distributions. KNN classifier achieved very good results. There is opposite relation of of number of the analyzed nearest neighbors and efficiency - the more neighbors

5 the worse results. It is probably so because of the nature of our dataset, which in a few untypical feature space regions has weak representation of given pose type frames. The nearest neighbor classifier is best fitted to the train dataset regardless of that representation. In spite of that, efficiency of 1NN classifier is still acceptable and only a little bit worse than 1NN. There is a slightly noticeable influence of number of features of random forest classifier, but the differences are not remarkable and all results are satisfactory. Globally the best results are achieved with neural network classifier. The results depend proportionally on the complexity of the network - the greater complexity, the better results, but even five hidden layer neurons give excellent efficiency over 99 percent. We think it is so because of the above mentioned weak representation, which could be better approximated with more complicated networks. 4 Dimensionality reduction We have applied dimensionality reduction methods to reduce pose descriptions. On the basis of the reduced feature space, we have tested supervised learning methods and compared the results with raw data. We have used and compared linear Principal Components Analysis [1] and nonlinear Kernel Principal Components Analysis [9] and tested nonlinearity of the feature space. We have chosen radial kernel function K(x, y) = e x y 2σ 2 [1], with different sigma values and Eucalidean metric calculated on normalized and raw feature space Number of componets Fig. 3. PCA Variance Cover Variance cover of PCA components shows that there is no short description which stores most of dataset variance. Three components have only 26 percent of global variance, and 9 percent receives just more than twenty componets. In the Fig. 4 we have visualized the first three components of PCA and Kernel PCA in reduced 3D feature space and in Tab. 1 we have presented example confusion matrices for this 3D PCA feature space achieved with Naive Bayes and Random Forest classifiers. For Kernel PCA we have used kernel function

6 PCA Components Kernel PCA Components Fig. 4. Reduced feature space. Poses: blue-sitting, red-standing, green-jumping, yellowclimbing, black-walking, cyan-running. sit sta jum cli wal run sitting standing jumping climbing walking running sit sta jum cli wal run sitting standing jumping climbing walking running Table 1. Example Confusion Matrices. 3D PCA feature space, Naive Bayes and Random Forest classifiers. parameters for which we have obtained best classification results. We can notice general pose classes boundaries, but there is no accurate simple distinction between them. Especially poses standing and walking are mixed together. It happens so because in slow walking there are some phases which look very similar to standing and three values are insufficient to distinguish them. In Kernel PCA some climbing poses are placed far from the rest of instances. There are probably poses with largely leaning forward, which produce large values of distance metric to the rest of posed and has an impact on the kernel function values. Fig. 5 shows aggregated classification results obtained by classifiers for PCA and Kernel PCA reduced feature space. We have chosen results for best parameters of each classifier and in the case of Kernel PCA best pair of sets of parameters classifier and kernel function. There is a similarity to raw feature space. The worst is Naive Bayes and kernel density estimator is a bit better than normal. For knn the best one is 1NN, but the variations are not remarkable. The only difference is that the multilayer perceptron is not better than others and does not need such complexed structure to obtain optimal efficiency. Globally the best is 1NN, but Random Forest and neural network are almost the same. Acceptable results with efficiency over 9 percent need at least three dimensional features space, for 95 percent five dimensional is required, but excellent 99 percent needs only seven dimensional. There are no remarkable deferences of PCA and Kernel PCA, except for one

7 Number of components Naive Bayes (PCA) Naive Bayes (KPCA) knn (PCA) knn (KPCA) Random Forest (PCA) Random Forest (KPCA) MLP (PCA) Fig. 5. PCA and Kernel PCA classification results dimensional feature space which promotes PCA. Kernel PCA is a little bit better for Naive Bayes but a bit worse for knn and Random Forest. Kernel function has great impact on the classification. In most cases distance metric calculated on the normalized feature space is promoted. There is no noticeable general dependency as regards sigma parameter, it differs in given cases. We also built classifier based on the first components of the Kernel PCA trained on the datasets with pose frames of only single pose type. Kernel function depends on the similarity of its arguments, the more similar the greater the value. Thus, sum of kernel function values calculated against the same pose type could give greater value than against other pose type. We have decided to assign pose to the class with maximum value of first Kernel PCA component trained on the given class instances. 1 8 Raw Normalized Sigma Fig. 6. Kernel PCA classifier results for distance metric calculated on the raw and normalized feature space. We have obtained over 97 percent of classifier efficiency for the best case. Regardless of sigma values, the distance metric based on the normalized feature

8 space gives better results. The choice of analyzed Kernel PCA component is disputable and there is possible area to improve the results. 5 Conclusion We have evaluated supervised learning techniques for detection of pose type, based only on the location of body markers. We have prepared test database of 2 pose frames and six different pose types. We have chosen four different classifiers and tested them on wide range of parameters. The results are very promising, we have obtained even 1 percent of classifier efficiency for the case of multilayer percptron with complexed structure. However, it could be an overtrained network, ideally fitted to the train dataset. Although train and test datasets are disjoint, they come from the same database and have some kind of dependency. In fact there is no possibility to prepare dataset with unique cover of possible class regions for the data described with 56 attributes. A single pose type can be represented by a very large number of different frames, and some attributes may have no significance, like the position of the hands in the sitting pose. Dimensionality reduction techniques preserve global information of the pose state. Three-dimensional feature space is sufficient to notice the inaccurate boundaries between pose types, but better results require more dimensions. For tendimensional space the results are rated on the level of 99 percent, which is only slightly worse than for the full 56 attributes. Dimensionality reduction generalizes feature space ; thus, it diminishes focusing on the pose details and strict fitting to train dataset. We think that results are more reliable. Our experiments are only introductory stage to real applications, which is more challenging task, because of the above mentioned train set representation and more pose types. Our final conclusion is that supervised machine learning techniques are able to recognize pose types. Acknowledgment This paper has been supported by the European Regional Development Project: System wraz z bibliotek moduw dla zaawansowanej analizy i interaktywnej syntezy ruchu postaci ludzkiej wspfinansowany przez Uni Europejsk ze rodkw Europejskiego Funduszu Rozwoju Regionalnego w ramach Programu Operacyjnego Innowacyjna Gospodarka Dziaanie 1.3 Poddziaanie References 1. Boser B. E., Guyon I. M., Vapnik V.: A training algorithms for optimal margin classifiers. Fifth Annual Workshop on Computational Learning Theory., Pittsburgh 1992

9 2. Carnegie-Mellon Mocap Database Johnson M. Exploiting Quaternions to Support Expressive Interactive Character Motion. PhD thesis, Massachusetts Institute of Technology, Kovar L., Gleicher M.: Flexible automatic motion blending with registration curves. Proc. 23 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, (23) 5. Kovar L., Gleicher M., Pighin F: Motion graphs. ACM Trans. Graph., (22) 6. Lee M. W., Cohen I.: A Model-Based Approach for Estimating Human 3D Poses in Static Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 6, Mller M., Rder T.: A Relational Approach to Content-based Analysis of Motion Capture Data. Vol. 36 of Computational Imaging and Vision, ch. 2, , Roder T.:Similarity, Retrieval, and Classification of Motion Capture Data. PhD thesis, Massachusetts Institute of Technology, Schoelkopf B., Smola A., Mueller K.-R.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Technical Report No. 44, Max-Planck-Institut fuer biologische Kybernetik, Witten I., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 25

Human Identification Based on Gait Paths

Human Identification Based on Gait Paths Human Identification Based on Gait Paths Adam Świtoński 1,2, Andrzej Polański 1,2, Konrad Wojciechowski 1,2 1 Polish-Japanese Institute of Information Technology, Aleja Legionów 2, 41-902 Bytom, Poland

More information

Matlab Based Interactive Simulation Program for 2D Multisegment Mechanical Systems

Matlab Based Interactive Simulation Program for 2D Multisegment Mechanical Systems Matlab Based Interactive Simulation Program for D Multisegment Mechanical Systems Henryk Josiński,, Adam Świtoński,, Karol Jędrasiak, Andrzej Polański,, and Konrad Wojciechowski, Polish-Japanese Institute

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Project 2: Character Animation Due Date: Friday, March 10th, 11:59 PM

Project 2: Character Animation Due Date: Friday, March 10th, 11:59 PM 1 Introduction Project 2: Character Animation Due Date: Friday, March 10th, 11:59 PM The technique of motion capture, or using the recorded movements of a live actor to drive a virtual character, has recently

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

More information

Human behavior analysis from videos using optical flow

Human behavior analysis from videos using optical flow L a b o r a t o i r e I n f o r m a t i q u e F o n d a m e n t a l e d e L i l l e Human behavior analysis from videos using optical flow Yassine Benabbas Directeur de thèse : Chabane Djeraba Multitel

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Blood Vessel Classification into Arteries and Veins in Retinal Images

Blood Vessel Classification into Arteries and Veins in Retinal Images Blood Vessel Classification into Arteries and Veins in Retinal Images Claudia Kondermann and Daniel Kondermann a and Michelle Yan b a Interdisciplinary Center for Scientific Computing (IWR), University

More information

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Role of Neural network in data mining

Role of Neural network in data mining Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Predicting Student Performance by Using Data Mining Methods for Classification

Predicting Student Performance by Using Data Mining Methods for Classification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Content-Based Recommendation

Content-Based Recommendation Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

3 An Illustrative Example

3 An Illustrative Example Objectives An Illustrative Example Objectives - Theory and Examples -2 Problem Statement -2 Perceptron - Two-Input Case -4 Pattern Recognition Example -5 Hamming Network -8 Feedforward Layer -8 Recurrent

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Early defect identification of semiconductor processes using machine learning

Early defect identification of semiconductor processes using machine learning STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep Engineering, 23, 5, 88-92 doi:.4236/eng.23.55b8 Published Online May 23 (http://www.scirp.org/journal/eng) Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep JeeEun

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Using Nonlinear Dimensionality Reduction in 3D Figure Animation

Using Nonlinear Dimensionality Reduction in 3D Figure Animation Using Nonlinear Dimensionality Reduction in 3D Figure Animation A. Elizabeth Seward Dept. of Electrical Engineering and Computer Science Vanderbilt University Nashville, TN 37235 anne.e.seward@vanderbilt.edu

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Meta-learning. Synonyms. Definition. Characteristics

Meta-learning. Synonyms. Definition. Characteristics Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore wduch@is.umk.pl (or search

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Chapter 1. Introduction. 1.1 The Challenge of Computer Generated Postures

Chapter 1. Introduction. 1.1 The Challenge of Computer Generated Postures Chapter 1 Introduction 1.1 The Challenge of Computer Generated Postures With advances in hardware technology, more powerful computers become available for the majority of users. A few years ago, computer

More information

Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

How To Predict Web Site Visits

How To Predict Web Site Visits Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many

More information

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,

More information

MACHINE LEARNING BASICS WITH R

MACHINE LEARNING BASICS WITH R MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

More information

How To Identify A Churner

How To Identify A Churner 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION 1 ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION B. Mikó PhD, Z-Form Tool Manufacturing and Application Ltd H-1082. Budapest, Asztalos S. u 4. Tel: (1) 477 1016, e-mail: miko@manuf.bme.hu

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information