Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification


 Geraldine Jordan
 3 years ago
 Views:
Transcription
1 29 Ninth IEEE International Conference on Data Mining Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification Fei Wang 1,JimengSun 2,TaoLi 1, Nikos Anerousis 2 1 School of Computing and Information Sciences, Florida International University, Miami, FL IBM T. J. Watson Research Center, Service Department, Yorktown Heights, NY Abstract Large IT service providers track service requests and their execution through problem/change tickets. It is important to classify the tickets based on the problem/change description in order to understand service quality and to optimize service processes. However, two challenges exist in solving this classification problem: 1) ticket descriptions from different classes are of highly diverse characteristics, which invalidates most standard distance metrics; 2) it is very expensive to obtain highquality labeled data. To address these challenges, we develop two seemingly independent methods 1) Discriminative Neighborhood Metric Learning (DNML) and 2) Active Learning with Median Selection (ALMS), both of which are, however, based on the same core technique: iterated representative selection. A case study on real IT service classification application is presented to demonstrate the effectiveness and efficiency of our proposed methods. I. INTRODUCTION In IT outsourcing environments, every day thousands of problem and change requests are generated on a diverse set of issues related to all kinds of software and hardware. Those requests need to be resolved correctly and quickly in order to meet service level agreements (SLAs). Under this environment, when devices fail or applications need to be upgraded, the person recovering such failures or applying the patch is likely to be sitting thousand miles from the affected component. The reliance on outsourcing of technology support has fundamentally shift the dependencies between participating organizations. In such a complex environment, all service requests are handled and tracked through tickets by various problem & change management systems. A ticket is opened with certain symptom description, and routed to appropriate Service Matter Experts (SMEs) for resolution. The solution is documented when the ticket is closed. The job of SMEs is to resolve tickets quickly and correctly in order to meet SLAs. Another important job role is Quality Analysts (QAs), whose responsibility is to analyze recent tickets in order to identify the opportunity for the servicequality improvement. For example, a frequent password resets on one system may be due to the inconsistent password period in that system. Instead of resetting the password all the times, the password period should be adjusted properly. Or sometimes one patch or fix for a particular server should also be applied to all servers with the same configuration, instead of creating multiple tickets with exactly the same work order on each of those systems. It requires great understanding of the current ticket distribution for QAs to identify any opportunity for optimization. In another word, it is important to classify those tickets based on their description and resolution accurately in a timely manner. In this paper, we address the ticket classification problem that lies in the core of IT service quality improvement with significant practical impact and great challenges: Standard distance metrics do not apply due to diverse characteristics of raw features. More specifically, the ticket description and solution are highly diverse and noisy. Since different SMEs can describe the same problem quite differently and the descriptions are typically short and noisy. Also depending on the types of problems, the description can vary significantly. There are almost no highquality labeled data. Tickets are handled by SMEs, who often do not have the incentive or ability to classify the tickets accurately, due to heavy workload and incomplete information. On the other hand, QAs, who have the right ability and incentive, do not have cycle to manually label all the tickets. To address these two challenges, we propose a novel hybrid approach that leverage both active learning and metric learning. The contributions of this paper are the following: We propose Discriminative Neighborhood Metric Learning (DNML) that learns a domainspecific distance metric using the overall data distribution and a small set of labeled data. We propose Active Learning with Median Selection (ALMS) that progressively selects the representative data points that need to be labeled, which is naturally a multiclass algorithm. We combine metric and active learning steps into a unified process over the data. Moreover, our algorithm can automatically detect the number of classes contained in the data set. We demonstrate the effectiveness and efficiency of DNML and ALMS over several datasets compared to several existing methods /9 $ IEEE DOI 1.119/ICDM
2 because of its homogeneous assumption of all the feature dimensions. Therefore many researchers propose to learn a Mahalanobis distance which measures the distance between data x i and x j by d m (x i, x) = (x i x j ) C(x i x j ) (2) Figure 1. The basic algorithm flowchart, where the red blobs represent the labeled points, and the gray blobs correspond to unlabeled points. The rest of the paper is organized as follows: Section II presents the methods for metric and active learning. Section III demonstrates the practical casestudy of the proposed methods on IT ticket classification application. Finally, Section V concludes. II. METRIC+ACTIVE LEARNING In this section we will introduce our algorithm in detail. First we give an overview of our algorithm. A. Algorithm Overview The basic procedure of our algorithm is to iterate the following procedure: Learn a distance metric from the labeled data set, and then classify the unlabeled data points using the nearest neighbor classifier with the learned metric. We call the data points in the same class a cluster. Select the median from each cluster. For each cluster X i, we partition it into a labeled set Xi L (whose labels are given initially) and an unlabeled set Xi U (whose labels are predicted by the nearest neighbor classifier). Then the median for X i is defined as m i = arg min x c i 2 (1) x Xi U where c i is the mean of X i. Add the selected points into the labeled set. Fig.1 shows the graphical view of the basic algorithm flowchart. B. Discriminative Neighborhood Metric Learning As we know that a good distance metric plays a central role in many data mining and machine learning algorithms (e.g., Nearest Neighbor classifier, kmeans algorithm). Usually the Euclidean distance cannot satisfy our requirements where C R d d is a positive semidefinite covariance matrix used to incorporate the correlations of different feature dimensions. In this paper, we consider to learn a lowrank covariance matrix C, such that with the learned distance metric, the withinclass compactness and betweenclass scatterness are maximized. Different from linear discriminant analysis [1], which seeks for a discriminant subspace in a global sense, our algorithm aims to learn a distance metric with enhanced local discriminability. To define such local discriminability, we first introduce the definition of two types of neighborhoods [8]: Definition 1: Homogeneous Neighborhood. The homogeneous neighborhood of x i, denoted as Ni o,isthe N i o  nearest data points of x i with the same label, Ni o is the size of Ni o. Definition 2: Heterogeneous Neighborhood. The heterogeneous neighborhood of x i, denoted as Ni e,isthe N i e  nearest data points of x i with different labels, Ni e is the size of Ni e. Basedontheabovetwodefinitions, we can define the local compactness of point x i as C i = d 2 j:x j Ni o m (x i, x j ) (3) and the local scatter ness of point x i as S i = d 2 k:x k Ni e m (x i, x k ) (4) Then the local discriminability of the data set X with respect to the distance metric d m can be defined as J = i C i j:x j N (x i = o i x j ) C(x i x j ) (x i x k ) () C(x i x k ) i S i k:x k N e i The goal of our algorithm is to minimize J, which is equivalent to minimize the local compactness and maximize the local scatterness simultaneously. Fig.2 provides an intuitive graphical illustration of the theme behind our algorithm. However, the minimization of J in Eq.() to get an optimal C is not an easy task as there are d(d +1)/2 variables to solve provided that C is symmetric. Recall that we require C to be a lowrank matrix and C is positive semidefinite, then by incomplete Choleskey factorization, we can decompose C to C = WW (6) where W R d r and r is the rank of C. Inthisway, we only need to solve W instead of solving the entire C. 123
3 (a) The neighborhoods of x i with Euclidean distance (b) The neighborhoods of x i with the learned distance Figure 2. The homogeneous and heterogeneous neighborhoods of x i with the regular Euclidean and learned Mahalanobis distance metrics. The blob with dark green in the middle of the circle is x i, and the blobs with green and blue correspond to the points in Ni o and N i e. The goal of DNML is to learn a distance metric that pulls the points in Ni o towards x i while push the points in Ni e away from x i. Table I DECOMPOSED NEWTOWN S PROCEDURE FOR SOLVING THE TRACE RATIO PROBLEM Input: Matrices M C and M S, precision ε, dimensiond Output: Trace Ratio value λ and matrix W Procedure: 1. Initialize λ =, t = 2. Do eigenvalue decomposition to M C λ tm S 3. Let (β k (λ), w k (λ)) be the kth eigenvalue, eigenvector pair obtained from step 2, define the first order Taylor expansion ˆβ k (λ) =β k (λ)+β k (λt)(λ λt), where β k (λt) =w k(λ t) M S w k (λ t) 4. Define ˆf d (λ) to be the sum of the smallest d ˆβ k (λ), solve ˆd d (λ) =and set the root to be λ t+1. If λ t+1 λ t <ɛ,gotostep6;otherwiset = t +1, go to step 2 6. Output λ = λ, andw to be the eigenvectors w.r.t. the smallest d eigenvalues of M C M S Combining Eq.(6) and Eq.(), we can derive the following optimization problem tr(w M C W) min W tr(w (7) M S W) where M C = (x i x j )(x i x j ) (8) i j:x j Ni o M S = (x i x k )(x i x k ) (9) i k:x k Ni e are the compactness and scatterness matrices respectively. Therefore, our problem (7) becomes a trace quotient minimization problem, and we can make use of the decomposed Newtown s method [3]. C. Active Learning with Median Selection Recently a novel active learning method called Transductive Experimental Design (TED) [11] is proposed, which aims to select the k most representative points contained in the data set. Despite the theoretical soundness and empirical success of TED, there are still some limitations: Although the name suggests TED is transductive, it does make use of any label information contained in the data set. In fact, TED just uses the whole data set (include labeled and unlabeled) to select the k most representative data, such that the linear reconstruction loss of the whole data set using the selected points are minimized. In this sense, TED is an unsupervised method. As the authors analyzed in [11], TED tends to select the data points with large norms (where the authors analyzed that these points are hard to predict). However, these selected points lie on the border of the data distribution area, and those data points could be outliers that would mislead the classification process. Based on the above analysis, we propose to (1) make use of the label information; (2) select the representative points locally. Specifically, we first learn a distance metric using our DNML method introduced in last section and then apply the nearest neighbor classifier to classify the unlabeled points. In this way, the whole data set is partitioned into several classes, and for each class, we just select the median point as defined in Eq.(1) class1 class2 class3 class4 selected (a) TED class1 class2 class3 class4 selected (b) Local Median Figure 3. Active learning results. (a) shows the results of transductive experimental design; (b) shows the results of our local median selection method. Fig.3 illustrates a toy example on the difference between TED and our local median selection method. The data set here is generated from 4 Gaussians, and each Gaussian contains 1 data points. We treat each Gaussian as a class. Initially we randomly label 1% of the data points and use TED to select 4 most representative data points shown as black triangles in Fig.3(a), from which we can see that these points all lie on the border of these Gaussians. Fig.3(b) shows the results of our local median selection method, where we first apply DNML to learn a proper distance metric from the labeled points and then use such metric to classify the whole data set, and finally we select one median from each class. From the figure we observe that the selected points are representative for each Gaussian. An issue that is worth mentioning here is that our algorithm in fact can be viewed as an approximated version of 124
4 Table II THE METRIC+ACTIVE LEARNING ALGORITHM 6 6 Inputs: Training data, Ni o, N i e, precision ɛ, dimensiond, number of iteration steps T Outputs: The selected points and learned W Procedure: for t=1:t 1. Construct M S, M C from the training data 2. Learn a proper distance metric 3. Count the number of classes k in the training data, apply the learned metric to classify the unlabeled data using the nearest neighbor classifier 4. Select the median in each class and add them into the training data pool end local TED, wherewefirst partition the data set into several local regions using the learned distance metric, and then select exactly ONE representative point in each region. As data mean is the most representative point for a set of data in the sense of Euclidean loss, we select the median which is closest to the data mean from the candidate set. The whole algorithm procedure is summarized in Table II. III. TICKET CLASSIFICATION: ACASE STUDY In this section we present the detailed experimental results on applying our proposed active learning scheme for ticket classification. First we will describe the basic characteristics of the data set. A. The Data Set There are totally 4182 tickets from 27 classes. We use the bagofwords features, which results in a 3882 dimensional space. After eliminating the duplicate and null tickets, there are 2222 tickets remained. The class distribution is shown in Fig.4(a), from which we can observe that the classes are highly imbalanced and there are many rare classes with only a few data points. We identify a class as rare class if and only if the number of data points contained in it is less than 2. In our experiments, we eliminate those rare classes, which results in a data set of size 2161 from classes, and the class distribution is shown in Fig.4(b). Besides rare classes, we also observe that the data set is highly sparse and there are also a set of rare features. The original feature distribution is shown in Fig.(a), where we accumulated the times that each feature appears in each class. We identify a feature as a rare feature if and only if its total appearance times in the data set is less than 1. After eliminating those rare features, we obtain a data set with 669 features and its distribution is shown in Fig.(b). Finally, we also eliminated the data with only these rare features, which makes the final data set containing 213 tickets with 669 features. number of data points Figure 4. classes 1 1 (a) Original data number of data points (b) Data distribution after eliminating the rare classes Class distribution of original data and the data with no rare feature index (a) Original data feature distribution feature index (b) Data feature distribution with no rare features Figure. Feature distribution of the original data and the data with rare feature eliminated. B. Distance Metric Learning In this part of experiments, we first test the effectiveness of our DNML algorithm on the ticket data set, where we use our algorithm to obtain a distance metric, and then use such distance metric to perform nearest neighbor classification and get the final classification accuracy. Such procedure is repeated times and we report the average classification accuracies and standard deviations as in Fig.6. The size of homogeneous and heterogeneous neighborhoods are set to 3 manually, and the rank of the covariance matrix C is set to 4. From the figure we observe the superiority of our metric learning method. Specifically, with the learned metric, our DNML method clearly outperforms the original NN method, which validates that DNML can learn a better distance metric. As there are two sets of parameters in our DNML method, one is the rank r of the covariance matrix C, the other is the sizes of the homogeneous neighborhood N o and heterogeneous neighborhood N e (denoted as n o and n e ). Therefore we also conducted a set of experiments to test the sensitivity of DNML with respect to those parameters. Fig.7 shows how the algorithm performance varies with respect to the rank of the covariance matrix C, wherewe randomly label half of the tickets as the training set, and the remaining tickets are used for testing. We set the sizes of N o and N e to be 3. The results in Fig.7 are summarized over independent runs. From the figure we can see that
5 classification accuracy DNML NN NB RLS SVM not that sensitive with respect to the variation of Ni o and Ni e. From Fig.8 we can see that when the neighborhood sizes are small, the algorithm performance is better than that when the neighborhood sizes are large. This is possibly because the distribution of the data set is complicated and data in different classes are highly overlapped, then when we enlarge the neighborhoods to include more data points, the learned distance metric might be corrupted by some noisy points, which will make the final classification results inaccurate percentage of labeled tickets (%) Figure 6. Classification accuracy comparison with different supervised learning methods. The xaxis represents the percentage of randomly labeled tickets, and the yaxis denotes the averaged classification accuracy. the final ticket classification results are stable with respect to the choice of the rank of the covariance matrix, except the cases when the rank is too small (i.e., 1 in our case), since in those cases too much information will be lost. When the rank becomes too large, some noise contained in the data set could be retained, therefore the performance of our algorithm would go down a little, and the choices of r [2, 4] are all reasonable. classification accuracy rank of the covariance matrix Figure 7. The sensitivity of the performance of our algorithm with respect to the rank r of the covariance matrix C. Weset Ni o = N i e =3,and half of the data set are labeled as the training data. We also test the sensitivity of our algorithm with respect to the choices of the sizes of Ni o and Ni e, the results of which are shown in Fig.8, where the xaxis and yaxis correspond to the sizes of Ni o and N i e, and the zaxis denotes the classification accuracy averaged over independent runs. Here we assume that the sizes homogeneous and heterogeneous neighborhoods are the same for all the data points. For each run, we randomly label % of the tickets as training data, and the rest as testing data. From Fig.8 we can clearly see that the whole surface of z = f(x, y) is flat, which means that the performance of our algorithm is Accuracy N i e 1 Figure 8. The sensitivity of the performance of our algorithm with respect to the choices of Ni o and N i e, and half of the data set are labeled as the training data. C. Integrated Active Learning and Distance Metric Learning In our implementation, we initially label 2% of the data set, and then apply the various active learning methods. Since there are totally classes in the ticket data set, for each method, we select points from the unlabeled set in each round. For all the approaches that use DNML, we set N o = N e =3, and the rank of the covariance matrix is set to 4. Fig.9 illustrates the results of these algorithms summarized over independent runs, where the xaxis represents the percentage of selected points, and the yaxis denotes the averaged classification accuracy as well as the standard deviation. From the figure we can clearly see that with our DNML+LMED method, the classification accuracy will ascend faster compared to other methods. IV. RELATED WORKS In this section we will briefly review some previous works that are closely related to our metric+active learning method. A. Distance Metric Learning Distance metric learning plays a central role in real world applications. According to [1], these approaches can mainly be categorized into two classes: unsupervised and supervised. Here we mainly review the supervised methods, which learn distance metric from the data set with some supervised information. Usually the information takes the 1 N i o 126
6 classification accuracy DNML+LMED. LMED DNML+Rand DNML+TED percentage of actively labeled tickets (%) Figure 9. The classification accuracy vs. number of selected tickets plot. The xaxis represents the percentage of labeled tickets, and the yaxis represents the classification accuracy averaged over independent runs. form of pairwise constraints, which indicating whether a pair of data points belong to the same class (usually referred to as mustlink constraints) or different classes (cannotlink) constraints. Then these algorithms aim to learn a proper distance metric under which the data with mustlink constraints are as compact as possible, while the data with cannotlink constraints are far apart from each other. Some typical approaches include the sideinformation method [9], Relevant Component Analysis (RCA) [],anddiscriminant Component Analysis (DCA)[2].OurDiscriminant Neighborhood Metric Learning (DNML) method can also be viewed as a supervised method, however, we make use of the labeled data together with their labels, which is different from those pairwise constraints. B. Active Learning In many realworld problems, we face the problems when unlabeled data are abundant but labeling data is expensive to obtain (e.g., in text classification, it is expensive and time consuming to ask the users to label the documents manually, however, it is quite easy to obtain a large amount of unlabeled documents by crawling the web). In such a scenario the learning algorithm can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. Two classical active learning algorithms are Tong and Koller s Simple SVM algorithm [6] and Seung et al. s Query By Committee (QBC) algorithm [4]. However, the simple SVM algorithm is coupled with the Support Vector Machine (SVM) classifier [7] and is only applicable to twoclass problems. For the QBC algorithm, we need to construct a committee of models that represent different regions of the version space and have some measure of disagreement among committee members, which is usually difficult for real world applications. Recently, Yu et al. [11] proposed another active learning algorithm called Transductive Experimental Design (TED), which aims to find some most representative points that can optimally reconstruct the whole data set in the sense of Euclidean sense. Our median selection strategy introduced in this paper is similar to TED, and we have analyzed in section IIC the superiority of our algorithm. V. CONCLUSIONS We present a novel metric+active learning method for IT service ticket classification in this paper. Our method combines both the strengths of metric learning and active learning methods. Finally the experimental results on both benchmark and real ticket data sets are presented to demonstrate the effectiveness of the proposed method. ACKNOWLEDGEMENT The work is partially supported by NSF CAREER Award IIS4628 and a 28 IBM Faculty Award. REFERENCES [1] K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, California, 199. [2] S. Hoi, W. Liu, M. Lyu, and W. Ma. Learning distance metrics with contextual constraints for image retrieval. In Proceedings of CVPR26, 26. [3] Y. Jia, F. Nie, and C. Zhang. Trace ratio problem revisited. IEEE Transactions on Neural Networks, 29. [4] H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of COLT, pages , [] N. Shental, T. Hertz, D. Weinshall, and M. Pavel. Adjustment learning and relevant component analysis. In Proceedings of ECCV, pages , 22. [6] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2:4 66, 21. [7] V. N. Vapnik. The nature of statistical learning theory. SpringerVerlag New York, Inc., New York, NY, USA, 199. [8] F. Wang and C. Zhang. Feature extraction by maximizing the neighborhood margin. In Proceedings of CVPR, 27. [9] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with sideinformation. In Advances in Neural Information Processing Systems, volume, pages 12, 23. [1] L. Yang. Distance metric learning: A comprehensive survey. Technical report, Michgan State University, 26. [11] K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In Proceedings of ICML, pages ,
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I  Applications Motivation and Introduction Patient similarity application Part II
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Daybyday Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationBayesian Active Distance Metric Learning
44 YANG ET AL. Bayesian Active Distance Metric Learning Liu Yang and Rong Jin Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 4884 Rahul Sukthankar Robotics Institute
More informationComparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Nonlinear
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationNeural Networks Lesson 5  Cluster Analysis
Neural Networks Lesson 5  Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt.  Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationImproved Fuzzy Cmeans Clustering Algorithm Based on Cluster Density
Journal of Computational Information Systems 8: 2 (2012) 727 737 Available at http://www.jofcis.com Improved Fuzzy Cmeans Clustering Algorithm Based on Cluster Density Xiaojun LOU, Junying LI, Haitao
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationVirtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
More informationPersonalized Hierarchical Clustering
Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, OttovonGuerickeUniversity Magdeburg, D39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.unimagdeburg.de
More informationFace Recognition using SIFT Features
Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.
More informationA Complete Gradient Clustering Algorithm for Features Analysis of Xray Images
A Complete Gradient Clustering Algorithm for Features Analysis of Xray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationFace Recognition using Principle Component Analysis
Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationMulticlass Classification: A Coding Based Space Partitioning
Multiclass Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More information1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,
More informationEM Clustering Approach for MultiDimensional Analysis of Big Data Set
EM Clustering Approach for MultiDimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University September 19, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University September 19, 2012 EMart No. of items sold per day = 139x2000x20 = ~6 million
More informationA Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSRJCE) eissn: 22780661, p ISSN: 22788727Volume 16, Issue 2, Ver. VI (MarApr. 2014), PP 4448 A Survey on Outlier Detection Techniques for Credit Card Fraud
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationA Survey on Preprocessing and Postprocessing Techniques in Data Mining
, pp. 99128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Preprocessing and Postprocessing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationPrincipal Component Analysis Application to images
Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/
More informationRecognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
More informationA fast multiclass SVM learning method for huge databases
www.ijcsi.org 544 A fast multiclass SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and TalebAhmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationFeature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract  This paper presents,
More informationCLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationEcommerce Transaction Anomaly Classification
Ecommerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of ecommerce
More informationA Lightweight Solution to the Educational Data Mining Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationDistance Metric Learning for Large Margin Nearest Neighbor Classification
Journal of Machine Learning Research 10 (2009) 207244 Submitted 12/07; Revised 9/08; Published 2/09 Distance Metric Learning for Large Margin Nearest Neighbor Classification Kilian Q. Weinberger Yahoo!
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationA New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, MayJun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationConcepts in Machine Learning, Unsupervised Learning & Astronomy Applications
Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications ChingWa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationA Kmeanslike Algorithm for Kmedoids Clustering and Its Performance
A Kmeanslike Algorithm for Kmedoids Clustering and Its Performance HaeSang Park*, JongSeok Lee and ChiHyuck Jun Department of Industrial and Management Engineering, POSTECH San 31 Hyojadong, Pohang
More informationUnsupervised Learning: Clustering with DBSCAN Mat Kallada
Unsupervised Learning: Clustering with DBSCAN Mat Kallada STAT 2450  Introduction to Data Mining Supervised Data Mining: Predicting a column called the label The domain of data mining focused on prediction:
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationDistance based clustering
// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationData Mining Project Report. Document Clustering. Meryem UzunPer
Data Mining Project Report Document Clustering Meryem UzunPer 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. Kmeans algorithm...
More informationStandardization and Its Effects on KMeans Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 3993303, 03 ISSN: 0407459; eissn: 0407467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationLearning a Metric during Hierarchical Clustering based on Constraints
Learning a Metric during Hierarchical Clustering based on Constraints Korinna Bade and Andreas Nürnberger OttovonGuerickeUniversity Magdeburg, Faculty of Computer Science, D39106, Magdeburg, Germany
More informationGoing Big in Data Dimensionality:
LUDWIG MAXIMILIANS UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationCrossValidation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
More informationReview of Computer Engineering Research WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM. Geeta R.B.* Shobha R.B.
Review of Computer Engineering Research journal homepage: http://www.pakinsight.com/?ic=journal&journal=76 WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM Geeta R.B.* Department
More informationMulticlass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationUse of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
More informationCLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
More informationFace Recognition with Occlusions in the Training and Testing Sets
Face Recognition with Occlusions in the Training and Testing Sets Hongjun Jia and Aleix M. Martinez The Department of Electrical and Computer Engineering The Ohio State University, Columbus, OH 43210,
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598. Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationMaximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationIN this paper we focus on the problem of largescale, multiclass
IEEE TRANSACTIONS ON PATTERN RECOGNITION AND MACHINE INTELLIGENCE 1 DistanceBased Image Classification: Generalizing to new classes at nearzero cost Thomas Mensink, Member IEEE, Jakob Verbeek, Member,
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 3448 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationEfficient visual search of local features. Cordelia Schmid
Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More informationBizPro: Extracting and Categorizing Business Intelligence Factors from News
BizPro: Extracting and Categorizing Business Intelligence Factors from News Wingyan Chung, Ph.D. Institute for Simulation and Training wchung@ucf.edu Definitions and Research Highlights BI Factor: qualitative
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More information