Optimizing Area Under the ROC Curve using Ranking SVMs

Size: px
Start display at page:

Download "Optimizing Area Under the ROC Curve using Ranking SVMs"

Transcription

1 Optimizing Area Under the ROC Curve using Ranking SVMs Kaan Ataman Department of Management Sciences The University of Iowa W. Nick Street Department of Management Sciences The University of Iowa ABSTRACT Area Under the ROC Curve (AUC), often used for comparing classifiers, is a widely accepted performance measure for ranking instances. Many researches have studied optimization of AUC, usually via optimizing some approximation of a ranking function. Ranking SVMs are among the better performers but their usage in the literature is typically limited to learning a total ranking from partial rankings. In this paper, we observe that a ranking SVM is in fact a direct optimization of AUC via optimizing the Wilcoxon-Mann- Whitney statistic. We compare a linear ranking SVM with some well-known linear classifiers such as linear SVM, perceptron and logit in the context of binary classification. We show that an SVM optimized for ranking not only achieves better AUC than other linear classifiers on average, but also performs comparably in accuracy. Keywords AUC,ranking,SVM,optimization 1. INTRODUCTION Many real world classification problems may require an ordering instead of a simple classification. For example in direct mail marketing, the marketer may want to know which potential customers to target for sending new catalogs so as to maximize the revenue. If the number of catalogs the marketer is sending is limited then it is not enough to know who is likely to buy a product after receiving the catalog. In this case it would be more useful to to distinguish top n-percent of potential customers who yield the highest likelihood of buying a product. By doing so, the marketer can more intelligently target a subset of her customer base increasing the expected profit. A slightly different example is collaborative book recommendations. To intelligently recommend a book, an algorithm must distinguish the similarities of book preferences of people who have read and submitted a score for the book. Combining these scores it is possible to return a recommendation list to the user. Ideally this list would be ranked in such a way that the book on the top of the list is expected to appeal the most to the user. It is not hard to see that there is a slight difference between those two ranking problems. In the marketing problem the ranking is based on binary input (purchase, non-purchase) whereas in the book recommendation problem the ranking is based on a collection of partial rankings. In this paper we consider the problem of ranking based on binary input. Ranking is a popular machine learning problem that has been addressed extensively in the literature [3, 4, 19]. The most commonly used performance measure to evaluate a classifiers ability to rank instances in binary classification problems is the area under the ROC curve. ROC curves are widely used for visualization and comparison of performance of binary classifiers. ROC curves were originally used in signal detection theory. They were introduced to the machine learning community by Spackman in 1989, who showed that ROC curves can be used for evaluation and comparison of algorithms [17]. ROC curves plot true positive rate vs. false positive rate by varying the threshold (probability of membership to a class) or score. In the ROC space, the upper left corner represents perfect classification while a diagonal line represents random classification. A point in ROC space that lies to the upper left of another point represents a better classifier. Some classifiers such as neural networks or naive Bayes naturally provide those probabilities while other classifiers such as decision trees do not. It is still possible to produce ROC curves for any type of classifier using minor adjustments [5]. Area under the ROC Curve (AUC) is a single scalar value for classifier comparison [2, 9]. Statistically speaking, the AUC of a classifier is the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. Since AUC is a probability, its value varies between 0 and 1, where 1 represents all positives being ranked higher than all negatives. Larger AUC values indicate better classifier performance across the full range of possible thresholds. Even though it is possible that a classifier with high AUC can be outperformed by a lower AUC classifier at some region of the ROC, in general the high AUC classifier is better.

2 AUC can also be represented as the ratio of the number of correct pairwise rankings vs. the number of all possible pairs [8]. This quantity is called the Wilcoxon-Mann-Whitney (WMW) statistic [13, 20]. W = p 1 i=0 I(x i, y j) = n 1 j=0 I(xi, yj) pn { 1 if xi > y j 0 otherwise where p is the number of positive points, n is the number of negative points, x i is the i th positive point and y j is the j th negative point. Note that in the binary classification problem we have no prior knowledge of intra-class rankings. Mainstream classifiers are usually designed to minimize the classification error, therefore they are not specifically designed to do well with the rankings. Support Vector Machines (SVMs) are exceptions, since by nature (margin maximization) they turn out to be good rankers. It has been shown that accuracy is not always the best performance measure, especially for datasets with skewed class or cost distributions as many real world problems have. The machine learning community has recently explored the question of which performance measure is better in general. Ling and Huang showed in their recent study that AUC is a statistically consistent and a more discriminating value than accuracy [12]. Recent research indicates an increase in the popularity of AUC as a performance measure especially in cases where class priors and/or misclassification costs are unknown. In the previous literature researchers introduced ways to optimize rankings. Cohen, Schapire and Singer proposed a two-stage approach where first a preference function is learned then new instances are ordered to maximize the agreements with the learned function [4]. Caruana, Baluja and Mitchell introduced Rankprop, an algorithm that improves rankings by backpropogation with sum of squares error on estimated ranks [3]. Vogt and Cottrell introduced d, an optimization measure for ranking problems [19]. In their paper they claim their method works as well as optimizing exact precision, a popular IR performance measure. Recently, optimizing the area under the ROC curve has become a focal point of researchers. The problem can be framed as optimizing a total ranking either with a given collection of rankings or using 0-1 input, depending on the available input. Freund et al. introduced Rankboost, an algorithm for combining preferences [6]. In the context of collaborative filtering the algorithm combines ranked preferences by users to create a recommendation list optimally ranked for a similar user. Yan et al. had a direct approach to optimizing the AUC. They use an alternate continuous function to approximate the WMW statistic, which by itself is not continuous, and optimize this function directly [22]. Another recent paper by Herschtal and Raskutti introduces an algorithm that optimizes AUC using gradient descent [10]. In this study, the authors approximated the WMW statistic with a sigmoid function making use of its (1) differentiability. The authors also introduced an efficient way to reduce the number of constraints in the optimization problem which reduces the overall complexity. Ranking SVMs are introduced by Herbrich et al. [15] and Joachims [11]. To our knowledge comparison of ranking SVMs to other baseline classifiers in the binary classification context has not been studied. The closest article we were able to find is the optimization of AUC using SVMs by Rakotomamonjy [16]. The study shows an application of a quadratic-programming-based algorithm to optimize AUC. Joachims used ranking SVMs in the context of optimizing search engines [11]. Even though in his paper the author does not experiment with binary classification here we show that his approach can be extended to binary classification problems. In this paper we focus on maximizing the number of correct pairwise rankings, which in turn will optimize Wilcoxon- Mann-Whitney statistic. In the following section we explain the optimization problem and our experimental setup in detail. This is followed by section 3, which covers the experimental results and discussions. Finally, we finish this paper with conclusions. 2. AUC OPTIMIZATION 2.1 Primal and Dual Formulations To directly optimize the WMW statistic, we would like to have all positive points ranked higher than all negative points. We therefore construct a linear program with soft constraints penalizing negatives that are ranked above positives. This requires p (number of positive points) times n (number of negative points) constraints in total. A perfect linear separation of classes is impossible for most real world datasets, therefore the LP has to be constructed to minimize some error term corresponding to the number of incorrect orderings. As in the classification case, our formulation avoids combinatorial complexity by minimizing the distance by which such pairs are mis-ordered. The primal form of the LP is given below. minimize w,v,z subject to e T z + e T v Cw z 1 w v 0 w v 0 v, z 0 where e is a vector of ones in the appropriate dimension, w represents the attribute weights, v minimizes the magnitude of the weights and z is a vector of error terms corresponding to each pairwise misranking. C represents a matrix of misrankings, more specifically: (2) C i = A i B i (3) where A i represents a collection of positive points, B i represents a collection of negative points, and C i constrains a

3 Initialize the weight vector While the number of added violations greater than k do For each positive point ranked below a negative point Create a constraint for each misranked positive point Create balancing, non-violated, constraints End for Solve the LP and update the weight vector End while Figure 1: Algorithm Outline positive point A i to be ranked higher then a negative point B i. In the ideal case the C matrix has pn rows. The result of this LP is a vector w such that the distance of pairwise mis-rankings is minimized. By finding such w without an intercept, we are in fact learning an optimal family of parallel planes, that is equivalent to scanning a single separating plane across a dataset parallel to its normal vector, and constructing an optimal ROC based on the series of intercept values. At this point our LP looks similar to a soft margin linear SVM [18]. Similar to an SVM, our LP maximizes the margin via minimizing the magnitude of the weights. The parameter that controls the balance between error and margin is set to one in our case so that sum of the margin and sum of errors are balanced. We observed that on average, using one was as good as using any other value for the coefficient. An SVM optimizes the margin of the relevant points (support vectors) relative to a single separating plane, while the ranking SVM optimizes the margin between each inter-class pair of points along the normal direction of the plane. As such, the number of points influencing the ranking solution is potentially much larger than the number affecting the separating plane. We use the dual of the LP to reduce the number of constraints, as the soft constraints on w become simple bounds on the dual variables. The dual of the above LP can be arranged as: minimize α,β,γ subject to 2.2 The Algorithm e T α C T α + β γ = 0 α 1 β γ 1 β, γ 0 Previously we mentioned that ideally we would like to include all possible constraints such that each positive point is ranked above each negative point. This however is computationally expensive especially with large datasets. For (4) instance, for a balanced dataset with 1000 points the number of constraints necessary is Of course this is the worst case scenario for a 1000 point data set since class ratio is 1. For a skewed dataset with a class ratio of 1:100 the required number of constraints would be roughly Initializing the weights randomly increases the run time of the algorithm by delaying the convergence of weights. Instead we initialize the weights using the solution of a linear SVM. This speeds up the algorithm dramatically since we start with good initial weights. At each iteration having good weights provides lesser number of violations. Figure 1 outlines the algorithm. Our approach is to adjust the weight vector incrementally. To do this, we create constraints for only the misranked positives. This a variation of the well-known column generation technique [7]. At each iteration, each newly misranked pair of points creates a new constraint that is added to the LP. Eventually, all non-slack constraints from the complete problem will be contained in our constraint set, resulting in an optimal solution. In practice, we stop the iteration early if only a few new constraints were added. In the experiments that follow k parameter in the algorithm is set to 10. In addition to these violated constraints we also add constraints to prevent weights flipping sign ( to + or + to ). Flipping of the sign of the weight vector occurs when we create constraints that only correspond to the violations. The reason is that, with a constraint set consisting of only those pairs of points that were misranked in the previous iteration an optimal solution can be achieved by simply reversing the signs of the coefficients. However this does not guarantee minimization for the whole problem since it ranks most of the original points incorrectly. Adding violated constraints together with some balancing non-violated constraints to avoid the reversal phenomenon dramatically reduces the total number of constraints compared to an LP that lists all possible p by n constraints. It takes only a few iterations for the objective function to converge and overall time to run the algorithm is usually much less then an LP which solves for all possible constraints. 3. EXPERIMENTS AND RESULTS In this study we used Matlab as programming environment with GLPK LP Solver [14]. We also used WEKA [21] for comparison tests. We used 17 datasets from UCI repository [1]. Some of the datasets used were multi-class problems. Those datasets are converted to binary classification problems by splitting the datasets by one class vs. others splitting. An overview of the datasets and the splitting criterion for multi-class datasets are given in Table 1. For our experiments we used 10-fold cross validation. For training we limited the number of data points to 500 to keep the number of constraints manageable for the GLPK solver. For comparisons we ran WEKA using exactly the same training and testing sets in each fold. This provided us reliability with the significance tests. We present our results in Table 2. The table provides averages and standard deviations for five 10-fold cross validations. Table 3

4 Table 1: Overview of the datasets and modification details Datasets # of points # of attributes % rare class Comments abalone female = 1, rest = 0 blocks text = 1, rest = 0 boston (MEDV<35) = 1, rest = 0 cancer(wbc) cancer(wpbc) diabetes ecoli pp = 1, rest = 0 glass = 1, rest = 0 heart ionosphere liver(bupa) pendigits = 1, rest = 0 sonar spambase spectf survival vehicle opel = 1, rest = 0 Table 3: Summary of best-worst comparison results for AUC analysis RankerSVM SVM Logit Perc Best Worst Table 6: Summary of best-worst comparison results for accuracy analysis RankerSVM SVM Logit Perc Best Worst Table 4: Summary of pairwise significance (t-test) results for AUC W-L(significant) Ranking vs. SVM 7-4 Ranking vs. Logit 4-2 Ranking vs. Perceptron 8-1 Table 7: Summary of pairwise significance (t-test) results for accuracy Algorithm Pair W-L(significant) Ranking vs. SVM 5-2 Ranking vs. Logit 2-2 Ranking vs. Perceptron 4-0 summarizes the number of data sets on which the various algorithms performed best and worst, and the significance test (at 95%) results are provided in Table 4. The results in Table 3 show that ranker SVM returns the best AUC among all the tested algorithms for 10 data sets out of 17 while it is the worst in only one case. Significance test analysis in Table 4 show that in general ranker SVM performs better ranking than the classifiers. The linear SVM tended to perform either very well or very poorly, while the perceptron classifier did a generally poor job of ranking the cases. To test the impact of ranking optimization on classification accuracy we also collected accuracy results from the above experiment. It is fairly simple to get classifications from the ranker. To do so we simply find the best accuracyyielding threshold value on the training data. Then we use this threshold in the separating surface in the test set to obtain accuracy. The accuracy of the other linear classifiers were obtained from WEKA using the same data sets at each fold. The results are provided in Table 5. Table 6 summarizes overall performance and 95% pairwise significance tests for the accuracy comparisons are presented in Table 7. From Table 6 we see that ranker SVM performs comparably in accuracy with other classifiers. Even though it was not optimized for classification, by choosing a good threshold, it still achieves comparable levels of accuracy. Although the ranker SVM didn t score the best accuracies in our tests, it usually was not significantly worse as can be seen in Table 7. We believe the basic reason for comparable performance is the margin maximization in the objective function which seems to be helping with the accuracy performance. While the linear SVM and perceptron were the best overall classifiers on accuracy, the ranker SVM beat both in head-to-head significance results. The only drawback of the algorithm is the number of constraints created. In our tests we didn t go beyond training with more than 500 data points because of the computational expense. Table 8 shows the number of iterations needed to converge and the range of the number of constraints created for each dataset. In this table we also show the maximum possible number of constraints for comparison. For instance, in general it is much faster to solve five LPs with 5000 constraints compared to a single LP with constraints.

5 Table 2: AUC Results from 5 10-fold cross-validation. (Best results in each dataset shown in bold) Datasets RankerSVM SVM Logit Perc abalone (0.0025) (0.0148) (0.0021) (0.0049) blocks (0.0107) (0.0180) (0.0158) (0.0094) boston (0.0037) (0.0177) (0.0135) (0.0294) cancer(wbc) (0.0011) (0.0014) (0.0019) (0.0024) cancer(wpbc) (0.0203) (0.0071) (0.0155) (0.0266) diabetes (0.0058) (0.0034) (0.0026) (0.0039) ecoli (0.0099) (0.0074) (0.0068) (0.0083) glass (0.0227) (0.0236) (0.0298) (0.0145) heart (0.0080) (0.0055) (0.0076) (0.0090) ionosphere (0.0196) (0.0265) (0.0257) (0.0245) liver(bupa) (0.0114) (0.0136) (0.0108) (0.0094) pendigits (0.0005) (0.0053) (0.0058) (0.0067) sonar (0.0161) (0.0094) (0.0193) (0.0195) spambase (0.0027) (0.0028) (0.0084) (0.0009) spectf (0.0037) (0.0254) (0.0249) (0.0263) survival (0.0081) (0.0121) (0.0111) (0.0089) vehicle (0.0042) (0.0051) (0.0051) (0.0079) Table 5: Accuracy Results from 5 10-fold cross-validation (Best results in each dataset shown in bold) Datasets RankerSVM SVM Logit Perc abalone (0.0044) (0.0003) (0.0018) (0.0099) blocks (0.0011) (0.0021) (0.0018) (0.0021) boston (0.0073) (0.0047) (0.0087) (0.0091) cancer(wbc) (0.0026) (0.0023) (0.0046) (0.0053) cancer(wpbc) (0.0212) (0.0072) (0.0173) (0.0149) diabetes (0.0063) (0.0045) (0.0087) (0.014) ecoli (0.0078) (0.0017) (0.0070) (0.0055) glass (0.0102) (0.0027) (0.0061) (0.0027) heart (0.0077) (0.0072) (0.0072) (0.0214) ionosphere (0.0068) (0.0137) (0.0079) (0.0041) liver(bupa) (0.0152) (0.0057) (0.0071) (0.0167) pendigits (0.0019) (0.0028) (0.0012) (0.0038) sonar (0.0164) (0.0097) (0.0252) (0.0135) spambase (0.0042) (0.0060) (0.0044) (0.0149) spectf (0.0098) (0.0062) (0.0049) (0.0187) survival (0.0051) (0.0030) (0.0018) (0.0043) vehicle (0.0058) (0.0021) (0.0111) (0.0095) Table 8: Constraint Size Comparison Datasets # of iterations range of constraints max. possible # of constraints abalone blocks boston cancer(wbc) cancer(wpbc) diabetes ecoli glass heart ionosphere liver(bupa) pendigits sonar spambase spectf survival vehicle

6 4. CONCLUSIONS AND FUTURE WORK It is a common practice in supervised learning tasks to use the output from a classifier as a score to rank the instances. However, it is reasonable to expect a learner that directly optimizes ranking performance would produce better results. In this paper we provided a performance comparison between a linear ranking-svm and a number of linear classifiers. We showed that ranker-svm, which is optimized for maximizing AUC using the Wilcoxon-Mann-Whitney statistic, has better ranking performance than commanly used linear classifiers. As a side effect it is also comparable in accuracy performance against classifiers optimized for accuracy. This study was limited to linear algorithms. As a future step it would be interesting to see how a non linear ranker SVM performs against non linear classifiers. Also it would be interesting to see how ranker SVM compares to other algorithms that are also optimized for AUC maximization. This study considered datasets with binary output but the method can easily be extended to the problem of learning a total ranking from partial rankings. This is still a popular problem in the research area of collaborative filtering and meta search. Computational complexity of mid-size to large-size problems is another issue. There are a number of papers that have addressed this issue and improved the efficiency. We think reducing the number of constraints in the AUC optimization problem without compromising AUC performance is still an open and an interesting problem. 5. REFERENCES [1] C.L. Blake and C.J. Merz. UCI repository of machine learning databases. mlearn/mlrepository.html, [2] A.P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30: , [3] R. Caruana, S. Baluja, and T. Mitchell. Using the future to sort out the present: Rankprop and multitask learning for medical risk evaluation. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages The MIT Press, [4] W.W. Cohen, R.E. Schapire, and Y. Singer. Learning to order things. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10, pages The MIT Press, [5] P. Domingos. Metacost: A general method for making classifiers cost-sensitive. In Knowledge Discovery and Data Mining, pages , [6] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4: , [7] P.C. Gilmore and R.E. Gomory. A linear programming approach to the cutting-stock problem. Operations Research, 8: , [8] D.J. Hand. Construction and Assessment of Classification Rules. John Wiley & Sons, Bans Lane, England, [9] J.A. Hanley and B.J. McNeil. The meaning and the use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29 36, [10] A. Herschtal and B. Raskutti. Optimising area under the ROC curve using gradient descent. In ICML 04: Twenty-First International Conference on Machine Learning. ACM Press, [11] T. Joachims. Optimizing search engines using clickthrough data. In KDD 02: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, [12] C. Ling, J. Huang, and H. Zhang. AUC: A statistically consistent and more discriminating measure than accuracy, [13] H.B. Mann and D.R. Whitney. On a test whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statististics, 18:50 60, [14] GLPK Project. [15] K. Obermayer R. Herbrich, T. Graepel. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages The MIT Press, [16] A. Rakotomamonjy. Optimizing area under ROC curve with SVMs. In ROC Analysis in Artificial Intelligence, pages 71 80, [17] K.A. Spackman. Signal detection theory: Valuable tools for evaluating inductive learning. In Proceedings of the Sixth International Workshop on Machine Learning, pages Morgan Kaufman, [18] V. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, [19] C. Vogt and G. Cottrell. Using d to optimize rankings. Technical Report CS98-601, U.C. San Diego, CSE Department, [20] F. Wilcoxon. Individual comparisons by ranking methods. Biometrics, 1:80 83, [21] I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, [22] L. Yan, R.H. Dodier, M. Mozer, and R.H. Wolniewicz. Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In International Conference on Machine Learning, pages , 2003.

Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Large Scale Learning to Rank

Large Scale Learning to Rank Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification One-sided Support Vector Regression for Multiclass Cost-sensitive Classification Han-Hsing Tu r96139@csie.ntu.edu.tw Hsuan-Tien Lin htlin@csie.ntu.edu.tw Department of Computer Science and Information

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy

More information

Benchmarking Open-Source Tree Learners in R/RWeka

Benchmarking Open-Source Tree Learners in R/RWeka Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA tj@cs.cornell.edu Abstract This paper presents a Support Vector Method for optimizing multivariate

More information

Active Learning with Boosting for Spam Detection

Active Learning with Boosting for Spam Detection Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

The Relationship Between Precision-Recall and ROC Curves

The Relationship Between Precision-Recall and ROC Curves Jesse Davis jdavis@cs.wisc.edu Mark Goadrich richm@cs.wisc.edu Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 2 West Dayton Street,

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Maximum Profit Mining and Its Application in Software Development

Maximum Profit Mining and Its Application in Software Development Maximum Profit Mining and Its Application in Software Development Charles X. Ling 1, Victor S. Sheng 1, Tilmann Bruckhaus 2, Nazim H. Madhavji 1 1 Department of Computer Science, The University of Western

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

More information

Data Mining for Direct Marketing: Problems and

Data Mining for Direct Marketing: Problems and Data Mining for Direct Marketing: Problems and Solutions Charles X. Ling and Chenghui Li Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 Tel: 519-661-3341;

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ

Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ David Cieslak and Nitesh Chawla University of Notre Dame, Notre Dame IN 46556, USA {dcieslak,nchawla}@cse.nd.edu

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Active Learning in the Drug Discovery Process

Active Learning in the Drug Discovery Process Active Learning in the Drug Discovery Process Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, Christian Lemmen Computer Science Dep., Univ. of Calif. at Santa Cruz FHG FIRST, Kekuléstr.

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Direct Marketing When There Are Voluntary Buyers

Direct Marketing When There Are Voluntary Buyers Direct Marketing When There Are Voluntary Buyers Yi-Ting Lai and Ke Wang Simon Fraser University {llai2, wangk}@cs.sfu.ca Daymond Ling, Hua Shi, and Jason Zhang Canadian Imperial Bank of Commerce {Daymond.Ling,

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Predicting Good Probabilities With Supervised Learning

Predicting Good Probabilities With Supervised Learning Alexandru Niculescu-Mizil Rich Caruana Department Of Computer Science, Cornell University, Ithaca NY 4853 Abstract We examine the relationship between the predictions made by different learning algorithms

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

REVIEW OF ENSEMBLE CLASSIFICATION

REVIEW OF ENSEMBLE CLASSIFICATION Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

Comparison of machine learning methods for intelligent tutoring systems

Comparison of machine learning methods for intelligent tutoring systems Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Inductive Learning in Less Than One Sequential Data Scan

Inductive Learning in Less Than One Sequential Data Scan Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Research Hawthorne, NY 10532 {weifan,haixun,psyu}@us.ibm.com Shaw-Hwa Lo Statistics Department,

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

On Cross-Validation and Stacking: Building seemingly predictive models on random data

On Cross-Validation and Stacking: Building seemingly predictive models on random data On Cross-Validation and Stacking: Building seemingly predictive models on random data ABSTRACT Claudia Perlich Media6 New York, NY 10012 claudia@media6degrees.com A number of times when using cross-validation

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

Mining Life Insurance Data for Customer Attrition Analysis

Mining Life Insurance Data for Customer Attrition Analysis Mining Life Insurance Data for Customer Attrition Analysis T. L. Oshini Goonetilleke Informatics Institute of Technology/Department of Computing, Colombo, Sri Lanka Email: oshini.g@iit.ac.lk H. A. Caldera

More information

DUOL: A Double Updating Approach for Online Learning

DUOL: A Double Updating Approach for Online Learning : A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University

More information

Author Gender Identification of English Novels

Author Gender Identification of English Novels Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in

More information

Sharing Classifiers among Ensembles from Related Problem Domains

Sharing Classifiers among Ensembles from Related Problem Domains Sharing Classifiers among Ensembles from Related Problem Domains Yi Zhang yi-zhang-2@uiowa.edu W. Nick Street nick-street@uiowa.edu Samuel Burer samuel-burer@uiowa.edu Department of Management Sciences,

More information

Studying Auto Insurance Data

Studying Auto Insurance Data Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information