Utility-Based Fraud Detection
|
|
|
- Prosper McLaughlin
- 10 years ago
- Views:
Transcription
1 Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Utility-Based Fraud Detection Luis Torgo and Elsa Lopes Fac. of Sciences / LIAAD-INESC Porto LA University of Porto, Portugal [email protected] and [email protected] Abstract Fraud detection is a key activity with serious socioeconomical impact. Inspection activities associated with this task are usually constrained by limited available resources. Data analysis methods can provide help in the task of deciding where to allocate these limited resources in order to optimise the outcome of the inspection activities. This paper presents a multi-strategy learning method to address the question of which cases to inspect first. The proposed methodology is based on the utility theory and provides a ranking ordered by decreasing expected outcome of inspecting the candidate cases. This outcome is a function not only of the probability of the case being fraudulent but also of the inspection costs and expected payoff if the case is confirmed as a fraud. The proposed methodology is general and can be useful on fraud detection activities with limited inspection resources. We experimentally evaluate our proposal on both an artificial domain and on a real world task. 1 Introduction Fraud detection is a very important application domain that has been addressed in several research areas (e.g. [Hand, 2002; Phua et al., 2005; Fawcett and Provost, 1997]). Due to the intrinsic characteristics of fraudulent events it is highly related to other research topics like outlier detection, anomaly detection or change detection. The connecting feature among these topics is the interest on deviations from normal behaviour that is more frequently observed. Depending on the characteristics of the available fraud data different data mining methodologies can be applied. Namely, when the available data includes information regarding each observation being (or not) fraudulent, supervised classification techniques are typically used (e.g. [Ghosh and Reilly, 1994]). On the contrary, in several domains such classifications do not exist and thus unsupervised techniques are required (e.g. [Bolton and Hand, 2001]). Finally, we may have a mix of both types of data with a few labelled observations (e.g. resulting from past inspection activities) and a large set of unlabelled cases. These situations are often handled with semi-supervised approaches (e.g. [Nigam et al., 2000]). Fraud detection usually involves two main steps: i) decide which cases to inspect, and ii) the inspection activity in itself. The latter of these steps is frequently constrained by the available resources, e.g. time, man power or financial. An information system that is designed to support these activities should take this important aspect into account. These systems are typically used in the first step - help in deciding which observations should be inspected due to high suspicion of fraudulent behaviour. It is in this decision process that data mining can help by providing some guidance regarding the priorities for the posterior inspection activities. Given a set of observations data mining systems can signal which ones to inspect using data analysis techniques that detect deviations from normality. Due to the low frequency of fraudulent cases, this can be cast as an outlier detection task. Most outlier detection methods produce a yes/no answer for each observation. This type of methods is not adequate for most fraud detection tasks due to the above-mentioned limited inspection resources. In effect, such methods can lead to solutions where there are too many inspection signals for the available resources. If that occurs the user is left alone in choosing which ones to address. In this context, we claim that the methods should produce an outlier ranking instead. With such result users can easily adjust the solution to their available inspection resources, with the guarantee that the most promising cases are addressed first. The way the ranking of the candidates is obtained may have a strong impact on the optimisation of the inspection results. Most outlier ranking methods produce ranks according to the probability of being an outlier of the observations. Cases with higher probability of being outliers appear on top positions of the ranks. In this paper we claim that this approach may lead to sub-optimal results in terms of optimising the inspection resources. In typical real world fraud detection applications the costs of inspecting a certain entity may vary from case to case, and moreover, the payoffs of detecting a fraudulent case may also vary. In other words the utility of the fraudulent cases is different. Outlier ranking methods that solely look at the probability of a certain case being or not an outlier (i.e. being suspiciously different from normality) do not take these costs and benefits into consideration. This may lead to situations where case A is ranked above case B, because it diverges more from normality, and yet case B if detected would be more rewarding in the sense that it has a 1517
2 more favourable balance between inspection costs and detection benefits. The main goal of this paper is to propose a fraud detection methodology that takes the utility of detection into account when producing inspection ranks. The paper main contribution is the proposal of a general approach to fraud detection in a context of limited inspection resources. This approach can be seen as a form of multistrategy learning as it integrates several learning steps to meet the application requirements. 2 Utility-based Rankings The basic assumption behind the ideas described in this paper is that fraudulent cases have different importance, different inspection costs and different payoffs. Another obvious assumption of our work is that inspection resources are limited. In this context, we claim that the support for the inspection decisions should be in the form of an utility ranking, i.e. a ranking where top positions are occupied by cases that if confirmed fraudulent will bring a higher reward. Utility theory [von Neumann and Morgenstern, 1944] describes means for characterising the risk preferences and the actions taken by a rational decision maker given a certain probability model. This theory assumes we have knowledge about the wealth associated with each possible decision. Given this estimated wealth and a probability associated with each alternative action to be taken, the decision maker uses an utility function to map the wealth into an utility score. The action/decision with higher utility is then selected by the decision maker. Different utility functions exist in the literature that map a wealth value into an utility score (e.g. [Friedman and Sandow, 2011]). Our proposal casts the decision of which cases to inspect in the context of fraud detection into this utility theory framework. In general terms our method is the following. Given a set of potential candidates for inspection we start by estimating their probability of being a fraud. This is a classical outlier detection task. Additionally, for each of these candidates we calculate their potential wealth as a function of estimates of their inspection cost and payoff if confirmed fraudulent. For each case there are two possible outcomes of inspection: either the case is confirmed as fraudulent or not. We calculate the wealth of each of these possible outcomes. Using an utility function we can then calculate the utility of both possible outcomes for each case. Together with the estimated probability associated to each outcome we finally reach an estimated utility score of the case. Our proposal is to rank the cases according to the decreasing value of these estimated utilities and use this ranking to guide the inspection phase. We thus propose to calculate the expected utility value of a case as follows: E[U i ]= ˆP i u( ˆB i Ĉi)+(1 ˆP i ) u( Ĉi) (1) where ˆP i is the estimated probability of i being a fraud, ˆBi is the estimated benefit (payoff) of case i if confirmed fraudulent, Ĉ i is the estimated inspection cost of case i, and u(.) is an utility function. In order to adapt this general formulation to our target applications we need to consider how to obtain the several estimates that are mentioned in Equation 1. The estimate of the probability that a case is a fraud can be obtained by any existing outlier detection method that is able to attach a probability of being outlier to its results. In Section 3 we will describe concrete systems that can be used to obtain these scores for any set of cases. With respect to the estimated inspection costs and payoffs (Ĉi and ˆB i ) the problem is slightly different. These values are clearly application dependent. In this context, we need some user-supervision to obtain these estimates. This can take two forms: either by providing domain knowledge that allows us to obtain these values, or by providing examples that we can use to learn a predictive model that is able to forecast them. The second form is probably more realistic in the sense that in most real-world fraud detection tasks we will have access to a historical record of past inspection activities. These past data can be used as a supervised data set, where for each inspected case we have information on both the inspection costs and also on the outcome of the inspection (either the payoff or the conclusion that it was not a fraud). Using this training set we can obtain two models: one that predicts the estimated inspection cost, and the other that forecasts the expected payoff. Both are supervised regression tasks. In summary, our proposal includes three learning tasks that together provide the necessary information to obtain the utility-based outlier rankings using Equation 1. Algorithm 1 describes our proposed methodology in general terms. Its inputs are (1) the historical data set (HistD) with information on past inspection activities; and (2) the set of candidate cases for inspection (InspCand). The practical implementation of the ideas in Algorithm 1 requires some extra decisions. Namely, we need to decide which learning tools are to be included in Steps 1 and 2 of the algorithm, and also on the utility function to use. Note that our proposal is independent of this choice, only requiring that the outlier detection tool is able to attach a score in the interval [0, 1] to each member of the set of candidate cases for inspection. 3 An Implementation of the Proposal 3.1 Predicting Costs and Benefits The two tasks of predicting the inspection costs and the potential benefits of an inspection are standard supervised regression problems. In both cases we are trying to forecast a continuous variable as a function of a set of predictors that describe the case under consideration. This means that any standard regression tool would be capable of handling this task provided we have a training sample with both the values of the target variable and of the predictors. Still, the problems have some characteristics that may bias our choice of models to use. Namely, the data concerning the benefits will have a proportion of the learning sample with the value of zero on the target. These are the past cases that after inspection were tagged as non-fraudulent. This problem should not occur on the task of predicting the inspection costs as every inspection has some cost. Still, in both problems we can expect that we will have a rather diverse range of values, and this is actually one of the motivations for the work presented in this paper. 1518
3 Algorithm 1 High-level description of the methodology. 1: procedure UOR(HistD, InspCand) where HistD = { x 1,,x p,c,b }, InspCand = { x 1,,x p }, and x 1,,x p are variables describing each case Step1-TrainCost and Benefit Prediction Models 2: DS C { x 1,,x p,c HistD} 3: DS B { x 1,,x p,b HistD} 4: C model RegressionT ool(ds C ) 5: B model RegressionT ool(ds B ) Step 2 - Obtain Outlier Probabilities for InspCand 6: P OutlierProbEstimator(HistD, InspCand) Step 3 - Estimate Utilities 7: for all i InspCand do 8: C i P redict(c model,i) 9: B i P redict(b model,i) 10: EU i = P i u(b i C i )+(1 P i ) u( C i ) 11: end for Step 4 - Obtain utility ranking (solution) 12: return InspCand ranked by decreasing EU i 13: end procedure In our experiments with the proposed method we will use both regression trees and neural networks for illustrative purposes. These are two methodologies with a rather different approach and thus we expect this to be a good test of the robustness of the methodology to this choice. 3.2 Outlier Ranking Our methodology also requires probabilities of being outlier to be obtained. There are many existing outlier detection tools that can be used to estimate these probabilities. Once again we have selected two particular representatives to illustrate our methodology in the experimental section. The first approach we have selected is the method OR h [Torgo, 2010]. This method obtains outlier rankings using a hierarchical agglomerative clustering algorithm. The overall motivation/idea of the method is that outliers should offer more resistance to being merged with large groups of normal cases and this should be reflected in the information of the merging process of this clustering algorithm. The OR h method calculates the outlier score (a number in the interval [0, 1]) of each case as follows. For each merging step i involving two groups of cases (g x,i and g y,i ) the following value is calculated, ( of i (x) =max 0, g ) y,i g x,i (2) g y,i + g x,i where g x,i is the group to which x belongs, and g x,i is that group cardinality. Each observation may be involved in several merges throughout the iterative process of the hierarchical clustering algorithm. Sometimes as members of the larger group, others as members of the smaller group. The final outlier score of each case is given by, OR h (x) =maxof i (x) (3) i In our experiments we use an implementation of this method available in the R package DMwR [Torgo, 2010]. The second method we have used is the LOF outlier ranking algorithm [Breunig et al., 2000]. LOF obtains an outlier score for each case by estimating its degree of isolation with respect to its local neighbourhood. This method is based on the notion of local density of the observations. Cases in regions of very low density are considered outliers. The estimates of the densities are obtained with the distance between cases. The authors define a few concepts that drive the algorithm used to calculate the outlier score of each point. These are: the (1) concept of core distance of a point p that is defined as its distance to its k th nearest neighbour; the (2) concept of reachability distance between the case p 1 and p 2 that is given by the maximum of the core distance of p 1 and the distance between both cases; and the (3) local reachability distance of a point that is inversely proportional to the average reachability distance of its k neighbours. The local outlier factor (LOF ) of a case is calculated as a function of its local reachability distance. In our experiments with LOF we have used a R implementation of this method available in package DMwR [Torgo, 2010]. Please note that the scores produced by LOF are not in the interval [0, 1] so we have used a soft max scaling function to cast these values into this interval. We should remark that neither the scores obtained by OR h and LOF can be considered as proper probabilities. Still, they lead to values in a [0, 1] interval and we can look at them as badly calibrated probabilities of being outlier. We should expect better results with more reliable estimates of these probabilities. 4 Experimental Evaluation 4.1 Artificially Generated Data In this section we present a set of experiments 1 with artificially created data using an implementation of Algorithm 1. The goal of these experiments is to test the validity of some of our assumptions and also to observe the performance of the proposal on a controlled experiment. We have generated a data set with the following properties. For easier visualisation we have described each case by two variables, x 1 and x 2. Two well-separated clusters of data points were generated, each containing some local outliers, though on one of them these outliers are more marked. The idea here is that any outlier ranking method would easily be able to rank these outliers on top positions, but the ones that are more marked should appear on higher ranking positions. For all data points we have artificially generated costs and benefits of their inspection. We have done so in a way that the cluster containing the less evident outliers brings a higher utility by having significantly higher payoffs for fraudulent 1 All code and data available at pt/ ltorgo/ijcai
4 cases, though with a higher inspection cost. Figure 1 shows the resulting data formed by 2200 randomly generated data points following the general guidelines just described. x2 The generated data set Cluster 2 (less marked outliers) High inspection costs High rewards x1 Cluster 1 (marked outliers) Low inspection costs Low rewards Figure 1: The Artificial Data Set. The payoffs were set to zero for cases that we have considered non-fraudulent. We have chosen these cases to be the ones near the centre of the two clusters. More specifically, cases for which (0 <x 1 < <x 2 < 100) (900 < x 1 < <x 2 < 1100) is true. Still, all cases, even non-fraudulent, have an inspection cost. We have randomly selected 20% of these data to be the set for which we want to obtain an inspection priority ranking (i.e. a kind of test set). For this test set the information on the cost, payoff and utility was hidden from our algorithm. They are to be estimated by the regression models included in our methodology (lines 8-9 in Algorithm 1). Looking at the characteristics of the above artificial problem we would want our algorithm to include the outliers near Cluster 2 as the top priority cases for inspection as these will bring the higher rewards, i.e. are more useful in terms of optimising the inspection resources. Moreover, our hypothesis is that most standard outlier ranking methods will fail to achieve this goal and will put the outliers nearer Cluster 1 on top positions as these are more obviously deviating from the main (local) bulk of data points. Although this is an artificially created data set we have tried to include properties of real world applications. In effect, if we think for instance of the scenario of detecting tax frauds, we can see that within this high impact application there are also marked clusters of tax payers with rather different characteristics (e.g. due to different economic activities). It is also easy to imagine that for these different groups of tax payers the inspection costs and eventual payoffs, if confirmed fraudulent, can be rather diverse. For instance, on low income professionals we could imagine a smaller inspection cost due to the their simpler economic activity, and also a lower potential reward caused by the small monetary values involved in their activity (these are properties similar to points in Cluster 1). On the other hand, tax payers with a very high income (e.g. large companies) would probably involve a more complex (and thus costly) inspection phase, but if found fraudulent would also potentially bring a much higher payoff (Cluster 2 situation). In summary, we think this artificial scenario is a simplified version of some highly relevant real world applications of fraud detection constrained by limited inspection resources. We have applied an implementation of our algorithm to this problem. We have tested it using different setups regarding the learning components. Namely, we have used both a regression tree and a neural network as supervised regression learners, and both the OR h and LOF algorithms as outlier probability estimators. Figure 2 shows the results of two of the tested combinations: (1) regression trees with OR h and linear utility (u(w) =w); and (2) neural nets with LOF and also linear utility. We have also tested the methodology with other combinations of these components, like for instance a power utility function (u(w) = w1 k 1 1 k, with k =0.2). The results with the other tested combinations were qualitatively similar to these so we omit them for space reasons. Figure 2 includes information on the top 20 cases according to two different rankings. The first is obtained by looking only at the probabilities of being outlier produced by the outlier ranking method (OR h in one case and LOF on the other). The second is the ranking obtained with our utilitybased methodology. As we can observe in both setups the results provide clear evidence that our proposal is able to produce a ranking more adequate to the application goals, i.e. better optimise the inspection resources. In effect, while most of the top positions of the standard outlier rankers belong to Cluster 1, our method ranks at top positions mostly outliers of Cluster 2. These are the cases that provide a larger payoff. We have confirmed this by calculating the total balance of both rankings if we inspect their respective top 20 suggestions. The scores are shown in Table 1. Regr.Tree+OR h +linearu NNet+LOF +linearu UOR(OR h ) ˆP (ORh ) UOR(LOF ) ˆP (LOF ) Costs Benefits Utility % Util. Gain % % Table 1: Net Inspection Results. The results shown in Table 1 reveal an overwhelming different in net results of the inspection activity if we follow either the advice of a standard outlier ranking method ( ˆP )or our proposed utility-based outlier ranker (UOR). 4.2 Foreign Trade Transactions In this section we describe an application of our methodology to a real world domain. This domain consists on trying to detect errors and/or frauds in foreign trade transaction reports. The data was provided by a national institute of statistics and consists of information regarding these reports that describe transactions of companies with foreign countries. Due to the impact that errors and/or frauds on this database can have on official statistics it is of high relevance to detect them. At the end of each month the institute faces the task of trying to detect these problems on the reported transactions that cover 1520
5 Reg.Trees+ORh+linearU NNet+LOF+linearU x2 top20 outlier ranking top20 utility ranking x2 top20 outlier ranking top20 utility ranking x1 x1 Figure 2: Results with the Artificial Data Set. a wide range of rather diverse products. There are varying but limited human resources allocated for this task. In this context, it is of high importance to allocate these resources to the right reports. Given the diverse range of products the impact of error/frauds on official statistics is not uniform due to the amounts involved. This impact is a function of both the unit price of the product in question but also of the reported quantity involved in the transaction. Among the information in the transaction reports, domain experts recommend the use of the cost/weight ratio as the key variable for finding suspicious transactions. According to these experts transactions of the same product should have similar cost/weight ratio. The available data set contains around transaction reports concerning 8 months and covering a wide range of products. The information of past inspection activities on these transactions is rather limited. In effect, the sole information we have is a label on transactions that were found to be errors. We do not have any information on transactions that were inspected but found correct neither on the corrected values for transactions labelled as errors. This limited information creates serious difficulties for applying our methodology because we assume a relatively rich set of information on prior inspection activities. According to domain experts the inspection cost of each report is roughly constant and was set to 150 Euros (approximately 2 hours of work). This means that we do not need a prediction model to estimate this cost. Regarding the benefits we do not have any information on past corrections so we can not build a training set to obtain a model for predicting benefits. However, we can try to estimate the benefit of inspecting a certain transaction report by confronting the values in the form with the expected values for transactions of the same product. The difference between the two can provide us with clues on the eventual corrections that are necessary to the report and thus on the benefit of the correction in monetary terms. For instance, if we have a transaction of 1000 Kg of product A with a reported cost/weight ratio of 10, and we observe that the typical cost/weight of this product is 4 then if we decide to inspect this transaction and confirm it as an error the benefit can be approximated by (10 4) In summary, in the application of our method to this problem we have used a constant value of Ĉi = 150, and have estimated the benefit of a detection, ˆBi,as c/w i c/w w i, where c/w is the median cost/weight of the past transactions of the same product, and w i, c/w i are the weight and cost/weight reported in transaction i, respectively. Our experimental analysis follows the procedure used in the institute of analysing each month data independently. We also analyse each product transactions independently, given the rather diverse prices involved. Given the transactions of a product in a month we calculate the expected utility of inspecting each of the transactions using our method, according to the procedure explained above concerning the estimates of costs and benefits. Using the estimated utilities for all transactions of all products in the current month we obtain an utilitybased ranking of that month transactions. This ranking of the transactions of a month obtained using our methodology was compared to the ranking of the same transactions obtained using solely the probabilities of being outlier, i.e. not taking into account neither benefits nor costs. The comparison was carried out for different inspection effort levels. Namely, for inspecting the top 10%, 15%,, 30% transactions of the month, according to each ranking method. So for inspection effort x% we compare the top x% cases according to the two rankings. This is done by calculating the net utility in the respective x% set of cases for each of the rankings. The net utility is given by the sum of the benefits of the cases in the set that have the error label set (i.e. the ones experts told us that are errors), subtracted by the total inspection cost (i.e. 150 set ). Comparing the net utility values of each of the two rankings we obtain % gain in utility of our proposal over the simple probability of outlier rankings. This comparison was carried out twice for each of the two meth- 1521
6 % Utility Gains over Probability Rankings using LOF % Utility Gains over Probability Rankings using ORh log(% Gain) % 15% 20% 25% 30% log(% Gain) % 15% 20% 25% 30% Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Figure 3: Results with the Foreign Trade Transactions Data Set. ods that we considered to estimate these probabilities: LOF and OR h. Figure 3 presents the results of these experiments. The lines on each graph represent the % utility gain of our method over the respective outlier probability ranking (results are shown on log scale for better visualisation), for different inspection effort levels. Once again we have observed a clear superiority of our rankings in terms of the net results of the inspection activities. In effect, in all setups the % utility gain is positive. We have also observed that our advantage tends to decrease for larger inspection efforts, which is expectable. Moreover, the advantage also holds across the two alternative outlier probability estimators. 5 Conclusions We have presented a new methodology for supporting fraud detection activities. The main distinguishing feature of this method is its focus on trying to optimise the available inspection resources. This is achieved by producing an inspection ranking that is ordered by decreasing expected utility of the posterior inspection. We use the theoretical framework provided by Utility Theory to integrate several learning steps that obtain probabilities of being outlier and also forecast the expected inspection costs and resulting payoffs. We have presented a series of experimental tests of our proposal with both artificially created data and a real world fraud detection application. These experiments provide clear evidence of the advantages of our proposal in terms of the optimisation of the available inspection resources. Further work should focus on investigating the contributions of the different steps of our methodology as well as the exploration of different variants in its components. Acknowledgments This work is financially supported by FEDER funds through COMPETE program and by FCT national funds in the context of the project PTDC/EIA/68322/2006. References [Bolton and Hand, 2001] R.J. Bolton and D. J. Hand. Unsupervised profiling methods for fraud detection. In Credit Scoring and Credit Control VII, [Breunig et al., 2000] M. Breunig, H. Kriegel, R. Ng, and J. Sander. Lof: identifying density-based local outliers. In ACM Int. Conf. on Management of Data, pages , [Fawcett and Provost, 1997] Tom Fawcett and Foster Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3): , [Friedman and Sandow, 2011] C. Friedman and S. Sandow. Utility-based Learning from Data. CRC Press, [Ghosh and Reilly, 1994] S. Ghosh and D. Reilly. Credit card fraud detection with a neural-network. In Proceedings of the 27th Annual Hawaii International Conference on System Science, volume 3, Los Alamitos, CA, [Hand, 2002] Richard J. Bolton ; David J. Hand. Statistical fraud detection: A review. Statistical Science, 17(3): , [Nigam et al., 2000] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classication from labeled and unlabeled documents using em. Machine Learning, 39: , [Phua et al., 2005] C. Phua, V. Lee, K. Smith, and R. Gayler. A comprehensive survey of data mining-based fraud detection research. Artificial Intelligence Review, (submitted), [Torgo, 2010] L. Torgo. Data Mining with R, learning with case studies. CRC Press, [von Neumann and Morgenstern, 1944] J. von Neumann and O. Morgenstern. Theory of games and Economic Behavior. Priceton University Press,
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
A Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Local outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
OUTLIER ANALYSIS. Data Mining 1
OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Detecting Fraud in Health Insurance Data: Learning to Model Incomplete Benford s Law Distributions
Detecting Fraud in Health Insurance Data: Learning to Model Incomplete Benford s Law Distributions Fletcher Lu 1 and J. Efrim Boritz 2 1 School of Computer Science, University of Waterloo & Canadian Institute
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Data Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Online Ensembles for Financial Trading
Online Ensembles for Financial Trading Jorge Barbosa 1 and Luis Torgo 2 1 MADSAD/FEP, University of Porto, R. Dr. Roberto Frias, 4200-464 Porto, Portugal [email protected] 2 LIACC-FEP, University of
Machine Learning: Overview
Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Machine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
A semi-supervised Spam mail detector
A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach
How To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Adaptive Fraud Detection Using Benford s Law
Adaptive Fraud Detection Using Benford s Law Fletcher Lu 1, J. Efrim Boritz 2, and Dominic Covvey 2 1 Canadian Institute of Chartered Accountants, 66 Grace Street, Scarborough, Ontario, M1J 3K9 [email protected]
A Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm
Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm Markus Goldstein and Andreas Dengel German Research Center for Artificial Intelligence (DFKI), Trippstadter Str. 122,
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit
Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed,
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
Outlier Detection Using Clustering Methods: a data cleaning application
Outlier Detection Using Clustering Methods: a data cleaning application Antonio Loureiro 1, Luis Torgo 2, and Carlos Soares 2 1 LIACC 2 LIACC-FEP University of Porto, R. Campo Alegre, 823, 4150 Porto,
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Adaptive Anomaly Detection for Network Security
International Journal of Computer and Internet Security. ISSN 0974-2247 Volume 5, Number 1 (2013), pp. 1-9 International Research Publication House http://www.irphouse.com Adaptive Anomaly Detection for
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus
An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus Tadashi Ogino* Okinawa National College of Technology, Okinawa, Japan. * Corresponding author. Email: [email protected]
Predictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Linköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise
DBSCAN A Density-Based Spatial Clustering of Application with Noise Henrik Bäcklund (henba892), Anders Hedblom (andh893), Niklas Neijman (nikne866) 1 1. Introduction Today data is received automatically
Unsupervised Profiling Methods for Fraud Detection
Unsupervised Profiling Methods for Fraud Detection Richard J. Bolton and David J. Hand Department of Mathematics Imperial College London {r.bolton, d.j.hand}@ic.ac.uk Abstract Credit card fraud falls broadly
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Meta Learning Algorithms for Credit Card Fraud Detection
International Journal of Engineering Research and Development e-issn: 2278-67X, p-issn: 2278-8X, www.ijerd.com Volume 6, Issue 6 (March 213), PP. 16-2 Meta Learning Algorithms for Credit Card Fraud Detection
Data Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected]
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
TDWI Best Practice BI & DW Predictive Analytics & Data Mining
TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
New Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Cluster Analysis: Basic Concepts and Algorithms
8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
TIETS34 Seminar: Data Mining on Biometric identification
TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland [email protected] Course Description Content
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Unsupervised Outlier Detection in Time Series Data
Unsupervised Outlier Detection in Time Series Data Zakia Ferdousi and Akira Maeda Graduate School of Science and Engineering, Ritsumeikan University Department of Media Technology, College of Information
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Anomaly Detection using multidimensional reduction Principal Component Analysis
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. II (Jan. 2014), PP 86-90 Anomaly Detection using multidimensional reduction Principal Component
Segmentation and Classification of Online Chats
Segmentation and Classification of Online Chats Justin Weisz Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract One method for analyzing textual chat
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)
data analysis data mining quality control web-based analytics What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) StatSoft
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I
Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
