Combining Data Mining and Machine Learning for Effective Fraud Detection*
|
|
|
- Evelyn Blake
- 10 years ago
- Views:
Transcription
1 From: AAAI Technical Report WS Compilation copyright 1997, AAAI ( All rights reserved. Combining Data Mining and Machine Learning for Effective Fraud Detection* Tom Fawcett NYNEX Science and Technology 400 Westchester Avenue White Plains, New York fawcett ~nynexst.com Foster Provost NYNEX Science and Technology 400 Westchester Avenue White Plains, New York foster~nynexst.com Abstract This paper describes the automatic design of methods for detecting fraudulent behavior. Much of the design is accomplished using a series of machine learning methods. In particular, we combine data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior. Specifically, we use a rulelearning program to uncover indicators of fraudulent behavior from a large database of cellular calls. These indicators are used to create profilers, which then serve as features to a system that combines evidence from multiple profilers to generate high-confidence alarms. Experiments indicate that this automatic approach performs nearly as well as the best hand-tuned methods for detecting fraud. Introduction In the United States, cellular fraud costs the telecommunications industry hundreds of millions of dollars per year (Walters & Wilkinson 1994). A specific kind of cellular fraud called cloning is particularly expensive and epidemic in major cities throughout the United States. Existing methods for detecting cloning fraud are ad hoc and their evaluation is virtually nonexistent. We have embarked on a program of systematic analysis of cellular call data for the purpose of designing and evaluating methods for detecting fraudulent behavior. This paper presents a framework for automatically generating fraud detectors. The framework has several components, and uses data at two levels of aggregation. Massive numbers of cellular calls are first analyzed to determine general patterns of fraudulent usage. These patterns are then used to profile each individual customer s usage on an account-day basis. The profiles determine when a customer s behavior has become uncharacteristic in a way that suggests fraud. *This paper has also been published in Proceedings of the Second International Conference on Knowledge Discove~l and Data Mining, edited by Simoudis, Han and Fayyad. Menlo Park, CA, pp AAAI Press. An extended, updated version is available (Faweett Provost 199 /). Our framework includes a data mining component for discovering indicators of fraud. A constructive induction component generates profiling detectors that use the discovered indicators. A final evidencecombining component determines how to combine signals from the profiling detectors to generate alarms. The rest of this paper describes the domain, the framework and the implemented system, the data, and resuits. Cellular Cloning Fraud and its Detection Every cellular phone periodically transmits two unique identification numbers: its Mobile Identification Number (MIN) and its Electronic Serial Number (ESN). These two numbers are broadcast unencrypted over the airwaves, and can be received, decoded and stored using special equipment that is relatively inexpensive. Cloning occurs when a customer s MIN and ESN are programmed into a cellular telephone not belonging to the customer. When this telephone is used, the network sees the customer s MIN and ESN and subsequently bills the usage to the customer. With the stolen MIN and ESN, a cloned phone user (whom we shall call a bandit) can make virtually unlimited calls, whose charges are billed to the customer. 1 If the fraudulent usage goes undetected, the customer s next bill will include the corresponding charges. Typically, the customer then calls the cellular service provider (the carrier) and denies the usage. The carrier and customer then determine which calls were made by the "bandit" and which were legitimate calls. The fraudulent charges are credited to the customer s account, and measures are taken to prohibit further fraudulent charges, usually by assigning the customer a (new) Personal Identification Number. Fraud causes considerable inconvenience both to the carrier and to the customer. Fraudulent usage also incurs significant financial losses due to costs of land-line 1According to the Cellular Telecommunications Industry Association, MIN-ESN pairs are sold on the streets of major US cities for between $5 and $50 apiece. 14
2 usage (most cellular calls are to non-cellular destinations), costs of congestion in the cellular system, loss of revenue by the crediting process, and costs paid to other cellular companies when a customer s MIN and ESN are used outside the carrier s home territory. Cellular carriers therefore have a strong interest in detecting cloning fraud as soon as possible. Standard methods of fraud detection include analyzing call data for overlapping calls (collisions), or calls in temporal proximity that could not have been placed by the same user due to geographic dispersion (velocity checks) (Davis & Goyal 1993). More sophisticated methods involve profiling user behavior and looking for significant deviations from normal patterns. This paper addresses the automatic design of such methods. One approach to detecting fraud automatically is to learn a classifier for individual calls. We have not had success using standard machine learning techniques to construct such a classifier. Context is very important: a call that would be unusual for one customer would be typical for another. Furthermore, legitimate subscribers occasionally make isolated calls that look suspicious, so in general decisions of fraud should not be made on the basis of individual calls. To detect fraud reliably it is necessary to determine the normal behavior of each account with respect to certain indicators, and to determine when that behavior has deviated significantly. Three issues arise: 1. Which call features are important? Which features or combinations of features are useful for distinguishing legitimate behavior from fraudulent behavior? 2. How should profiles be created? Given an important feature identified in Step 1, how should we characterize the behavior of a subscriber with respect to the feature? 3. When should alarms be issued? Given a set of profiling criteria identified in Step 2, how should we combine them to determine when fraud has occurred? Our goal is to automate the design of user-profiling systems. Each of these issues corresponds to a component of our framework. Tile Framework and the DC-1 System Our system framework is illustrated in Figure 1. The framework uses data mining to discover indicators of fraudulent behavior, and then builds modules to profile each user s behavio.r with respect to these indicators. Tile profilers capture the typical behavior of an account and, in use, describe how far an account is from this typical behavior. The profilers are combined into a single detector, which learns how to detect fraud effectively based on the profiler outputs. When the detector has enough evidence of fraudulent activity on an account, based on the indications of the profilers, it generates an alarm. ~ [ Data Mining ] Call data IRules... I Profiler Profiler ii:iii:i~:i~ii Construction ~ templates... Profilers ~~_ I [Weight Training Figure 1: A framework for automatically constructing fraud detectors. Figure 1 depicts the automatic generation of a fraud detector from a set of data on fraudulent and legitimate calls. The system takes as input a set of call data, which are chronological records of the calls made by each subscriber, organized by account. The call data describe individual calls using features such as TIME-OF-DAY, DURATION and CELL-SITE. The constructor also takes as input a set of profiler templates, which are the basis for the construction of the individual profilers. Mining the Call Data The first stage of detector construction, data mining, involves combing through the call data searching for indicators of fraud. In the DC-1 system, the indicators are conjunctive rules discovered by a standard rule-learning program. We use the RL program (Clearwater & Provost 1990), which is similar to other Meta- DENDRAL-style rule learners (Buchanan & Mitchell 1978; Segal & Etzioni 1994). RL searches for rules with certainty factors above a user-defined threshold. The certainty factor we used for these runs was a simple frequency-based probability estimate, corrected for small samples (Quinlan 1987). The call data are organized by account, and each call record is labeled as fraudulent or legitimate. When RL is applied to an account s calls it produces a set of rules that serve to distinguish, within that account, the fraudulent calls from the legitimate calls. As an example, the following rule would be a relatively good indicator of fraud: 15
3 (TIME-OF-DAY = NIGHT) AND (LOCATION = BRONX) =--> FRAUD Certainty factor = 0.89 This rule denotes that a call placed at night from The Bronx (a Borough of New York City) is likely be fraudulent. The Certainty factor = 0.89 means that, for this account, a call matching this rule has an 89% probability of being fraudulent. Each account generates a set of such rules. Each rule is recorded along with the account from which it was generated. After all accounts have been processed, a rule selection step is performed, the purpose of which is to derive a general covering set of rules that will serve as fraud indicators. The set of accounts is traversed again. For each account, the list of rules generated by that account is sorted by the frequency of occurrence in the entire account set. The highest frequency unchosen rule is selected. If an account has been covered already by four chosen rules, it is skipped. The resulting set of rules is used in profiler construction. Constructing Profilers The second stage of detector construction, profiler construction, generates a set of profilers from the discovered fraud rules. The profiler constructor has a set of templates which are instantiated by rule conditions. The profiler constructor is given a set of rules and a set of templates, and generates a profiler from each rule-template pair. Every profiler has a Training step, in which it is trained on typical (non-fraud) account activity; and a Use step, in which it describes how far from the typical behavior a current account-day is. For example, a simple profiler template would be: Given: Rule conditions from a fraud rule. Training: On a daily basis, count the number of calls that satisfy rule conditions. Keep track of the maximum as daily-threshold. Use: Given an account-day, output 1 if the number of calls in a day exceeds daily-threshold, else output 0. Assume the Bronx-at-night rule mentioned earlier was used with this template. The resulting instantinted profiler would determine, for a given account, the maximum number of calls made from The Bronx at night in any 24-hour period. In use, this profiler would emit a 1 whenever an account-day exceeded this threshold. Different kinds of profilers are possible. A thresholding profiler yields a binary feature corresponding to whether the user s behavior was above threshold for the given day. A counting profiler yields a feature corresponding to its count (e.g., the number of calls from BRONX at NIGHT). A percentage profiler yields a feature whose value is between zero and one hundred, representing the percentage of calls in the account-day that satisfy the conditions. Each type of profiler is produced by a different type of profiling template. Combining Evidence from the Profilers The third stage of detector construction learns how to combine evidence from the set of profilers generated by the previous stage. For this stage, the outputs of the profilers are used as features to a standard machine learning program. Training is done on account data, and profilers evaluate a complete account-day at a time. In training, the profilers outputs are presented along with the desired output (the account-day s cl~sification). The evidence combination learns which combinations of profiler outputs indicate fraud with high confidence. Many training methods for evidence combining are possible. After experimenting with several methods, we chose a simple Linear Threshold Unit (LTU) for our experiments. An LTU is simple and fast, and enables a good first-order judgment of the features worth. A feature selection process is used to reduce the number of profilers in the final detector. Some of the rules do not perform well when used in profilers, and some profilers overlap in their fraud detection coverage. We therefore employ a sequential forward selection process (Kittler 1986) which chooses a small set of useful profilers. Empirically, this simplifies the final detector and increases its accuracy. The Detector The final output of the constructor is a detector that profiles each user s behavior based on several indicators, and produces an alarm if there is sufficient evidence of fraudulent activity. Figure 2 shows an example of a simple detector evaluating an account-day. Before being used on an account, the profilers undergo a profiling period (usually 30 days) during which they measure unfrauded usage. In our study, these initial 30 account-days were guaranteed free of fraud, but were not otherwise guaranteed to be typical. From this initial profiling period, each profiler measures a characteristic level of activity. The Data The call data used for this study are records of cellular calls placed over four months by users in the New York City area--an area with very high levels of fraud. The calls are labeled as legitimate or fraudulent by cross referencing a database of all calls that were credited as being fraudulent for the same time period. Each call is described by 31 attributes, such as the phone number of the caller, the duration of the call, the geographical origin and destination of the call, and any long-distance carrier used. The call data were separated carefully into several partitions for data mining, profiler training and test- 16
4 Account-Day Day Time Duration Origin Destination Tue 01:42 I0 mins Bronx, NY Miami, FL Tue I0:05 3 mins Scrsdl, NY Bayonne, NJ Tue II:23 24 sec Scrsdl, NY Congers, NY Tue 14:53 5 mins Trrytvn, NY Grnvich,CT Tue 15:06 5 mins Manha~, NY Wstport, CT Tue 16:28 53 sec Scrsdl, NY Congers, NY Tue 23:40 17 mins Bronx, NY Miami, FL Profller~~ Evidence Combining Value normalization I and weighting J ~Yea FRAUD ALARM Figure 2: A DC-1 fraud detector processing a single account-day of data. ing. Data mining used 610 accounts comprising approximately 350,000 calls. Once the profilers are generated, the system transforms the raw call data into a series of account-days using the outputs of the profilers as features. Data for the profilers were drawn from a remaining pool of about 2500 accounts. We used randomly selected sets of 5000 account-days for training, and another set of 5000 account-days (drawn from separate accounts) for testing. Each account-day set was chosen to comprise 20% fraud and 80% non-fraud days. An account-day was classified as fraud if five or more minutes of fraudulent usage occurred; days including only one to four minutes of fraudulent usage were discarded. Results Data mining generated 3630 rules, each of which applied to two or more accounts. The rule selection process, in which rules are chosen in order of maximum account coverage, yielded a smaller set of 99 rule sufficiento cover the accounts. Each of the 99 rules was used to instantiate two profiler templates, yielding 198 profilers. The final feature selection step reduced this to nine profilers, with which the experiments were performed. Each detector was run ten times on randomly selected training and testing accounts. Accuracy averages and standardeviations are shown in the leftmost column of Table I. For comparison, we evaluated DC-1 along with other detection strategies: "Alarm on All" represents the policy of alarming on every account every day. "Alarm on None" represents the policy of allowing fraud to go completely unchecked. This corresponds to the maximum likelihood classification. "Collisions and Velocities" is a detector using two common methods for detecting cloning fraud, mentioned earlier. DC-1 was used to learn a threshold on the number of collision and velocity alarms necessary to generate a fraud alarm. The "High Usage" detector generates an alarm on any day in which airtime usage exceeded a threshold. The threshold was found empirically from training data. The best individual DC-1 profiler was used as an isolated detector. This experiment was done to determine the additional benefit of combining profilers. The best individual profiler was generated from the rule: (TIME-0F-DAY = EVENING) ==> FRAUD Data mining had discovered (in 119 accounts) that the sudden appearance of evening calls, in accounts that did not normally make them, was coincident with cloning fraud. The relatively high accuracy of this one profiler reveals that this is a valuable fraud indicator. The DC-1 detector incorporates all the profilers chosen by feature selection. We used the weight learning method described earlier to determine the weights for evidence combining. The SOTA ("State Of The Art") detector incorporates seven hand-crafted profiling methods that were the best individual detectors identified in a previous study. Each method profiles an account in a different way and produces a separate alarm. Weights for combining SOTA s alarms were determined by our weight-tuning algorithm. In this domain, different types of errors have different costs, and a realistic evaluation must take these costs into account. A false positive error (a false alarm) corresponds to wrongly deciding that a customer has been cloned. Based on the cost of a fraud analyst s time, we estimate the cost of a false positive error to be about $5. A false negative error corresponds to letting a frauded account-day go undetected. Rather than using a uniform cost for all false negatives, we estimated a false negative to cost $.40 per minute of fraudulent airtime used on that account-day. This figure is based on the proportion of usage in local and non-local ("roaming") markets, and their corresponding ~ costs. 2We have still glossed over some complexity. For a given account, the only false negative fraud days that incur cost 17
5 Detector I Accuracy (%) Cost ($US) Accuracy at cost (%) Alarm on All Alarm on None Collisions + Velocities ±.3 High Usage 87 ± ± Best individual DC-1 profiler 88 ± ± ± 1 DC-1 detector State of the Art (SOTA) 91 ±.5 94 ± ± ± ± ±.3 Table 1: A comparison of accuracies and costs of various detectors. Because LTU training methods try to minimize errors but not error costs, we employed a second step in training. After training, the LTU s threshold is adjusted to yield minimum error cost on the training set. This adjustment is done by moving the decision threshold from -1 to +1 in increments of.01 and computing the resulting error cost. After the minimum cost on training data is found, the threshold is clamped and the testing data are evaluated. The second column of Table 1 shows the mean and standard deviations of test set costs. The third column, "Accuracy at cost," is the corresponding classification accuracy of the detector when the threshold is set to yield lowest-cost classifications. Discussion The results in Table I demonstrate that DC-1 performs quite well. Though there is room for improvement, the DC-1 detector performs better than all but the handcoded SOTA detector. It is surprising that Collisions and Velocity Checks, commonly thought to be reliable indicators of cloning, performed poorly in our experiments. Preliminary analysis suggests that call collisions and velocity alarms may be more common among legitimate calls in our region than is generally believed. In our experiments, lowest cost classification occurred at an accuracy somewhat lower than optimal. In other words, some classification accuracy could be sacrificed to decrease cost. More sophisticated methods could be used to produce cost sensitive classifiers, which would probably produce better results. Related Work Yuhas (1993) and Ezawa and Norton (1995) address the problem of uncollectible debt in telecommunications services. However, neither work deals with characterizing typical customer behavior, so mining the data to derive profiling features is not necessary. Ezawa and Norton s method of evidence combining is to the company are those prior to the first true positive alarm. After the fraud is detected, it is terminated. Thus, our analysis overestimates the costs slightly; a more thorough analysis would eliminate such days from the computation. much more sophisticated than ours and faces some of the same problems (unequal error costs, skewed class distributions). Methods that deal with time series are relevant to our work. However, time series analysis (Chatfield 1984; Farnum & Stanton 1989) strives to characterize an entire time series or to forecast future events in the series. Neither ability is directly useful to fraud detection. Hidden Markov Models (Rabiner & Juang 1986) are concerned with distinguishing recurring sequences of states and the transitions between them. However, fraud detection usually only deals with two states (the "frauded" and "un-frauded" states) with single transition between them. It may be useful to recognize recurring un-frauded states of an account, but this ability is likely peripheral to the detection task. Conclusions and Future Work The detection of cellular cloning fraud is a relatively young field. Fraud behavior changes frequently as bandits adapt to detection techniques. A fraud detection system should be adaptive as well. However, in order to build usage profilers we must know which aspects of customers behavior to profile. Historically, determining such aspects has involved a good deal of manual work, hypothesizing useful features, building profilers and testing them. Determining how to combine them involves much trial-and-error as well. Our framework automates this process. Results show that the DC-1 detector performs better than the high-usage alarm and the collision/velocity alarm. Even with relatively simple components, DC-1 is able to exploit mined data to produce a detector whose performance approaches that of the state-of-the-art. The SOTA system took several person-months to build. The DC-1 detector took several CPU-hours. Furthermore, DC-1 can be retrained at any time as necessitated by the changing environment. We believe our framework will be useful in other domains in which typical behavior is to be distinguished from unusual behavior. Prime candidates are similar domains involving fraud, such as credit-card fraud and toll fraud. In credit-card fraud, data mining may identify locations that arise as new hot-beds of fraud. The constructor would then incorporate profilers that no- 18
6 tice if a customer begins to charge more than usual from that location. The DC-1 system is an initial prototype. Further work will develop two aspects of DC-1 in preparation for its deployment. First, we intend to expand the data mining step, particularly to exploit available background knowledge. We believe that there is a good deal of relevant background knowledge (for example, hierarchical geographical knowledge) that can augment the current calling data. Along with this, we hope to be able to characterize and describe the knowledge discovered in our system. Second, we hope to improve the method of combining profilers. We chose an LTU initially because it is simple and fast. A neural network could probably attain higher accuracy for DC-1, possibly matching that of SOTA. Acknowledgements We would like to thank Nicholas Arcuri and the Fraud Control department at Bell Atlantic NYNEX Mobile for many useful discussions about cellular fraud and its detection. References Buchanan, B. G., and Mitchell, T. M Modeldirected learning of production rules. In Hayes-Roth, F., ed., Pattern-directed inference systems. New York: Academic Press. Chatfield, C The analysis of time series: An introduction (third edition). New York: Chapman and Hall. Clearwater, S., and Provost, F RL4: A tool for knowledge-based induction. In Proceedings of the Second International IEEE Conference on Tools for Artificial Intelligence, IEEE CS Press. Davis, A., and Goyal, S Management of cellular fraud: Knowledge-based detection, classification and prevention. In Thirteenth International Conference on Artificial Intelligence, Expert Systems and Natural Language. Ezawa, K., and Norton, S Knowledge discovery in telecommunication services data using bayesian network models. In Fayyad, U., and Uthurusamy, R., eds., Proceedings of First International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA: AAAI Press. Farnum, N., and Stanton, L Quantitative forecasting methods. Boston, MA: PWS-Kent Publishing Company. Fawcett, T., and Provost, F. Submitted. Data mining for adaptive fraud detection. Data Mining and Knowledge Discovery. Available as cs. umass, edu/ fawcett/dmkd-97, ps. gz. Kittler, J Feature selection and extraction. In Fu, K. S., ed., Handbook of pattern recognition and image processing. New York: Academic Press Quinlan, J. R Generating production rules from decision trees. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann. Rabiner, L. R., and Juang, B. H An introduction to hidden markov models. IEEE ASSP Magazine 3(1):4-16. Segal, R., and Etzioni, O Learning decision lists using homogeneous rules. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park, CA: AAAI Press. Walters, D., and Wilkinson, W Wireless fraud, now and in the future: A view of the problem and some solutions. Mobile Phone News 4-7. Yuhas, B. P Toll-fraud detection. In Alspector, J.; Goodman, R.; and Brown, T., eds., Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications, Hillsdale, N J: Lawrence Erlbaum Associates. 19
Adaptive Fraud Detection
Data Mining and Knowledge Discovery 1, 291 316 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Adaptive Fraud Detection TOM FAWCETT [email protected] FOSTER PROVOST [email protected]
Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University
DATA MINING IN TELECOMMUNICATIONS
DATA MINING IN TELECOMMUNICATIONS Gary M. Weiss Department of Computer and Information Science Fordham University Abstract: Key words: Telecommunication companies generate a tremendous amount of data.
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Data Mining in Telecommunication
Data Mining in Telecommunication Mohsin Nadaf & Vidya Kadam Department of IT, Trinity College of Engineering & Research, Pune, India E-mail : [email protected] Abstract Telecommunication is one of
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Intelligent Agents and Fraud Detection
Intelligent Agents and Fraud Detection Name: Jia Wu and Jongwoo Park 1. Introduction Frauds have plagued telecommunication industries, financial institutions and other organizations for a long time. The
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
DNA: An Online Algorithm for Credit Card Fraud Detection for Games Merchants
DNA: An Online Algorithm for Credit Card Fraud Detection for Games Merchants Michael Schaidnagel D-72072 Tübingen, Germany [email protected] Ilia Petrov, Fritz Laux Data Management Lab Reutlingen
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
Dan French Founder & CEO, Consider Solutions
Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
Fraud Detection for Online Retail using Random Forests
Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.
International Dialing and Roaming: Preventing Fraud and Revenue Leakage
page 1 of 7 International Dialing and Roaming: Preventing Fraud and Revenue Leakage Abstract By enhancing global dialing code information management, mobile and fixed operators can reduce unforeseen fraud-related
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model ABSTRACT Mrs. Arpana Bharani* Mrs. Mohini Rao** Consumer credit is one of the necessary processes but lending bears
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Data Mining in the Telecommunications Industry
486 Section: Service Data Mining in the Telecommunications Industry Gary M. Weiss Fordham University, USA INTRODUCTION The telecommunications industry was one of the first to adopt data mining technology.
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
Computational Intelligence in Data Mining and Prospects in Telecommunication Industry
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 Scholarlink Research Institute Journals, 2011 (ISSN: 2141-7016) jeteas.scholarlinkresearch.org Journal of Emerging
Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector
Contemporary Engineering Sciences, Vol. 7, 2014, no. 11, 515-522 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4431 Data Mining Approach For Subscription-Fraud Detection in Telecommunication
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES
International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
2.1. Data Mining for Biomedical and DNA data analysis
Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: [email protected]) Dr. G.N. Singh Department of Physics and
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?
Dynamic Predictive Modeling in Claims Management - Is it a Game Changer? Anil Joshi Alan Josefsek Bob Mattison Anil Joshi is the President and CEO of AnalyticsPlus, Inc. (www.analyticsplus.com)- a Chicago
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
Analyzing Customer Churn in the Software as a Service (SaaS) Industry
Analyzing Customer Churn in the Software as a Service (SaaS) Industry Ben Frank, Radford University Jeff Pittges, Radford University Abstract Predicting customer churn is a classic data mining problem.
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING
Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Web Mining as a Tool for Understanding Online Learning
Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA [email protected] James Laffey University of Missouri Columbia Columbia, MO USA [email protected]
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
FRAUD DETECTION IN MOBILE TELECOMMUNICATION
FRAUD DETECTION IN MOBILE TELECOMMUNICATION Fayemiwo Michael Adebisi 1* and Olasoji Babatunde O 1. 1 Department of Mathematical Sciences, College of Natural and Applied Science, Oduduwa University, Ipetumodu,
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Application of Data Mining Techniques in Intrusion Detection
Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology [email protected] Abstract: The article introduced the importance of intrusion detection, as well as
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Introduction. Background
Predictive Operational Analytics (POA): Customized Solutions for Improving Efficiency and Productivity for Manufacturers using a Predictive Analytics Approach Introduction Preserving assets and improving
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Building A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
An effective approach to preventing application fraud. Experian Fraud Analytics
An effective approach to preventing application fraud Experian Fraud Analytics The growing threat of application fraud Fraud attacks are increasing across the world Application fraud is a rapidly growing
Local outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Knowledge Based Descriptive Neural Networks
Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: [email protected] Abstract This paper presents a
On Correlating Performance Metrics
On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are
Using News Articles to Predict Stock Price Movements
Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 [email protected] 21, June 15,
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
Spam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell [email protected] David Kopcso Babson College [email protected] Abstract: A series of simulations
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
