USING THE XLMINER TOOL FOR DATA MINING IN CUSTOMER RELATIONSHIP MANAGEMENT
|
|
|
- Suzanna Wilkerson
- 10 years ago
- Views:
Transcription
1 USING THE XLMINER TOOL FOR DATA MINING IN CUSTOMER RELATIONSHIP MANAGEMENT I. Polaka and A. Sukov Keywords: data mining software, classification, clustering, associative rules, XLMiner 1. Introduction Data mining is a set of methods that helps an expert to obtain new, previously unknown significant information from available data. To apply data mining possibilities practically software that deploys data mining algorithms is needed. This software may also be integrated into other tools. One of these tools is the QuantLink XLMiner tool for Microsoft Excel. In this paper the data mining possibilities of XLMiner are examined. More precisely this tool is used for extracting unknown client data, rules and patterns, as well as for extracting information that is significant for client relationship management. 2. The XLMiner software XLMiner is Microsoft Excel plug-in software [1], a product of QuantLink. It works with Microsoft Excel (English versions 2000, 2003 or 2007) environment. This tool offers solving many data mining tasks using the most popular methods [2]. Available tasks include classification, clustering, time series analysis and associative rules. XLMiner also offers a possibility of sampling data from external data bases or MS Excel spreadsheets, as well as partitioning data set into training, validation and test data sets and performing data preprocessing. The tool also includes some visualization options and allows using a pre-created model on new data. The tool works with the MS Excel environment but the work with the supported methods is carried out using a standard three step wizard that guides the user through the stages of input data information, method parameters and output data information. Figure 1. Interface of XLMiner a standard wizard 155
2 Results and model data are shown in specially designed MS Excel worksheets that make further analysis faster and easier. The tool also outlines the ranges of spreadsheets that will be needed for further work with this tool and this data (e.g. applying other methods). Most of the computer users are already familiar with the MS Excel interface therefore learning to work with XLMiner is quick and easy, it doesn t require an extended course on one program if users have some knowledge about data mining and the methods that are used in this tool. 3. Client information analysis In the recent years businesses have moved from production oriented strategies to those that put customer satisfaction as their goal. Product and service sales are now more oriented towards clients needs. Therefore it is necessary to know your customer. It is made possible by analyzing available customer data transactions and other committed information like filled in questionnaires etc. This information allows us to forecast further demand, personalize advertisement campaigns, cross-sell new or related products or services etc. In the experiments conducted, the data was provided by Southern-German Class lottery [3]; it holds personal information about the client and his living environment, and also the data that describes playing habits of that person. The goals of the experiments were to forecast for how long a person will be playing in this lottery (extract patterns that could help to point out potential loyal customers), to determine the age of the most active customer group and also to extract other significant rules that would help with the customer relationship management. The data set used in the case study contains 72 attributes and more than records. This data set was used for the classification task to determine the activity class to which a person would belong based on the information which was known previously about this customer. The results of classification will help to forecast the client flow. This data set was also used for clustering to determine the dependencies between the age of a person and the number of tickets bought by this person. This information may prove helpful in further market segmentation. To extract more relevant patterns that exist among the features of customers, the data was mined for associative rules Forecasting client activity class XLMiner offers to perform data sampling and partition using a data set that holds up to records and 200 attributes. This includes data sampling to reduce the size of a data set (numbers of records) by choosing a subset of records that would represent the whole set. It can be done by random partitioning or partitioning with oversampling setting a desired support limit for a class. Data set is larger than the limitations set by MS Excel or XLMiner; for this reason random partitioning was used to reduce the number of records. But the resulting data set would still have more attributes than it is allowed in this tool (30 attributes are allowed). Therefore a smaller subset of the attributes is needed, that doesn t lose much information in the reduction process. Therefore a tool named WEKA [4] was used to pick a subset of 13 best attributes. The data set was also divided into training and test data sets. Usually data sets are divided into training set holding 70% of the data and test set holding 30% of the data [5]. This ratio was also used in this case study wherever such division was necessary (e.g. classification and forecasting). XLMiner offers solving a classification task using the following methods: discriminant analysis, logistic regression (not to be confused with logic regression, which works with 156
3 binary data and classifies data using the Boolean algebra approach), classification trees (CART), naive Bayes classifier, neural networks (multilayer feedforward architectures) and k- nearest neighbours. Discriminant analysis does not work with the needed amount of data and logistic regression does not support 5 classes in the data in this tool (it only classifies 2 classes), therefore this task was solved using other methods and attributes of data types that are supported by these methods (e.g. continuous data was not used with naive Bayes classifier). The best method for this task was chosen using the total classification error and the wrong clasifications into most important class (that is class 4) (see Table 1). Each client was assigned an activity class which shows how long a client will participate in the lottery, which means we can forecast whether this client will be a loyal customer and for how long he will participate). We consider the following classes that show 5 common patterns of client behaviour: 0 no tickets were bought, 1 only the first ticket was bought, 2 no more than two tickets were bought, 3 client bought all tickets for one season, 4 client played during the first season and bought tickets for the second season (we consider this type of people loyal customers ). The importance of this task is its contribution to forecasting client behaviour for the company, it helps to forecast the number of players, money flow etc. If there is information about prospective customers (that are not clients of this lottery yet), it can also provide information for selective advertising campaigns, distributing the advertisements to prospective customers that will most likely become loyal customers. An analysis of the classifiers results enables us to evaluate the performance of the classifiers. It was done by evaluating the total classification error and the rates of the wrong classifications into each class (Table 1). Table 1 Classification errors (%) Class Classification K-nearest Neural Naïve Bayes tree neighbours network Total Some classifiers did not classify classes 1 to 3 but they were not the most important for this task. The most important was to determine the clients that will play in this lottery regularly (4 th class). The classifier that had the lowest error for this class was classification tree. The same method also had the lowest total classification error. This means that the most appropriate method to solve this task is to build a classification tree. The use of this method gives us 86 right answers out of 100 when we want to know whether the client will become a loyal customer. In this case study there were several inconveniences caused by data set limitations posed by software. Not only was it the record number limitation by Microsoft Excel, but also some unexpected limitations from the XLMiner tool. The work with records (XLMiner limit) could not be completed even if the data set had less than 200 attributes that were 157
4 Computer Science allowed. The set that did fit all the limitations had records while pre-processing the data and records in the training set for classification. The results of the classification were overall worse than previously assumed the error rate of the best classifier was above 40 % and that is big enough for the results to lose their impact on business decision making. But the classification error of the most important class (class 4 which means that the client will be a loyal customer and will take part in the lottery for a long time) was better it was 14% for the classification tree. This means that it can be used for further plans and business strategies Segmentation of participants age and number of tickets bought To determine the age at which people tend to play more, client data was clusterized using attributes Age and Number of tickets. Relations extracted here will help to determine the best segment of population that will be the right recipient (i.e. the one who will participate). At the beginning of the work, a graph was plotted to show the data relations in a two dimensional data chart, to visualize the data and see the possible patterns (see Figure 2) Number of tickets Age Figure 2. Data graph showing client s age and number of tickets bought It is obvious that there is a group of records that stand out in the graph they are all with the age value of 0. This means that these are the records that did not have the age specified. These records were excluded from the data set because they didn t hold any significant information that would contribute to the solution of this task greatly. In the hierarchical clustering clusters are built from records or subclusters until there is one cluster left that holds all of the records [6]. After the work is finished, the best number of subclusters is determined (usually using a dendrogram which shows merging of subclusters and distances at which it happens), that means the number of clusters is chosen where there is the longest distance among clusters. In this case there were three clusters. Data division was based on the number of tickets played in this case, attribute age did not contribute much (one group played 5-15 tickets, another group played tickets and another group played tickets). Partitional clustering means assigning each record to the cluster with the closest centroid (assigning records to clusters of previously specified number) [6]. Trying different
5 numbers of classes and coordinates of the centroids, the most typical and informative result was with 4 clusters. This method also showed different end clusters if compared to the hierarchical clustering. In this clustering, cluster centers, numbers of records in clusters and average distances in clusters were as shown in Table 2. Table 2 Cluster information (partitional clustering) Cluster Age Number of Average distance Records tickets in cluster Cluster Cluster Cluster Cluster We can see that people of age (Cluster-4) do not play often and end soon (1 to 8 tickets), and the percentage of this type of participants is only 20. Participants that have reached the age of 40 (up to 60 year olds) can be divided into two groups those who buy tickets (and more (around 11%)) and those who buy relatively few (4-8) tickets (~35% of the participants). Another group is people who have reached the age of 70 they don t buy as many tickets as group does but the still buy more tickets than the young participants. So we can see that older people buy more tickets, and most of the tickets are bought by the participants who are 40 to 60 years old. So people of this age will most likely answer the advertisements and will be willing to try demo or introduction packages. Also in the clustering tasks there were some unexpected limitations concerning the data set. The maximum number of records in this task is And even if the data can be clusterized, the visualization could be better the results could be better interpretable and easier to understand, if there were more visualization tools for clusterization. Even the dendrogram shows only up to 30 subclusters Discovering other significant patterns If we extract new interesting relationships about people s lifestyle and environment, it can be used for more efficient advertisement campaigns. Therefore data was also mined for association rules. In this task frequent patterns in the data can be found. They can be expressed as relations between objects of the data set in the form If A, then B where A and B can be sets of objects [5]. Data set was reduced because the XLMiner tool brought up an associative tree building error for the maximum supported data set. Attribute set was reduced excluding some attributes that describe others (e.g. the evaluated frequency of each car (by producers) in the client s heighbourhood). The rest of the attributes were transformed to have unique values compared to each other so that they can be distinguished. In this case transformation was carried out with MS Excel function, adding a unique position before the values for every attribute. Some interesting associative rules were found in the process. For example, if a person chose mail as a primary mean of communication, complaints arrived only in 2% of the cases. Also the participants who live in the old Federal States, participants who live in houses with no offices, and those who are driving Volkswagen or Seat, preferred to choose mail as a primary mean of communication. 159
6 And there was also discovered a sure rule (559 people of 559 matched) if the participant s purchasing power was evaluated 10 (maximum points), this person lived in the old Federal States. It is not surprising because economic growth of the East Germany was slower in the Soviet Union years compared to the growth of West Germany and this means that the company would benefit more from more expensive projects in the western part of the country. 4. Other options of the software XLMiner has the ability to solve forecasting tasks using multiple linear regression, k- nearest neighbours, regression trees or neural networks (multilayer feedforward architectures). It can also analyze time series autocorrelation function and partial autocorrelation function for data exploration and finding patterns in data, and autoregression and moving average integrated (ARIMA) for forecasting. XLMiner also supports smoothing functions exponential, double exponential, moving average and Holt-Winters smoothing. QuantLink also offers a tool named XLMcalc, which allows using models made with XLMiner on other data without a need of XLMiner on this machine. 5. Results This paper discusses the possibilities of XLMiner in solving data mining tasks, concentrating on tasks that are relevant for client relationship management. XLMiner can solve classification, forecasting, clustering, time series analysis and associative rules tasks. And the classification task is the one that is most emphasized. This task can be solved with various methods that work with both categorical (naive Bayes classifier) and continuous data (k-nearest neighbors and others) and the results are comprehensive and more processed. For time series analysis and associative rules there are some of the most popular methods but these tasks are not the main application of this tool, although they can be used successfully as secondary analysis tools. The same is with clustering some of the methods are offered for this task but result analysis is quite poor and so is visualization. This tool also lacks some data pre-processing utilities. In this case study some tools for attribute set analysis and reduction (subset selection) would have been handy but they are offered only for some regression methods (as built in functions). All of the data set was not used in any task because of the XLMiner limitations. So we have to conclude that this tool is more suitable for tasks with smaller data sets and as an illustration in the learning process for the tasks described previously. Acknowledgements Thanks to Dr.sc.ing., Assoc.prof. Ludmila Aleksejeva for help and support. References 1. XLMiner Data Mining. Retrieved October from: 2. Shmueli G., Patel, N.R., Bruce, P.C. Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner // Wiley- Interscience, Data Mining Cup. Retrieved May from: 160
7 4. Witten, I.H., Frank, E. Data Mining: Practical Machine Learning Tools and Techniques // Morgan Kaufmann, 2 nd Edition, Datu ieguve. Pamati. / A. Sukovs, L. Aleksejeva, K. Makejeva... [u.c.] Rīga: SIA Drukātava, Han, J., Kamber, M. Data Mining: Concepts and Techniques // Morgan Kaufmann, 2 nd Edition, Inese Polaka, B.sc.ing., Master s student, Department of Modelling and Simulation, Riga Technical University, 1 Kalku Street, Riga, LV- 1658, Latvia, [email protected]. Anatoly Sukov, Mg.sc.ing., Lecturer, Department of Modelling and Simulation, Riga Technical University, 1 Kalku Street, Riga, LV- 1658, Latvia, [email protected]. Poļaka Inese, Sukovs Anatolijs. Rīka XLMiner izmantošana datu ieguvei klientu attiecību vadībā Rakstā tika apskatītas dažas datu ieguves iespējas, kas ir svarīgas klientu attiecību vadībā. Datu ieguve tika veikta loterijas klientu datos, izmantojot Microsoft Excel programmu un QuantLink XLMiner pievienojumprogrammu. XLMiner atbalsta tādu datu ieguves uzdevumu risināšanu kā klasifikācija, klasterizācija, laika rindu analīze un asociatīvo likumu meklēšana, lielāku uzsvaru liekot uz klasifikācijas uzdevumu, piedāvājot lielāku metožu klāstu un plašāku rezultātu analīzi. Šā rīka piedāvātās iespējas tika izmantotas, lai no esošiem loterijas klientu datiem izgūtu klientu attiecību vadības procesiem svarīgu informāciju. Tika veikta klienta aktivitātes klases noteikšana ar dažādu piedāvāto klasifikatoru palīdzību (klasifikācijas koki, naivais Baijesa klasifikators, k-tuvākie kaimiņi un neironu tīkli), nosakot arī labāko klasifikatoru. Tika veikta arī klientu segmentācija pēc vecuma un nopirkto biļešu skaita, izmantojot hierarhisko un sadalošo klasterizāciju, nosakot loterijas mērķauditoriju, t.i. visvairāk spēlējošo vecumgrupu. Tika veikta arī citu zīmīgu likumu meklēšana (asociatīvie likumi). Polaka Inese, Sukov Anatoly. Using the XLMiner tool for data mining in customer relationship management This paper describes some of the data mining tasks that are important for customer relationship management. Data used in mining was lottery client data and it was carried out with Microsoft Excel software and QuantLink XLMiner plug-in software. XLMiner offers to solve data mining tasks like classification, clustering, time series analysis and associative rules, emphasizing the classification task by offering more methods and broader results analysis. The possibilities offered by this tool were put to use to extract data that is significant for customer relationship management from the available data. The following tasks were solved: assigning an activity class to each client, using some of the offered classification methods (classification trees, naïve Bayes classifier, k- nearest neighbours and neural networks), client segmentation by age of customers and number of lottery tickets bought, using both the hierarchical and the partitional clustering to determine the age group that contains people who are most likely to play more. The data was also mined for other significant rules (association rules). Поляка Инесе, Суков Анатолий. Использование инструмента XLMiner для добычи знаний в задачах управления взаимоотношениями с клиентами В статье рассмотрены различные возможности добычи знаний, которые важны для управления взаимоотношениями с клиентами. Добыча знаний была произведена на данных клиентов лотереи, используя среду Microsoft Excel и специальную программу QuantLink XLMiner. XLMiner поддерживает решение таких задач добычи знаний, как классификация, кластеризация, анализ временных рядов и поиск ассоциативных законов, уделяя особое внимание задаче классификации, предлагая более широкий выбор методов и более полный анализ результатов. Возможности этого инструмента были использованы для того, чтобы из имеющихся данных о клиентах лотереи получить важную информацию для процессов управления взаимоотношениями с клиентами. Проведено определение класса активности клиента с помощью различных предложенных классификаторов (деревья классификации, наивный классификатор Байеса, к-ближайшие соседи, нейронные сети), выбран наилучший классификатор. Проведена сегментация клиентов по возрасту и количеству купленных билетов, используя иерархическую и разделяющую кластеризацию, определена целевая аудитория лотереи, то есть наиболее играющая возрастная группа игроков. Также произведен поиск других показательных законов (ассоциативные законы). 161
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
City University of Hong Kong. Information on a Course offered by the Department of Management Sciences with effect from Semester A in 2012 / 2013
City University of Hong Kong Information on a Course offered by the Department of Management Sciences with effect from Semester A in 2012 / 2013 Part I Course Title: Customer Relationship Management with
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
from Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Data Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Predictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
Model Deployment. Dr. Saed Sayad. University of Toronto 2010 [email protected]. http://chem-eng.utoronto.ca/~datamining/
Model Deployment Dr. Saed Sayad University of Toronto 2010 [email protected] http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.
Subject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Data Mining mit der JMSL Numerical Library for Java Applications
Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis
ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational
Numerical Algorithms for Predicting Sports Results
Numerical Algorithms for Predicting Sports Results by Jack David Blundell, 1 School of Computing, Faculty of Engineering ABSTRACT Numerical models can help predict the outcome of sporting events. The features
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Predictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Introducing Oracle Crystal Ball Predictor: a new approach to forecasting in MS Excel Environment
Introducing Oracle Crystal Ball Predictor: a new approach to forecasting in MS Excel Environment Samik Raychaudhuri, Ph. D. Principal Member of Technical Staff ISF 2010 Oracle Crystal
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
College of Health and Human Services. Fall 2013. Syllabus
College of Health and Human Services Fall 2013 Syllabus information placement Instructor description objectives HAP 780 : Data Mining in Health Care Time: Mondays, 7.20pm 10pm (except for 3 rd lecture
270107 - MD - Data Mining
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of
How To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva
382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
JetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
Data Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: [email protected] http://sci2s.ugr.es
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Data Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario [email protected]
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario [email protected] Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Introduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain [email protected] Horsens, Denmark, 26th September 2005
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Get to Know the IBM SPSS Product Portfolio
IBM Software Business Analytics Product portfolio Get to Know the IBM SPSS Product Portfolio Offering integrated analytical capabilities that help organizations use data to drive improved outcomes 123
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Email: [email protected] Office: LSK 5045 Begin subject: [ISOM3360]...
Business Intelligence and Data Mining ISOM 3360: Spring 2015 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: [email protected] Office: LSK 5045 Begin subject:
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
Machine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Ira J. Haimowitz Henry Schwarz
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Clustering and Prediction for Credit Line Optimization Ira J. Haimowitz Henry Schwarz General
GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING
Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Distance Learning and Examining Systems
Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
Numerical Algorithms Group
Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
