What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling
|
|
|
- Barry Hodge
- 10 years ago
- Views:
Transcription
1 MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : [email protected] 1 Aims To introduce the basic concepts of data mining and its applications in various industries with focus on marketing, sales and customer relationship management. Course Outline Introduction. Data preparation. Prediction modeling: multiple regressions, logistic regressions, neural networks, decision trees. Cluster analysis and self-organizing maps. Market basket analysis: association analysis and sequence discovery. 2 MS4424 Data Mining & Modelling Assessment Coursework : 50% (test, project) Examination : 50% (three hours) Textbooks Michael J A Berry and Gordon Linoff, Data Mining Techniques for Marketing, Sales, and Customer Support, John Wiley & Sons, 1997 Michael J A Berry and Gordon Linoff, Mastering Data Mining: The Art and Science of Customer Relationship Management, John Wiley & Sons, 2000 MS4424 Data Mining & Modelling Week Content Reading 1 Introduction B Ch 1-5, Logistic Regression B Ch Data Preparation B Ch 5 5 Variable Selection B Ch Decision Tree B Ch Neural Network B Ch Cluster Analysis B Ch Associations B Ch 8 13 Test 3 4 MS4424 Data Mining & Modelling Introduction Lecturer : Dr Iris Yeung Room No : P7509 Tel No : [email protected] What is Data Mining? Data mining is the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules. Knowledge Discovery in Databases (KDD) 5 6 1
2 What is Data Mining? Role of Datawarehouse It is an interdisciplinary field bringing together techniques from Statistics Information Science Machine learning Database Technology Visualization Others Disciplines 7 8 Why Data Mining Now? The data is being produced at unprecedented rate. The data is being warehoused, which is designed exclusively for data mining. The computer power is affordable. The competitive pressure is strong. Commercial data mining software products have become available. Examples of Using Data Mining Direct Marketing Customer Acquisition Customer Retention Cross-Selling Fraud Detection Forecasting in Financial Markets 9 10 What Can Data Mining Do? Classification Estimation Prediction Affinity Grouping Clustering Description Classification Examine the features of a newly presented object and assign it to one of a predefined set of classes. Examples: Classifying credit applicants as low, medium, or high risk Spotting fraudulent insurance claims
3 Estimation Prediction Estimation deals with continuously valued outcomes rather than discrete outcomes as compared with classification. Examples: estimate a family s total household income estimate the lifetime value of a customer estimate the value of a piece of real estate 13 Same as classification or estimation except that it predicts future behavior Examples: Predicting which customers will leave within the next six months 14 Affinity Grouping Or Market Basket Analysis Determine which things go together. Example: determine what things go together in a shopping cart at the supermarket, hence the term market basket analysis. used to plan arrangement of items on store shelves identify cross-selling opportunities and to design attractive packages or groupings of product and services. 15 Clustering Segmenting a heterogeneous population into a number of more homogeneous subgroups or clusters. Unlike classification, there are no predefined classes Often precedes some other form of data mining or modeling. For example, first divide the customer base into clusters with similar buying habits, and then ask what kind of promotion works best for each cluster. 16 Description Describe what is going on in a complicated database in a way that increases our understanding of the people, products, or processes A good enough description of a behavior will often suggest an explanation for it as well Data Mining Techniques Market Basket Analysis (MBA) Memory-Based Reasoning (MBR) Cluster Detection Link Analysis Decision Trees Neural Network 17 Genetic Algorithms 18 3
4 Data Mining using SEMMA SEMMA stands for 1. Sample identify input data sets (identify input data, sample from a larger data set, partition data set into training, validation, and test data sets). 2. Explore explore data set statistically and graphically (plot the data, obtain descriptive statistics, identify important variables, perform association analysis). 3. Modify prepare the data for analysis (create additional variables or transform existing variables for analysis, identify outliers, impute missing values, modify the way in which variables are used for the analysis, perform cluster 19 analysis, analyze data with SOMs or Kohonen networks.) Data Mining using SEMMA SEMMA stands for 4. Model fit a predictive model (model a target variable using a regression model, a decision tree, a neural network, or userdefined model). 5. Assess compare competing predictive models (build charts plotting percentage of respondents, percentage of respondents captured, lift charts, profit charts). 20 Getting Started with SAS Enterprise Miner Setting up the initial project and diagram 1. Select File->New->Project 2. Type in name of project (for example, My project ) Setting up the initial project and diagram 3. Modify the location of the project folder if desired by selecting Browse 4. Select Create. The project opens with an initial untitled diagram. Setting up the initial project and diagram 5. Click on the diagram title and type in a new title if desired (for example, My First Flow ). After Selecting Name Final Appearance
5 Identifying the workspace components 1. Observe that the project window opens with the Diagrams tab activated. Select the Tools tab located to the right of the Diagrams tab in the lower-left portion of the project window. This tab enables you to see all of the tools (or nodes ) that are available in the Enterprise Miner. Identifying the workspace components Many of the commonly used tools are shown on the toolbar at the top of the window. If you desire to have additional tools in this toolbar, you can drag them from the window above onto the toolbar. In addition, you can rearrange the tools on the toolbar by dragging each tool to the desired location on the bar Identifying the workspace components Sample Nodes 2. Select the Reports tab located to the right of the Tools tab. This tab reveals any reports that have been generated for this project. This is a new project, so no reports are currently available. 3. Return to the Tools tab. 27 The Input Data Source node read data sources and defines their attributes for later processing by Enterprise Miner. The Sampling node enables you to take random, stratified random samples, and cluster samples of datasets. Sampling is recommended for extremely large databases because it can significantly decrease model training time. If the sample is sufficiently representative, relationships found in the sample can be expected to generalize to the complete data set. The sampling node writes the sampled observations to an output data set and saves the seed values that are used to generate the random numbers for the samples so that you may replicate the samples. 28 Sample Nodes Explore Nodes The Data Partition node enables you to partition data sets into training, test, and validation data sets. The training data set is used for preliminary model fitting. The validation data set is used to monitor and tune the model weights during estimation and is also used for model assessment. The test data set is an additional holdout data set and is also used for model assessment. The test data set is an additional holdout data set that you can use for model assessment. This node uses simple random sampling, stratified random sampling, or user-defined partitions to create partitioned data sets. The Distribution Explorer node is a visualization tool that enables you quickly and easily to explore large volumes of data in multidimensional histograms. You can view the distribution of up to three variables at a time with this node. When the variable is binary, nominal, or ordinal, you can select specific values to exclude from the chart. To exclude extreme values for interval variables, you can set a range cutoff. The node also generates summary statistics for the charting variables
6 Explore Nodes Explore Nodes The Multiplot node is another visualization tool that enables you to explore larger volumes of data graphically. Unlike the Insight or Distribution Explorer nodes, the Multiplot node automatically creates bar charts and scatter plots for the input and target variables without making several menu or window item selections. The code created by this node can be used to create graphs in a batch environment, whereas the Insight and Distribution explore nodes must be run interactively. 31 The Insight node enable you to open a SAS/INSIGHT session. SAS/INSIGHT software is an interactive tool for data exploration and analysis with it you explore data through graphs and analyses that are linked across multiple windows. You can analyze univariate distributions, investigate multivariate distributions, and fit explanatory models using generalized linear models. The Association node enables you to identify association relationships within the data. For example, if a customer buys a loaf of bread, how likely is the customer to also buy a gallon of milk? The node also enables you to perform sequence discovery if a time stamp variable (a sequence variable ) is present in the data set. 32 Explore Nodes Modify Nodes The Variable Selection node enables you to evaluate the importance of input variables in predicting or classifying the target variable. To select the important inputs, the node uses either an R-square or a Chi-square selection (tree based) criterion. The R-square criterion enables you to remove variables in hierarchies, remove variables that have large percentages of missing values, and remove class variables that are based on the number of unique values. The variables that are not related to the target are set to a status of rejected. Although rejected variables are passed to subsequent nodes in the process flow diagram, these variables are not used as model inputs by a more detailed modeling node, such as the Neural Network and Tree nodes. You can reassign the input model status to rejected variables. The Data Set Attributes node enables you to modify data set attributes, such as data set names, descriptions, and roles. You can also use this node to modify the metadata sample that is associated with a data set and specify target profiles for a target. An example of a useful Data Set Attributes application is to generate a data set in the SAS Code node and then modify its metadata sample with this node Modify Nodes The Transform Variables nodes enables you to transform variables be taking the square root of a variable, by taking the natural logarithm, maximizing the correlation with the target, or normalizing a variable. Additionally, then ode supports userdefined formulas for transformations and provides a visual interface for grouping interval-valued variables into buckets or quantiles. This node also automatically bins interval variables into buckets using a decision tree based algorithm. Transforming variables to similar scale and variability may improve the fit of models and, subsequently, the classification and prediction precision of fitted models. 35 Modify Nodes The Filter Outliers node enables you to identify and remove outliers from data sets. Checking for outliers is recommended as outliers may greatly affect modeling results and, subsequently, the classification and prediction precision of fitted models. The Replacement node enables you to impute (fill in )values for observations that have missing values. You can replace missing values for interval variables with the mean, median, midrange, mid-minimum spacing, distribution based replacement, or use a replacement M-estimator such as Tukey s biweight, Huber s, or Andrew s Wave, you can also estimate the replacement values for each interval input by using a tree=based imputation method. Missing values for class variables can be replaced with the most frequently occurring value, distributionbased replacement, tree-based imputation, or a constant. 36 6
7 Modify Nodes Modify Nodes The Clustering node enables you to segment your data; that is, it enables you to identify data observations that are similar in some way. Observations that are similar tend to be in the same cluster, and observations that are different tend to be in different clusters. The cluster identifier for each observation can be passed to other nodes for use as an input, ID, or target variable. It also be passed as a group variable that enables you to automatically construct separate models for each group. The SOM/Kohonen node generates self-organizing maps, Kohonen network, and vector quantization networks. Essentially the node performs unsupervised learning in which it attempts to learn the structure of the data. As with the Clustering node, after the network maps have been created, the characteristics can be examined graphically using the results browser. The node provides the analysis results in the form of an interactive map illustrating the characteristics of the clusters. Furthermore, it provides a report indicating the importance of each variable Model Nodes Model Nodes The Regression node enables you to fit both linear and logistic regression models to your data. You can use continuous, ordinal, and binary target variables. You can use both continuous and discrete variables as inputs. The node supports the stepwise, forward, and backward selection methods. A point-and click interaction builder enables you to create higher-order modeling terms. The Tree node enables you to perform multi-way splitting of your database based on nominal, ordinal, and continuous variables. This is the SAS System implementation of decision trees, which represents a hybrid of the best of CHAID, CART, and C4.5 algorithms. The node supports both automatic and interactive training. When you run the Tree node in automatic mode, it automatically ranks the input variables based on the strength of their contribution to the tree. This ranking may be used to select variables for use in subsequent modeling. In addition, dummy variables can be generated for use in subsequent modeling. You may override any automatic step with the option to define a splitting rule and prune explicit nodes or subtrees. Interactive training enables you to explore and evaluate a large set of trees as you develop them Model Nodes Model Nodes The Neural Network node enables you to construct, train and validate multilayer feed-forward neural networks. By default, the Neural Network node consisting of three neurons. In general, each input is fully connected to the first hidden layer, each hidden layer is fully connected to the next hidden layer, and the last hidden layer is fully connected to the output. The Neural Network node supports may variations of this general form. The User Defined Model node enables you to generate assessment statistics using predicted values from a model that you built with the SAS Code node (for example, a logistic model using the SAS/STAT LOGISTIC procedure) or the Variable Selection node. The predicted values can also be saved to a SAS data set and then imported into the process flow with the Input Data Source node. 41 The Ensemble node enables you to combine models. the usual combination function is the mean. Ensemble models are expected to exhibit greater stability than individual models. they are most effective when the individual models exhibit lower correlations. The node creates three different types of ensembles: 1. Combined model for example, combining a decision tree and a neural network model. the combination function is the mean of the predicted values. 2. Stratified model performing group processing over variables values. In this case, there is no combination function because each row in the data set is scored by a single model that is dependent on the value of one or more variables. 3. Bagging/boosting models performing group processing with resampling. The combination function is the mean of the predicted values. Each observation in the data set is scored by n models and the probabilities are averaged. The only difference between bagging and boosting is that with boosting an intermediary data set is scored for use by the resampling algorithm. 42 7
8 Assess Nodes Assess Nodes The Assessment node provides a common framework for comparing models and predictions from any of the modeling nodes (Regression, Tree, Neural Network, and User Defined Model nodes). The comparison is based on the expected and actual profits or losses that would result from implementing the model. the node produces the following charts that help to describe the usefulness of the model: lift, profit, return on investment, receiver operating curves, diagnostic charts, and threshold-based charts. The Score node enables you to generate and manage predicted values from a trained model. scoring formulas are created for both assessment and prediction. Enterprise Miner generates and manages scoring formulas in the form of SAS DATA step code, which can be used in most SAS environments even without the presence of Enterprise Miner. The Report node assembles the results from a process flow analysis into an HTML report that can be viewed with your favorite web browser,. Each report contains header information, an image of the process flow diagram, and a separate report for each node in the flow. Reports are managed in the Reports tab of the Project Navigator Utility Nodes Utility Nodes The Group Processing node enables you to perform group by processing for class variables such as GENDER. You can also use this node to analyze multiple targets, and process the same data source repeatedly by setting the group-processing mode to index. The Data Mining Database bode enables you to create a data mining database (DMDB) for batch processing. For nonbatch processing, DMDBs are automatically created as they are needed. The SAS Code node enables you to incorporate new or existing SAS code into process flow diagrams. The ability to write SAS code enables you to include additional SAS system procedures into your data mining analysis. You can also use a SAS DATA step to create customized scoring code., to conditionally process data, and to concatenate or to merge existing data sets. The node provides a macro facility to dynamically reference data sets used for training, validation, testing or scoring and variables, such as input, target, and predict variables. After you run the SAS Code node, the results and the data sets can then be exported for use by subsequent nodes in the diagram Utility Nodes Some general usage rules for nodes The Control Point node enables you to establish a control point to reduce the number of connections that are made in process flow diagrams. For example, suppose three Input Data Source nodes are to be connected to three modeling nodes. If no Control Point node is used, then nine connections are required to connect all of the Input Data Source nodes to all of the modeling nodes. However, if a Control Point node is used, only six connections are required. The Subdiagram node enables you to group a portion of a process flow diagram into a subdiagram. For complex process flow diagrams, you may want to create subdiagrams to better design and control the process flow. 47 The Input Data Source cannot be preceded by any other node. The Sampling node must be preceded by a node that exports a data set. The Assessment node must be preceded by one or more modeling nodes. The Score node must be preceded by a node that produces score code. For example, the Modeling nodes produce score code. The SAS Code node can be defined in any stage of the process flow diagram. It does not require an input data set defined in the Input Data Source node. 48 8
9 Real Case: Telecom Network Fault Prediction Presented by Dr Yuen in SAS Seminars and Workshop, February 2001 Problem Formulation Overview Messages about network performances are generated from transmission stations Messages are examined manually Messages are classified as urgent fault or nonurgent fault To build a model to predict whether a received message signals an urgent fault or not The Data Problem Formulation 5,924 past messages were collected Each message contains 1,082 variables Each message was examine manually The decision "Urgent" or "Non-Urgent" was set as the target variable Urgent case = "True" Non-Urgent case = "Null" Problem Formulation Distribution of the Target Variable Null True Problem Formulation Selection of Cases Use the Sampling node of Enterprise Miner (EM) to select a sample Variable Selection Using all of the variables in the model is not practical Impractical to examine the associations between the target variable and the other input variables manually The Tree node and the Variable Selection node of Enterprise Miner were employed
10 Variable Selection Some results from Tree1 Model Development Data are partitioned into three parts Training (50%) Validation (25%) Testing (25%) Neural Network A total of 23 variables are selected as input Model Development Process flow Implementing the Model An incoming signal with predicted Prob(target variable = "True) = p Model Manager Class 1 p 0.5 else Class 2 p 0.15 Class 3 Based on some criteria, choose a neural network model with 57 the most predictive power Send technician Examine the signal manually Ignore the signal Benefits:Saving in manpower; Faster response time to problems 58 Real Case: Safeway UK 1. Find a certain lowselling yogurt product (rank 209) 2. It was purchased by the top-spending 25% of its customers 3. Identify items that sell well together and adjust in-store promotions, store layout and special offers to generate more profits 59 10
Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition
Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2003. Data Mining Using SAS Enterprise
Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA
Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA An Overview of SAS Enterprise Miner The following article is in regards to Enterprise Miner v.4.3 that is available in SAS v9.1.3.
A fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
SAS ENTERPRISE MINER 5.3
FACT SHEET SAS ENTERPRISE MINER 5.3 Unearthing valuable insight profitable data mining results with less time and effort What does SAS Enterprise Miner do? SAS Enterprise Miner streamlines the data mining
APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING
Wrocław University of Technology Internet Engineering Henryk Maciejewski APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING PRACTICAL GUIDE Wrocław (2011) 1 Copyright by Wrocław University of Technology
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Data Mining with SAS. Mathias Lanner [email protected]. Copyright 2010 SAS Institute Inc. All rights reserved.
Data Mining with SAS Mathias Lanner [email protected] Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA
Applying Data Mining Techniques Using SAS Enterprise Miner. Course Notes
Applying Data Mining Techniques Using SAS Enterprise Miner Course Notes Applying Data Mining Techniques Using SAS Enterprise Miner Course Notes was developed by Sue Walsh. Some of the course notes is based
Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine
Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining
A Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
Data mining and statistical models in marketing campaigns of BT Retail
Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry
Advances in Natural and Applied Sciences, 3(1): 73-78, 2009 ISSN 1995-0772 2009, American Eurasian Network for Scientific Information This is a refereed journal and all articles are professionally screened
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
An Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX [email protected] Carole Jesse Cargill, Inc. Wayzata, MN [email protected]
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. Developing
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
MBA 8473 - Data Mining & Knowledge Discovery
MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various
KnowledgeSEEKER Marketing Edition
KnowledgeSEEKER Marketing Edition Predictive Analytics for Marketing The Easiest to Use Marketing Analytics Tool KnowledgeSEEKER Marketing Edition is a predictive analytics tool designed for marketers
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Neural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Joseph Twagilimana, University of Louisville, Louisville, KY
ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1. 15.7 Analytics and Data Mining 1
M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1 15.7 Analytics and Data Mining 15.7 Analytics and Data Mining 1 Section 1.5 noted that advances in computing processing during the past 40 years have
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out
CART 6.0 Feature Matrix
CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America
Application of SAS! Enterprise Miner in Credit Risk Analytics Presented by Minakshi Srivastava, VP, Bank of America 1 Table of Contents Credit Risk Analytics Overview Journey from DATA to DECISIONS Exploratory
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected]
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
Decision Trees What Are They?
Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies
WHITEPAPER Today, leading companies are looking to improve business performance via faster, better decision making by applying advanced predictive modeling to their vast and growing volumes of data. Business
Enhancing Compliance with Predictive Analytics
Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue [email protected] Sifting through a Gold Mine of Tax Data
A Demonstration of Hierarchical Clustering
Recitation Supplement: Hierarchical Clustering and Principal Component Analysis in SAS November 18, 2002 The Methods In addition to K-means clustering, SAS provides several other types of unsupervised
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Data Mining from A to Z: Better Insights, New Opportunities WHITE PAPER
Data Mining from A to Z: Better Insights, New Opportunities WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 How Do Predictive Analytics and Data Mining Work?.... 2 The Data Mining Process....
SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING
SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING WELCOME TO SAS VISUAL ANALYTICS SAS Visual Analytics is a high-performance, in-memory solution for exploring massive amounts
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
Regression Clustering
Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm
Make Better Decisions Through Predictive Intelligence
IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly
How To Use Data Mining For Loyalty Based Management
Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland [email protected],
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
The Basics of SAS Enterprise Miner 5.2
The Basics of SAS Enterprise Miner 5.2 1.1 Introduction to Data Mining...1 1.2 Introduction to SAS Enterprise Miner 5.2...4 1.3 Exploring the Data Set... 14 1.4 Analyzing a Sample Data Set... 19 1.5 Presenting
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
Introduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model
Creating a Scoring Application Based on a Decision Tree Model This Quick Start guides you through creating a credit-scoring application in eight easy steps. Quick Start Century Corp., an electronics retailer,
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics
White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does
Getting Started with SAS Enterprise Miner 7.1
Getting Started with SAS Enterprise Miner 7.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Getting Started with SAS Enterprise Miner 7.1.
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry
Paper 1808-2014 Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Kittipong Trongsawad and Jongsawas Chongwatpol NIDA Business School, National Institute
IBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM
Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional
GeoGebra Statistics and Probability
GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:
Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive
Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide
Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide Olivia Parr-Rud From Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner. Full book available
Polynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
