Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
|
|
- Felicity Randall
- 8 years ago
- Views:
Transcription
1 Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive Modeling Applications Predictive Modeling Database marketing Financial risk management Fraud detection Process monitoring Pattern detection 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Numeric or categorical values 3 4
2 Predictive Modeling Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Only input values known Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs 5 6 Predictions Predictive Modeling Essentials 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Predict s 7 8
3 Predictive Modeling Essentials Three Prediction Types Predict s 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Decisions Rankings Estimates 9 10 Decision Predictions Ranking Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Decisions primary secondary tertiary primary secondary Trained model uses input measurements to make best decision for each. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Rankings Trained model uses input measurements to optimally rank each
4 Estimate Predictions Model Essentials Predict Review 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Estimates Trained model uses input measurements to optimally estimate value. Predict s Decide, rank, estimate Model Essentials Select Review Curse of Dimensionality Predict s 1 D 2 D 3 D 15 16
5 Input Selection Model Essentials Select Review Redundancy Irrelevancy Predict s Decide, rank, estimate Eradicate redundancies irrelevancies Model Essentials Optimize Fool s Gold Predict s My model fits the training data perfectly... I ve struck it rich! 19 20
6 Model Complexity Data Splitting Too flexible Not flexible enough Role Validation Data Role 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Training data gives sequence of predictive models with increasing complexity. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation data helps select best model from sequence 23 24
7 Validation Data Role Model Essentials Optimize 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation data helps select best model from Sequence. Predict s Decide, rank, estimate Eradicate redundancies irrelevancies Tune models with validation data Agenda Predictive Modeling Tools Introduktion till Prediktiva modeller Beslutsträd Primary Decision Tree Regression Neural Network Pruning Regressioner Specialty Dmine Regression MBR AutoNeural Neurala Nätverk Utvärdering av modeller Rule Induction DMNeural Multiple Model Ensemble Two Stage 27 28
8 Predictive Modeling Tools Predictive Modeling Tools Primary Decision Tree Regression Neural Network Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage Multiple Model Ensemble Two Stage Model Essentials Decision Trees Simple Prediction Illustration Predict s Prediction rules Split search Pruning Analysis goal: Predict the color of a dot based on its location in a scatter plot
9 Model Essentials Decision Trees Decision Tree Prediction Rules Predict s Prediction rules Split search Pruning 40% leaf node root node < interior node < < % 70% 55% Decision Tree Prediction Rules Decision Tree Prediction Rules Decision = Estimate = % < < < % 70% 55% % < < < % 70% 55%
10 Model Essentials Decision Trees Predict s Prediction rules Demo Beslutsträd Split search Pruning Agenda Model Essentials Decision Trees Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction rules Pruning Regressioner Split search Neurala Nätverk Utvärdering av modeller Pruning 39 40
11 Binary Targets Decision Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 primary outcome secondary outcome 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Accuracy/Profit true positive true negative Focus on correct decisions Decision Assessment (for Pessimists) Ranking Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Misclassification/Loss false positive false negative 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings Concordance rank(=1) > rank(=0) Focus on incorrect decisions Focus on correct ordering 43 44
12 Ranking Assessment (for Pessimists) Estimate Assessment (only Pessimistic!) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings Discordance rank(=1) < rank(=0) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Estimates Squared Error (-estimate) 2 Focus on incorrect ordering Focus on incorrect estimation Predictive Modeling Assessments Optimistic Assessment, Pessimistic Stats 42% Misclassification D 1 D 0 R r Decisions Rankings Accuracy Misclassification Concordance Discordance 40% 38% 45% 40% 35% 0.26 Discordance Squared Error Increasing leaf count improves assessment measures on training data p 1 Estimates Squared Error
13 Unbiased Assessment 42% 40% 38% Misclassification Demo på Pruning 45% 40% 35% Discordance Squared Error Increasing leaf count might worsen assessment measures on validation data Model Essentials Regressions Model Essentials Regressions Predict s Prediction formula Predict s Prediction formula Sequential selection Sequential selection Optimal sequence model Optimal sequence model 51 52
14 Linear Regression Prediction Formula Logistic Regression Prediction Formula intercept estimate input measurement y ^ = w^ 0 + w^ 1 + w^ 2 parameter estimate estimate log p^ 1 ^p ( ) = w^ 0 + w^ 1 + w^ 2 logit scores Choose intercept and parameter estimates to minimize. squared error function ( y i y ^ i ) 2 training data Choose intercept and parameter estimates to maximize. log-likelihood function log(p ^ i ) + log(1 ^ p i ) primary outcome training s secondary outcome training s Logit Link Function Simple Prediction Illustration Regressions log 5 p^ 1 ^p ( ) logit link function = w^ 0 + w^ 1 + w^ 2 doubling amount x i logit scores consequence 1 odds exp(w i ) 0.69 odds 2 w i Model interpretation odds ratio ^ p = logit equation logit( p ^) = w^ 0 + w^ 1 + w^ e -logit( p ^ ) logistic equation
15 Beyond the Prediction Formula Missing Values and Regression Modeling Missing values Inputs Extreme or unusual values C A B Non-numeric inputs Cases Nonlinearity and Non-additivity Missing Values and Regression Modeling Missing Values and the Prediction Formula Inputs Prediction Formula: logit( p )= Cases New Case: (,, ) = ( 2,,-1 ) Predicted Value: logit( p )=
16 Missing Value Causes Missing Value Remedies N/A Not applicable N/A Not applicable Synthetic distribution No match Non-disclosure No match Non-disclosure Estimation x i = f(,,x p ) Model Essentials Regressions Sequential Selection Forward Predict s Prediction formula Input p-value Entry Cutoff Sequential selection Optimal sequence model 63 64
17 Sequential Selection Forward Sequential Selection Backward Input p-value Entry Cutoff Input p-value Stay Cutoff Sequential Selection Backward Sequential Selection Stepwise Input p-value Stay Cutoff Input p-value Entry Cutoff Stay Cutoff 67 68
18 Sequential Selection Stepwise Model Essentials Regressions Input p-value Entry Cutoff Predict s Prediction formula Stay Cutoff Sequential selection Optimal sequence model Model Fit versus Complexity Model fit statistic Select Model with Optimal Validation Fit Model fit statistic validation training Evaluate each sequence step. Choose simplest optimal model. Evaluate each sequence step. Choose simplest optimal model
19 Agenda Model Essentials Neural Networks Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction formula Pruning Regressioner None Neurala Nätverk Utvärdering av modeller Stopped training Model Essentials Neural Networks Neural Network Prediction Formula Predict s Prediction formula estimate weight estimate hidden unit ^ ^ ^ ^ ^ y = w 00 + w 01 H 1 + w 02 H 2 + w 03 H 3 bias estimate None Stopped training tanh 5 H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) activation function 75 76
20 Neural Network Binary Prediction Formula Neural Network Diagram ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 logit link function H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) input layer ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 H 1 H 2 H 3 hidden layer y layer H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) Prediction Illustration Neural Networks Agenda logit( p ^) = w^ 00 +w^ 01 H 1 +w^ 02 H 2 +w^ 03 H 3 ^ p = logit equation H 1 = tanh( w^ 10 +w^ 11 +w^ 12 ) H 2 = tanh( w^ 20 +w^ 21 +w^ 22 ) H 3 = tanh( w^ 30 +w^ 31 +w^ 32 ) e -logit( p ^ ) logistic equation Introduktion till Prediktiva modeller Beslutsträd Pruning Regressioner Neurala Nätverk Utvärdering av modeller 79 80
21 Assessment Types Summary Statistics Summary The Model Comparison tool provides KS C ASE Summary statistics Statistical graphics Prediction Type Decisions 1,2,3, Rankings Statistic Accuracy / Misclassification Profit / Loss KS-statistic ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood Summary Statistics Summary Summary Statistics Summary Prediction Type Statistic Prediction Type Statistic Decisions Accuracy / Misclassification Profit / Loss KS-statistic Decisions Accuracy / Misclassification Profit / Loss KS-statistic 1,2,3, Rankings ROC Index (concordance) Gini coefficient 1,2,3, Rankings ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood ^ p E(Y) Estimates Average squared error SBC / Likelihood 83 84
22 Statistical Graphics Summary Prediction Type Statistic Decisions 1,2,3, ^ p E(Y) Rankings Estimates Sensitivity charts Response rate charts 85
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationInternet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers
Paper 1863-2014 Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers Sai Vijay Kishore Movva, Vandana Reddy and Dr. Goutam Chakraborty;
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationEnhancing Compliance with Predictive Analytics
Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue reid.linn@state.tn.us Sifting through a Gold Mine of Tax Data
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationData Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA
Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA An Overview of SAS Enterprise Miner The following article is in regards to Enterprise Miner v.4.3 that is available in SAS v9.1.3.
More informationPredictive Modeling of Titanic Survivors: a Learning Competition
SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationA fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node
Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationA Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationAn Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationDeveloping Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. Developing
More informationA Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationWhat is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling
MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining
More informationTHE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
More informationPredictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble
Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble Dr. Hongwei Patrick Yang Educational Policy Studies & Evaluation College of Education University of Kentucky
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationFree Trial - BIRT Analytics - IAAs
Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis
More informationIndiana State Core Curriculum Standards updated 2009 Algebra I
Indiana State Core Curriculum Standards updated 2009 Algebra I Strand Description Boardworks High School Algebra presentations Operations With Real Numbers Linear Equations and A1.1 Students simplify and
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationDECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
More informationAPPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING
Wrocław University of Technology Internet Engineering Henryk Maciejewski APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING PRACTICAL GUIDE Wrocław (2011) 1 Copyright by Wrocław University of Technology
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationRole of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationPredictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationModeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG
Paper 3406-2015 Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG Sérgio Henrique Rodrigues Ribeiro, CEMIG; Iguatinan
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationTitle. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.
Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationPredictive Analytics Modeling Methodology Document
Predictive Analytics Modeling Methodology Document Campaign Response Modeling 17 October- 2012 Version details Version number Date Author Reviewer name 1.0 16 October- 2012 Vikash chandra CONTENTS 1. TRAINING
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationJetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
More informationMORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 26387099, e-mail: irina.genriha@inbox.
MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 2638799, e-mail: irina.genriha@inbox.lv Since September 28, when the crisis deepened in the first place
More informationUse Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study
Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationSAS ENTERPRISE MINER 5.3
FACT SHEET SAS ENTERPRISE MINER 5.3 Unearthing valuable insight profitable data mining results with less time and effort What does SAS Enterprise Miner do? SAS Enterprise Miner streamlines the data mining
More informationSmart Sell Re-quote project for an Insurance company.
SAS Analytics Day Smart Sell Re-quote project for an Insurance company. A project by Ajay Guyyala Naga Sudhir Lanka Narendra Babu Merla Kiran Reddy Samiullah Bramhanapalli Shaik Business Situation XYZ
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationMethods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationSimple Methods and Procedures Used in Forecasting
Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events
More informationS03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
More informationDetecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo
Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable
More informationIdentifying and Overcoming Common Data Mining Mistakes Doug Wielenga, SAS Institute Inc., Cary, NC
Identifying and Overcoming Common Data Mining Mistakes Doug Wielenga, SAS Institute Inc., Cary, NC ABSTRACT Due to the large amount of data typically involved, data mining analyses can exacerbate some
More informationWeight of Evidence Module
Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,
More informationSection 6: Model Selection, Logistic Regression and more...
Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationData Mining: A Magic Technology for College Recruitment. Tongshan Chang, Ed.D.
Data Mining: A Magic Technology for College Recruitment Tongshan Chang, Ed.D. Principal Administrative Analyst Admissions Research and Evaluation The University of California Office of the President Tongshan.Chang@ucop.edu
More informationCopyright 2006, SAS Institute Inc. All rights reserved. Predictive Modeling using SAS
Predictive Modeling using SAS Purpose of Predictive Modeling To Predict the Future x To identify statistically significant attributes or risk factors x To publish findings in Science, Nature, or the New
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationCollege Tuition: Data mining and analysis
CS105 College Tuition: Data mining and analysis By Jeanette Chu & Khiem Tran 4/28/2010 Introduction College tuition issues are steadily increasing every year. According to the college pricing trends report
More information1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
More informationData Mining mit der JMSL Numerical Library for Java Applications
Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale
More informationStat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationStudents' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationAccurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationCorrelation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers
Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationThe Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationClovis Community College Core Competencies Assessment 2014 2015 Area II: Mathematics Algebra
Core Assessment 2014 2015 Area II: Mathematics Algebra Class: Math 110 College Algebra Faculty: Erin Akhtar (Learning Outcomes Being Measured) 1. Students will construct and analyze graphs and/or data
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More information