Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller


 Felicity Randall
 1 years ago
 Views:
Transcription
1 Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive Modeling Applications Predictive Modeling Database marketing Financial risk management Fraud detection Process monitoring Pattern detection 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Numeric or categorical values 3 4
2 Predictive Modeling Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Only input values known Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs 5 6 Predictions Predictive Modeling Essentials 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Predict s 7 8
3 Predictive Modeling Essentials Three Prediction Types Predict s 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Decisions Rankings Estimates 9 10 Decision Predictions Ranking Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Decisions primary secondary tertiary primary secondary Trained model uses input measurements to make best decision for each. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Rankings Trained model uses input measurements to optimally rank each
4 Estimate Predictions Model Essentials Predict Review 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Estimates Trained model uses input measurements to optimally estimate value. Predict s Decide, rank, estimate Model Essentials Select Review Curse of Dimensionality Predict s 1 D 2 D 3 D 15 16
5 Input Selection Model Essentials Select Review Redundancy Irrelevancy Predict s Decide, rank, estimate Eradicate redundancies irrelevancies Model Essentials Optimize Fool s Gold Predict s My model fits the training data perfectly... I ve struck it rich! 19 20
6 Model Complexity Data Splitting Too flexible Not flexible enough Role Validation Data Role 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Training data gives sequence of predictive models with increasing complexity. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation data helps select best model from sequence 23 24
7 Validation Data Role Model Essentials Optimize 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation data helps select best model from Sequence. Predict s Decide, rank, estimate Eradicate redundancies irrelevancies Tune models with validation data Agenda Predictive Modeling Tools Introduktion till Prediktiva modeller Beslutsträd Primary Decision Tree Regression Neural Network Pruning Regressioner Specialty Dmine Regression MBR AutoNeural Neurala Nätverk Utvärdering av modeller Rule Induction DMNeural Multiple Model Ensemble Two Stage 27 28
8 Predictive Modeling Tools Predictive Modeling Tools Primary Decision Tree Regression Neural Network Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage Multiple Model Ensemble Two Stage Model Essentials Decision Trees Simple Prediction Illustration Predict s Prediction rules Split search Pruning Analysis goal: Predict the color of a dot based on its location in a scatter plot
9 Model Essentials Decision Trees Decision Tree Prediction Rules Predict s Prediction rules Split search Pruning 40% leaf node root node < interior node < < % 70% 55% Decision Tree Prediction Rules Decision Tree Prediction Rules Decision = Estimate = % < < < % 70% 55% % < < < % 70% 55%
10 Model Essentials Decision Trees Predict s Prediction rules Demo Beslutsträd Split search Pruning Agenda Model Essentials Decision Trees Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction rules Pruning Regressioner Split search Neurala Nätverk Utvärdering av modeller Pruning 39 40
11 Binary Targets Decision Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 primary outcome secondary outcome 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Accuracy/Profit true positive true negative Focus on correct decisions Decision Assessment (for Pessimists) Ranking Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Misclassification/Loss false positive false negative 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings Concordance rank(=1) > rank(=0) Focus on incorrect decisions Focus on correct ordering 43 44
12 Ranking Assessment (for Pessimists) Estimate Assessment (only Pessimistic!) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings Discordance rank(=1) < rank(=0) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Estimates Squared Error (estimate) 2 Focus on incorrect ordering Focus on incorrect estimation Predictive Modeling Assessments Optimistic Assessment, Pessimistic Stats 42% Misclassification D 1 D 0 R r Decisions Rankings Accuracy Misclassification Concordance Discordance 40% 38% 45% 40% 35% 0.26 Discordance Squared Error Increasing leaf count improves assessment measures on training data p 1 Estimates Squared Error
13 Unbiased Assessment 42% 40% 38% Misclassification Demo på Pruning 45% 40% 35% Discordance Squared Error Increasing leaf count might worsen assessment measures on validation data Model Essentials Regressions Model Essentials Regressions Predict s Prediction formula Predict s Prediction formula Sequential selection Sequential selection Optimal sequence model Optimal sequence model 51 52
14 Linear Regression Prediction Formula Logistic Regression Prediction Formula intercept estimate input measurement y ^ = w^ 0 + w^ 1 + w^ 2 parameter estimate estimate log p^ 1 ^p ( ) = w^ 0 + w^ 1 + w^ 2 logit scores Choose intercept and parameter estimates to minimize. squared error function ( y i y ^ i ) 2 training data Choose intercept and parameter estimates to maximize. loglikelihood function log(p ^ i ) + log(1 ^ p i ) primary outcome training s secondary outcome training s Logit Link Function Simple Prediction Illustration Regressions log 5 p^ 1 ^p ( ) logit link function = w^ 0 + w^ 1 + w^ 2 doubling amount x i logit scores consequence 1 odds exp(w i ) 0.69 odds 2 w i Model interpretation odds ratio ^ p = logit equation logit( p ^) = w^ 0 + w^ 1 + w^ e logit( p ^ ) logistic equation
15 Beyond the Prediction Formula Missing Values and Regression Modeling Missing values Inputs Extreme or unusual values C A B Nonnumeric inputs Cases Nonlinearity and Nonadditivity Missing Values and Regression Modeling Missing Values and the Prediction Formula Inputs Prediction Formula: logit( p )= Cases New Case: (,, ) = ( 2,,1 ) Predicted Value: logit( p )=
16 Missing Value Causes Missing Value Remedies N/A Not applicable N/A Not applicable Synthetic distribution No match Nondisclosure No match Nondisclosure Estimation x i = f(,,x p ) Model Essentials Regressions Sequential Selection Forward Predict s Prediction formula Input pvalue Entry Cutoff Sequential selection Optimal sequence model 63 64
17 Sequential Selection Forward Sequential Selection Backward Input pvalue Entry Cutoff Input pvalue Stay Cutoff Sequential Selection Backward Sequential Selection Stepwise Input pvalue Stay Cutoff Input pvalue Entry Cutoff Stay Cutoff 67 68
18 Sequential Selection Stepwise Model Essentials Regressions Input pvalue Entry Cutoff Predict s Prediction formula Stay Cutoff Sequential selection Optimal sequence model Model Fit versus Complexity Model fit statistic Select Model with Optimal Validation Fit Model fit statistic validation training Evaluate each sequence step. Choose simplest optimal model. Evaluate each sequence step. Choose simplest optimal model
19 Agenda Model Essentials Neural Networks Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction formula Pruning Regressioner None Neurala Nätverk Utvärdering av modeller Stopped training Model Essentials Neural Networks Neural Network Prediction Formula Predict s Prediction formula estimate weight estimate hidden unit ^ ^ ^ ^ ^ y = w 00 + w 01 H 1 + w 02 H 2 + w 03 H 3 bias estimate None Stopped training tanh 5 H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) activation function 75 76
20 Neural Network Binary Prediction Formula Neural Network Diagram ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 logit link function H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) input layer ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 H 1 H 2 H 3 hidden layer y layer H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) Prediction Illustration Neural Networks Agenda logit( p ^) = w^ 00 +w^ 01 H 1 +w^ 02 H 2 +w^ 03 H 3 ^ p = logit equation H 1 = tanh( w^ 10 +w^ 11 +w^ 12 ) H 2 = tanh( w^ 20 +w^ 21 +w^ 22 ) H 3 = tanh( w^ 30 +w^ 31 +w^ 32 ) e logit( p ^ ) logistic equation Introduktion till Prediktiva modeller Beslutsträd Pruning Regressioner Neurala Nätverk Utvärdering av modeller 79 80
21 Assessment Types Summary Statistics Summary The Model Comparison tool provides KS C ASE Summary statistics Statistical graphics Prediction Type Decisions 1,2,3, Rankings Statistic Accuracy / Misclassification Profit / Loss KSstatistic ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood Summary Statistics Summary Summary Statistics Summary Prediction Type Statistic Prediction Type Statistic Decisions Accuracy / Misclassification Profit / Loss KSstatistic Decisions Accuracy / Misclassification Profit / Loss KSstatistic 1,2,3, Rankings ROC Index (concordance) Gini coefficient 1,2,3, Rankings ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood ^ p E(Y) Estimates Average squared error SBC / Likelihood 83 84
22 Statistical Graphics Summary Prediction Type Statistic Decisions 1,2,3, ^ p E(Y) Rankings Estimates Sensitivity charts Response rate charts 85
Decision Trees and other predictive models. Mathias Lanner SAS Institute
Decision Trees and other predictive models Mathias Lanner SAS Institute Agenda Introduction to Predictive Models Decision Trees Pruning Regression Neural Network Model Assessment 2 Predictive Modeling
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationInternet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict HighRisk Internet Gamblers
Paper 18632014 Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict HighRisk Internet Gamblers Sai Vijay Kishore Movva, Vandana Reddy and Dr. Goutam Chakraborty;
More informationEnhancing Compliance with Predictive Analytics
Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue reid.linn@state.tn.us Sifting through a Gold Mine of Tax Data
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationData Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA
Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA An Overview of SAS Enterprise Miner The following article is in regards to Enterprise Miner v.4.3 that is available in SAS v9.1.3.
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationA fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA022015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationClassification and Regression Trees
Classification and Regression Trees Bob Stine Dept of Statistics, School University of Pennsylvania Trees Familiar metaphor Biology Decision tree Medical diagnosis Org chart Properties Recursive, partitioning
More informationNew Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationPredictive Modeling of Titanic Survivors: a Learning Competition
SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Regression Model I. Regression Node
Enterprise Miner  Regression 1 ECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"!"#"$%&#'()*+',$$.&#',/"0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More information1/2/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors
More informationA Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 114222016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationSPSS Multivariable Linear Models and Logistic Regression
1 SPSS Multivariable Linear Models and Logistic Regression Multivariable Models Single continuous outcome (dependent variable), one main exposure (independent) variable, and one or more potential confounders
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationDECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 knearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationFree Trial  BIRT Analytics  IAAs
Free Trial  BIRT Analytics  IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis
More informationTHE HYBRID CARTLOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CATLOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most datamining projects involve classification problems assigning objects to classes whether
More informationIndiana State Core Curriculum Standards updated 2009 Algebra I
Indiana State Core Curriculum Standards updated 2009 Algebra I Strand Description Boardworks High School Algebra presentations Operations With Real Numbers Linear Equations and A1.1 Students simplify and
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationWhat is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling
MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining
More informationDeveloping Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. Developing
More informationData Mining Using SAS Enterprise Miner 7.1
Data Mining Using SAS Enterprise Miner 7.1 Lorne Rothman Lorne.rothman@sas.com Principal Statistician SAS Institute (Canada) Inc. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining The
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationResult for G.S. Test2 held on 7th February 2016
Result for G.S. Test2 held on 7th February 2016 Admin. No. Correct Incorrect +ve Marks ve Marks Score Accuracy Rank 550166 79 0 158 0 158 100.00 1 220123 80 13 1608.66 151.34 86.02 2 550069 75 5 1503.33
More informationA Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D022009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS1332014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationAn Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
More informationPredictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble
Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble Dr. Hongwei Patrick Yang Educational Policy Studies & Evaluation College of Education University of Kentucky
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationRole of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationApplied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets
Applied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets http://info.salfordsystems.com/jsm2015ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationChapter 3 Introduction to Predictive Modeling: Predictive Modeling Fundamentals and Decision Trees
Chapter 3 Introduction to Predictive Modeling: Predictive Modeling Fundamentals and Decision Trees 3.1 Creating Training and Validation Data... 32 3.2 Constructing a Decision Tree Predictive Model...
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationJetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
More informationMACHINE LEARNING AN INTRODUCTION
AN INTRODUCTION JOSEFIN ROSÉN, SENIOR ANALYTICAL EXPERT, SAS INSTITUTE JOSEFIN.ROSEN@SAS.COM TWITTER: @ROSENJOSEFIN AGENDA What is machine learning? When, where and how is machine learning used? Exemple
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationModeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector  The case CEMIG
Paper 34062015 Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector  The case CEMIG Sérgio Henrique Rodrigues Ribeiro, CEMIG; Iguatinan
More informationTitle. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 810 December 2010.
Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 810 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling
More informationPredictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationPredictive Analytics Modeling Methodology Document
Predictive Analytics Modeling Methodology Document Campaign Response Modeling 17 October 2012 Version details Version number Date Author Reviewer name 1.0 16 October 2012 Vikash chandra CONTENTS 1. TRAINING
More informationAPPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING
Wrocław University of Technology Internet Engineering Henryk Maciejewski APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING PRACTICAL GUIDE Wrocław (2011) 1 Copyright by Wrocław University of Technology
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationEnterprise Miner  Decision tree 1
Enterprise Miner  Decision tree 1 ECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Decision Tree I. Tree Node Setting Tree Node Defaults  define default options that you commonly use
More informationElementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination
Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationIdentifying and Overcoming Common Data Mining Mistakes Doug Wielenga, SAS Institute Inc., Cary, NC
Identifying and Overcoming Common Data Mining Mistakes Doug Wielenga, SAS Institute Inc., Cary, NC ABSTRACT Due to the large amount of data typically involved, data mining analyses can exacerbate some
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationdata on Down's syndrome
DATA a; INFILE 'downs.dat' ; INPUT AgeL AgeU BirthOrd Cases Births ; MidAge = (AgeL + AgeU)/2 ; Rate = 1000*Cases/Births; LogRate = Log( (Cases+0.5)/Births ); LogDenom = Log(Births); age_c = MidAge  30;
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationInsurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationDetecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo
Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable
More informationSmart Sell Requote project for an Insurance company.
SAS Analytics Day Smart Sell Requote project for an Insurance company. A project by Ajay Guyyala Naga Sudhir Lanka Narendra Babu Merla Kiran Reddy Samiullah Bramhanapalli Shaik Business Situation XYZ
More informationSAS ENTERPRISE MINER 5.3
FACT SHEET SAS ENTERPRISE MINER 5.3 Unearthing valuable insight profitable data mining results with less time and effort What does SAS Enterprise Miner do? SAS Enterprise Miner streamlines the data mining
More informationThe Big 50 Revision Guidelines for S1
The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand
More informationSection 6: Model Selection, Logistic Regression and more...
Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building
More informationLecture 16: Logistic regression diagnostics, splines and interactions. Sandy Eckel 19 May 2007
Lecture 16: Logistic regression diagnostics, splines and interactions Sandy Eckel seckel@jhsph.edu 19 May 2007 1 Logistic Regression Diagnostics Graphs to check assumptions Recall: Graphing was used to
More informationWeight of Evidence Module
Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring  Overview Random Forest  Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationMORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 26387099, email: irina.genriha@inbox.
MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 2638799, email: irina.genriha@inbox.lv Since September 28, when the crisis deepened in the first place
More informationAccurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
More informationStat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
More informationMethods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA012012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
More informationClassification and regression trees
Classification and regression trees December 9 Introduction We ve seen that local methods and splines both operate by partitioning the sample space of the regression variable(s), and then fitting separate/piecewise
More informationNeural Networks & Boosting
Neural Networks & Boosting Bob Stine Dept of Statistics, School University of Pennsylvania Questions How is logistic regression different from OLS? Logistic mean function for probabilities Larger weight
More informationKey Topics What will ALL students learn? What will the most able students learn?
2013 2014 Scheme of Work Subject MATHS Year 9 Course/ Year Term 1 Key Topics What will ALL students learn? What will the most able students learn? Number Written methods of calculations Decimals Rounding
More informationUse Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study
Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008
More informationReevaluating Policy and Claims Analytics: a Case of NonFleet Customers In Automobile Insurance Industry
Paper 18082014 Reevaluating Policy and Claims Analytics: a Case of NonFleet Customers In Automobile Insurance Industry Kittipong Trongsawad and Jongsawas Chongwatpol NIDA Business School, National Institute
More informationSimple Methods and Procedures Used in Forecasting
Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria JadamusHacura What Is Forecasting? Prediction of future events
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Email Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 22773878, Volume1, Issue6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationStep 1 MATHS. To achieve Step 1 in Maths students must master the following skills and competencies: Number. Shape. Algebra
MATHS Step 1 To achieve Step 1 in Maths students must master the following skills and competencies: Number Add and subtract positive decimal numbers Add and subtract negative numbers in context Order decimal
More informationGCSE Statistics Revision notes
GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic
More informationVariable Selection and Transformation of Variables in SAS Enterprise Miner
Variable Selection and Transformation of Variables in SAS Enterprise Miner Kattamuri S. Sarma, Ph.D Ecostat Research Corp., White Plains NY kssarma@worldnet.att.net kssarma@ecostatresearch.com 2 Issues
More informationSemester 1 Statistics Short courses
Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical
More informationData Mining: A Magic Technology for College Recruitment. Tongshan Chang, Ed.D.
Data Mining: A Magic Technology for College Recruitment Tongshan Chang, Ed.D. Principal Administrative Analyst Admissions Research and Evaluation The University of California Office of the President Tongshan.Chang@ucop.edu
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationCollege Tuition: Data mining and analysis
CS105 College Tuition: Data mining and analysis By Jeanette Chu & Khiem Tran 4/28/2010 Introduction College tuition issues are steadily increasing every year. According to the college pricing trends report
More informationThe general form of the PROC GLM statement is
Linear Regression Analysis using PROC GLM Regression analysis is a statistical method of obtaining an equation that represents a linear relationship between two variables (simple linear regression), or
More information