Ensemble Modeling with R

Size: px
Start display at page:

Download "Ensemble Modeling with R"

Transcription

1 Doctoral Candidate/Merchandise Data Scientist MatthewALanham.com Virginia Tech Department of Business Information Technology Advance Auto Parts, Inc.

2 Outline Outline My Background and Research Pros and Cons of R for Data Science Modeling Using CRISP-DM Framework What is Ensemble Modeling? Fitting Models Bagging a decision tree Optimal Decision Cut Points for Binary Classification SEPTEMBER 15,

3 Background (2005) B.A. Economics/Mathematics, Indiana University-Bloomington ( ) Genscape, Inc., Louisville, KY Energy transparency start-up ( ) M.S. Biostatistics-Decision Science, University of Louisville ( ) M.S. Statistics, Virginia Tech ( Current) Ph.D. Business Information Technology, Virginia Tech ( Current) Advance Auto Parts, Inc. Fortune 500 Retailer (#402.. for now) Research Focus How can we build better predictive models that are empirically sound (stats) as input parameters to prescriptive models that are process representative (optimization) to provide the best (maintainable, timely, scalable, KPI fused) decision-support for a retailer s assortment plan? Why is assortment planning so important? Why is assortment planning problem so challenging? Where does Data Science & Big Data Analytics (BDA) come into play? Where does Information Technology (IT) come into play? Where does Business come into play?

4 Predictive and Prescriptive Analytics INTEGRATING PREDICTIVE AND PRESCRIPTIVE ANALYTICS Determine Optimal Solution (loop) Search Algorithm Prescriptive Model Optimality Conditions Decision Criteria Max Profit Decision Variables 1) Assortment 2) Prices 3) Promotion 4) Shelf Space* Decision Model Sales Model Performance Measures 1) Obj. Function - Revenue 2) Constraints - Budget(s) Estimation Model Data Predictive Model(s) Market Specs Demand Model Preference Structure Similarity Measures Utility Model Demand Forecast Parameter(s) Time Summary (TS) Ex: Sum, Avg. Time summary performance measures (TSPM) Scenario summary performance measures (SSPM) Scenario Summary (SS) Ex: Sum, Avg. Performance measures that are functions over a time horizon or random variables that must be summarized over their distributions SEPTEMBER 15,

5 Oracle + SPSS Modeler WHAT I M WORKING WITH CURRENTLY My opinion: IBM SPSS Modeler and SAS Enterprise Miner are: 1) Great for teaching 2) Great for stand-alone data mining projects 3) Visually appealing to Management 4) Not great for real-time production analytics 5) Not great for customized solutions 6) Not designed for prescriptive analytics An Example of an IBM SPSS Modeler stream building predictive models SEPTEMBER 17,

6 Data Mining, Data Science, and Predictive Modeling with R DATA SCIENCE WITH R R is an open-source and freely accessible software language under the GNU General Public License, version 2 agreement for statistical and mathematical computing (Ihaka & Gentleman, 1996). R is compatible with many operating systems such as with Windows, Macintosh, Unix, and Linux. According to Eric Sigel, Author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, R is The leading free, open-source software tool for PA (Predictive Analytics), has a rapidly expanding base of users as well as enthusiastic volunteer developers who add to and support its functionalities (Siegel, 2013). Today there are several thousand available user-developed packages (also referred to as libraries). Packages are collections of R functions, compiled code, and data put together in a specific format following CRAN s guidelines. You can search for packages by application area here ( As of July 2014, there are 33 different application areas. In the Machine Learning application area there are 72 different packages offering libraries that have functions to do nearly any methodology. There are many newer techniques available here that are not available in commercial software packages. Cons Memory, memory, memory! See memory_example.r SEPTEMBER 17,

7 Data Mining, Data Science, and Predictive Modeling with R WHAT IS ENSEMBLE MODELING? Ensemble methods train multiple predictive models and then combine the predictions to achieve a higher overall performance and stability. Pros Ensemble methods require little tuning Ensemble methods operate on a variety of input types (Categorical variables, Integers, and real numbers) Ensemble methods can be used on variety of problems (binary and multi-class classification and rankings, regression, etc.) SEPTEMBER 17,

8 CRISP-DM DATA MINING FRAMEWORK Cross-Industry Standard for Data Mining (CRISP-DM) is a general data mining process model that can be applied to solve any business problem. There are other popular data mining and analytics process models, such as Sample-Explore-Modify-Model- Assess (SEMMA), but in my opinion CRISP-DM is more structured and detailed. CRISP-DM was created and modified over time by leading practitioners and researchers in the data mining field and has been shown to lead to analytical results that align with business objectives. The CRISP-DM process model and techniques primarily fall under the predictive analytics domain in business analytics, where the objective is to help organizations predict future events and proactively act upon such insights in a systematic fashion to drive better business outcomes (Provost & Fawcett, 2013). However, this process could be extended to prescriptive (i.e. optimization) analytics endeavors as well. SEPTEMBER 15,

9 CRISP-DM CRISP-DM DETAILED VIEW CRISP-DM Model Phases and Tasks (Source: Modified from SEPTEMBER 15,

10 Business Understanding BUSINESS UNDERSTANDING Business Objectives Retail Assortment planning, at the most basic level asks which products to offer and how many (Mantrala et al., 2009). Assortment planning is one of the most important decisions faced by retailers (Sauré & Zeevi, 2013). Because of financial and physical capacity constraints, operationally a retailer does not have to ability to stock, let alone hold in store every possible product a consumer may desire (Sauré & Zeevi, 2013). You must get the project sponsor to detail the business success criteria. It s not some predictive model accuracy statistic. Examples: Increased Sales of X% at Stores Y and Z. Reduced non-working inventory of W% at Stores Y and Z. Assess Situation May use R and any of its available packages Deadline is September 17 th at Meetup Competition winner gets $100, losers learn something, Speaker gets feedback Data Mining Goals Determine best overall test accuracy on a 10% out-of-training set Neither the sensitivity nor specificity must fall below 0.70 on the out-oftraining set to qualify. Project Plan Layout your expected work schedule, breaks, etc. Will vary depending on your experience using R SEPTEMBER 15,

11 Data Understanding DATA UNDERSTANDING Collect Data Describe Data?? Variable Description store_number A unique store identifier sku_number A unique SKU identifier Y SOLD The SKU in a respective store sold (1=yes, 0=no) sold in the last 13 periods after it was replinished/maxied. NUM_SOLD The number of realized unit SKU sales for a respective store over the past 1-13 periods. X NUM_SOLD_LAST The number of realized unit SKU sales for a respective store over the past periods. X application_count The total number of different year-make-model vehicle options that the respective SKU could be used for. X projected_growth_pct The projected percentage growth for this SKU in the next 13 periods based on financial experts. X offset For each store-sku, the positive deviation based on unit sales from the center of the part-type-specific distribution For each store-sku the positive deviation based on unit sales from the center of the part-type-specific distribution adjusted based on an ad-hoc X adjusted_offset calculation X unit_sales_py The total number of units sold for this particular SKU over all stores for between the past 27 and 39 periods. X unit_sales_cy The total number of units sold for this particular SKU over all stores for the past 14 and 26 periods. X unit_sales_fy The total number of units sold for this particular SKU over all stores over the past 13 periods. X total_vio The total number of "estimated" vehicles in operations associated to a particular store based on an ad-hoc calculation. X adjusted_total_vio The total number of "estimated" vehicles in operations associated to a particular store based on an ad-hoc calculation. The percentage of vehicles in operations (VIO) for a respective store compared to the total number of VIO for all stores associated to a cluster over X vio_compared_to_cluster the past 14 to 26 periods. X avg_cluster_cy_unit_sales The average number of SKUs sold based on a clustering of all stores over the past 13 to 26 periods. X avg_cluster_cy_total_sales The average number of total sales which is a combination of unit and lost sales based on store clusters over the past 14 to 26 periods. X avg_cluster_cy_lost_sales The average number of lost sales, clustered by all stores over the past 14 to 26 periods. X pop_est_cy Estimated number of persons in the population where the store is located based on the latest period. X pop_density_cy Estimated density (a percentage) of the population where the store is located based on the latest period. X pct_white Estimated number of caucasion-identified persons where the store is located based on the latest period. X age Estimated median person-age where the store is located over based on the latest period. X pct_college Estimated percentage of college-education persons where the store is located based on the latest period. X pct_blue_collar Estimated percentage of blue-collar type workers where the store is located based on the latest period. X median_household_income Estimated median household income where the store is located based on the latest period. Estimated number of physical locations where business is conducted or where services or industrial operations are performed where the store is X establishments located based on the latest period. X road_quality_index A measure of the quality of the roads in the area the store is located. Usually you will create such a table yourself but make it more descriptive. The data scientist will ask the domain expert(s) questions such as: What are the variables units of measure? Where does the data come from? When is it updated? How and why was clustering performed a particular way? How and why was a variable adjusted? SEPTEMBER 15,

12 Data Understanding DATA UNDERSTANDING (CONT.) Explore Data Matt s source code: Find the main.r Data Quality DataQualityReport(skus) DataQualityReportOverall(dataSetName=skus) SEPTEMBER 17,

13 Data Preparation DATA PREPARATION Data Description?? Variable Description store_number A unique store identifier sku_number A unique SKU identifier Y SOLD The SKU in a respective store sold (1=yes, 0=no) sold in the last 13 periods after it was replinished/maxied. NUM_SOLD The number of realized unit SKU sales for a respective store over the past 1-13 periods. X NUM_SOLD_LAST The number of realized unit SKU sales for a respective store over the past periods. X application_count The total number of different year-make-model vehicle options that the respective SKU could be used for. X projected_growth_pct The projected percentage growth for this SKU in the next 13 periods based on financial experts. X offset For each store-sku, the positive deviation based on unit sales from the center of the part-type-specific distribution For each store-sku the positive deviation based on unit sales from the center of the part-type-specific distribution adjusted based on an ad-hoc X adjusted_offset calculation X unit_sales_py The total number of units sold for this particular SKU over all stores for between the past 27 and 39 periods. X unit_sales_cy The total number of units sold for this particular SKU over all stores for the past 14 and 26 periods. unit_sales_fy The total number of units sold for this particular SKU over all stores over the past 13 periods. X total_vio The total number of "estimated" vehicles in operations associated to a particular store based on an ad-hoc calculation. X adjusted_total_vio The total number of "estimated" vehicles in operations associated to a particular store based on an ad-hoc calculation. The percentage of vehicles in operations (VIO) for a respective store compared to the total number of VIO for all stores associated to a cluster over X vio_compared_to_cluster the past 14 to 26 periods. X avg_cluster_cy_unit_sales The average number of SKUs sold based on a clustering of all stores over the past 13 to 26 periods. X avg_cluster_cy_total_sales The average number of total sales which is a combination of unit and lost sales based on store clusters over the past 14 to 26 periods. X avg_cluster_cy_lost_sales The average number of lost sales, clustered by all stores over the past 14 to 26 periods. X pop_est_cy Estimated number of persons in the population where the store is located based on the latest period. X pop_density_cy Estimated density (a percentage) of the population where the store is located based on the latest period. X pct_white Estimated number of caucasion-identified persons where the store is located based on the latest period. X age Estimated median person-age where the store is located over based on the latest period. X pct_college Estimated percentage of college-education persons where the store is located based on the latest period. X pct_blue_collar Estimated percentage of blue-collar type workers where the store is located based on the latest period. X median_household_income Estimated median household income where the store is located based on the latest period. Estimated number of physical locations where business is conducted or where services or industrial operations are performed where the store is X establishments located based on the latest period. X road_quality_index A measure of the quality of the roads in the area the store is located. skus = skus[which(complete.cases(skus)),] DataQualityReportOverall(dataSetName=skus) SEPTEMBER 17,

14 Modeling Modeling Techniques C5.0 Decision tree Logistic Regression CART Decision tree MODELING SEPTEMBER 17,

15 Modeling MODELING (CONT.) Design When building and testing predictive models using observational data (i.e. data that is not controlled like in laboratory experimentation), the question that must be answered is how valid is my model in regards to what will happen next? In a properly designed and controlled experiment, data (samples) used in the experiment are used to make inferences about the population. Regardless of how large or small the sample is compared to the true size of the population, this single randomly selected subset of the population allows for generalizability of the remaining subset of data not used in the study. Cross-validation is the most practical and cost effective means of obtaining a proxy for truth in predictive analytics. ## Randomly partition data into training and test sets my_seed = skus = GenerateTTV(dataSetName=skus, response='sold', trainpct=.90, testpct=.10, my_seed) GeneratePartitionPcts(dataSetName=skus) Using the training data error rate as a proxy for a model s generalization error is not wise, especially when the training error is low to almost perfect. Most likely the model has been overfit (or over trained) and will not perform as well when new examples are feed through and evaluated from a validation data set (Zhou, 2012). SEPTEMBER 17,

16 Modeling MODELING (CONT.) Design ## Percentage of Target is 1 (or Y='SOLD') in total data set dim(skus[which(skus$sold==1),])[[1]] / dim(skus[which(skus$sold==0 skus$sold==1),])[[1]] ## Percentage of Target is 1 (or Y='SOLD') in training set dim(skus[which(skus$sold==1 & skus$spss_partition=='train'),])[[1]] / dim(skus[which((skus$sold==1 skus$sold==0) & skus$spss_partition=='train'),])[[1]] ## Percentage of Target is 1 (or Y='SOLD') in test set dim(skus[which(skus$sold==1 & skus$spss_partition=='test'),])[[1]] / dim(skus[which((skus$sold==1 skus$sold==0) & skus$spss_partition=='test'),])[[1]] ## remove independent variables that you don't want to use names(skus) skus2 = skus[,c(3,5:19,21:28)] head(skus2) skus2$sold = as.factor(skus2$sold) SEPTEMBER 17,

17 Modeling MODELING (CONT.) ## set up data for algorithms trainx = skus2[which(skus2$spss_partition=='train'),2:(length(skus2)-1)] trainy = skus2[which(skus2$spss_partition=='train'),'sold'] train = cbind(trainy,trainx) testx = skus2[which(skus2$spss_partition=='test'),2:(length(skus2)-1)] testy = skus2[which(skus2$spss_partition=='test'),'sold'] test = cbind(testy,testx) SEPTEMBER 17,

18 Modeling MODELING (CONT.) Build Models C5.0 Decision tree require(c50) #Fit classification tree models or rule-based models using Quinlan's C5.0 algorithm C5Params = C5.0Control( ) C5 = C5.0(x=trainX, y=trainy,,control=c5params #control parameters defined above,trails=1 ) summary(c5) #number of boosting iterations; 1 implies a single model is used Changing the trials from 1 to 1000 doesn t change the result in this case Overall error rate The confusion matrix is based on a decision cutoff threshold of 0.50 Variables that were used to create the tree SEPTEMBER 17,

19 Modeling MODELING (CONT.) Build Models C5.0 Decision tree ## training probabilities and predicted classes C5trainp = predict(c5,newdata = trainx,trials = C5$trials["Actual"],type = "prob", #either "class" for the predicted class or "prob" for model confidence values.,na.action = na.pass)[,2] C5trainc = predict(c5,newdata = trainx,trials = C5$trials["Actual"],type = "class", #either "class" for the predicted class or "prob" for model confidence values.,na.action = na.pass) ## testing probabilities and predicted classes C5testp = predict(c5,newdata = testx,trials = C5$trials["Actual"],type = "prob", #either "class" for the predicted class or "prob" for model confidence values.,na.action = na.pass)[,2] C5testc = predict(c5,newdata = testx,trials = C5$trials["Actual"],type = "class", #either "class" for the predicted class or "prob" for model confidence values.,na.action = na.pass) SEPTEMBER 17,

20 Modeling MODELING (CONT.) Build Models Logistic Regression logit.fit = glm(trainy ~ NUM_SOLD_LAST+TOTAL_VIO+ADJ_TOTAL_VIO+VIO_COMPARED_TO_CLUSTER+POP_EST_CY+POP_DEN SITY_CY+PCT_WHITE+AGE+PCT_COLLEGE+PCT_BLUE_COLLAR+MEDIAN_HOUSEHOLD_INCOME+ESTABLI SHMENTS+ROAD_QUALITY_INDEX+APPLICATION_COUNT+PROJECTED_GROWTH_PCT+UNIT_SALES_CY +UNIT_SALES_PY+OFFSET+ADJUSTED_OFFSET+AVG_CLUSTER_CY_UNIT_SALES+AVG_CLUSTER_CY_TOTA L_SALES #+AVG_CLUSTER_CY_LOST_SALES,family = binomial,data = train) summary(logit.fit) SEPTEMBER 17,

21 Modeling MODELING (CONT.) Build Models CART require(rpart) set.seed( ) tree = rpart(trainy ~.,data = train,method = "class",cp = 0,minsplit = 4,minbucket = 2,parms = list(prior=c(0.5, 0.5))) #summary(tree) <-this will take awhile ## find the best pruned tree i.min = which.min(tree$cptable[,"xerror"]) i.se = which.min(abs(tree$cptable[,"xerror"] - (tree$cptable[i.min,"xerror"] + tree$cptable[i.min,"xstd"]))) alpha.best = tree$cptable[i.se, "CP"] tree.p = prune(tree, cp=alpha.best) ## obtain predictions treetrainp = predict(tree.p, train)[,2] treetrainc = treetrainp treetrainc[which(treetrainc>.5)] = 1 treetrainc[which(treetrainc<=.5)] = 0 treetestp = predict(tree.p, test)[,2] treetestc = treetestp treetestc[which(treetestc>.5)] = 1 treetestc[which(treetestc<=.5)] = 0 SEPTEMBER 17,

22 Modeling ENSEMBLING VIA BAGGING Bootstrap aggregation (Bagging) Bagging is a simple way to increase the predictive power of a model Pros Useful when the predictors are more unstable, meaning that the more variation observed Cons Using smaller samples will yield more instability and too small yields poor models How Take several random samples with replacement from the training data set Use each sample to construct a separate predictive model with predictions as the testing data set. Average the predictions to come up with one final predicted value my_list = getbs_samples(seed=my_seed, Ntrees=100, SampleSize=1000) bagged_tree = BS_Trees(response=train[,1], datasetname=train, samplelist=my_list, Ntrees=100) bagged_probs = getfinalpredictions(bagged_tree, datasetname=train, Ntrees=100) What R packages are available for bagging? SEPTEMBER 17,

23 Modeling C5.0 MODEL ASSESSMENT - TRAINING Bagged CART Logit CART Why does the bagged tree perform worse? SEPTEMBER 17,

24 Modeling C5.0 MODEL ASSESSMENT - TESTING Bagged CART How do we store a Bagged Model in R? Logit CART SEPTEMBER 17,

25 Modeling OPTIMAL DECISION CUTPOINTS Why use a decision cutoff threshold of 0.50? Example of an ROC curve SEPTEMBER 17,

26 Modeling OPTIMAL DECISION CUTPOINTS See require(optimalcutpoints) ## Define my methods list methodlist = list( "Youden" #1 (Youden Index);,"ROC01" #2 (minimizes distance between ROC plot and point (0,1));,"PROC01" #3 (minimizes distance between PROC plot and point (0,1)); #,"MaxAccuracyArea" #4 (maximizes Accuracy Area); #,"AUC" #5 (maximizes concordance which is a function of AUC);,"MaxEfficiency" #6 (maximizes Efficiency or Accuracy);,"MaxKappa" #7 (maximizes Kappa Index); #,"MinErrorRate" #8 (minimizes Error Rate); C5.0 Training using the Youden cutoff method Using other cutoff methods.. SEPTEMBER 17,

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION Matthew A. Lanham & Ralph D. Badinelli Virginia Polytechnic Institute and State University Department of Business

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

THE THREE "Rs" OF PREDICTIVE ANALYTICS

THE THREE Rs OF PREDICTIVE ANALYTICS THE THREE "Rs" OF PREDICTIVE As companies commit to big data and data-driven decision making, the demand for predictive analytics has never been greater. While each day seems to bring another story of

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Overview, Goals, & Introductions

Overview, Goals, & Introductions Improving the Retail Experience with Predictive Analytics www.spss.com/perspectives Overview, Goals, & Introductions Goal: To present the Retail Business Maturity Model Equip you with a plan of attack

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

Numerical Algorithms Group

Numerical Algorithms Group Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful

More information

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Classification and Regression by randomforest

Classification and Regression by randomforest Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

Didacticiel Études de cas

Didacticiel Études de cas 1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG MACPA Government & Non Profit Conference April 26, 2013 Isaiah Goodall, Director of Business

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

How To Perform An Ensemble Analysis

How To Perform An Ensemble Analysis Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

IBM SPSS Modeler Professional

IBM SPSS Modeler Professional IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

How To Choose A Churn Prediction

How To Choose A Churn Prediction Assessing classification methods for churn prediction by composite indicators M. Clemente*, V. Giner-Bosch, S. San Matías Department of Applied Statistics, Operations Research and Quality, Universitat

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining

More information

Numerical Algorithms Group. Embedded Analytics. A cure for the common code. www.nag.com. Results Matter. Trust NAG.

Numerical Algorithms Group. Embedded Analytics. A cure for the common code. www.nag.com. Results Matter. Trust NAG. Embedded Analytics A cure for the common code www.nag.com Results Matter. Trust NAG. Executive Summary How much information is there in your data? How much is hidden from you, because you don t have access

More information

Indian School of Business Forecasting Sales for Dairy Products

Indian School of Business Forecasting Sales for Dairy Products Indian School of Business Forecasting Sales for Dairy Products Contents EXECUTIVE SUMMARY... 3 Data Analysis... 3 Forecast Horizon:... 4 Forecasting Models:... 4 Fresh milk - AmulTaaza (500 ml)... 4 Dahi/

More information

Chapter 12 Bagging and Random Forests

Chapter 12 Bagging and Random Forests Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts

More information

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you. DEMYSTIFYING BIG DATA What it is, what it isn t, and what it can do for you. JAMES LUCK BIO James Luck is a Data Scientist with AT&T Consulting. He has 25+ years of experience in data analytics, in addition

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

Enhancing Compliance with Predictive Analytics

Enhancing Compliance with Predictive Analytics Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue reid.linn@state.tn.us Sifting through a Gold Mine of Tax Data

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

The Predictive Data Mining Revolution in Scorecards:

The Predictive Data Mining Revolution in Scorecards: January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

The Prophecy-Prototype of Prediction modeling tool

The Prophecy-Prototype of Prediction modeling tool The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data 100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.

More information

KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE

KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE Most Effective Modeling Application Designed to Address Business Challenges Applying a predictive strategy to reach a desired business

More information

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer? Dynamic Predictive Modeling in Claims Management - Is it a Game Changer? Anil Joshi Alan Josefsek Bob Mattison Anil Joshi is the President and CEO of AnalyticsPlus, Inc. (www.analyticsplus.com)- a Chicago

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Data Project Extract Big Data Analytics course. Toulouse Business School London 2015

Data Project Extract Big Data Analytics course. Toulouse Business School London 2015 Data Project Extract Big Data Analytics course Toulouse Business School London 2015 How do you analyse data? Project are often a flop: Need a problem, a business problem to solve. Start with a small well-defined

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Using Data Mining to Detect Insurance Fraud

Using Data Mining to Detect Insurance Fraud IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: combines powerful analytical techniques with existing fraud detection and prevention efforts

More information

Software for Supply Chain Design and Analysis

Software for Supply Chain Design and Analysis Software for Supply Chain Design and Analysis Optimize networks Improve product flow Position inventory Simulate service Balance production Refine routes The Leading Supply Chain Design and Analysis Application

More information

Using Data Mining to Detect Insurance Fraud

Using Data Mining to Detect Insurance Fraud IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: Combine powerful analytical techniques with existing fraud detection and prevention efforts Build

More information

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed

More information

Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms

Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms Johan Perols Assistant Professor University of San Diego, San Diego, CA 92110 jperols@sandiego.edu April

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

Statement of Work. Shin Woong Sung

Statement of Work. Shin Woong Sung Statement of Work Shin Woong Sung 1. Executive Summary This Statement of Work (SOW) suggests a plan and a solution approach to find out the best mix of machines for each casino site of Lucky Duck Entertainment,

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Data Science with R. Introducing Data Mining with Rattle and R. Graham.Williams@togaware.com

Data Science with R. Introducing Data Mining with Rattle and R. Graham.Williams@togaware.com http: // togaware. com Copyright 2013, Graham.Williams@togaware.com 1/35 Data Science with R Introducing Data Mining with Rattle and R Graham.Williams@togaware.com Senior Director and Chief Data Miner,

More information

Executive Briefing White Paper Plant Performance Predictive Analytics

Executive Briefing White Paper Plant Performance Predictive Analytics Executive Briefing White Paper Plant Performance Predictive Analytics A Data Mining Based Approach Abstract The data mining buzzword has been floating around the process industries offices and control

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview

More information

Strengthening Diverse Retail Business Processes with Forecasting: Practical Application of Forecasting Across the Retail Enterprise

Strengthening Diverse Retail Business Processes with Forecasting: Practical Application of Forecasting Across the Retail Enterprise Paper SAS1833-2015 Strengthening Diverse Retail Business Processes with Forecasting: Practical Application of Forecasting Across the Retail Enterprise Alex Chien, Beth Cubbage, Wanda Shive, SAS Institute

More information

Table of Contents. June 2010

Table of Contents. June 2010 June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and

More information

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Analytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012

Analytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012 Analytics in Action What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012 University of Cincinnati Tangeman University Center Theater Sponsored by LUCRUM, Inc. ABOUT

More information

Marketing Strategies for Retail Customers Based on Predictive Behavior Models

Marketing Strategies for Retail Customers Based on Predictive Behavior Models Marketing Strategies for Retail Customers Based on Predictive Behavior Models Glenn Hofmann HSBC Salford Systems Data Mining 2005 New York, March 28 30 0 Objectives Inform about effective approach to direct

More information

Trusted Experts in Business Analytics BUSINESS ANALYTICS FOR DEMAND PLANNING: HOW TO FORECAST STORE/SKU DEMAND

Trusted Experts in Business Analytics BUSINESS ANALYTICS FOR DEMAND PLANNING: HOW TO FORECAST STORE/SKU DEMAND Trusted Experts in Business Analytics BUSINESS ANALYTICS FOR DEMAND PLANNING: HOW TO FORECAST STORE/SKU DEMAND September 2014 HOW DOES TM1 AND SPSS MODELER INTEGRATION WORK? In a QueBIT whitepaper titled

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Prescriptive Analytics. A business guide

Prescriptive Analytics. A business guide Prescriptive Analytics A business guide May 2014 Contents 3 The Business Value of Prescriptive Analytics 4 What is Prescriptive Analytics? 6 Prescriptive Analytics Methods 7 Integration 8 Business Applications

More information

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING Wrocław University of Technology Internet Engineering Henryk Maciejewski APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING PRACTICAL GUIDE Wrocław (2011) 1 Copyright by Wrocław University of Technology

More information

Data Mining Introduction

Data Mining Introduction Data Mining Introduction Bob Stine Dept of Statistics, School University of Pennsylvania www-stat.wharton.upenn.edu/~stine What is data mining? An insult? Predictive modeling Large, wide data sets, often

More information