Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses

Size: px
Start display at page:

Download "Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses"

Transcription

1 Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Salford Systems Data Mining 2006 March San Diego, CA By Dean Abbott Abbott Analytics

2 Acknowledgements Work done under contract with Seer Analytics Subcontractors: Tessar and Associates (now Mobile Foundry), Abbott Consulting (now Abbott Analytics) we help you see what's there. SEER Seer Analytics, LLC 518 North Tampa Street Tampa, FL

3 About Abbott Analytics Abbott Analytics Founded in 1999, based in San Diego, CA Dedicated to data mining consulting and training Principal: Dean Abbott Applied Data Mining for 19+ years in Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Detection, Predictive Toxicology, Biological Risk Assessment Course Instruction Public 2-day 2 Data Mining Courses Conference Tutorials Customized Training and Knowledge Transfer Data mining methodology (CRISP-DM) Training services for software products, including CART, Clementine, Affinium Model, Insightful Miner 3

4 Talk Outline Member survey Survey description Results using statistical modeling Lessons learned Employee survey Survey description Results using decision trees (CART) Lessons learned 4

5 Problem Setup: Member Survey Question: What are the characteristics of members who indicated the highest overall satisfaction with their Club? Data: 32,811 records containing survey answers No demographic data except what was on survey (marital status, children, age, gender) Approach: Create supervised learning models with target variable overall_satisfaction = 1 1 5

6 Data Preparation Begin with 57 candidate inputs to model All survey questions are multiple choice Treated as categories, not numbers Typically 6 categories per question (1-5) Unknown initially coded as 0 No text comments fields included as inputs to model Create new column for target variable If overall_satisfaction = 1, variable value = 1, otherwise, variable value = 0 Data very clean with respect to missing data Only needed to record # children fields Number missing 11,006 children < 6; 10,701 children 6-12; 6 10,873 children 13-17; 17; 4,936 children (overall) When missing, recoded values with -1 to indicate missing 6

7 Member Survey Question Categories 7

8 Sampling Begin with 32,811 responses Set aside about half for validation (not used during modeling): 16,379 records These records will be used to provide final summaries of the segments 16,433 records used in creating and scoring model 5,059 had overall satisfaction = 1 (30.8%) Model 1 splits data into training and testing data: 2/3 for training (creating model), 1/3 for testing (scoring and ranking models) Approximately 11,503 for training; 4,930 for testing 8

9 Relationship of Overall Satisfaction to Recommend to Friends Of the 4912 / (30.2%) with Overall Satisfaction = 1 86% have Recommend to friends = 1 Of the 8708 / (54%) with Recommend to Friends = 1 49% have Overall Satis. = / (26.0%) have both overall satisfaction and recommend to friends both equal to 1 This is the biggest bin of the cross tab, followed by Overall = 2 / recommend = 2 (24%; 3890 / 16739) Overall = 2 / recommend = 1 (22%; 3565 / 16739) No other bin greater than 5% of records Recommend to Friend RECOMMEND OVERALL.RA Overall satisfaction 9

10 Objective and Data Challenges Project Objective Interpret results of survey for large health club (not a predictive model) Challenges Missing data (some questions either N/A or blank) Solution: Impute values that least effect information communicated by question (not a mean or median!) Answers (target variables) highly correlated with one another Multi-collinearity and interpretation of results problematic Must reduce dimensionality without losing interpretation of results Solution: Factor analysis Target variable Three questions pointed to the important actionable information (related to how satisfied members were) Solution: combine all three into a new index of excellence 10

11 Data Preprocessing Approach Reduce input data (for understanding) Use factor analysis to identify groupings of variables that are interesting. Factors can be candidate inputs to models, but didn t t work as well on this data Selected as inputs, those variables with highest loadings as representative of that type of factor Also retained key questions in addition to the factor analysis representative questions The effect is to remove questions too highly correlated with one another, while maintaining relevant information for modeling. 11

12 Predictive Modeling Approach Survey Survey Questions 3 key questions Identify Identify Key Key Questions Factor Factor Analysis: factors factors 3 questions with high association with target Regression Model: Model: Find Find Significant Variables 10 factors, or variables that loaded highest on each factor 13 fields down to 7 Regression Model: Model: Find Find Significant Variables Variable ranks 12

13 Factor Analysis: Making the Complex Simple loadings Factor 1 Loading Factor1 Factor2 Factor3 Facto r4 Factor5 Factor6 Factor Factor7 Factor8 Factor9 Factor10 loadings Loading Value Loading Values Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Top Question Loadings Factor 2 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23 Top Question Loadings 13

14 Member Survey Factor Analysis Loadings 14

15 Reduce Variables using Regression Already beginning with only 13 variables Question: how many of these are useful predictors? Decided to retain 5 factors for final model Regression Coefficient Regression Rankings of Questions/Factors Q44 Q22 Q25 factor3.2 factor3.9 factor3.1 factor3.4 factor3.3 factor3.8 factor3.10 factor3.6 factor3.5 factor3.7 Question/Factor 15

16 Explaining Results Through Visualization Customer was not interested in techno solutions Customer was interested in what actions could be taken as a result of the data mining models Which characteristics are most correlated with best customers? What do they like and dislike about the club? Is it equipment? relationships? facility? staff? Show key contributors, how each club compared with other club locations, and if club is improving 16

17 Key: Explaining Results Visualization shows key variables in survey associated with excellence,, and performance metrics for each club How well did this club do? What is the change over last year s result? Shows which attributes does the club need to improve to improve customer satisfaction. Drivers of Satisfaction Staff 2 Staff 1 equipment facility value relationships goals 17

18 So What s The Problem with That? Regression, Neural Networks are global estimators The operate over the entire data space Descriptors of Regression represent average influence Neither technique provides explicit localized characteristics Customer would like actionable analytics Clear characteristics of subgroups Different strategies for subgroups Conclusion: In Round 2 (Employee Survey), use another approach 18

19 Employee Survey Analysis Problem Setup Very similar to member survey 60+ questions Few demographics Attitudes the job How to handle questions They are ordinal, but CART supports interval and nominal types Treat as categorical, but make sure values aren t t split up If see a split on a question having values 1, 2, 4 rebuild 4 as interval variable Didn t t happen this way though all worked out well 19

20 Employee Survey Question Groupings 20

21 Employee Survey: Target Variable Definition Predict key attitudes that are consequents Satisfaction Recommend to a Friend Intend to Work Next Year at Club Club is Good Place to Work Exclude these from each others models They are highly correlated with each other Models that predict a target variable with these as inputs are not n actionable Key Predictors, questions relating to: Communications with management Quality of supervisors Training received Effectiveness of club Fairness of policies Perceived member attitudes 21

22 Employee Satisfaction (=1) Model: Data Information File: modeling data with binarized dependents w missing.txt Target Variable: Q1_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct Cases 0 4, % 1 1, % 22

23 Employee Satisfaction Model: Performance Class N Cases N Misclassified Pct. Class 0 4, , Cases Target % of Node % Target Cum % Cum % Cases in Node Class Tgt. Class Class Tgt. Class Pop % Pop Node Cum lift Lift , , ,

24 Employee Satisfaction Model: Splits 1 Q8 Q7 Q3 2 Q Q18 Q3 Q Q8: Feel Welcome Surrogate: Q27 (family friendly), Q28 (inclusive environment), Q18 (good working conditions) Q18: Good working conditions Surrogate: Q17 (necessary support/materials to do job) Q3: Feeling of accomplishment Surrogates: Q6 (responsibilities good fit with interests/skills) Q7: Staff Competent Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth) 24

25 Employee Satisfaction: Q8 Split (root node) Competitor Split Improvement winner Q Q Q Q Q Q7 1, Strongly agree feel welcome 25

26 Employee Satisfaction: Q18 Split (right side or root) Competitor Split Improvement Winner Q Q Q Q Q14 1, Q13 1, Strongly agree feel welcome This is the best terminal node for satisfaction 26

27 Employee Satisfaction Model: Key Variables Variable Score Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Primary splitters only Variable Score Q8 100 Q Q Q Q Q Q8: Feel Welcome Surrogate: Q27 (family friendly), Q28 (inclusive environment), Q18 (good working conditions) Q18: Good working conditions Surrogate: Q17 (necessary support/materials to do job) Q3: Feeling of accomplishment Surrogates: Q6 (responsibilities good fit with interests/skills) Q7: Staff Competent Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth) 27

28 Member Satisfaction Model: Key Rules /*Rules for terminal node 8*/ Matches 1,414 surveys (23.1%), 859 highly satisfied (60.8%), 58.4% of all highly satisfied RULE: If ( Q18 = 1 and Q8 = 1) Then Highly Satisfied P(0) = 0.39; P(1) = 0.61; Lift 2.5 If strongly agree that there are good working conditions and strongly agree that member feels welcome, then highly satisfied /*Rules for terminal node 7 */ Matches 476 surveys (7.8%), 201 highly satisfied (42.2%), 13.7% of all highly satisfied RULE: If ( Q8 = 1 and Q18 <> 1 and Q3 == 1 and Q32 == 1 or 2) Then Highly Satisfied P(0) = 0.58; P(1) = 0.42; Lift 1.8 If strongly agree that feel welcome and strongly agree working at the club gives feeling of personal accomplishment, and agree management will take interests into account, even if don t strongly agree good working conditions, then highly satisfied /*Rules for terminal node 4 */ Matches 218 surveys (3.6%), 95 highly satisfied (43.6%), 6.5% of all highly satisfied RULE: If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 == 1 and Q36 == 1 or 2) Then Highly Satisfied P(0) = 0.56; P(1) = 0.44; Lift 1.8 If agree that I ll be recognized for doing a good job, and strongly agree working at the club gives feeling of personal accomplishment, and agree that am paid fairly, even if don t strongly agree feel welcome, then 28 highly satisfied

29 Member Satisfaction Model: Unsatisfied Rules /*Rules for terminal node 1*/ Matches 1,869 surveys (30.6%), 55 highly satisfied (2.9%), 3.7% of highly satisfied 39.0% of all not highly satisfied RULE: If ( Q8 <> 1 and Q7 <> 1 or 2) Then not highly satisfied P(0) = 0.96; P(1) = 0.04; Lift 0.12 If don t strongly agree that feel welcome and don t agree that will be properly recognized for a good job, then not highly satisfied. /*Rules for terminal node 2 */ Matches 1,225 surveys (20.0%), 124 highly satisfied (10.1%), 8.4% of highly satisfied 23.7% of all not highly satisfied RULE: If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 <> 1) Then not highly satisfied P(0) = 0.90; P(1) = 0.10; Lift 0.42 If don t strongly agree that feel welcome and work doesn t give a feeling of accomplishment, even though I agree that I will be properly recognized for a good job, then Abbott not highly Analytics, satisfied /*Rules for terminal node 5*/ Matches 640 surveys (10.5%), 92 highly satisfied (14.4%), 6.3% of all highly satisfied 11.8% of all not highly satisfied RULE: If ( Q8 = 1 and Q18 <> 1 and Q3 <> 1) Then not highly satisfied P(0) = 0.86; P(1) = 0.14; Lift 0.58 If don t strongly agree that there are good working conditions and don t strongly agree that feel welcome and work doesn t give a feeling of accomplishment, even though strongly agree that feel welcome, then not highly satisfied. 29

30 Recommend to Friend (=1) Model: Data Information File: modeling data with binarized dependents w missing.txt Target Variable: Q44_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q19, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct 0 3, % 1 2, % This model includes Q19 (am treated with respect), and is the best model to report 30

31 Recommend to Friend Model Performance Class N Cases N Misclassified Pct. Class 0 3, , Node Cases Target % of Node Cum % Cum % Cases in Class Tgt. Class % Target Class Tgt. Class Pop % Pop Node Cum lift Lift 10 1, , Abbott 6.03 Analytics, ,

32 Recommend to Friend Model Splits 1 Q19 Q33 Q37 Q45 Q5 Q8 Q35 Q Q Q19: Treated with respect Surrogates: Q18 (good working conditions) and Q8 (feel welcome) Q37: Compensation practice is fair Surrogates: Q36 (I am paid fairly) Q45: How think members rate club Surrogates: Q47, Q46, Q60 (member-cleanliness, enough equip., check on progress) Q33: Trust management to take interests into account Surrogates: Q32 (management keeps promises), Q34 (leaders remove roadblocks to inclusion) Q5: Good opportunities for professional growth Surrogates: Q4 (responsibilities good fit with interests), Q7 (appropriately recognized) Q8: Feel welcome Surrogates: Q7 32

33 Recommend to Friend Model Variable Score Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q4 4.3 Q Q Q Q Q Q Q Q Q Q Q Q Key Variables Primary splitters only Variable Score Q Q Q Q Q Q Q Q Q19: Treated with respect Surrogates: Q18 (good working conditions) and Q8 (feel welcome) Q37: Compensation practice is fair Surrogates: Q36 (I am paid fairly) Q45: How think members rate club Surrogates: Q47, Q46, Q60 (member-cleanliness, enough equip., check on progress) Q33: Trust management to take interests into account Surrogates: Q32 (management keeps promises), Q34 (leaders remove roadblocks to inclusion) Q5: Good opportunities for professional growth Surrogates: Q4 (responsibilities good fit with interests), Q7 (appropriately recognized) Q8: Feel welcome Surrogates: Q7 33

34 Recommend to Friend Model: Key Rules /*Rules for terminal node 10*/ Matches 1,548 surveys (25.3%), 1,113 recommend (71.9%), 51.6% of all strong recommends RULE: If ( Q19= 1 and Q37 = 1 or 2) Then Recommend = 1 P(0) = 0.281; P(1) = 0.719;; Lift = 2.0 If strongly agree that supervisors treat me with respect, and agree that compensation practice is fair, then strongly agree that will recommend to friend. /*Rules for terminal node 9*/ Matches 188 surveys (3.1%), 110 recommend 58.5%), 5.1% of all strong recommends RULE: If ( Q19 = 1 and Q37 <> 1or 2 and Q45 = 1) Then Recommend = 1 P(0) = 0.415; P(1) = 0.585; Lift = 1.7 If strongly agree that supervisors treat me with respect, and believe that members strongly agree they are highly satisfied, even though don t agree compensation practice is fair, then strongly agree that will recommend to friend /*Rules for terminal node 5*/ Matches 350 surveys (5.7%), 198 recommend (73.5%), 9.2% of all strong recommends RULE IF ( Q19 <> 1 and Q33 = 1 or 2 and Q45 = 1 ) Then Recommend = 1 P(0)= 0.434; P(1) = 0.566; Lift = 1.4 If agree that trust management will take my interests into account, and believe that members strongly agree they are highly satisfied, even though don t strongly agree supervisors treat me with respect, then strongly agree that will recommend to friend 34

35 Recommend to Friend Model: Rules for Not Recommending /*Rules for terminal node 1 */ Matches 1,784 surveys (29.2%), 130 highly recommend (7.3%), 94% don t highly rec. 6.0% of all highly recommend RULE: If ( Q31 <> 1 and Q22 <> 1) Then Don t Strongly Recommend P(0) = 0.94 P(1) = 0.06; /*Rules for terminal node 2 */ Matches 846 surveys (13.84%), 132 highly recommend (15.6%), 84.4% don t highly rec. 6.1% of all highly recommend RULE If ( Q19 <>1and Q33 = 1or 2 and Q45 <> 1 and Q5 <> 1 or 2) Then Don t Strongly Recommend P(0) = 0.84; P(1) = 0.16; If don t strongly agree that supervisors treat me with If don t strongly agree that supervisors treat me with respect, and respect, and don t agree that management will take don t strongly believe that members are highly satisfied, and don t interests into account, then don t strongly agree that will agree that there are good opportunities for professional growth, then recommend to friend. even though agree that management will take interests into account, Abbott Analytics, don t strongly agree that will recommend to friend. 35

36 Intend to Continue Working at Club (=1) Model: Data Information File:modeling data with binarized dependents w missing.txt Target Variable: Q39_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct 0 3, % 1 3, % 36

37 Intend to Continue Working at Club: Model Performance Pct. Class N Cases N Misclassified Misclass 0 3, , Cases Target % of Node % Target Cum % Cum % Cases in Node Class Tgt. Class Class Tgt. Class Pop % Pop Node Cum lift Lift 10 1, , ,

38 Intend to Continue Working at Club Model: Splitters 1 Q8 Q5 Q7 Q66 2 Q Q18 Q5 Q Q Q8: Feel Welcome Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions) Q69: Age Surrogate: Q66 (how long worked at Club), Q68 (education) Q18: Good Working Conditions Q17 (have necessary support and materials to do job) Q5: Good Opportunities for Professional Growth Q7, Q33 (Management will take my interests into account) Q7: Will be Recognized for Good Job Q15 (Work is appreciated) 38

39 Intend to Continue Working at Club Variable Score Q8 100 Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Model: Key Variables Primary splitters only Variable Score Q8 100 Q Q Q Q Q Q Q Q8: Feel Welcome Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions) Q69: Age Surrogate: Q66 (how long worked at Club), Q68 (education) Q18: Good Working Conditions Q17 (have necessary support and materials to do job) Q5: Good Opportunities for Professional Growth Q7, Q33 (Management will take my interests into account) Q7: Will be Recognized for Good Job Q15 (Work is appreciated) 39

40 Intend to Continue Working at Club Model: Key Rules /*Rules for terminal node 10 */ Matches 1,360 surveys (22.2%), 1,099 intend to continue (80.8%), 35.6% of all intend to continue RULE: If (Q8 = 1 and Q69>=2.5 ) Then Intend to continue P(0) = 0.19; P(1) = 0.81;; Lift = 1.6 If strongly agree that feel welcome and am 35 years old or older, then strongly agree that intend to continue working at the club. /*Rules for terminal node 9 */ Matches 698 surveys (11.4%), 486 intend to continue (69.6%), 15.8% of all intend to continue RULE: If ( Q8 = 1 and Q18 = 1and Q69 <= 2.5 ) Then Intend to continue P(0) = 0.30; P(1) = 0.70; Lift = 1.4 If strongly agree that feel welcome and strongly agree that there are good working conditions, am older than 35 years old, then strongly agree that intend to continue Abbott Analytics, working at the club. /*Rules for terminal node 5 */ Matches 518 surveys (8.5%), 349 intend to continue (67.4%), 11.3% of all intend to contiue RULE IF ( Q8 <> 1 and Q5 = 1 or 2 and Q7 = 1 or 2 and Q66 > 2.5 ) Then Intend to continue P(0)= 0.32; P(1) = 0.68; Lift = 1.3 If I strongly agree that if I do a good job I ll be recognized, and I strongly agree that there are good opportunities for professional growth, and I have worked at the club for more than 2 years, even though don t strongly agree that feel welcome, then I strongly agree that intend 40 to continue working at the club.

41 Intend to Continue Working at Club Model: Rules for Don t Strongly Intend to Continue /* Rules for terminal node 1 */ Matches 1,863 surveys (30.5%), 442 strongly intend to continue working (23.7%), 14.3% of all strongly intend to continue working 46.9% of all not strongly intending to continue RULE: If ( Q8 <> 1 and Q5 <> 1 or 2) Then not strongly intending to continue working at club P(0) = 0.76; P(1) = 0.24; Lift 0.47 /*Rules for terminal node 2 */ Matches 634 surveys (10.4%), 224 strongly intend to continue working (35.3%), 7.3% of all strongly intend to continue working 13.5% of all not strongly intending to continue working RULE If ( Q8 <> 1 and Q5 = 1 or 2 and Q7 <> 1 or 2 ) Then not strongly intending to continue working at club P(0) = 0.65; P(1) = 0.35; Lift 0.70 If don t strongly agree that feel welcome and don t If don t strongly agree that feel welcome and don t strongly agree that there are good opportunities for strongly agree that if I do a good job I ll be recognized, professional growth, then don t strongly agree that even though I strongly agree that there are good intend to continue working at the club. opportunities for professional growth, then don t strongly 41 agree that intend to continue working at the club.

42 Summary of Results Satisfaction Model Top two rules identify 65% of most satisfied Top three rules identify 79% of most satisfied Recommend to Friend Top three rules identify 66% of most likely to recommend to friend Intend to Keep Working at Club Top three rules identify 63% of most likely to keep working 42

43 Summary of Results Satisfaction keys: Make an environment where employees feel welcome, and have a sense of purpose Recommend to a Friend keys Supervisors treat employees with respect and either good pay or it is perceived that members really like the club Will work at club in a years time For those under 35: feel welcome (relationships) For those over 35 (or worked at club a long time): feel welcome and good working conditions For those who don t t feel welcome, need good opportunities for professional growth 43

44 Conclusions Trees can be used to provide concise summaries of behavioral tendencies from surveys Regression shows global, average attitudes Trees show specific, localized attitudes Two or three rules can describe nearly 2/3 of all employee attitudes of interest Rules make sense, and are easy to explain Rules and are actionable 44

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

CART 6.0 Feature Matrix

CART 6.0 Feature Matrix CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008

More information

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor

More information

Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide

Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide Olivia Parr-Rud From Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner. Full book available

More information

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

More information

1 Choosing the right data mining techniques for the job (8 minutes,

1 Choosing the right data mining techniques for the job (8 minutes, CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry Advances in Natural and Applied Sciences, 3(1): 73-78, 2009 ISSN 1995-0772 2009, American Eurasian Network for Scientific Information This is a refereed journal and all articles are professionally screened

More information

Enhancing Compliance with Predictive Analytics

Enhancing Compliance with Predictive Analytics Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue reid.linn@state.tn.us Sifting through a Gold Mine of Tax Data

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

IBM SPSS Direct Marketing 20

IBM SPSS Direct Marketing 20 IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to

More information

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov Start-up Companies Predictive Models Analysis Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov Abstract: A quantitative research is performed to derive a model for predicting the success of Bulgarian start-up

More information

Predictive Modeling of Titanic Survivors: a Learning Competition

Predictive Modeling of Titanic Survivors: a Learning Competition SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224

More information

An Overview and Evaluation of Decision Tree Methodology

An Overview and Evaluation of Decision Tree Methodology An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

TDWI Best Practice BI & DW Predictive Analytics & Data Mining TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost

More information

A Property and Casualty Insurance Predictive Modeling Process in SAS

A Property and Casualty Insurance Predictive Modeling Process in SAS Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects

More information

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America Application of SAS! Enterprise Miner in Credit Risk Analytics Presented by Minakshi Srivastava, VP, Bank of America 1 Table of Contents Credit Risk Analytics Overview Journey from DATA to DECISIONS Exploratory

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

More information

Customer and Business Analytic

Customer and Business Analytic Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint

More information

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

Environmental Scan of the Radiographer s Workplace: Technologist vs. Administrator Perspectives, 2001 February 2002

Environmental Scan of the Radiographer s Workplace: Technologist vs. Administrator Perspectives, 2001 February 2002 Environmental Scan of the Radiographer s Workplace: Technologist vs. Administrator Perspectives, 2001 February 2002 2002 American Society of Radiologic Technologists. All rights reserved. Reproduction

More information

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration Business Analytics using Data Mining Project Report Optimizing Operation Room Utilization by Predicting Surgery Duration Project Team 4 102034606 WU, CHOU-CHUN 103078508 CHEN, LI-CHAN 102077503 LI, DAI-SIN

More information

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

Using Adaptive Random Trees (ART) for optimal scorecard segmentation A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Customer Perception and Reality: Unraveling the Energy Customer Equation

Customer Perception and Reality: Unraveling the Energy Customer Equation Paper 1686-2014 Customer Perception and Reality: Unraveling the Energy Customer Equation Mark Konya, P.E., Ameren Missouri; Kathy Ball, SAS Institute ABSTRACT Energy companies that operate in a highly

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

TEXT ANALYTICS INTEGRATION

TEXT ANALYTICS INTEGRATION TEXT ANALYTICS INTEGRATION A TELECOMMUNICATIONS BEST PRACTICES CASE STUDY VISION COMMON ANALYTICAL ENVIRONMENT Structured Unstructured Analytical Mining Text Discovery Text Categorization Text Sentiment

More information

Using Excel for Statistical Analysis

Using Excel for Statistical Analysis Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining

More information

Identifying Characteristics of High School Dropouts: Data Mining With A Decision Tree Model

Identifying Characteristics of High School Dropouts: Data Mining With A Decision Tree Model Identifying Characteristics of High School Dropouts: Data Mining With A Decision Tree Model William R. Veitch, Ph.D. Colorado Springs (CO) School District 11 Presented at the Annual Meeting of the American

More information

Marketing Strategies for Retail Customers Based on Predictive Behavior Models

Marketing Strategies for Retail Customers Based on Predictive Behavior Models Marketing Strategies for Retail Customers Based on Predictive Behavior Models Glenn Hofmann HSBC Salford Systems Data Mining 2005 New York, March 28 30 0 Objectives Inform about effective approach to direct

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

The Predictive Data Mining Revolution in Scorecards:

The Predictive Data Mining Revolution in Scorecards: January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

More information

Benchmarking of different classes of models used for credit scoring

Benchmarking of different classes of models used for credit scoring Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Smart Sell Re-quote project for an Insurance company.

Smart Sell Re-quote project for an Insurance company. SAS Analytics Day Smart Sell Re-quote project for an Insurance company. A project by Ajay Guyyala Naga Sudhir Lanka Narendra Babu Merla Kiran Reddy Samiullah Bramhanapalli Shaik Business Situation XYZ

More information

An Introduction to Advanced Analytics and Data Mining

An Introduction to Advanced Analytics and Data Mining An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit

More information

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

More information

Whitepaper. Power of Predictive Analytics. Published on: March 2010 Author: Sumant Sahoo

Whitepaper. Power of Predictive Analytics. Published on: March 2010 Author: Sumant Sahoo Published on: March 2010 Author: Sumant Sahoo 2009 Hexaware Technologies. All rights reserved. Table of Contents 1. Introduction 2. Problem Statement / Concerns 3. Solutions / Approaches to address the

More information

Course Syllabus. Purposes of Course:

Course Syllabus. Purposes of Course: Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

The CRM Lifecycle Without CRM Analytics, Your Customers Won t Even Know You re There

The CRM Lifecycle Without CRM Analytics, Your Customers Won t Even Know You re There 5.0 White Paper WRITTEN BY Julie Hahnke IDTech http://hahnke.crmproject.com The CRM Lifecycle Without CRM Analytics, Your Customers Won t Even Know You re There CRM is one of the fastest growing business

More information

Workforce Insights Employee Satisfaction Surveying

Workforce Insights Employee Satisfaction Surveying Workforce Insights Employee Satisfaction Surveying Overview One significant factor in your call center s success is how happy and satisfied the employees are. Employee satisfaction has an extremely high

More information

SILVERPOP Step-Up Plan

SILVERPOP Step-Up Plan Big Scary Cranium SILVERPOP Step-Up Plan Evolve from Email marketing to behavioral marketing automation Get more from your investment in IBM Silverpop See your personalized road map to better performance

More information

Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*)

Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*) Business Intelligence Professor Chen NAME: Due Date: Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*) Tutorial Summary Objective: Richard would

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

New Clergy Compensation Report

New Clergy Compensation Report New Clergy Compensation Report August 7, 2014 Mark McCormack Director of Research and Evaluation Joel Cummings Research and Evaluation Associate New Clergy Compensation Report 1 Introduction As part of

More information

Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

More information

When to Use a Particular Statistical Test

When to Use a Particular Statistical Test When to Use a Particular Statistical Test Central Tendency Univariate Descriptive Mode the most commonly occurring value 6 people with ages 21, 22, 21, 23, 19, 21 - mode = 21 Median the center value the

More information

Summary. WHITE PAPER Using Segmented Models for Better Decisions

Summary. WHITE PAPER Using Segmented Models for Better Decisions WHITE PAPER Using Segmented Models for Better Decisions Summary Experienced modelers readily understand the value to be derived from developing multiple models based on population segment splits, rather

More information

Chapter 7: Data Mining

Chapter 7: Data Mining Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

More information

Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System

Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System FTA TECHNOLOGY CONFERENCE 2003 Bill Haffey, SPSS Inc. Daniele Micci-Barreca, Elite Analytics LLC Agenda ß Data Mining Overview

More information

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

More information

Data Mining Jargon. Bob Muenchen The Statistical Consulting Center

Data Mining Jargon. Bob Muenchen The Statistical Consulting Center Data Mining Jargon Bob Muenchen The Statistical Consulting Center Data mining is the automated search for useful patterns in data. It uses tools from many different disciplines, each of which uses its

More information

Predictive Analytics: Extracts from Red Olive foundational course

Predictive Analytics: Extracts from Red Olive foundational course Predictive Analytics: Extracts from Red Olive foundational course For more details or to speak about a tailored course for your organisation please contact: Jefferson Lynch: jefferson.lynch@red-olive.co.uk

More information

Banking Analytics Training Program

Banking Analytics Training Program Training (BAT) is a set of courses and workshops developed by Cognitro Analytics team designed to assist banks in making smarter lending, marketing and credit decisions. Analyze Data, Discover Information,

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

Decision Trees What Are They?

Decision Trees What Are They? Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a

More information

Data Mining is the process of knowledge discovery involving finding

Data Mining is the process of knowledge discovery involving finding using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden

More information

A fast, powerful data mining workbench designed for small to midsize organizations

A fast, powerful data mining workbench designed for small to midsize organizations FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business

More information

Lecture 6 - Data Mining Processes

Lecture 6 - Data Mining Processes Lecture 6 - Data Mining Processes Dr. Songsri Tangsripairoj Dr.Benjarath Pupacdi Faculty of ICT, Mahidol University 1 Cross-Industry Standard Process for Data Mining (CRISP-DM) Example Application: Telephone

More information

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. As a methodology, it includes descriptions of the typical phases

More information

Types of Cities in the United States

Types of Cities in the United States Using Data Analytics in Government to make Better Decisions Mayor Stephen Goldsmith Daniel Paul Professor of Government Director, Innovations in American Government stephen_goldsmith@harvard.edu datasmart.ash.harvard.edu

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Software User Experience and Likelihood to Recommend: Linking UX and NPS

Software User Experience and Likelihood to Recommend: Linking UX and NPS Software User Experience and Likelihood to Recommend: Linking UX and NPS Erin Bradner User Research Manager Autodesk Inc. One Market St San Francisco, CA USA erin.bradner@autodesk.com Jeff Sauro Founder

More information

OPTIMIZING YOUR MARKETING STRATEGY THROUGH MODELED TARGETING

OPTIMIZING YOUR MARKETING STRATEGY THROUGH MODELED TARGETING OPTIMIZING YOUR MARKETING STRATEGY THROUGH MODELED TARGETING 1 Introductions An insights-driven customer engagement firm Analytics-driven Marketing ROI focus Direct mail optimization 1.5 Billion 1:1 pieces

More information

How To Predict Diabetes In A Cost Bucket

How To Predict Diabetes In A Cost Bucket Paper PH10-2012 An Analysis of Diabetes Risk Factors Using Data Mining Approach Akkarapol Sa-ngasoongsong and Jongsawas Chongwatpol Oklahoma State University, Stillwater, OK 74078, USA ABSTRACT Preventing

More information

analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics

analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics stone analytics Automated Analytics and Predictive Modeling A White Paper by Stone Analytics 3665 Ruffin Road, Suite 300 San Diego, CA 92123 (858) 503-7540 www.stoneanalytics.com Page 1 Automated Analytics

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Improving Customer Contact Quality

Improving Customer Contact Quality Improving Customer Contact Quality An Extract from Call Quality Practices 2009 Call quality monitoring is one of the most effective methods for improving the level of service you provide to your customers.

More information

Lowering social cost of car accidents by predicting high-risk drivers

Lowering social cost of car accidents by predicting high-risk drivers Lowering social cost of car accidents by predicting high-risk drivers Vannessa Peng Davin Tsai Shu-Min Yeh Why we do this? Traffic accident happened every day. In order to decrease the number of traffic

More information

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications Gary Miner Dursun Delen John Elder Charlottesville, VA, USA Andrew Fast Charlottesville, VA, USA Thomas Hill Robert

More information

Marketing Advanced Analytics. Predicting customer churn. Whitepaper

Marketing Advanced Analytics. Predicting customer churn. Whitepaper Marketing Advanced Analytics Predicting customer churn Whitepaper Churn prediction The challenge of predicting customers churn It is between five and fifteen times more expensive for a company to gain

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

PAKDD 2006 Data Mining Competition

PAKDD 2006 Data Mining Competition PAKDD 2006 Data Mining Competition Date Submitted: February 28 th, 2006 SAS Enterprise Miner, Release 4.3 Team Members Bhuvanendran, Aswin Bommi Narasimha, Sankeerth Reddy Jain, Amit Rangwala, Zenab Table

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

IBM SPSS Neural Networks 22

IBM SPSS Neural Networks 22 IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,

More information

PASW Direct Marketing 18

PASW Direct Marketing 18 i PASW Direct Marketing 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

Introduction to IBM Watson Analytics Data Loading and Data Quality

Introduction to IBM Watson Analytics Data Loading and Data Quality Introduction to IBM Watson Analytics Data Loading and Data Quality December 16, 2014 Document version 2.0 This document applies to IBM Watson Analytics. Licensed Materials - Property of IBM Copyright IBM

More information

Easily Identify the Right Customers

Easily Identify the Right Customers PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your

More information

Insurance Premium Increase Optimization: Case Study. Charles Pollack B.Ec F.I.A.A.

Insurance Premium Increase Optimization: Case Study. Charles Pollack B.Ec F.I.A.A. Insurance Premium Increase Optimization: Case Study Charles Pollack B.Ec F.I.A.A. Agenda Introduction Business Rules CART analysis to identify customer groups Elasticity modelling for each group Setting

More information