Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses

Similar documents

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

CART 6.0 Feature Matrix

Data Mining Applications in Higher Education

A Property & Casualty Insurance Predictive Modeling Process in SAS

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

1 Choosing the right data mining techniques for the job (8 minutes,

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

Enhancing Compliance with Predictive Analytics

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 20

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

Predictive Modeling of Titanic Survivors: a Learning Competition

An Overview and Evaluation of Decision Tree Methodology

2015 Workshops for Professors

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

A Property and Casualty Insurance Predictive Modeling Process in SAS

Data Mining: Overview. What is Data Mining?

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

How to Get More Value from Your Survey Data

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Customer and Business Analytic

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data

Environmental Scan of the Radiographer s Workplace: Technologist vs. Administrator Perspectives, 2001 February 2002

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining Methods: Applications for Institutional Research

TEXT ANALYTICS INTEGRATION

Using Excel for Statistical Analysis

Gerry Hobbs, Department of Statistics, West Virginia University

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

Marketing Strategies for Retail Customers Based on Predictive Behavior Models

Using multiple models: Bagging, Boosting, Ensembles, Forests

The Predictive Data Mining Revolution in Scorecards:

Benchmarking of different classes of models used for credit scoring

The Data Mining Process

Smart Sell Re-quote project for an Insurance company.

An Introduction to Advanced Analytics and Data Mining

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Whitepaper. Power of Predictive Analytics. Published on: March 2010 Author: Sumant Sahoo

Course Syllabus. Purposes of Course:

Data Mining Part 5. Prediction

Workforce Insights Employee Satisfaction Surveying

Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*)

Easily Identify Your Best Customers

New Clergy Compensation Report

Free Trial - BIRT Analytics - IAAs

When to Use a Particular Statistical Test

Summary. WHITE PAPER Using Segmented Models for Better Decisions

Chapter 7: Data Mining

Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Predictive Analytics: Extracts from Red Olive foundational course

Banking Analytics Training Program

Data Mining Techniques Chapter 6: Decision Trees

Decision Trees What Are They?

Data Mining is the process of knowledge discovery involving finding

A fast, powerful data mining workbench designed for small to midsize organizations

Lecture 6 - Data Mining Processes

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

Introduction to Data Mining

Software User Experience and Likelihood to Recommend: Linking UX and NPS

analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Improving Customer Contact Quality

Lowering social cost of car accidents by predicting high-risk drivers

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

Marketing Advanced Analytics. Predicting customer churn. Whitepaper

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

not possible or was possible at a high cost for collecting the data.

PAKDD 2006 Data Mining Competition

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

How To Make A Credit Risk Model For A Bank Account

Leveraging Ensemble Models in SAS Enterprise Miner

IBM SPSS Neural Networks 22

Introduction to IBM Watson Analytics Data Loading and Data Quality

Easily Identify the Right Customers

Insurance Premium Increase Optimization: Case Study. Charles Pollack B.Ec F.I.A.A.

Transcription:

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Salford Systems Data Mining 2006 March 27-31 2006 San Diego, CA By Dean Abbott Abbott Analytics http://www.abbottanalytics.com

Acknowledgements Work done under contract with Seer Analytics Subcontractors: Tessar and Associates (now Mobile Foundry), Abbott Consulting (now Abbott Analytics) we help you see what's there. SEER Seer Analytics, LLC 518 North Tampa Street Tampa, FL 33602 813-318 318-0111 http://www.seeranalytics.com http://www.mobilefoundry.net www.mobilefoundry.net/ 2

About Abbott Analytics Abbott Analytics Founded in 1999, based in San Diego, CA Dedicated to data mining consulting and training Principal: Dean Abbott Applied Data Mining for 19+ years in Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Detection, Predictive Toxicology, Biological Risk Assessment Course Instruction Public 2-day 2 Data Mining Courses Conference Tutorials Customized Training and Knowledge Transfer Data mining methodology (CRISP-DM) Training services for software products, including CART, Clementine, Affinium Model, Insightful Miner 3

Talk Outline Member survey Survey description Results using statistical modeling Lessons learned Employee survey Survey description Results using decision trees (CART) Lessons learned 4

Problem Setup: Member Survey Question: What are the characteristics of members who indicated the highest overall satisfaction with their Club? Data: 32,811 records containing survey answers No demographic data except what was on survey (marital status, children, age, gender) Approach: Create supervised learning models with target variable overall_satisfaction = 1 1 5

Data Preparation Begin with 57 candidate inputs to model All survey questions are multiple choice Treated as categories, not numbers Typically 6 categories per question (1-5) Unknown initially coded as 0 No text comments fields included as inputs to model Create new column for target variable If overall_satisfaction = 1, variable value = 1, otherwise, variable value = 0 Data very clean with respect to missing data Only needed to record # children fields Number missing 11,006 children < 6; 10,701 children 6-12; 6 10,873 children 13-17; 17; 4,936 children (overall) When missing, recoded values with -1 to indicate missing 6

Member Survey Question Categories 7

Sampling Begin with 32,811 responses Set aside about half for validation (not used during modeling): 16,379 records These records will be used to provide final summaries of the segments 16,433 records used in creating and scoring model 5,059 had overall satisfaction = 1 (30.8%) Model 1 splits data into training and testing data: 2/3 for training (creating model), 1/3 for testing (scoring and ranking models) Approximately 11,503 for training; 4,930 for testing 8

Relationship of Overall Satisfaction to Recommend to Friends Of the 4912 / 16739 (30.2%) with Overall Satisfaction = 1 86% have Recommend to friends = 1 Of the 8708 / 16739 (54%) with Recommend to Friends = 1 49% have Overall Satis. = 1 4227 / 16739 (26.0%) have both overall satisfaction and recommend to friends both equal to 1 This is the biggest bin of the cross tab, followed by Overall = 2 / recommend = 2 (24%; 3890 / 16739) Overall = 2 / recommend = 1 (22%; 3565 / 16739) No other bin greater than 5% of records Recommend to Friend RECOMMEND. 5 4 3 2 1 0 0 1 2 3 4 OVERALL.RA Overall satisfaction 9

Objective and Data Challenges Project Objective Interpret results of survey for large health club (not a predictive model) Challenges Missing data (some questions either N/A or blank) Solution: Impute values that least effect information communicated by question (not a mean or median!) Answers (target variables) highly correlated with one another Multi-collinearity and interpretation of results problematic Must reduce dimensionality without losing interpretation of results Solution: Factor analysis Target variable Three questions pointed to the important actionable information (related to how satisfied members were) Solution: combine all three into a new index of excellence 10

Data Preprocessing Approach Reduce input data (for understanding) Use factor analysis to identify groupings of variables that are interesting. Factors can be candidate inputs to models, but didn t t work as well on this data Selected as inputs, those variables with highest loadings as representative of that type of factor Also retained key questions in addition to the factor analysis representative questions The effect is to remove questions too highly correlated with one another, while maintaining relevant information for modeling. 11

Predictive Modeling Approach 60+ 60+ Survey Survey Questions 3 key questions Identify Identify Key Key Questions Factor Factor Analysis: 10 10 factors factors 3 questions with high association with target Regression Model: Model: Find Find Significant Variables 10 factors, or variables that loaded highest on each factor 13 fields down to 7 Regression Model: Model: Find Find Significant Variables Variable ranks 12

Factor Analysis: Making the Complex Simple loadings Factor 1 Loading 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Factor1 Factor2 Factor3 Facto r4 Factor5 Factor6 Factor Factor7 Factor8 Factor9 Factor10 loadings Loading Value Loading Values 1.00 0.80 0.60 0.40 0.20 0.00 0.80 0.60 0.40 0.20 0.00 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Top Question Loadings Factor 2 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23 Top Question Loadings 13

Member Survey Factor Analysis Loadings 14

Reduce Variables using Regression Already beginning with only 13 variables Question: how many of these are useful predictors? Decided to retain 5 factors for final model Regression Coefficient 0.6 0.5 0.4 0.3 0.2 0.1 0 Regression Rankings of Questions/Factors Q44 Q22 Q25 factor3.2 factor3.9 factor3.1 factor3.4 factor3.3 factor3.8 factor3.10 factor3.6 factor3.5 factor3.7 Question/Factor 15

Explaining Results Through Visualization Customer was not interested in techno solutions Customer was interested in what actions could be taken as a result of the data mining models Which characteristics are most correlated with best customers? What do they like and dislike about the club? Is it equipment? relationships? facility? staff? Show key contributors, how each club compared with other club locations, and if club is improving 16

Key: Explaining Results Visualization shows key variables in survey associated with excellence,, and performance metrics for each club How well did this club do? What is the change over last year s result? Shows which attributes does the club need to improve to improve customer satisfaction. Drivers of Satisfaction Staff 2 Staff 1 equipment facility value relationships goals 17

So What s The Problem with That? Regression, Neural Networks are global estimators The operate over the entire data space Descriptors of Regression represent average influence Neither technique provides explicit localized characteristics Customer would like actionable analytics Clear characteristics of subgroups Different strategies for subgroups Conclusion: In Round 2 (Employee Survey), use another approach 18

Employee Survey Analysis Problem Setup Very similar to member survey 60+ questions Few demographics Attitudes the job How to handle questions They are ordinal, but CART supports interval and nominal types Treat as categorical, but make sure values aren t t split up If see a split on a question having values 1, 2, 4 rebuild 4 as interval variable Didn t t happen this way though all worked out well 19

Employee Survey Question Groupings 20

Employee Survey: Target Variable Definition Predict key attitudes that are consequents Satisfaction Recommend to a Friend Intend to Work Next Year at Club Club is Good Place to Work Exclude these from each others models They are highly correlated with each other Models that predict a target variable with these as inputs are not n actionable Key Predictors, questions relating to: Communications with management Quality of supervisors Training received Effectiveness of club Fairness of policies Perceived member attitudes 21

Employee Satisfaction (=1) Model: Data Information File: modeling data with binarized dependents w missing.txt Target Variable: Q1_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct Cases 0 4,645 76.0% 1 1,470 24.0% 22

Employee Satisfaction Model: Performance Class N Cases N Misclassified Pct. Class 0 4,645 953 20.52 1 1,470 315 21.43 Cases Target % of Node % Target Cum % Cum % Cases in Node Class Tgt. Class Class Tgt. Class Pop % Pop Node Cum lift Lift 8 859 60.75 58.44 58.44 23.12 23.12 1,414 2.53 2.53 4 95 43.58 6.46 64.90 26.69 3.57 218 2.43 1.81 7 201 42.23 13.67 78.57 34.47 7.78 476 2.28 1.76 3 30 17.44 2.04 80.61 37.29 2.81 172 2.16 0.73 5 92 14.38 6.26 86.87 47.75 10.47 640 1.82 0.60 6 14 13.86 0.95 87.82 49.40 1.65 101 1.78 0.58 2 124 10.12 8.44 96.26 69.44 20.03 1,225 1.39 0.42 1 55 2.94 3.74 100.00 100.00 30.56 1,869 1.00 0.12 23

Employee Satisfaction Model: Splits 1 Q8 Q7 Q3 2 Q36 3 4 5 Q18 Q3 Q32 67 8 Q8: Feel Welcome Surrogate: Q27 (family friendly), Q28 (inclusive environment), Q18 (good working conditions) Q18: Good working conditions Surrogate: Q17 (necessary support/materials to do job) Q3: Feeling of accomplishment Surrogates: Q6 (responsibilities good fit with interests/skills) Q7: Staff Competent Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth) 24

Employee Satisfaction: Q8 Split (root node) Competitor Split Improvement winner Q8 1 0.1174 1 Q18 1 0.1169 2 Q3 1 0.0998 3 Q35 1 0.0957 4 Q6 1 0.0951 5 Q7 1,2 0.094 Strongly agree feel welcome 25

Employee Satisfaction: Q18 Split (right side or root) Competitor Split Improvement Winner Q18 1 0.0271 1 Q3 1 0.0203 2 Q35 1 0.0195 3 Q6 1 0.0177 4 Q14 1,5 0.0172 5 Q13 1,5 0.0167 Strongly agree feel welcome This is the best terminal node for satisfaction 26

Employee Satisfaction Model: Key Variables Variable Score Q18 100 Q8 81.02 Q14 72.03 Q27 55.11 Q26 50.53 Q28 50.12 Q5 17.66 Q3 14.14 Q17 14.05 Q11 13.15 Q7 11.89 Q13 11.56 Q6 11.27 Q33 11.03 Q16 9.6 Primary splitters only Variable Score Q8 100 Q18 23.11 Q3 17.46 Q7 14.68 Q36 2.88 Q32 2.68 Q8: Feel Welcome Surrogate: Q27 (family friendly), Q28 (inclusive environment), Q18 (good working conditions) Q18: Good working conditions Surrogate: Q17 (necessary support/materials to do job) Q3: Feeling of accomplishment Surrogates: Q6 (responsibilities good fit with interests/skills) Q7: Staff Competent Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth) 27

Member Satisfaction Model: Key Rules /*Rules for terminal node 8*/ Matches 1,414 surveys (23.1%), 859 highly satisfied (60.8%), 58.4% of all highly satisfied RULE: If ( Q18 = 1 and Q8 = 1) Then Highly Satisfied P(0) = 0.39; P(1) = 0.61; Lift 2.5 If strongly agree that there are good working conditions and strongly agree that member feels welcome, then highly satisfied /*Rules for terminal node 7 */ Matches 476 surveys (7.8%), 201 highly satisfied (42.2%), 13.7% of all highly satisfied RULE: If ( Q8 = 1 and Q18 <> 1 and Q3 == 1 and Q32 == 1 or 2) Then Highly Satisfied P(0) = 0.58; P(1) = 0.42; Lift 1.8 If strongly agree that feel welcome and strongly agree working at the club gives feeling of personal accomplishment, and agree management will take interests into account, even if don t strongly agree good working conditions, then highly satisfied /*Rules for terminal node 4 */ Matches 218 surveys (3.6%), 95 highly satisfied (43.6%), 6.5% of all highly satisfied RULE: If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 == 1 and Q36 == 1 or 2) Then Highly Satisfied P(0) = 0.56; P(1) = 0.44; Lift 1.8 If agree that I ll be recognized for doing a good job, and strongly agree working at the club gives feeling of personal accomplishment, and agree that am paid fairly, even if don t strongly agree feel welcome, then 28 highly satisfied

Member Satisfaction Model: Unsatisfied Rules /*Rules for terminal node 1*/ Matches 1,869 surveys (30.6%), 55 highly satisfied (2.9%), 3.7% of highly satisfied 39.0% of all not highly satisfied RULE: If ( Q8 <> 1 and Q7 <> 1 or 2) Then not highly satisfied P(0) = 0.96; P(1) = 0.04; Lift 0.12 If don t strongly agree that feel welcome and don t agree that will be properly recognized for a good job, then not highly satisfied. /*Rules for terminal node 2 */ Matches 1,225 surveys (20.0%), 124 highly satisfied (10.1%), 8.4% of highly satisfied 23.7% of all not highly satisfied RULE: If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 <> 1) Then not highly satisfied P(0) = 0.90; P(1) = 0.10; Lift 0.42 If don t strongly agree that feel welcome and work doesn t give a feeling of accomplishment, even though I agree that I will be properly recognized for a good job, then Abbott not highly Analytics, satisfied. 2000-2006 /*Rules for terminal node 5*/ Matches 640 surveys (10.5%), 92 highly satisfied (14.4%), 6.3% of all highly satisfied 11.8% of all not highly satisfied RULE: If ( Q8 = 1 and Q18 <> 1 and Q3 <> 1) Then not highly satisfied P(0) = 0.86; P(1) = 0.14; Lift 0.58 If don t strongly agree that there are good working conditions and don t strongly agree that feel welcome and work doesn t give a feeling of accomplishment, even though strongly agree that feel welcome, then not highly satisfied. 29

Recommend to Friend (=1) Model: Data Information File: modeling data with binarized dependents w missing.txt Target Variable: Q44_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q19, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct 0 3,958 64.7% 1 2,157 35.3% This model includes Q19 (am treated with respect), and is the best model to report 30

Recommend to Friend Model Performance Class N Cases N Misclassified Pct. Class 0 3,958 894 22.59 1 2,157 525 24.34 Node Cases Target % of Node Cum % Cum % Cases in Class Tgt. Class % Target Class Tgt. Class Pop % Pop Node Cum lift Lift 10 1,113 71.90 51.60 51.60 25.32 25.32 1,548 2.04 2.04 9 110 58.51 5.10 56.70 28.39 3.07 188 2.00 1.66 5 198 56.57 9.18 65.88 34.11 5.72 350 1.93 1.60 4 128 49.81 5.93 71.81 38.32 4.20 257 1.87 1.41 8 83 45.36 3.85 75.66 41.31 2.99 183 1.83 1.29 3 215 29.49 9.97 85.63 53.23 11.92 729 1.61 0.84 7 36 24.83 1.67 87.30 55.60 2.37 145 1.57 0.70 2 132 15.60 6.12 93.42 69.44 13.84 846 1.35 0.44 6 12 14.12 0.56 93.97 70.83 1.39 85 1.33 0.40 1 130 7.29 Abbott 6.03 Analytics, 100.00 2000-2006 100.00 29.17 1,784 1.00 310.21

Recommend to Friend Model Splits 1 Q19 Q33 Q37 Q45 Q5 Q8 Q35 Q45 5 9 Q50 2 6 3 4 7 8 10 Q19: Treated with respect Surrogates: Q18 (good working conditions) and Q8 (feel welcome) Q37: Compensation practice is fair Surrogates: Q36 (I am paid fairly) Q45: How think members rate club Surrogates: Q47, Q46, Q60 (member-cleanliness, enough equip., check on progress) Q33: Trust management to take interests into account Surrogates: Q32 (management keeps promises), Q34 (leaders remove roadblocks to inclusion) Q5: Good opportunities for professional growth Surrogates: Q4 (responsibilities good fit with interests), Q7 (appropriately recognized) Q8: Feel welcome Surrogates: Q7 32

Recommend to Friend Model Variable Score Q8 100.0 Q19 99.1 Q18 97.4 Q15 64.5 Q16 63.1 Q14 61.3 Q33 39.6 Q35 33.8 Q32 24.7 Q34 23.9 Q31 23.9 Q9 21.5 Q7 15.4 Q45 14.8 Q37 12.9 Q5 10.0 Q36 9.7 Q4 4.3 Q38 4.0 Q22 1.6 Q50 1.4 Q26 1.0 Q48 0.8 Q47 0.7 Q28 0.6 Q46 0.6 Q11 0.3 Q51 0.3 Q60 0.1 Q49 0.0 Key Variables Primary splitters only Variable Score Q19 100 Q33 32.23 Q45 14.94 Q37 12.99 Q5 8.98 Q8 3.03 Q35 1.67 Q50 1.34 Q19: Treated with respect Surrogates: Q18 (good working conditions) and Q8 (feel welcome) Q37: Compensation practice is fair Surrogates: Q36 (I am paid fairly) Q45: How think members rate club Surrogates: Q47, Q46, Q60 (member-cleanliness, enough equip., check on progress) Q33: Trust management to take interests into account Surrogates: Q32 (management keeps promises), Q34 (leaders remove roadblocks to inclusion) Q5: Good opportunities for professional growth Surrogates: Q4 (responsibilities good fit with interests), Q7 (appropriately recognized) Q8: Feel welcome Surrogates: Q7 33

Recommend to Friend Model: Key Rules /*Rules for terminal node 10*/ Matches 1,548 surveys (25.3%), 1,113 recommend (71.9%), 51.6% of all strong recommends RULE: If ( Q19= 1 and Q37 = 1 or 2) Then Recommend = 1 P(0) = 0.281; P(1) = 0.719;; Lift = 2.0 If strongly agree that supervisors treat me with respect, and agree that compensation practice is fair, then strongly agree that will recommend to friend. /*Rules for terminal node 9*/ Matches 188 surveys (3.1%), 110 recommend 58.5%), 5.1% of all strong recommends RULE: If ( Q19 = 1 and Q37 <> 1or 2 and Q45 = 1) Then Recommend = 1 P(0) = 0.415; P(1) = 0.585; Lift = 1.7 If strongly agree that supervisors treat me with respect, and believe that members strongly agree they are highly satisfied, even though don t agree compensation practice is fair, then strongly agree that will recommend to friend /*Rules for terminal node 5*/ Matches 350 surveys (5.7%), 198 recommend (73.5%), 9.2% of all strong recommends RULE IF ( Q19 <> 1 and Q33 = 1 or 2 and Q45 = 1 ) Then Recommend = 1 P(0)= 0.434; P(1) = 0.566; Lift = 1.4 If agree that trust management will take my interests into account, and believe that members strongly agree they are highly satisfied, even though don t strongly agree supervisors treat me with respect, then strongly agree that will recommend to friend 34

Recommend to Friend Model: Rules for Not Recommending /*Rules for terminal node 1 */ Matches 1,784 surveys (29.2%), 130 highly recommend (7.3%), 94% don t highly rec. 6.0% of all highly recommend RULE: If ( Q31 <> 1 and Q22 <> 1) Then Don t Strongly Recommend P(0) = 0.94 P(1) = 0.06; /*Rules for terminal node 2 */ Matches 846 surveys (13.84%), 132 highly recommend (15.6%), 84.4% don t highly rec. 6.1% of all highly recommend RULE If ( Q19 <>1and Q33 = 1or 2 and Q45 <> 1 and Q5 <> 1 or 2) Then Don t Strongly Recommend P(0) = 0.84; P(1) = 0.16; If don t strongly agree that supervisors treat me with If don t strongly agree that supervisors treat me with respect, and respect, and don t agree that management will take don t strongly believe that members are highly satisfied, and don t interests into account, then don t strongly agree that will agree that there are good opportunities for professional growth, then recommend to friend. even though agree that management will take interests into account, Abbott Analytics, don t 2000-2006 strongly agree that will recommend to friend. 35

Intend to Continue Working at Club (=1) Model: Data Information File:modeling data with binarized dependents w missing.txt Target Variable: Q39_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct 0 3,030 49.6% 1 3,085 50.4% 36

Intend to Continue Working at Club: Model Performance Pct. Class N Cases N Misclassified Misclass 0 3,030 868 28.65 1 3,085 849 27.52 Cases Target % of Node % Target Cum % Cum % Cases in Node Class Tgt. Class Class Tgt. Class Pop % Pop Node Cum lift Lift 10 1,099 80.81 35.62 35.62 22.24 22.24 1,360 1.60 1.60 9 486 69.63 15.75 51.38 33.66 11.42 698 1.53 1.38 5 349 67.38 11.31 62.69 42.13 8.47 518 1.49 1.34 8 100 65.36 3.24 65.93 44.63 2.50 153 1.48 1.30 4 202 53.87 6.55 72.48 50.76 6.13 375 1.43 1.07 7 75 43.86 2.43 74.91 53.56 2.80 171 1.40 0.87 2 224 35.33 7.26 82.17 63.93 10.37 634 1.29 0.70 3 43 33.59 1.39 83.57 66.02 2.09 128 1.27 0.67 6 65 30.23 2.11 85.67 69.53 3.52 215 1.23 0.60 37 1 442 23.73 14.33 100.00 100.00 30.47 1,863 1.00 0.47

Intend to Continue Working at Club Model: Splitters 1 Q8 Q5 Q7 Q66 2 Q56 5 6 Q18 Q5 Q6 3 4 7 8 Q69 9 10 Q8: Feel Welcome Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions) Q69: Age Surrogate: Q66 (how long worked at Club), Q68 (education) Q18: Good Working Conditions Q17 (have necessary support and materials to do job) Q5: Good Opportunities for Professional Growth Q7, Q33 (Management will take my interests into account) Q7: Will be Recognized for Good Job Q15 (Work is appreciated) 38

Intend to Continue Working at Club Variable Score Q8 100 Q18 84.13 Q27 63.23 Q11 57.03 Q28 50.45 Q26 48.54 Q7 43.43 Q5 37.23 Q33 32.81 Q31 23.56 Q69 22.21 Q4 21.86 Q9 18.79 Q3 13.82 Q13 9.98 Q14 9.46 Q16 8.12 Q15 6.03 Q66 5.26 Q17 3.99 Q56 2.15 Q6 2.03 Q23 1.63 Q68 1.23 Model: Key Variables Primary splitters only Variable Score Q8 100 Q5 37.07 Q69 17.48 Q7 11.24 Q18 10.7 Q66 5.19 Q56 2.15 Q6 2.03 Q8: Feel Welcome Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions) Q69: Age Surrogate: Q66 (how long worked at Club), Q68 (education) Q18: Good Working Conditions Q17 (have necessary support and materials to do job) Q5: Good Opportunities for Professional Growth Q7, Q33 (Management will take my interests into account) Q7: Will be Recognized for Good Job Q15 (Work is appreciated) 39

Intend to Continue Working at Club Model: Key Rules /*Rules for terminal node 10 */ Matches 1,360 surveys (22.2%), 1,099 intend to continue (80.8%), 35.6% of all intend to continue RULE: If (Q8 = 1 and Q69>=2.5 ) Then Intend to continue P(0) = 0.19; P(1) = 0.81;; Lift = 1.6 If strongly agree that feel welcome and am 35 years old or older, then strongly agree that intend to continue working at the club. /*Rules for terminal node 9 */ Matches 698 surveys (11.4%), 486 intend to continue (69.6%), 15.8% of all intend to continue RULE: If ( Q8 = 1 and Q18 = 1and Q69 <= 2.5 ) Then Intend to continue P(0) = 0.30; P(1) = 0.70; Lift = 1.4 If strongly agree that feel welcome and strongly agree that there are good working conditions, am older than 35 years old, then strongly agree that intend to continue Abbott Analytics, working at 2000-2006 the club. /*Rules for terminal node 5 */ Matches 518 surveys (8.5%), 349 intend to continue (67.4%), 11.3% of all intend to contiue RULE IF ( Q8 <> 1 and Q5 = 1 or 2 and Q7 = 1 or 2 and Q66 > 2.5 ) Then Intend to continue P(0)= 0.32; P(1) = 0.68; Lift = 1.3 If I strongly agree that if I do a good job I ll be recognized, and I strongly agree that there are good opportunities for professional growth, and I have worked at the club for more than 2 years, even though don t strongly agree that feel welcome, then I strongly agree that intend 40 to continue working at the club.

Intend to Continue Working at Club Model: Rules for Don t Strongly Intend to Continue /* Rules for terminal node 1 */ Matches 1,863 surveys (30.5%), 442 strongly intend to continue working (23.7%), 14.3% of all strongly intend to continue working 46.9% of all not strongly intending to continue RULE: If ( Q8 <> 1 and Q5 <> 1 or 2) Then not strongly intending to continue working at club P(0) = 0.76; P(1) = 0.24; Lift 0.47 /*Rules for terminal node 2 */ Matches 634 surveys (10.4%), 224 strongly intend to continue working (35.3%), 7.3% of all strongly intend to continue working 13.5% of all not strongly intending to continue working RULE If ( Q8 <> 1 and Q5 = 1 or 2 and Q7 <> 1 or 2 ) Then not strongly intending to continue working at club P(0) = 0.65; P(1) = 0.35; Lift 0.70 If don t strongly agree that feel welcome and don t If don t strongly agree that feel welcome and don t strongly agree that there are good opportunities for strongly agree that if I do a good job I ll be recognized, professional growth, then don t strongly agree that even though I strongly agree that there are good intend to continue working at the club. opportunities for professional growth, then don t strongly 41 agree that intend to continue working at the club.

Summary of Results Satisfaction Model Top two rules identify 65% of most satisfied Top three rules identify 79% of most satisfied Recommend to Friend Top three rules identify 66% of most likely to recommend to friend Intend to Keep Working at Club Top three rules identify 63% of most likely to keep working 42

Summary of Results Satisfaction keys: Make an environment where employees feel welcome, and have a sense of purpose Recommend to a Friend keys Supervisors treat employees with respect and either good pay or it is perceived that members really like the club Will work at club in a years time For those under 35: feel welcome (relationships) For those over 35 (or worked at club a long time): feel welcome and good working conditions For those who don t t feel welcome, need good opportunities for professional growth 43

Conclusions Trees can be used to provide concise summaries of behavioral tendencies from surveys Regression shows global, average attitudes Trees show specific, localized attitudes Two or three rules can describe nearly 2/3 of all employee attitudes of interest Rules make sense, and are easy to explain Rules and are actionable 44