Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Salford Systems Data Mining 2006 March 27-31 2006 San Diego, CA By Dean Abbott Abbott Analytics http://www.abbottanalytics.com
Acknowledgements Work done under contract with Seer Analytics Subcontractors: Tessar and Associates (now Mobile Foundry), Abbott Consulting (now Abbott Analytics) we help you see what's there. SEER Seer Analytics, LLC 518 North Tampa Street Tampa, FL 33602 813-318 318-0111 http://www.seeranalytics.com http://www.mobilefoundry.net www.mobilefoundry.net/ 2
About Abbott Analytics Abbott Analytics Founded in 1999, based in San Diego, CA Dedicated to data mining consulting and training Principal: Dean Abbott Applied Data Mining for 19+ years in Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Detection, Predictive Toxicology, Biological Risk Assessment Course Instruction Public 2-day 2 Data Mining Courses Conference Tutorials Customized Training and Knowledge Transfer Data mining methodology (CRISP-DM) Training services for software products, including CART, Clementine, Affinium Model, Insightful Miner 3
Talk Outline Member survey Survey description Results using statistical modeling Lessons learned Employee survey Survey description Results using decision trees (CART) Lessons learned 4
Problem Setup: Member Survey Question: What are the characteristics of members who indicated the highest overall satisfaction with their Club? Data: 32,811 records containing survey answers No demographic data except what was on survey (marital status, children, age, gender) Approach: Create supervised learning models with target variable overall_satisfaction = 1 1 5
Data Preparation Begin with 57 candidate inputs to model All survey questions are multiple choice Treated as categories, not numbers Typically 6 categories per question (1-5) Unknown initially coded as 0 No text comments fields included as inputs to model Create new column for target variable If overall_satisfaction = 1, variable value = 1, otherwise, variable value = 0 Data very clean with respect to missing data Only needed to record # children fields Number missing 11,006 children < 6; 10,701 children 6-12; 6 10,873 children 13-17; 17; 4,936 children (overall) When missing, recoded values with -1 to indicate missing 6
Member Survey Question Categories 7
Sampling Begin with 32,811 responses Set aside about half for validation (not used during modeling): 16,379 records These records will be used to provide final summaries of the segments 16,433 records used in creating and scoring model 5,059 had overall satisfaction = 1 (30.8%) Model 1 splits data into training and testing data: 2/3 for training (creating model), 1/3 for testing (scoring and ranking models) Approximately 11,503 for training; 4,930 for testing 8
Relationship of Overall Satisfaction to Recommend to Friends Of the 4912 / 16739 (30.2%) with Overall Satisfaction = 1 86% have Recommend to friends = 1 Of the 8708 / 16739 (54%) with Recommend to Friends = 1 49% have Overall Satis. = 1 4227 / 16739 (26.0%) have both overall satisfaction and recommend to friends both equal to 1 This is the biggest bin of the cross tab, followed by Overall = 2 / recommend = 2 (24%; 3890 / 16739) Overall = 2 / recommend = 1 (22%; 3565 / 16739) No other bin greater than 5% of records Recommend to Friend RECOMMEND. 5 4 3 2 1 0 0 1 2 3 4 OVERALL.RA Overall satisfaction 9
Objective and Data Challenges Project Objective Interpret results of survey for large health club (not a predictive model) Challenges Missing data (some questions either N/A or blank) Solution: Impute values that least effect information communicated by question (not a mean or median!) Answers (target variables) highly correlated with one another Multi-collinearity and interpretation of results problematic Must reduce dimensionality without losing interpretation of results Solution: Factor analysis Target variable Three questions pointed to the important actionable information (related to how satisfied members were) Solution: combine all three into a new index of excellence 10
Data Preprocessing Approach Reduce input data (for understanding) Use factor analysis to identify groupings of variables that are interesting. Factors can be candidate inputs to models, but didn t t work as well on this data Selected as inputs, those variables with highest loadings as representative of that type of factor Also retained key questions in addition to the factor analysis representative questions The effect is to remove questions too highly correlated with one another, while maintaining relevant information for modeling. 11
Predictive Modeling Approach 60+ 60+ Survey Survey Questions 3 key questions Identify Identify Key Key Questions Factor Factor Analysis: 10 10 factors factors 3 questions with high association with target Regression Model: Model: Find Find Significant Variables 10 factors, or variables that loaded highest on each factor 13 fields down to 7 Regression Model: Model: Find Find Significant Variables Variable ranks 12
Factor Analysis: Making the Complex Simple loadings Factor 1 Loading 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Factor1 Factor2 Factor3 Facto r4 Factor5 Factor6 Factor Factor7 Factor8 Factor9 Factor10 loadings Loading Value Loading Values 1.00 0.80 0.60 0.40 0.20 0.00 0.80 0.60 0.40 0.20 0.00 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Top Question Loadings Factor 2 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23 Top Question Loadings 13
Member Survey Factor Analysis Loadings 14
Reduce Variables using Regression Already beginning with only 13 variables Question: how many of these are useful predictors? Decided to retain 5 factors for final model Regression Coefficient 0.6 0.5 0.4 0.3 0.2 0.1 0 Regression Rankings of Questions/Factors Q44 Q22 Q25 factor3.2 factor3.9 factor3.1 factor3.4 factor3.3 factor3.8 factor3.10 factor3.6 factor3.5 factor3.7 Question/Factor 15
Explaining Results Through Visualization Customer was not interested in techno solutions Customer was interested in what actions could be taken as a result of the data mining models Which characteristics are most correlated with best customers? What do they like and dislike about the club? Is it equipment? relationships? facility? staff? Show key contributors, how each club compared with other club locations, and if club is improving 16
Key: Explaining Results Visualization shows key variables in survey associated with excellence,, and performance metrics for each club How well did this club do? What is the change over last year s result? Shows which attributes does the club need to improve to improve customer satisfaction. Drivers of Satisfaction Staff 2 Staff 1 equipment facility value relationships goals 17
So What s The Problem with That? Regression, Neural Networks are global estimators The operate over the entire data space Descriptors of Regression represent average influence Neither technique provides explicit localized characteristics Customer would like actionable analytics Clear characteristics of subgroups Different strategies for subgroups Conclusion: In Round 2 (Employee Survey), use another approach 18
Employee Survey Analysis Problem Setup Very similar to member survey 60+ questions Few demographics Attitudes the job How to handle questions They are ordinal, but CART supports interval and nominal types Treat as categorical, but make sure values aren t t split up If see a split on a question having values 1, 2, 4 rebuild 4 as interval variable Didn t t happen this way though all worked out well 19
Employee Survey Question Groupings 20
Employee Survey: Target Variable Definition Predict key attitudes that are consequents Satisfaction Recommend to a Friend Intend to Work Next Year at Club Club is Good Place to Work Exclude these from each others models They are highly correlated with each other Models that predict a target variable with these as inputs are not n actionable Key Predictors, questions relating to: Communications with management Quality of supervisors Training received Effectiveness of club Fairness of policies Perceived member attitudes 21
Employee Satisfaction (=1) Model: Data Information File: modeling data with binarized dependents w missing.txt Target Variable: Q1_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct Cases 0 4,645 76.0% 1 1,470 24.0% 22
Employee Satisfaction Model: Performance Class N Cases N Misclassified Pct. Class 0 4,645 953 20.52 1 1,470 315 21.43 Cases Target % of Node % Target Cum % Cum % Cases in Node Class Tgt. Class Class Tgt. Class Pop % Pop Node Cum lift Lift 8 859 60.75 58.44 58.44 23.12 23.12 1,414 2.53 2.53 4 95 43.58 6.46 64.90 26.69 3.57 218 2.43 1.81 7 201 42.23 13.67 78.57 34.47 7.78 476 2.28 1.76 3 30 17.44 2.04 80.61 37.29 2.81 172 2.16 0.73 5 92 14.38 6.26 86.87 47.75 10.47 640 1.82 0.60 6 14 13.86 0.95 87.82 49.40 1.65 101 1.78 0.58 2 124 10.12 8.44 96.26 69.44 20.03 1,225 1.39 0.42 1 55 2.94 3.74 100.00 100.00 30.56 1,869 1.00 0.12 23
Employee Satisfaction Model: Splits 1 Q8 Q7 Q3 2 Q36 3 4 5 Q18 Q3 Q32 67 8 Q8: Feel Welcome Surrogate: Q27 (family friendly), Q28 (inclusive environment), Q18 (good working conditions) Q18: Good working conditions Surrogate: Q17 (necessary support/materials to do job) Q3: Feeling of accomplishment Surrogates: Q6 (responsibilities good fit with interests/skills) Q7: Staff Competent Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth) 24
Employee Satisfaction: Q8 Split (root node) Competitor Split Improvement winner Q8 1 0.1174 1 Q18 1 0.1169 2 Q3 1 0.0998 3 Q35 1 0.0957 4 Q6 1 0.0951 5 Q7 1,2 0.094 Strongly agree feel welcome 25
Employee Satisfaction: Q18 Split (right side or root) Competitor Split Improvement Winner Q18 1 0.0271 1 Q3 1 0.0203 2 Q35 1 0.0195 3 Q6 1 0.0177 4 Q14 1,5 0.0172 5 Q13 1,5 0.0167 Strongly agree feel welcome This is the best terminal node for satisfaction 26
Employee Satisfaction Model: Key Variables Variable Score Q18 100 Q8 81.02 Q14 72.03 Q27 55.11 Q26 50.53 Q28 50.12 Q5 17.66 Q3 14.14 Q17 14.05 Q11 13.15 Q7 11.89 Q13 11.56 Q6 11.27 Q33 11.03 Q16 9.6 Primary splitters only Variable Score Q8 100 Q18 23.11 Q3 17.46 Q7 14.68 Q36 2.88 Q32 2.68 Q8: Feel Welcome Surrogate: Q27 (family friendly), Q28 (inclusive environment), Q18 (good working conditions) Q18: Good working conditions Surrogate: Q17 (necessary support/materials to do job) Q3: Feeling of accomplishment Surrogates: Q6 (responsibilities good fit with interests/skills) Q7: Staff Competent Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth) 27
Member Satisfaction Model: Key Rules /*Rules for terminal node 8*/ Matches 1,414 surveys (23.1%), 859 highly satisfied (60.8%), 58.4% of all highly satisfied RULE: If ( Q18 = 1 and Q8 = 1) Then Highly Satisfied P(0) = 0.39; P(1) = 0.61; Lift 2.5 If strongly agree that there are good working conditions and strongly agree that member feels welcome, then highly satisfied /*Rules for terminal node 7 */ Matches 476 surveys (7.8%), 201 highly satisfied (42.2%), 13.7% of all highly satisfied RULE: If ( Q8 = 1 and Q18 <> 1 and Q3 == 1 and Q32 == 1 or 2) Then Highly Satisfied P(0) = 0.58; P(1) = 0.42; Lift 1.8 If strongly agree that feel welcome and strongly agree working at the club gives feeling of personal accomplishment, and agree management will take interests into account, even if don t strongly agree good working conditions, then highly satisfied /*Rules for terminal node 4 */ Matches 218 surveys (3.6%), 95 highly satisfied (43.6%), 6.5% of all highly satisfied RULE: If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 == 1 and Q36 == 1 or 2) Then Highly Satisfied P(0) = 0.56; P(1) = 0.44; Lift 1.8 If agree that I ll be recognized for doing a good job, and strongly agree working at the club gives feeling of personal accomplishment, and agree that am paid fairly, even if don t strongly agree feel welcome, then 28 highly satisfied
Member Satisfaction Model: Unsatisfied Rules /*Rules for terminal node 1*/ Matches 1,869 surveys (30.6%), 55 highly satisfied (2.9%), 3.7% of highly satisfied 39.0% of all not highly satisfied RULE: If ( Q8 <> 1 and Q7 <> 1 or 2) Then not highly satisfied P(0) = 0.96; P(1) = 0.04; Lift 0.12 If don t strongly agree that feel welcome and don t agree that will be properly recognized for a good job, then not highly satisfied. /*Rules for terminal node 2 */ Matches 1,225 surveys (20.0%), 124 highly satisfied (10.1%), 8.4% of highly satisfied 23.7% of all not highly satisfied RULE: If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 <> 1) Then not highly satisfied P(0) = 0.90; P(1) = 0.10; Lift 0.42 If don t strongly agree that feel welcome and work doesn t give a feeling of accomplishment, even though I agree that I will be properly recognized for a good job, then Abbott not highly Analytics, satisfied. 2000-2006 /*Rules for terminal node 5*/ Matches 640 surveys (10.5%), 92 highly satisfied (14.4%), 6.3% of all highly satisfied 11.8% of all not highly satisfied RULE: If ( Q8 = 1 and Q18 <> 1 and Q3 <> 1) Then not highly satisfied P(0) = 0.86; P(1) = 0.14; Lift 0.58 If don t strongly agree that there are good working conditions and don t strongly agree that feel welcome and work doesn t give a feeling of accomplishment, even though strongly agree that feel welcome, then not highly satisfied. 29
Recommend to Friend (=1) Model: Data Information File: modeling data with binarized dependents w missing.txt Target Variable: Q44_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q19, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct 0 3,958 64.7% 1 2,157 35.3% This model includes Q19 (am treated with respect), and is the best model to report 30
Recommend to Friend Model Performance Class N Cases N Misclassified Pct. Class 0 3,958 894 22.59 1 2,157 525 24.34 Node Cases Target % of Node Cum % Cum % Cases in Class Tgt. Class % Target Class Tgt. Class Pop % Pop Node Cum lift Lift 10 1,113 71.90 51.60 51.60 25.32 25.32 1,548 2.04 2.04 9 110 58.51 5.10 56.70 28.39 3.07 188 2.00 1.66 5 198 56.57 9.18 65.88 34.11 5.72 350 1.93 1.60 4 128 49.81 5.93 71.81 38.32 4.20 257 1.87 1.41 8 83 45.36 3.85 75.66 41.31 2.99 183 1.83 1.29 3 215 29.49 9.97 85.63 53.23 11.92 729 1.61 0.84 7 36 24.83 1.67 87.30 55.60 2.37 145 1.57 0.70 2 132 15.60 6.12 93.42 69.44 13.84 846 1.35 0.44 6 12 14.12 0.56 93.97 70.83 1.39 85 1.33 0.40 1 130 7.29 Abbott 6.03 Analytics, 100.00 2000-2006 100.00 29.17 1,784 1.00 310.21
Recommend to Friend Model Splits 1 Q19 Q33 Q37 Q45 Q5 Q8 Q35 Q45 5 9 Q50 2 6 3 4 7 8 10 Q19: Treated with respect Surrogates: Q18 (good working conditions) and Q8 (feel welcome) Q37: Compensation practice is fair Surrogates: Q36 (I am paid fairly) Q45: How think members rate club Surrogates: Q47, Q46, Q60 (member-cleanliness, enough equip., check on progress) Q33: Trust management to take interests into account Surrogates: Q32 (management keeps promises), Q34 (leaders remove roadblocks to inclusion) Q5: Good opportunities for professional growth Surrogates: Q4 (responsibilities good fit with interests), Q7 (appropriately recognized) Q8: Feel welcome Surrogates: Q7 32
Recommend to Friend Model Variable Score Q8 100.0 Q19 99.1 Q18 97.4 Q15 64.5 Q16 63.1 Q14 61.3 Q33 39.6 Q35 33.8 Q32 24.7 Q34 23.9 Q31 23.9 Q9 21.5 Q7 15.4 Q45 14.8 Q37 12.9 Q5 10.0 Q36 9.7 Q4 4.3 Q38 4.0 Q22 1.6 Q50 1.4 Q26 1.0 Q48 0.8 Q47 0.7 Q28 0.6 Q46 0.6 Q11 0.3 Q51 0.3 Q60 0.1 Q49 0.0 Key Variables Primary splitters only Variable Score Q19 100 Q33 32.23 Q45 14.94 Q37 12.99 Q5 8.98 Q8 3.03 Q35 1.67 Q50 1.34 Q19: Treated with respect Surrogates: Q18 (good working conditions) and Q8 (feel welcome) Q37: Compensation practice is fair Surrogates: Q36 (I am paid fairly) Q45: How think members rate club Surrogates: Q47, Q46, Q60 (member-cleanliness, enough equip., check on progress) Q33: Trust management to take interests into account Surrogates: Q32 (management keeps promises), Q34 (leaders remove roadblocks to inclusion) Q5: Good opportunities for professional growth Surrogates: Q4 (responsibilities good fit with interests), Q7 (appropriately recognized) Q8: Feel welcome Surrogates: Q7 33
Recommend to Friend Model: Key Rules /*Rules for terminal node 10*/ Matches 1,548 surveys (25.3%), 1,113 recommend (71.9%), 51.6% of all strong recommends RULE: If ( Q19= 1 and Q37 = 1 or 2) Then Recommend = 1 P(0) = 0.281; P(1) = 0.719;; Lift = 2.0 If strongly agree that supervisors treat me with respect, and agree that compensation practice is fair, then strongly agree that will recommend to friend. /*Rules for terminal node 9*/ Matches 188 surveys (3.1%), 110 recommend 58.5%), 5.1% of all strong recommends RULE: If ( Q19 = 1 and Q37 <> 1or 2 and Q45 = 1) Then Recommend = 1 P(0) = 0.415; P(1) = 0.585; Lift = 1.7 If strongly agree that supervisors treat me with respect, and believe that members strongly agree they are highly satisfied, even though don t agree compensation practice is fair, then strongly agree that will recommend to friend /*Rules for terminal node 5*/ Matches 350 surveys (5.7%), 198 recommend (73.5%), 9.2% of all strong recommends RULE IF ( Q19 <> 1 and Q33 = 1 or 2 and Q45 = 1 ) Then Recommend = 1 P(0)= 0.434; P(1) = 0.566; Lift = 1.4 If agree that trust management will take my interests into account, and believe that members strongly agree they are highly satisfied, even though don t strongly agree supervisors treat me with respect, then strongly agree that will recommend to friend 34
Recommend to Friend Model: Rules for Not Recommending /*Rules for terminal node 1 */ Matches 1,784 surveys (29.2%), 130 highly recommend (7.3%), 94% don t highly rec. 6.0% of all highly recommend RULE: If ( Q31 <> 1 and Q22 <> 1) Then Don t Strongly Recommend P(0) = 0.94 P(1) = 0.06; /*Rules for terminal node 2 */ Matches 846 surveys (13.84%), 132 highly recommend (15.6%), 84.4% don t highly rec. 6.1% of all highly recommend RULE If ( Q19 <>1and Q33 = 1or 2 and Q45 <> 1 and Q5 <> 1 or 2) Then Don t Strongly Recommend P(0) = 0.84; P(1) = 0.16; If don t strongly agree that supervisors treat me with If don t strongly agree that supervisors treat me with respect, and respect, and don t agree that management will take don t strongly believe that members are highly satisfied, and don t interests into account, then don t strongly agree that will agree that there are good opportunities for professional growth, then recommend to friend. even though agree that management will take interests into account, Abbott Analytics, don t 2000-2006 strongly agree that will recommend to friend. 35
Intend to Continue Working at Club (=1) Model: Data Information File:modeling data with binarized dependents w missing.txt Target Variable: Q39_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65 Class N Cases Pct 0 3,030 49.6% 1 3,085 50.4% 36
Intend to Continue Working at Club: Model Performance Pct. Class N Cases N Misclassified Misclass 0 3,030 868 28.65 1 3,085 849 27.52 Cases Target % of Node % Target Cum % Cum % Cases in Node Class Tgt. Class Class Tgt. Class Pop % Pop Node Cum lift Lift 10 1,099 80.81 35.62 35.62 22.24 22.24 1,360 1.60 1.60 9 486 69.63 15.75 51.38 33.66 11.42 698 1.53 1.38 5 349 67.38 11.31 62.69 42.13 8.47 518 1.49 1.34 8 100 65.36 3.24 65.93 44.63 2.50 153 1.48 1.30 4 202 53.87 6.55 72.48 50.76 6.13 375 1.43 1.07 7 75 43.86 2.43 74.91 53.56 2.80 171 1.40 0.87 2 224 35.33 7.26 82.17 63.93 10.37 634 1.29 0.70 3 43 33.59 1.39 83.57 66.02 2.09 128 1.27 0.67 6 65 30.23 2.11 85.67 69.53 3.52 215 1.23 0.60 37 1 442 23.73 14.33 100.00 100.00 30.47 1,863 1.00 0.47
Intend to Continue Working at Club Model: Splitters 1 Q8 Q5 Q7 Q66 2 Q56 5 6 Q18 Q5 Q6 3 4 7 8 Q69 9 10 Q8: Feel Welcome Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions) Q69: Age Surrogate: Q66 (how long worked at Club), Q68 (education) Q18: Good Working Conditions Q17 (have necessary support and materials to do job) Q5: Good Opportunities for Professional Growth Q7, Q33 (Management will take my interests into account) Q7: Will be Recognized for Good Job Q15 (Work is appreciated) 38
Intend to Continue Working at Club Variable Score Q8 100 Q18 84.13 Q27 63.23 Q11 57.03 Q28 50.45 Q26 48.54 Q7 43.43 Q5 37.23 Q33 32.81 Q31 23.56 Q69 22.21 Q4 21.86 Q9 18.79 Q3 13.82 Q13 9.98 Q14 9.46 Q16 8.12 Q15 6.03 Q66 5.26 Q17 3.99 Q56 2.15 Q6 2.03 Q23 1.63 Q68 1.23 Model: Key Variables Primary splitters only Variable Score Q8 100 Q5 37.07 Q69 17.48 Q7 11.24 Q18 10.7 Q66 5.19 Q56 2.15 Q6 2.03 Q8: Feel Welcome Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions) Q69: Age Surrogate: Q66 (how long worked at Club), Q68 (education) Q18: Good Working Conditions Q17 (have necessary support and materials to do job) Q5: Good Opportunities for Professional Growth Q7, Q33 (Management will take my interests into account) Q7: Will be Recognized for Good Job Q15 (Work is appreciated) 39
Intend to Continue Working at Club Model: Key Rules /*Rules for terminal node 10 */ Matches 1,360 surveys (22.2%), 1,099 intend to continue (80.8%), 35.6% of all intend to continue RULE: If (Q8 = 1 and Q69>=2.5 ) Then Intend to continue P(0) = 0.19; P(1) = 0.81;; Lift = 1.6 If strongly agree that feel welcome and am 35 years old or older, then strongly agree that intend to continue working at the club. /*Rules for terminal node 9 */ Matches 698 surveys (11.4%), 486 intend to continue (69.6%), 15.8% of all intend to continue RULE: If ( Q8 = 1 and Q18 = 1and Q69 <= 2.5 ) Then Intend to continue P(0) = 0.30; P(1) = 0.70; Lift = 1.4 If strongly agree that feel welcome and strongly agree that there are good working conditions, am older than 35 years old, then strongly agree that intend to continue Abbott Analytics, working at 2000-2006 the club. /*Rules for terminal node 5 */ Matches 518 surveys (8.5%), 349 intend to continue (67.4%), 11.3% of all intend to contiue RULE IF ( Q8 <> 1 and Q5 = 1 or 2 and Q7 = 1 or 2 and Q66 > 2.5 ) Then Intend to continue P(0)= 0.32; P(1) = 0.68; Lift = 1.3 If I strongly agree that if I do a good job I ll be recognized, and I strongly agree that there are good opportunities for professional growth, and I have worked at the club for more than 2 years, even though don t strongly agree that feel welcome, then I strongly agree that intend 40 to continue working at the club.
Intend to Continue Working at Club Model: Rules for Don t Strongly Intend to Continue /* Rules for terminal node 1 */ Matches 1,863 surveys (30.5%), 442 strongly intend to continue working (23.7%), 14.3% of all strongly intend to continue working 46.9% of all not strongly intending to continue RULE: If ( Q8 <> 1 and Q5 <> 1 or 2) Then not strongly intending to continue working at club P(0) = 0.76; P(1) = 0.24; Lift 0.47 /*Rules for terminal node 2 */ Matches 634 surveys (10.4%), 224 strongly intend to continue working (35.3%), 7.3% of all strongly intend to continue working 13.5% of all not strongly intending to continue working RULE If ( Q8 <> 1 and Q5 = 1 or 2 and Q7 <> 1 or 2 ) Then not strongly intending to continue working at club P(0) = 0.65; P(1) = 0.35; Lift 0.70 If don t strongly agree that feel welcome and don t If don t strongly agree that feel welcome and don t strongly agree that there are good opportunities for strongly agree that if I do a good job I ll be recognized, professional growth, then don t strongly agree that even though I strongly agree that there are good intend to continue working at the club. opportunities for professional growth, then don t strongly 41 agree that intend to continue working at the club.
Summary of Results Satisfaction Model Top two rules identify 65% of most satisfied Top three rules identify 79% of most satisfied Recommend to Friend Top three rules identify 66% of most likely to recommend to friend Intend to Keep Working at Club Top three rules identify 63% of most likely to keep working 42
Summary of Results Satisfaction keys: Make an environment where employees feel welcome, and have a sense of purpose Recommend to a Friend keys Supervisors treat employees with respect and either good pay or it is perceived that members really like the club Will work at club in a years time For those under 35: feel welcome (relationships) For those over 35 (or worked at club a long time): feel welcome and good working conditions For those who don t t feel welcome, need good opportunities for professional growth 43
Conclusions Trees can be used to provide concise summaries of behavioral tendencies from surveys Regression shows global, average attitudes Trees show specific, localized attitudes Two or three rules can describe nearly 2/3 of all employee attitudes of interest Rules make sense, and are easy to explain Rules and are actionable 44