Predictive Analytics World New York, NY October 2011 2011 Data Miner Survey Highlights The Views of 1,319 Data Miners Karl Rexer, PhD President Rexer Analytics www.rexeranalytics.com
2011 Data Miner Survey: Overview 5 th annual survey 52 questions Vendors (8%) NGO / Gov t (7%) Academics 18% 40% Corporate 10,000+ invitations emailed, plus promoted by newsgroups, vendors, and bloggers Respondents: 1,319 data miners from over 60 countries Note: Data from software vendors was excluded from many analyses. Asia Pacific (10%) India 4% Australia 2% China 1% Central & South America (3%) Argentina 1% Brazil 2% Europe Germany 9% UK 4% France 4% Switzerland 3% 37% 27% Consultants Middle East & Africa (3%) Israel 1% South Africa 1% 47% North America USA 44% Canada 3% Mexico 1% 2011 Rexer Analytics 2
There s Strong Demand & We re Working Hard Data miner hiring is very strong*. And company use of data mining is increasing. 78% of data miners foresee increases in the number of data mining projects. This is on top of similar increases reported last year. Data miners working in diverse settings share this optimism. Number of Data Mining Projects Projected in 2011 Question: How will the number of data mining projects your organization conducts in 2011 compare to what has been typical in the past few years? * Multiple sources: Use of data mining in online job ads, KDnuggets job listings, recruiters, salary reports. 2011 Rexer Analytics 3
Data Miners are Working Everywhere More data miners report working in CRM / Marketing, Academia and Financial Services than any other fields. - These have been the three most commonly reported fields in each of the five annual Data Miner Surveys (2007-2011). Fewer data miners report working in CRM/Marketing this year (42% in 2010). Many data miners work in several fields. CRM/Marketing Academic Financial Insurance Technology Telecommunications Retail Medical Internet-based Government Pharmaceutical Manufacturing 11% 11% 10% 10% 13% 12% 12% 14% 14% Question: In what fields do you TYPICALLY apply data mining? (Select all that apply) 2011 Rexer Analytics 4 27% 31% 33% Data Mining is everywhere! Data miners also report working in Non-profit (6%), Hospitality / Entertainment / Sports (3%), Military / Security (3%), and Other (9%). 0% 10% 20% 30% 40% 50% Vendors were excluded from this analysis.
We Enjoy our Work Data miners are generally satisfied with their jobs, with more than a quarter reporting being very satisfied. 68% report being likely to remain with their current employer for the next two years. Very Unsatisfied Unsatisfied Neutral Satisfied Very Satisfied Satisfaction 6% 19% 46% 27% 0% 20% 40% 60% 80% 100% Very Unlikely Unlikely Neutral Likely Very Likely Likelihood 8% 18% 32% 36% 0% 20% 40% 60% 80% 100% Questions: What is your current level of job satisfaction? How likely are you to remain with your current employer for the next two years? 2011 Rexer Analytics 5
The Algorithms Data Miners are Using Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. This has been very consistent over time. However, a wide variety of algorithms are being used. Regression Decision Trees Cluster Analysis Time Series Neural Nets Factor Analysis Text Mining Association Rules Support Vector Bayesian Ensemble Models Survival Analysis Anomoly Detection Social Network Analysis Rule Induction Genetic Algorithms Link Analysis Uplift Modeling MARS 15% 14% 13% 12% 12% 12% 9% 9% 26% 23% 23% 22% 30% 32% 35% 34% 58% 69% 68% Consultants are more likely to use Ensemble Models Corporate Consultants Academic NGO / Gov t 22% 29% 22% 23% Consultants and corporate data miners are more likely to use Uplift Modeling Corporate Consultants Academic NGO / Gov t 10% 15% 3% 6% 0% 10% 20% 30% 40% 50% 60% 70% Question: What algorithms/analytic methods do you TYPICALLY use? (Select all that apply) Vendors were excluded from this analysis. 2011 Rexer Analytics 6
The Tools We re Using The average data miner reports using 4 software tools. R is used by the most data miners (47%). Survey Questions: What Data mining/analytic tools did you use in 2010? (rate each as never, occasionally, or frequently ) What one Data Mining software package do you use most frequently? STATISTICA is the primary data mining tool chosen most often (17%). Overall Corporate Consultants Academics NGO / Gov t 2011 Rexer Analytics 7
Tools: Satisfaction & Continued Use Vendors were excluded from this analysis. STATISTICA, KNIME, Rapid Miner and Salford Systems received the highest satisfaction ratings. The users of these tools are also the most likely to continue using them as their primary tools for the next three years. Satisfaction Continued Use Extremely Dissatisfied Extremely Satisfied Extremely Unlikely Extremely Likely Satisfaction question: Please rate your overall satisfaction with your primary Data Mining software package. Continued Use question: What is the likelihood that you will continue to use this tool as your primary Data Mining software package over the next 3 years? 2011 Rexer Analytics 8
The Popularity of R Software is Growing Fast The proportion of data miners using R is rapidly growing! - R is also the #1 most used data mining tool (in both 2010 & 2011). Up from #5 in 2007. An increasing number of data miners consider R their primary tool. - R is now #2 in primary tool rankings. Up from #7 in 2008. Half of R data miners use the command line interface. Among the rest, R Studio, scripts, R Commander, and STATISTICA are popular interfaces. 50% R Usage R Interface 40% Knime Other 2% 16% R Command Line 30% 20% Use R Primary Tool Rattle Rapid Miner STATISTICA 3% 4% 5% 51% 10% 0% 2007 2008 2009 2010 2011 Vendors were excluded from these analyses. R Commander Scripts R Studio 2011 Rexer Analytics 9 5% 6% 8% Question: If you use the R software package, what is your primary interface to R?
Room for Improvement And it Matters! Analytic capability: There s room to improve if we re going to Compete on Analytics. Analytic capabilities boost company performance. Only 12% of corporate respondents rate their company as having very high analytic sophistication. Company Performance Very Low Low Moderate High Very High Corporate Analytic Sophistication Companies with better analytic capabilities are outperforming their peers! Caution: this is self report data & correlation analysis. But go talk with Tom Davenport about the causal direction. Analytic Capability Question: In general, with what degree of sophistication does your company / organization approach analytic problems? Company Performance Question: Which statement best describes the recent performance of your company / organization? 2011 Rexer Analytics 10
How to Get More Information Questions? Talk with me at PAW Call or email me if you don t see me in the hallways Copy of these slides Available now 2011 Data Miner Survey Summary Report (Free) Available later this Fall Available at PAW website or email me 2010 Data Miner Survey Summary Report (Free) 37 page report, available now Available at PAW website or email me Karl Rexer, PhD krexer@rexeranalytics.com www.rexeranalytics.com 617-233-8185 2011 Rexer Analytics 11