Research Report Abstract: The Impact of Big Data on Data Analytics By Julie Lockner and Bill Lundell With Jennifer Gahm and John McKnight September 2011 2011 Enterprise Strategy Group, Inc. All Rights Reserved.
Introduction Research Objectives Research Report: The Impact of Big Data on Data Analytics In order to assess current data analytics and data management trends, as well as plans for the next 12-18 months, ESG recently surveyed 270 North American IT professionals representing large midmarket (500 to 999 employees) and enterprise-class (1,000 employees or more) organizations. Respondents were familiar with their organization s current database environment as well as forward-looking strategies involving data analytics and integration initiatives. The survey was designed to answer the following questions: How important is the enhancement of data analytics capabilities relative to all of an organization s IT priorities? What challenges do organizations face with respect to their current data analytics technologies and processes? How do organizations plan to deal with larger data sets during data analytics exercises? What are organizations spending plans for data analytics in 2011 and beyond? How are organizations planning to address their data analytics and data integration challenges? What is driving the adoption or need for a MapReduce compute platform? What challenges do organizations face with respect to their data integration needs? How does data growth impact organizations in general? Survey participants represented a wide range of industries including manufacturing, financial services, communications and media, health care, and retail. For more details, please see the Research Methodology and Respondent Demographics sections of this report.
Research Methodology To gather data for this report, ESG conducted a comprehensive online survey of IT professionals from private- and public-sector organizations in North America (United States and Canada) between July 7, 2011 and July 18, 2011. To qualify for this survey, respondents were required to be IT managers personally responsible for their organization s database environment(s), as well as forward looking data management strategies and initiatives pertaining to data analytics and integration. All respondents were provided an incentive to complete the survey in the form of cash awards and/or cash equivalents. After filtering out unqualified respondents, removing duplicate responses, and screening the remaining completed responses (on a number of criteria) for data integrity, we were left with a final total sample of 270 IT managers. Please see the Respondent Demographics section of this report for more information on these respondents. Note: Totals in figures and tables throughout this report may not add up to 100% due to rounding.
Respondent Demographics The data presented in this report is based on a survey of 270 qualified respondents. The figures below detail the demographics of the respondent base including individual respondents current job responsibility as well as respondent organizations total number of employees, primary industry, annual revenue, and number of databases. Respondents by Primary Area of Technology Responsibility Respondents primary area of technology responsibility is shown in Figure 1. Figure 1. Survey Respondents, by Primary Area of Technology Responsibility Which of the following best describes your primary area of technology responsibility? (Percent of respondents, N=267) All or multiple of the above, 34% Enterprise architect, 19% Application developer, 13% Data architect/data scientist, 3% Data warehouse/business intelligence, 7% Application administrator, 13% Database administrator, 11% Respondents by Job Responsibility Respondents current job responsibility is shown in Figure 2. Figure 2. Survey Respondents, by Job Responsibility Which of the following best describes your current responsibility within your organization? (Percent of respondents, N=270) IT staff, 13% Non-IT business manager, 1% Senior IT management (e.g., CIO, VP of IT, Director of IT, etc.), 29% IT management, 57%
Respondents by Number of Employees The number of employees in respondents organizations is shown in Figure 3. Figure 3. Survey Respondents, by Number of Employees How many total employees does your organization have worldwide? (Percent of respondents, N=270) 20,000 or more, 28% 500 to 999, 17% 10,000 to 19,999, 7% 1,000 to 2,499, 19% 5,000 to 9,999, 13% 2,500 to 4,999, 17% Respondents by Industry Respondents were asked to identify their organization s primary industry. In total, ESG received completed, qualified respondents from individuals in 21 distinct vertical industries, plus an Other category. Respondents were then grouped into the broader categories shown in Figure 4. Figure 4. Survey Respondents, by Industry What is your organization s primary industry? (Percent of respondents, N=270) Government (Federal/National, State/Province/Local), 3% Business Services (accounting, consulting, legal, etc.), 6% Other, 20% Communications & Media, 8% Retail/Wholesale, 10% Manufacturing, 25% Health Care, 10% Financial (banking, securities, insurance), 18%
Respondents by Annual Revenue Respondent organizations annual revenue is shown in Figure 5. Figure 5. Survey Respondents, by Annual Revenue What is your organization s total annual revenue ($US)? (Percent of respondents, N=270) $20 billion or more, 16% Not applicable (e.g., public sector, nonprofit), 3% Less than $100 million, 9% $100 million to $499 million, 17% $10 billion to $19.999 billion, 10% $5 billion to $9.999 billion, 11% $1 billion to $4.999 billion, 20% $500 million to $999 million, 14% Respondents by Total Number of Databases Respondent organizations total number of production and non-production databases is shown in Figure 6. Figure 6. Survey Respondents, by Total Number of Production and Non-production Databases How many production databases does your organization currently have deployed? How many total non-production databases does your organization currently have deployed? (Percent of respondents, N=270) Total number of production databases Total number of non-production databases 20% 15% 17% 14% 14% 16% 14% 13% 13% 17% 16% 10% 5% 0% 7% 8% 8% 9% 6% 8% 6% 5% 4% Less than 5 5 to 10 11 to 25 26 to 50 51 to 75 76 to 100 101 to 150 151 to 200 More than 200 3% 1% Don t know
Respondents by Total Amount of Database Data Respondent organizations total amount of database data is shown in Figure 7. Figure 7. Survey Respondents, by Total Amount of Database Data 20% Approximately how much total data is stored in all of your organization s databases (production and non-production)? Please include OLTP, DWH, OLAP, and departmental databases in your calculation. (Percent of respondents, N=270) 16% 15% 10% 10% 11% 11% 13% 13% 9% 8% 5% 5% 4% 0% Less than 1 TB 1 TB to 4 TB 5 TB to 9 TB 10 TB to 24 TB 25 TB to 49 TB 50 TB to 99 TB 100 TB to 249 TB 250 TB to 499 TB 500 TB or more Don t know Respondents by Annual Growth Rate of Database Data Respondent organizations annual growth rate of database data is shown in Figure 8. Figure 8. Survey Respondents, by Annual Growth Rate of Database Data On average, at approximately what rate do you believe your organization s total database data is growing annually in size? (Percent of respondents, N=270) More than 75% annually, 2% 51% to 75% annually, 4% 26% to 50% annually, 19% Don t know, 3% Less than 10% annually, 17% 10% to 25% annually, 56%
Respondents by Average Data Sources Integrated per Data Analytics Activity Respondent organizations average number of data sources integrated per data analytics activity is shown in Figure 9. Figure 9. Survey Respondents, by Average Data Sources Integrated per Data Analytics Activity On average, how many data sources does your organization need to integrate in order to support data analytics activities (i.e., feeds to a data warehouse, business intelligence system, etc.)? (Percent of respondents, N=270) We do not integrate data from multiple sources, 7% We typically integrate from more than 5 unique data sources, 7% We typically integrate from 5 unique data sources, 16% Don t know, 4% We typically integrate from 2 unique data sources, 11% We typically integrate from 3 unique data sources, 35% We typically integrate from 4 unique data sources, 19%
Contents List of Figures... 3 List of Tables... 3 Executive Summary... 4 Report Conclusions... 4 Introduction... 6 Research Objectives... 6 Research Findings... 7 The Expanding Landscape of Data Analytics... 7 Use of MapReduce Frameworks... 15 Big Data s Impact on Data Integration... 19 Conclusion... 22 Research Methodology... 23 Respondent Demographics... 24 Respondents by Primary Area of Technology Responsibility... 24 Respondents by Job Responsibility... 24 Respondents by Number of Employees... 25 Respondents by Industry... 25 Respondents by Annual Revenue... 26 Respondents by Total Number of Databases... 26 Respondents by Total Amount of Database Data... 27 Respondents by Annual Growth Rate of Database Data... 27 Respondents by Average Data Sources Integrated per Data Analytics Activity... 28
List of Figures Figure 1. Importance of Data Analytics Activities over the Next 12-18 Months, by Company Size... 7 Figure 2. Amount of Data Processed as Part of a Typical Data Analytics Exercise... 8 Figure 3. Types of Data Analytics Solutions Currently in Use... 9 Figure 4. Data Analytics Challenges... 10 Figure 5. Plans to Deploy New Data Analytics Solutions in the Next 12-18 Months... 11 Figure 6. New Data Analytics Solutions Expected to Be Deployed over the Next 12-18 Months... 12 Figure 7. Requirements Driving New Data Analytics Purchases... 13 Figure 8. Expected Benefits of Deploying a New Data Analytics Solution... 14 Figure 9. Plans to Implement a MapReduce Framework... 15 Figure 10. Amount of Data Processed by MapReduce Frameworks... 16 Figure 11. Drivers for Current or Potential MapReduce Framework Implementations... 17 Figure 12. MapReduce Framework Distributions Currently in Use or Being Considered... 18 Figure 13. Data Sources for Which MapReduce Frameworks are Being Used... 18 Figure 14. Frequency with Which Data is Typically Added or Updated During the Integration Process... 19 Figure 15. Data Integration Challenges... 20 Figure 16. Steps Expected to Be Taken over the Next 12-18 Months to Address Data Integration Challenges... 21 Figure 17. Survey Respondents, by Primary Area of Technology Responsibility... 24 Figure 18. Survey Respondents, by Job Responsibility... 24 Figure 19. Survey Respondents, by Number of Employees... 25 Figure 20. Survey Respondents, by Industry... 25 Figure 21. Survey Respondents, by Annual Revenue... 26 Figure 22. Survey Respondents, by Total Number of Production and Non-production Databases... 26 Figure 23. Survey Respondents, by Total Amount of Database Data... 27 Figure 24. Survey Respondents, by Annual Growth Rate of Database Data... 27 Figure 25. Survey Respondents, by Average Data Sources Integrated per Data Analytics Activity... 28 List of Tables Table 1. Amount of Data Processed as Part of a Typical Data Analytics Exercise, by Total Production Databases and Total Amount of Database Data... 8 Table 2. Types of Data Analytics Solutions Currently in Use, by Total Amount of Database Data... 9 Table 3. Data Analytics Challenges, by Total Amount of Database Data... 10 Table 4. Plans to Deploy New Data Analytics Solutions, by Annual Rate of Database Data Growth... 11 Table 5. New Data Analytics Solutions Expected to Be Deployed over the Next 12-18 Months, by Total Amount of Database Data... 12 Table 6. Expected Benefits of Deploying a New Data Analytics Solution, by Total Amount of Database Data... 14 Table 7. Plans to Implement a MapReduce Framework, by Average Amount of Data Processed per Data Analytics Exercise and Annual Rate of Database Data Growth... 16 Table 8. Data Integration Challenges, by Number of Integration Sources... 20 Table 9. Steps Expected to Be Taken over the Next 12-18 Months to Address Data Integration Challenges, by Number of Integration Sources... 21 All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of the Enterprise Strategy Group, Inc., is in violation of U.S. Copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508) 482.0188.
20 Asylum Street Milford, MA 01757 Tel:508.482.0188 Fax: 508.482.0128 www.enterprisestrategygroup.com