Research Report Abstract: The Convergence of Big Data Processing and Integrated Infrastructure By Evan Quinn, Senior Principal Analyst and Bill Lundell, Senior Research Analyst With Brian Babineau, Vice President of Research and Analyst Services July 2012
Introduction Research Objectives Research Report: The Convergence of Big Data Processing and Integrated Infrastructure In order to assess current data analytics and processing trends, as well as plans for the next 12-18 months, ESG recently surveyed 399 North American IT and business professionals representing midmarket (100 to 999 employees) and enterprise-class (1,000 employees or more) organizations. Respondents were familiar with their organization s current data analytics environment and processes, as well as forward-looking strategies involving the infrastructure and platforms necessary to support data analytics initiatives. The survey was designed to answer the following questions: How important is the enhancement of data analytics capabilities relative to all of an organization s business and IT priorities? What is associated with the term big data? What are the trends for current usage and planned adoption of MapReduce framework technology? What is the size of the largest data set upon which an organization conducts data analytics activities? How many unique sources do organizations integrate as part of their largest data sets? How frequently do organizations update their largest data set? What kind of tools do organizations use to integrate the data sources populating their largest data sets? What sources and data types comprise organizations largest data sets? What data analytics and/or processing challenges do organizations face with respect to their largest data sets? What types of data analytics platforms have organizations deployed to support their largest data sets? What benefits have they derived from these platforms? What types of data analytics platforms do organizations anticipate deploying in support of their fastest growing data sets? What requirements are driving these changes? Are the sources populating organizations largest data sets geographically dispersed? What challenges does this present? What are the must-have data management features/functionality for data analytics platforms and infrastructure? What kind of storage technologies do organizations use to support their data analytics and processing activities? Which are most pervasive and how will this change going forward? How much downtime can organizations tolerate when it comes to their data analytics platforms? What data protection technologies do they have in place to support these requirements? Survey participants represented a wide range of industries including manufacturing, financial services, communications and media, health care, and retail. For more details, please see the Research Methodology and Respondent Demographics sections of this report.
Research Methodology To gather data for this report, ESG conducted a comprehensive online survey of IT professionals from private- and public-sector organizations in North America (United States and Canada) between March 5, 2012 and March 12, 2012. To qualify for this survey, respondents were required to be IT or business professionals personally responsible for their organization s data analytics and processing environment, including the software/applications and/or the underlying platforms and systems. All respondents were provided an incentive to complete the survey in the form of cash awards and/or cash equivalents. After filtering out unqualified respondents, removing duplicate responses, and screening the remaining completed responses (on a number of criteria) for data integrity, we were left with a final total sample of 399 IT and business professionals. Please see the Respondent Demographics section of this report for more information on these respondents. Note: Totals in figures and tables throughout this report may not add up to 100% due to rounding.
Respondent Demographics The data presented in this report is based on a survey of 399 qualified respondents. The figures below detail the demographics of the respondent base, including individual respondents current job responsibility, technology responsibility, and job function, as well as the respondent organizations total number of employees, primary industry, and annual revenue. Respondents by Data Analytics Job Responsibility The breakdown of current job responsibility within an organization among survey respondents is shown in Figure 1. Figure 1. Survey Respondents, by Data Analytics Job Responsibility Which of the following best describes your current responsibility with respect to your organization s data analytics and processing environment? (Percent of respondents, N=399) Line-of-business support (non-it) my responsibilities include data analytics and processing support for the business, 22% IT Application Development & Support my primary responsibilities include the support and maintenance of data analytics and processing software, 28% Respondents by Technology Responsibility IT operations respondents primary area of technology responsibility is shown in Figure 2. Figure 2. Survey Respondents, by Technology Responsibility IT Operations my primary responsibility includes supporting the underlying data analytics and processing infrastructure, 50% Which of the following would you consider to be your primary area of technology responsibility? (Percent of respondents, N=199) Other, 1% Storage / SAN, 2% Servers, 5% Data protection, 6% IT operations, 29% General IT, 16% IT architecture/planning, 20% Applications/database, 22%
Respondents by Job Function The primary job function among survey respondents responsible for their organization s application environment is shown in Figure 3. Figure 3. Survey Respondents, by Job Function Which of the following best describes your primary job function? (Percent of respondents, N=200) Reports administrator, 3% Data scientist, 3% Data warehouse/business intelligence, 5% Other, 10% Business manager, 41% Data analyst, 11% Applications/database, 13% Respondents by Number of Employees Business analyst, 15% The number of employees in respondents organizations is shown in Figure 4. Figure 4. Survey Respondents, by Number of Employees How many total employees does your organization have worldwide? (Percent of respondents, N=399) 20,000 or more, 18% 100 to 249, 18% 10,000 to 19,999, 8% 250 to 499, 17% 5,000 to 9,999, 8% 2,500 to 4,999, 12% 1,000 to 2,499, 9% 500 to 999, 11%
Respondents by Industry Research Report: The Convergence of Big Data Processing and Integrated Infrastructure Respondents were asked to identify their organization s primary industry. In total, ESG received completed, qualified responses from individuals in 20 distinct vertical industries, plus an Other category. Respondents were then grouped into the broader categories shown in Figure 5. Figure 5. Survey Respondents, by Industry What is your organization s primary industry? (Percent of respondents, N=399) Other, 25% Manufacturing, 18% Health Care, 6% Financial (banking, securities, insurance), 13% Retail/Wholesale, 8% Business Services (accounting, consulting, legal, etc.), 9% Respondents by Annual Revenue Communications & Media, 10% Government (Federal/National, State/Province/Local), 13% The annual revenue of respondents organizations is shown in Figure 6. Figure 6. Survey Respondents, by Annual Revenue What is your organization s total annual revenue ($US)? (Percent of respondents, N=399) Not applicable (e.g., public sector, nonprofit), 9% $20 billion or more, 10% $10 billion to $19.999 billion, 5% $5 billion to $9.999 billion, 7% $1 billion to $4.999 billion, 12% $500 million to $999 million, 8% Less than $50 million, 20% $100 million to $499 million, 15% $50 million to $99 million, 15%
Contents List of Figures... 3 List of Tables... 4 Executive Summary... 5 Report Conclusions... 5 Introduction... 7 Research Objectives... 7 Research Findings... 8 The Increasing Importance of Analytics Thank You, Big Data... 8 The Impact of Big Data on Analytics... 9 Big Data Analytics Platforms... 18 Security Considerations for Big Data... 22 Data Analytics Storage and IT Infrastructure Requirements... 24 Increasing Interest in Hadoop MapReduce Framework Technology... 30 Conclusion... 32 Research Implications for Technology Vendors... 32 Research Implications for IT Professionals... 33 Research Methodology... 34 Respondent Demographics... 35 Respondents by Data Analytics Job Responsibility... 35 Respondents by Technology Responsibility... 35 Respondents by Job Function... 36 Respondents by Number of Employees... 36 Respondents by Industry... 37 Respondents by Annual Revenue... 37
List of Figures Figure 1. Importance of Enhancing Data Processing and Analytics Activities... 8 Figure 2. Meaning of the Term Big Data... 9 Figure 3. Size of Largest Data Set for Data Analytics and Processing Functions... 10 Figure 4. Number of Data Sources Integrated to Support Data Analytics Activities on Largest Data Set... 11 Figure 5. Number of Data Sources Integrated to Support Data Analytics Activities on Largest Data Set, by Company Size... 11 Figure 6. Update Frequency of Largest Data Set... 12 Figure 7. Primary Method of Integrating Data Sources in Largest Data Set... 13 Figure 8. Primary Method of Integrating Data Sources in Largest Data Set, by Largest Data Set Update Frequency... 13 Figure 9. Sources Responsible for Populating Largest Data Set... 14 Figure 10. Types of Data in Largest Data Set... 15 Figure 11. Types of Data Processing and Analytics Activities Conducted on Largest Data Set... 16 Figure 12. Data Processing and/or Analytics Challenges with Largest Data Set... 17 Figure 13. Data Processing and Analytics Platforms Currently Deployed to Support Largest Data Set... 18 Figure 14. Key Benefits Organizations Have Derived from Data Analytics Platforms... 19 Figure 15. Plans to Deploy New Data Analytics Platform to Support Fastest Growing Data Set... 20 Figure 16. Data Analytics Platform Organizations Plan to Deploy to Support Fastest Growing Data Set... 20 Figure 17. Requirements Driving Organizations to Evaluate New Data Analytics Solutions for Fastest Growing Data Set... 21 Figure 18. Geographic Dispersion of Largest Data Set... 22 Figure 19. Challenges of a Geographically Dispersed Data Set... 23 Figure 20. Importance of Features/Functionality in Considering Data Analytics Infrastructure and Platforms... 24 Figure 21. Disk-based Storage Used to Support Data Analytics and Processing Activities... 25 Figure 22. Percent of Total Volume of Data Analytics/Processing Activity Stored on Disk-based Storage... 26 Figure 23. Challenges Scaling Storage Environment to Support Data Analytics and/or Processing Activities... 27 Figure 24. Infrastructure for Data Analytics and Processing Activities... 28 Figure 25. Amount of Downtime Data Analytics Platforms Can Tolerate... 29 Figure 26. Data Protection / Availability Technologies Currently Deployed to Support Data Analytics Platforms.. 29 Figure 27. Interest in MapReduce Technology... 30 Figure 28. Interest in MapReduce Technology, by Company Size... 31 Figure 29. Interest in MapReduce Technology, by Size of Largest Data Set... 31 Figure 30. Survey Respondents, by Data Analytics Job Responsibility... 35 Figure 31. Survey Respondents, by Technology Responsibility... 35 Figure 32. Survey Respondents, by Job Function... 36 Figure 33. Survey Respondents, by Number of Employees... 36 Figure 34. Survey Respondents, by Industry... 37 Figure 35. Survey Respondents, by Annual Revenue... 37
List of Tables Table 1. Size of Largest Data Set for Data Analytics and Processing Functions, by Company Size... 10 Table 2. Sources Responsible for Populating Largest Data Set, by Company Size... 14 Table 3. Data Processing and/or Analytics Challenges with Largest Data Set, by Role... 17 Table 4. Geographic Dispersion of Largest Data Set, by Company Size... 22 Table 5. Challenges of a Geographically Dispersed Data Set, by Company Size... 23 Table 6. Disk-based Storage Used to Support Data Analytics and Processing Activities, by Role and Size of Largest Data Set... 25 Table 7. Percent of Total Volume of Data Analytics/Processing Activity Stored on SAN-based Storage, by Size of Largest Data Set... 26 All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
20 Asylum Street Milford, MA 01757 Tel: 508.482.0188 Fax: 508.482.0128 www.enterprisestrategygroup.com