1 GOVERNANCE MOVES BIG DATA FROM HYPE TO CONFIDENCE By Elliot King, Research Analyst Produced by Unisphere Research, a Division of Information Today, Inc. June 2014 Sponsored by
2 2 TABLE OF CONTENTS Introduction 3 The New Data Growth 4 The Confidence Factor 7 Information Governance and Management 13 Data Security and Privacy 18 Data Lifecycle Management 20 Conclusion 22 Appendix 23
3 3 INTRODUCTION The emergence of big data has been a key trend in information management for the past five years. Data is being produced faster than ever before, leading to a world in which more data is available to more organizations in more formats. But what is being called big data represents more than just data volume and data velocity. It represents a qualitative shift in the variety of data that organizations can apply to solve their business problems. In fact, big data analytics promises to have a significant impact on a huge array of fields, from medicine to sports to entertainment. With the game-changing potential of big data come several challenges that must be addressed by organizations big and small. The information that makes up big data comes from both inside and outside a single organization and may be used by many different stakeholders. As a result, data governance continues to grow in importance. Can the data be trusted? Is the data that comes from outside the organization as reliable as data generated internally? How do companies ensure data quality, institute master data management and data lifecycle management processes and, most importantly, free up resources to pursue new opportunities? Not surprisingly, as organizations try to expand the data pool for enterprise analytics, analysts may find themselves spending more time defending the quality of their data than ever before. And the problems of security lurk beneath the surface of every conversation about data and data analytics. As companies accumulate more and more sensitive data about their customers, the need to keep that information private and secure is paramount. To address these challenges, organizations are enriching their capabilities in areas like data governance, data quality, master data and data lifecycle management, and data security. These are ongoing processes, not end states. Some companies have more routine, established and comprehensive processes and policies in place, while others have ad hoc approaches to the same challenges. To better understand the impact of new data sources on these well-established data governance practices, in the first quarter of 2014, IBM commissioned Unisphere Research, a division of Information Today, Inc., to survey 304 managers responsible for data management in their organizations. Approximately 75% were IT and database managers and staff or were otherwise identified with IT operations, while the remainder held business positions ranging from C-level executive roles to line-of-business and project management. In short, the respondents make up a fairly good cross-section of the IT management universe. Study demographics are more fully reported in Appendix A. Among the survey s key findings are these: n Organizations are investing heavily in initiatives that will increase the amount of data at their disposal. As the amount of data grows, they are spending more time finding needed data rather than analyzing it. n The percentage of organizations with big data projects in production is expected to triple in the next 18 months. n Very few companies feel entirely confident about their data from all sources, and they are significantly less confident in data gathered through social media and public cloud applications than they are in data generated internally. Internal, structured data evokes the highest level of confidence. n Managers generally trust reports based on the analysis of big data, even though the data quality may not be as good as that of reports based on traditional, internal data. n Most respondents are not satisfied with the development of their information governance programs. Information governance initiatives often have limited scope, lack business sponsors and have trouble winning funding when competing with other IT priorities. n Of the information governance projects that have moved forward, data security and privacy are ongoing top-of-theagenda issues. n The demands associated with maintaining legacy applications are consuming resources and thus deterring some organizations from investing in promising new technologies.
4 4 THE NEW DATA GROWTH The history of computing has been the history of the growth of data. Each successive generation has produced new types of data, and currently the information infrastructure is at another major turning point. Companies are not only generating data in new ways; they are accessing, analyzing and applying data from new sources to achieve new goals. Among the applications seen as having the most potential for creating positive change in the enterprise in the immediate future, as Figure 1 shows, are different varieties of mobile applications, big data, customer intelligence and cloud computing. (See Figure 1.) (Some totals in the study do not equal 100% due to rounding or the acceptance of multiple responses per question.) Figure 1: What Trends Will Have the Most Positive Impact on Your Organization? Mobile applications 52% Big data 51% Customer intelligence & analytics 50% Cloud (private) 41% Cloud (public) 24%
5 5 Interestingly, more companies anticipate the positive benefits from new trends than actually have applications in production to capitalize on them right now. For example, while mobile applications are at the top of the list of trends that respondents believe will have a positive impact on their organizations, only 39% actually have mobile applications in production. And mobile applications represent the new type that has moved farthest into production. Only 22% of the respondents have big data projects in production, and less than 20% are working with public cloud applications. (See Figure 2.) Precisely what companies have currently in production, however, does not tell the whole story or even the most compelling part of the story of new technology adoption. The story unfolds in the pipeline of technologies that will be put into production over the next 18 months. Viewed through that lens, as Figure 2 shows, within 18 months, nearly four-fifths of the respondents will have customer intelligence applications in place, and around three quarters of the respondents will have mobile applications up and running. Similarly, while only around 22% currently have big data applications in production, big growth is on the horizon, as over two-thirds anticipate using big data in the foreseeable future. Figure 2: What is the Time Frame for Implementing New Technologies? Already in production Total in production or planning or implementing in 18 months Mobile 39% 74% Big Data 22% 66% Customer intelligence & analytics 33% 78% Real-time analytics 27% 71% Cloud (private) 24% 62% Cloud (public) 20% 47%
6 6 Regardless of the specific timetables for each new application, as Figure 3 indicates, companies believe that the amount of data they have under management will steadily grow over the next three years. Today, only 30% of respondents say they are managing more than 100TB of data. This number will increase by 63% in three years. (See Figure 3.) Figure 3: Data Growth Over the Next Three Years Currently nnnnnnnn In three years nnnnnnnn 1TB to 100TB 56% 37% >100TB 30% 49% Don t know/unsure 14% 14% 63% increase in environments over 100TB One of the factors affecting the rate of data growth in many organizations is the influx of data through many different avenues and in many different formats. Not surprisingly, as Figure 4 shows, structured data is still the dominant form of data in many settings. But clearly, new forms of data are gaining ground, with more than a quarter of respondents (26%), indicating that structured data now accounts for less than 50% of their data under management. Figure 4: Structured Data as a Percentage of Total Data Under Management >75% 34% >50% and 75% 29% >25% and 50% 17% 25% 9% Don t know/unsure 12%
7 7 THE CONFIDENCE FACTOR As the information mix evolves, the trend is clear. Over the next 18 months, most companies are going to witness a sharp increase in data generated by or used by mobile and customer intelligence applications, and more than half will also have implemented big data, private cloud and/or social technology applications. What new types of information are important to organizations? Information about the relationships between customers and products such as purchase history and customer sentiment about products top the list of information types important to organizations, as shown in Figure 5. Figure 5: Importance of Types of Data Available via Social Media Not Important Neutral Important Product interests (e.g., sentiment on products, purchase history) 28% 20% 51% Personal attributes (e.g., marital status, interests, hobbies) 50% 18% 32% Life events (e.g., having a baby, relocating, buying a house) 47% 19% 33% Relationships (e.g., family, business) 38% 28% 34%
8 8 Effective analytics depends on having confidence in the underlying data, whatever the data type and whatever its source. Confidence in data is an ongoing concern that will only take on more urgency in the future as new data and new data types from different, often external sources, are integrated into the ongoing operations of many companies. And the situation becomes complicated when the confidence in different data types is assessed. As Figures 6A and 6B show, most companies are relatively confident in their structured data generated internally. But the confidence level in data from virtually all other sources drops significantly. In fact, half or more of the respondents did not believe that data from social media or data stored in the public cloud could be considered reliable. Figure 6a &: What is Your Confidence Level in Data From Figure 6b Each of These Sources? 6a Confident nnnnnnnnnnnn 6b Not confident nnnnnnnnnnnn Structured data in your 63% internal systems Data provided by your 37% business partners Unstructured data in 21% your internal systems Data stored in a public 23% cloud Social media data 13% Structured data in your 14% internal systems Data provided by your 29% business partners Unstructured data in 44% your internal systems Data stored in a public 50% cloud Social media data 61% Note: The remaining respondents for each data type had neutral responses (neither confident nor not confident).
9 9 The degree of confidence that managers have in data raises a perplexing dilemma. On the one hand, as Figure 7 shows, more than half of the respondents are willing to incorporate data from outside sources even if their confidence in that data is lower than their confidence in their internal data. On the other hand, as Figure 8 indicates, managers generally have about as much confidence in reports based on the analysis of big data as in reports based only on traditional data, even though the data quality of the big data-based reports may not be as good. Figure 7: When Using Data from Outside Sources, Do You Require the Same Accuracy You Expect from Corporate Data? Yes. The data must have the same level trust as enterprise data if we are going to use it. 58% No. The data can be useful even if it has a lower level of trust than enterprise data. 42% Of those who answered Yes or No, 58% said Yes. Figure 8: How Does Your Level of Confidence in Reports Based on Big Data Compare to Reports Based on Traditional Data? 31% Less Confident 69% No Change or More Confident
10 10 Companies confidence in their data is going to be an ongoing issue and an ongoing challenge that will only grow over the years as the volume and number of sources of data climb. As Figure 9 shows, both IT and business personnel spend more time finding data and putting it into an appropriate form to analyze than actually analyzing it. While analysts might expect to spend most of their time analyzing, in reality, analysis accounts for little more than a quarter of the time spent on analytics-related tasks. Figure 9: Time Your Team Spends on Analytics-related Tasks Finding data (e.g., from discovery to 31% putting data in an appropriate place to be analyzed) Analyzing data (e.g., A-B testing, 27% multivariate testing, correlation, etc.) Validating data (e.g., ensuring that 21% the data analysis is correct) Defending data and/or analytics 21% (e.g., convincing users that the data is correct)
11 11 Both IT personnel and people aligned with line-ofbusiness operations are not happy with that time distribution. Respondents felt that people have to spend too much time in the data discovery process, as well as too much time defending the data used for analytics, shown in Figure 10. On the other hand, respondents felt that in some cases, their teams spend too little time on analyzing and validating the quality of the data itself. (See Figure 11.) Figure 10: Do You Think Your Team Spends Too Much Time on These Activities? IT teams nnnnnnnn Business teams nnnnnnnn Finding data (e.g., from discovery to 41% putting data in an appropriate place to 49% be analyzed) Analyzing data (e.g., A-B testing, 19% multivariate testing, correlation, etc.) 15% Validating data (e.g., ensuring that 25% the data analysis is correct) 33% Defending data (e.g., convincing users 35% that the data is correct) 28% Defending analytics (e.g., convincing 41% users that the analysis is correct) 26%
12 12 Figure 11: Do You Think Your Team Spends Too Little Time on These Activities? IT teams nnnnnnnn Business teams nnnnnnnn Finding data (e.g., from discovery to 10% putting data in an appropriate place to 7% be analyzed) Analyzing data (e.g., A-B testing, 36% multivariate testing, correlation, etc.) 37% Validating data (e.g., ensuring that the 32% data analysis is correct) 20% Defending data (e.g., convincing users 20% that the data is correct) 17% Defending analytics (e.g., convincing 10% users that the analysis is correct) 13% Taken together, Figures 10 and 11 paint a clear picture. In many settings, IT and business teams find themselves spending too much time doing data discovery and too little time analyzing and validating the data the activities that have the potential for delivering big benefits to the business.
13 13 INFORMATION GOVERNANCE AND MANAGEMENT Delivering those big benefits requires the organization to manage larger volumes of data and incorporate new data types from new data sources. These changes have increased the focus on that cluster of activities know collectively as information or data governance. Data or information governance is a discipline that embraces processes, people, and technology. It defines who owns the data and who is responsible for the data. It incorporates technical capabilities like data quality, data lifecycle management, data security and more. Taken together, data governance activities can increase the confidence level in the data used for decision making. The data governance maturity model is a method to evaluate the status of development of each of the areas associated with data governance, with the lowest ratings meaning that a company has no processes and policies or inconsistent processes and policies and the highest rating indicating integrated and consistent processes to address data issues on an enterprise basis. As Figure 12 shows, most companies do not believe that they have a very mature information governance infrastructure in place. Figure 12: How Would You Rate Your Organization s Current Information Governance Maturity Level? Somewhat immature Neither mature nor immature Immature 28% Somewhat mature 26% 21% % Mature 0 0 7%
14 14 Even though around three quarters of the respondents do not feel their information governance structures are even somewhat mature, many companies have at least some of the needed infrastructure in place, and many have projects in the pipeline, as shown in Figure 13. As Figure 13 shows, data security and privacy followed by information integration are the aspects of information governance that have received the greatest amount of attention and development while data lifecycle management and master data management have received the least attention and investment. Other than information governance projects in general, master data management has the largest number of projects in the planning stage. Figure 13: What is the Status of the Following Projects Associated With Information Governance? Production nnnnn Planning/implementing nnnnn Not planning at this time nnnnn Data security and privacy 43% 42% 14% Information integration 38% 52% 10% Data quality 32% 53% 16% Master data management 29% 49% 22% Data lifecycle management 28% 48% 23%
15 15 One reason to explain the variations in the rate of implementation for various applications associated with information governance is that companies frequently do not look at the issue holistically but instead address it on a projectby-project basis. (See Figure 14). Figure 14: What is Your Organization s Planning Process for Information Governance? We plan each project separately 34% We develop a data governance plan for 25% the year, across multiple projects We only address data governance when 22% it complements another initiative We do not address data governance 12% Don t know/unsure 8% Planning projects separately has its benefits and drawbacks. One of the primary benefits is that companies can focus on the area of most pressing need. The primary drawback is that planning projects individually makes it more difficult to systematically establish an integrated infrastructure. Moreover, each project needs to find business sponsors, and the overall process is not as streamlined or systematic as perhaps desired. Potential synergies across multiple similar projects which might benefit from shared skills, best practices or tools are often missed when activities are planned and executed in silos. In short, project-by-project planning and implementation do not support a consistent, efficient and effective data strategy.
16 16 The greatest hurdles to be overcome in planning and implementing new projects generally are organizational issues. Process and change management issues as well as data confidence issues follow. Those obstacles can come into play during prolonged but sometimes ad hoc efforts to create a mature information governance infrastructure. (See Figure 15.) Figure 15: Rate the Following Potential Inhibitors to Implementing New Projects Minor Moderate Major Organizational/people issues 22% 21% 57% Technology issues 30% 31% 39% Process/change management issues 28% 27% 45% Data confidence issues 32% 25% 43% Among the organizational issues, the top obstacle is not surprising competing budget priorities. The next obstacle in importance is the lack of a business sponsor, as Figure 16 shows. Figure 16: What Were the Biggest Obstacles to Implementing Information Governance Projects? (Respondents were asked to select up to two answers.) Competing priorities for budget and 56% resources Lack of a business sponsor 33% Lack of a compelling event or catalyst 32% Massive data volume, variety and velocity 26% Change management difficulties within 19% the organization
17 17 One reason for the lack of business sponsors for information governance projects may be the fact that few companies have chief data officers or other C-level officers specifically and directly responsible for the wide range of issues associated with data. As Figure 17 shows, fewer than one-fifth of the companies surveyed have a chief data officer, and 65% have no plans to add one any time soon. Figure 17: Does Your Organization Have an Officer With Any of the Following Titles? Yes No, but expect No, and no to add this year plans to add Chief data officer 17% 17% 65% Chief digital officer 9% 15% 76% Chief information security officer 43% 12% 44%
18 18 DATA SECURITY AND PRIVACY While not many organizations have chief data officers, nearly half do have chief information security officers. Organizations seem to recognize that beneath the hubbub about mobile applications, customer intelligence, big data and social media lies the simple fact that the walls between organizations and the outside world are becoming more porous. This shift has led to data security and privacy emerging as one of the hottest issues today in information governance and management. As Figure 18 shows, 60% of the respondents say that their concerns for data security and privacy have grown as their companies have gathered more information and data types from more sources. Figure 18: As the Volume, Types and Sources of Data Have Grown, How Have Your Concerns About Privacy and Security Changed? Grown significantly 19% Stayed the same 34% Grown somewhat 41% Lessened somewhat 5% Lessened significantly 1%
19 19 Several factors can stimulate investment in data security and privacy technology. The most important is the existence of corporate policies for data privacy. (See Figure 19.) Moreover, data security and privacy safeguards are seen as particularly important for big data analytics projects. (See Figure 20.) Figure 19: What was the Primary Reason for Your Company s Last Investment in Data Security and Privacy? Upcoming audit/ compliance deadline 19% Corporate policy for data privacy 55% Response to data security breach 17% Other 10% Figure 20: How Important are Data Security and Privacy to Initiatives Like Big Data Analytics? Critical 31% *97% Very important 45% Important 21% Not important 3% * 97% of those with a big data initiative underway thought that data security and privacy were important, very important, or critical to products like Big Data analytics The drivers for continued investment in data security and privacy will not go away soon. About 60% of the respondents indicated that they currently include sensitive data such as patient data, financials, customer data and proprietary data in their big data analytics projects. Of those that do not currently use sensitive information in their big data initiatives, about half expressed a willingness to do so, if appropriate. As the use of sensitive data increases, safeguards for the data have to be strengthened.
20 20 DATA LIFECYCLE MANAGEMENT Despite the importance of data protection, the ability to invest in new data security initiatives or in projects related to issues such as information governance and big data is often constrained by several factors. Perhaps the greatest constraint is the cost to maintain existing applications. As Figure 21 reveals, for 30% of the respondents, maintaining existing applications accounts for more than half of their IT staff time. Figure 21: Total IT Staff Time Devoted to Maintaining Applications 25% 23% >25% and 50% 35% >50% and 80% 27% >80% 3% Don t know/unsure 12%
21 21 Obviously, this represents a daunting commitment of resources. Increasing the challenge is the fact that many organizations, including 49% of respondents, are running multiple instances of at least one major enterprise application. As Figure 22 shows, the resources tied up in maintaining applications, including multiple instances of the same applications, can deter organizations from investing in new applications. Figure 22: Do the Demands of Maintaining Legacy Applications Inhibit You From Investing in New Technologies Such as Mobile Applications and Cloud Technologies? Yes nnnnnnnnnnnn No nnnnnnnnnnnn Time 82% 19% Resources 80% 19% Cost 82% 18% But the demands of time, cost and resources are often not the root reasons why companies do not retire or consolidate applications. There is risk involved in retiring and consolidating applications. Without a clear business case or business need, companies do not want to take that risk. (See Figure 23.) Figure 23: Primary Reason You Have Not Retired or Consolidated Applications Lack of sound business case 23% Business interruption/risk 43% Lack of business need 20% Other 13%
22 22 CONCLUSION Amid today s explosive growth in data volume, variety and velocity, organizations are identifying and overcoming barriers that make it hard to locate and leverage the hidden value in the data. Too often, they find themselves spending so much time discovering and defending data that they can t devote as much time as they should to analyzing the information to drive business decisions and operations. Moreover, as sensitive data is added to the new data mix, they are increasingly focused on addressing concerns about data security, privacy and compliance. To understand and utilize the best data, enterprises are deploying various information integration and governance capabilities, including data quality and lineage, data lifecycle management and master data management, while continuing to invest in data security and privacy measures. One key challenge is to build confidence that the data used for analysis can be trusted. Whether they are consolidating and retiring applications to modernize their infrastructures, creating an enhanced 360-degree view of customers, or tackling other data-intensive projects, they are competing for budget and resources with other priorities on a list headed by the maintenance of existing applications. The challenge is to find a sponsor, make a successful business case, and move ahead while the opportunity still exists to leverage information and gain a competitive edge. For more insights into information governance for all data, visit
23 23 APPENDIX A: DEMOGRAPHIC FIGURES The respondents represent a broad range of industries. No single industry accounted for 20% of the total. Respondents also came from a wide spectrum of company sizes. The respondents also came from companies that managed a wide range of data volumes. Nearly 60% indicated that their enterprises managed 10TB of data or less, but 17% came from organizations with 1PB of data or more under management. Figure 24: Job Titles of Respondents IT director or manager 21% Database administrator 16% IT staff 16% Analyst 11% CIO or CTO 10% Project manager 9% LOB director or manager 3% Other LOB role 2% Chief data officer (CDO), 2% chief privacy officer (CPO) or chief information security officer Other C-level 9%
24 24 Figure 25: Industry Classification of Respondents Organizations IT services/consulting/system integration 19% Government (all levels) 10% Financial services 10% Software/application development 10% Healthcare/medical 10% Manufacturing 7% Business or consumer services 6% Retail/distribution 6% Utility/telecommunications/transportation 6% Education (all levels) 5% Insurance 3% High-tech manufacturing 0% Other 7%