I D C E X E C U T I V E B R I E F How Big Data Transforms Data Protection and Storage August 2012 Written by Carla Arend Sponsored by CommVault Introduction: How Big Data Transforms Storage Omøgade 8 P.O.Box 2609 2100 Copenhagen, Denmark P.45.39.16.2222 Big Data is one of the transformative forces that are impacting the IT industry today. Attitudes to Big Data range from sarcastic to enthusiastic, but IDC is certain that Big Data will transform the way that we architect and use IT, and even more importantly, Big Data will change the way that business decisions are taken based on the accuracy and timeliness of data available for decision making. This paper discusses how the emergence of Big Data use cases will impact and transform storage infrastructure requirements. What is Big Data? Big Data has an analytics dimension and a storage dimension, and much of the discussion of Big Data is focused on how companies can gain competitive advantage from analyzing existing and emerging data sources in real time. These new requirements on the analytics side have an impact on the way storage is architected as well. IDC defines Big Data as follows: "Big Data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery, and/or analysis."
Figure 1 The Four Elements of Big Data Source: IDC, 2012 Big Data is described using the following four elements: Volume. The challenge of handling ever-larger data volumes is nothing new for storage administrators. However, Big Data might actually drive companies to the limits of their current architecture faster. Variety. Big Data enables organizations to analyze data that has been generated outside of the organization, such as social media data and weather data, as well as data that is generated from sensors, point of sales systems, RFID tags, video surveillance cameras, etc. These new data types ask new questions around information governance and add to the volume of data stored. Velocity. Data is coming into the organization at an increasing speed, and Big Data analytics want to take advantage of it in real time. Consequently, performance is a key element of the underlying IT infrastructure. Value. Big Data analysis is done to create a unique competitive advantage for organizations, through understanding their customers' preferences better, to segment customers more granularly, and to target specific offers at precise segments. But 2 2012 IDC
public sector organizations are also using Big Data to prevent fraud and save tax payer money and to provide better services to citizens, for example in healthcare. Big Data use cases are emerging in every single industry enthusiasm and creativity is what they have in common. Overall, Big Data approaches can be divided into two: those who are optimizing current data and analytics processes with new technology, and those who are using technology to open new business opportunities for their organizations, and think out of the box. Storage Challenges Stemming From Big Data How do these four parameters change the need for data protection? What are the challenges IT managers are facing? Volume. Increasing data volumes is the most commonly understood challenge for storage managers. They struggle with shrinking backup windows, yet longer backup cycles due to the larger volumes. They also struggle with requirements for shorter restore processes. Big Data accelerates these challenges, and raises the question about rearchitecting backup processes as well as questions about the value of data, and if all data should be treated equal. Variety. Different data types, not all of which are generated within the organization, raise the question of information governance. How do you protect data that has been generated on the social Web? How can you apply policy to data that lives in the cloud, is analyzed in the cloud, but forms the basis for important business decisions? Velocity. Performance is a key attribute of Big Data, and shorter time to decision is one of its benefits. This increases the requirement for performance on the storage infrastructure. Value. The purpose of Big Data analytics is to create additional value to the organization. This raises the old question about the value of data that is stored. Differentiating between data continues to be a challenge, and many companies treat all data equal, for lack of an efficient alternative. Another dimension of value is to find the relevant data and make it available in the decision process, particularly unstructured information. Benefits From Big Data How can storage help derive value and competitive advantage from data? Even though most of the competitive advantage is created from advances on the analytics side, storage also plays a key role in enabling Big Data by: Providing policy-based data management. Information architecture and information governance should be reviewed when organizations are starting to embrace Big Data. Making data searchable through intelligent indexing. Finding data and making it available for management and decision making is another way of adding value. 2012 IDC 3
Ensuring storage performance. Performance is a key parameter within Big Data, as value comes from real-time analysis of data. Efficient data management ensures optimum storage performance. Storing data very efficiently to contain the storage footprint (dedupe, single instancing, compression, thin provisioning, snapshots). Storage efficiency is one key means to providing value to the organization. If the data is stored in the most efficient way, the storage footprint can be contained to a minimum, and the organization can free up resources and money to deploy for innovation. Providing data access from mobile devices. Data is increasingly being accessed from mobile workers and through smart mobile devices. This is particularly true for data for decision making. IT managers need to plan for this. Using cloud storage where appropriate. Some Big Data is created, analyzed, and stored in the cloud. The movement of large amounts of data over networks remains a performance challenge, so cloud storage needs to be part of the storage mix where appropriate. Storage Best Practices to Support Big Data From the currently known use cases, we can highlight some emerging best practices around data management for Big Data: Revisit your storage architecture. Some Big Data datasets require multiple active copies that are protected with replication instead of traditional backup. Many companies are using a mix of snapshots, replication, and backup to protect Big Data datasets. Take a point of departure in your current storage infrastructure, and understand how you can evolve it by benefitting from a new architecture or new technologies. Understand your data. Especially in Big Data, not all data is equally important and needs the same protection. When looking at the Big Data process, input data will most likely need to be stored, but in some cases it is transient and just passes through the organization without being kept. The algorithms are usually the most valuable part because they are a unique differentiator for any organization. The outcomes of the analysis do not necessarily need to be stored because some datasets are faster to recreate than to restore. Data governance gets more complex. When using additional data types in the analytics mix, organizations need to understand the privacy regulations associated with this data. This is true for data generated externally to the organization, but also for data that lives in the cloud. 4 2012 IDC
Conclusion: Big Data Will Transform Storage How Can You Benefit? As Big Data is adopted across Europe, organizations will need to evolve their storage infrastructures as well. However, many of the storage challenges created by Big Data are well-known and wellunderstood, just on a smaller scale. So organizations are advised to evolve their storage infrastructure and not to rip and replace what they currently have. Storage vendors are constantly innovating in order to tackle the rising challenges, and Big Data is on the roadmap for some of them already. Consult your storage vendor or channel partner and ask them about their vision for the Big Data market. IDC also recommends the use of architectural services to understand the impact of Big Data usage in your organization. Big Data use cases differ greatly by industry and company size, and so does the value of the data used for analysis. Understanding the value creation throughout the process helps architect an efficient storage infrastructure. COPYRIGHT NOTICE The analyst opinion, analysis, and research results presented in this IDC Executive Brief are drawn directly from the more detailed studies published in IDC Continuous Intelligence Services. Any IDC information that is to be used in advertising, press releases, or promotional materials requires prior written approval from IDC. Contact IDC Go-to-Market Services at gms@idc.com or the GMS information line at 508-988-7610 to request permission to quote or source IDC or for more information on IDC Executive Briefs. Visit www.idc.com to learn more about IDC subscription and consulting services or www.idc.com/gms to learn more about IDC Go-to-Market Services. Copyright 2012 IDC. Reproduction is forbidden unless authorized. 2012 IDC 5