WHITE PAPER Common Gaps in Data Control: Identifying, Quantifying, and Solving Them Using Best Practices Sponsored by: Actifio Phil Goodwin September 2015 IDC OPINION In an effort to ensure absolute data protection, IT organizations make numerous copies of their data, both structured and unstructured. This may be done through snapshots, mirrors, and various replication methodologies (both local and remote) as well as backup/recovery. In fact, each technique helps organizations address different sources of potential data loss. IDC has defined this as an issue of copy data management. Our research indicates that the median number of copies made of critical databases and file systems is 13. While having 13 copies of data may ensure that the data can be recovered, too few IT organizations address the flip side: They now have dozens of copies (when considering multiple databases and file systems). In fact, our research revealed that on average, organizations have 375 separate copy data repositories. Each repository is vulnerable to inadvertent exposure of sensitive data or outright nefarious attempts to access the data. Data control and data security are different, although related. Data control is a superset of data security that adds centralized access management and auditing to the encryption and authentication of security. With organizations now having data spread across private clouds, public clouds, backup to tape, backup as a service, and disaster recovery (DR) as a service, the complexity and possible permutations of data locality are simply too significant for IT administrators to manage manually or ad hoc. Furthermore: The permutations of databases, copies, locations, policies, and access control methods make copy data management a complex matrix of dozens of possibilities that is impractical, if not impossible to track and manage manually. Organizations that rely on manual processes for data control are unlikely to ever achieve best practices because the IT environment is too dynamic for such methods to be consistently successful. Failure to address the data control issue virtually guarantees that sensitive data will be exposed to the risk of unauthorized access (whether nefarious or not), placing organizations at risk of regulatory sanction and negative publicity. Organizations need automated methods to consistently address the data control problem in a best practice manner. September 2015, IDC #259143
METHODOLOGY Actifio is well known for backup solutions designed to virtualize data and address the copy data problem in IT organizations. These solutions focus on reducing the significant financial toll that copy data extracts from the IT budget. Regular security breaches in the news highlight the risk that IT organizations face regarding either accidental exposure of sensitive data or the deliberate intrusion of hackers to retrieve sensitive data. The people at Actifio reasoned that the more copies of a data set that exist, the more opportunity there is for disclosure of sensitive information. Moreover, data in different stages, such as development, test/dev, and production, has different data access profiles. Thus there may be points at which some data is more vulnerable than other data. Actifio engaged IDC to determine answers to the following questions: How pervasive is the copy data access problem? At what point in the data life cycle is sensitive data most vulnerable to accidental disclosure? Are data managers aware of the copy data access problem, and how many follow best practices to deal with it? Who is responsible for dealing with the copy data access problem in most organizations? To answer these questions, IDC developed a survey of high-level decision makers regarding their organization's practices around copy data access. The survey netted 429 responses, with key demographics as follows: Organizations with more than 1,000 employees Respondents with titles such as CIO, VP of infrastructure, database architect (DBA), enterprise architect, and application architect Respondents who are intimately or very familiar with the organization's data access practices Organizations from financial services, education (K 16), healthcare, government, and retail industries We chose to focus on the aforementioned industries because all of them deal with some sensitive information, whether it is related to HIPAA, personally identifiable information (PII), or PCI. Each industry was represented equally in our survey results. Some highlights of the respondent profiles are as follows: CIO (45%), VP of infrastructure (29%), and DBA (18%) Firms with 1,000 4,999 employees (44%), 5,000 9,999 employees (43%), 10,000 or more employees (13%) All responses were strictly confidential to ensure respondent candor. 2015 IDC #259143 2
IN THIS WHITE PAPER This white paper uses the primary research gathered by IDC from 429 midsize companies to largescale enterprises to quantify the risks associated with inadequate data control and its sources and to identify commonly vulnerable areas. The intended audience of this study is senior IT professionals responsible for data control and security, including CIOs, CSOs, VPs of infrastructure, and datacenter managers. This examination included structured and unstructured data, on-premise and off-premise repositories, data at rest, data in flight, and backup sources of data control vulnerabilities. Best practice guidelines and operations at the various stages in the data life cycle are provided. In addition, the document includes an overview of Actifio's data control solution and discusses challenges and opportunities for organizations addressing the copy data and copy data access problems. SITUATION OVERVIEW IDC defines copy data as all secondary copies of data compared with primary data. As data enters a system via transaction or other input, it is primary as long as it is unique. When IT organizations make copies of the data for data protection, test data, analytics, ETL, or other purposes, these copies collectively constitute copy data. Copy data is a necessity for IT organizations for normal business operations as well as ensuring data survivability for any potential data loss event. Copy data becomes a "problem" when the number of copies exceeds the utility of each copy. Organizations may keep an excessive number of copies out of an abundance of caution bordering on paranoia or simply because copies are created and rarely deleted due to either immature processes or lack of oversight. IDC estimates that copy data will cost IT organizations $50.63 billion in 2018. This is obviously a significant sum, especially when one considers that it represents 60% of the typical storage hardware budget. In addition to the financial costs of copy data, excess copies represent unnecessary risk of sensitive data disclosure. When copies are made for data protection, analytics, and the like, the new copy not only must inherit the security of the primary system but may need to be scrubbed for sensitive information. For example, during the development or test/dev process, IT organizations prefer to use real data. However, developers are rarely authorized to view the sensitive information that they can easily see in the data. To avoid this, data scrubbing will either scramble sensitive data or mask it in such a way that it cannot be viewed. Organizations may have as much legal exposure through inadvertent disclosure as they do if a hacker were to gain access. The price may include financial penalties, civil legal liability, and damage to organizational reputation. So, how pervasive is the copy data access problem? The results from our study showed that on average, organizations have 13 copies of individual data repositories containing sensitive information. In fact, 13% of respondents had more than 15 copies. This copy rate is consistent within the margin of error for both databases and file systems. In other words, organizations appear to apply copy policies regardless of whether the data is structured or unstructured. 2015 IDC #259143 3
A bit of simple math will help us fully quantify the depth of the issue. In addition to finding that organizations make 13 copies of data, we also learned that organizations have an average of 12.95 databases, of which 42% contain sensitive information. So, we calculated the following: 12.95 databases x 13 copies of data x 0.42 sensitive information = 70.7 database copies with sensitive information We can run a similar calculation for mission-critical file systems based on responses to our survey: 14 file systems x 12.53 copies of data x 0.60 sensitive information = 105.2 file system copies containing sensitive information In addition, we asked respondents to tell us how many copies of sensitive data are kept on tape. The answers average out to 7.4 backup copies per database or file system. Again, some quick math: 12.95 databases x 7.4 backup copies per database = 95.5 database backup copies 14 file systems x 7.4 backup copies per file system = 103.6 file system copies Our calculations revealed that a typical organization has an astounding 375 extra copies of data that contain sensitive information. Every one of these extra copies is a point of vulnerability for the organization, and each extra copy must be secured against possible disclosure. Given all these points of vulnerability, it's important that organizations follow formalized procedures and best practices to reduce the risk of data exposure. To find out how well organizations secure access to these repositories, we asked respondents a series of questions regarding their data scrubbing and encryption practices. Respondents were asked whether they scrubbed data routinely whenever they make a copy of a mission-critical, sensitive database or file system. Responses range from "never" to "every time without exception." Responses showed that an average of just 27% of organizations scrub data every time without exception. Among different industries, government is the most diligent sector, but the 40% of government organizations that scrub data is still well below half. Educational organizations had the lowest incidents of data scrubbing, with just 14% doing so every time without exception. Even financial services organizations (27.3%) and healthcare (20.3%) had surprisingly low incidents of data scrubbing every time without exception, given how heavily regulated these industries are and the high-profile nature of any data breaches among them. Retail rounded out the group at 31.4% (see Figure 1). 2015 IDC #259143 4
FIGURE 1 Data Scrubbing Practices by Industry Q. When test/dev, analytical, or any other nonbackup copies are made of either mission-critical or business-critical databases or file systems containing sensitive information, how often is the data scrubbed or masked to eliminate the possibility of exposing sensitive information to unauthorized viewers? Never In some cases, but not a majority of the time In most cases Every time without exception Financial services (n = 88) Education (n = 84) Government (n = 85) Healthcare (n = 86) Retail (n = 86) Government most secure, education least secure Don't know Best practice 0 10 20 30 40 50 60 70 (% of respondents) n = 429 Base = all respondents Notes: This survey is managed by IDC's Quantitative Research Group. Data is not weighted. Use caution when interpreting small sample sizes. Source: IDC and Actifio's Data Control Survey, June 2015 Encryption is another method of thwarting unauthorized access, so respondents were asked about their practices in that regard. As with data scrubbing, only a minority of organizations always encrypt data at rest. In this case, an average of only 29% of organizations surveyed always encrypt data at rest. From the data that we gathered, we found that 22% of database copies that contain sensitive information are never encrypted. This means that on average, organizations have 15.5 database copies that are vulnerable to data disclosure. Of course, this is an average, so weaker organizations may have substantially more. 2015 IDC #259143 5
Results from our questions regarding encryption in flight were slightly higher. In this regard, one in three organizations always encrypts data in flight. However, we were able to determine that it is highly likely that the same 22% of database copies not encrypted at rest are not encrypted in flight either. With respect to encryption, we also asked about respondents' encryption practices for backup tapes. Here, the results were more encouraging. According to the survey, 47.5% of organizations encrypt tape every time and 47.3% encrypt specific data on certain tapes. However, given that 96% of the organizations surveyed move tapes offsite, it makes sense to simply encrypt all tapes by default. After data scrubbing and encryption, access control is the best means of ensuring that sensitive data is viewed only by those authorized to do so. By access control, we mean that the organization authenticates every user user permissions are set strictly and audit reports can be run to determine when and by whom the data was accessed, and the organization can detect when the data is copied outside the organization. Several key findings emerged from our study: Only 39% of organizations have best practice data access (as described previously) for databases. Just 29.6% of organizations have best practices for file system access. In 34% of cases, data access policies are applied ad hoc. 38% of organizations either do not audit their security and access policies or do so only on an ad hoc basis. The point at which policies are applied is also important. In this regard, we found that 36.6% of policies are applied at the time of application development, 34% are applied ad hoc, and 29.6% are applied during dev/test. The earlier that policies are applied, the less opportunity there is for data disclosure; the ideal is at the time of repository (copy) creation. Of course, someone must be responsible for dealing with copy data access. In our study, fully 10 different job titles had responsibility for participating in the establishment and implementation of data access policies. However, one title emerged more than any other: the CIO. According to the responses we received, the CIO was involved in setting policies 71.3% of the time and in implementing policies 68% of the time. The CIO was followed by the storage administrator, who is involved 50.3% of the time. This certainly means that in the event of a data access problem, all eyes will turn to the CIO. In summary, no more than one in three organizations can be considered best practice in any given area of vulnerability, including the areas of data at rest, data in the datacenter, data in flight, backup, and access control. However, being best practice in one area does not ensure best practice in all areas. An organization can measure well in one aspect and be exposed in another. FUTURE OUTLOOK The copy data access problem is simply too large and broad for IT organizations to handle manually. Even if the best processes are in place, the bureaucratic effort associated with managing 375 data copies impedes the agility that organizations need to meet business needs. Moreover, if human error enters the process just 5% of the time, then on average, nearly 20 copies of data will be vulnerable in the organization. An automated solution is the only practical means of addressing the copy data access problem. 2015 IDC #259143 6
Actifio was founded to address the copy data problem with its Copy Data Virtualization solution. Actifio Copy Data Virtualization technology offers a central point of control for all copies of data within the organization. It eliminates the multiple separate copy data infrastructure stacks. The key aspect of this solution is its "golden image" management, which can significantly reduce the need to create additional copies. It virtualizes the data. It takes a single copy of the data (the golden image) and provides different views to the data. Views function as virtual copies in that they have all the functionality of a full read/writable copy without actually making a copy. There is no practical limit to the number of virtual copies that can be made. There are several advantages to the virtual copy approach. The first is economic in that the use of physical storage can be significantly reduced. This benefit goes beyond just hardware because software licensing is often tied to hardware capacity. In addition, each data copy is normally backed up, even if multiple other replicas exist. This represents not only additional physical cost but hurts the organization's ability to meet RTO/RPO service-level requirements for backup and restore operations. With respect to copy data access management, having a single image means that data access policies can be applied once and inherited by all subsequent virtual images. Policies are applied at the time that the image is created, meaning there is never a gap or an opportunity to be exploited for any nefarious activity. For example, Actifio encrypts the golden image by default, both at rest and in flight. Comprehensive encryption is the first, most obvious, and easiest way to reduce the risk of unauthorized data access. As data is added or deleted from the golden image, the product uses an "incremental forever" approach that reduces resource and bandwidth consumption. This includes copies created for business resilience and DR. To further reduce the possibility of sensitive data exposure, Actifio includes workflow automation to mask sensitive data. This feature automates data scrubbing at the time of virtual copy creation. Therefore, whether the copy is used for application development, test/dev, or any other nonproduction purpose, the data is protected from even the most innocent or inadvertent disclosure. When workflow automation is combined with Actifio's ability to provide audit trails, organizations can assess who accessed what and when. Actifio data virtualization technology is deployed via a virtual or physical appliance. That appliance can reside in the datacenter or in private, public, or hybrid cloud environments. For public cloud environments, Actifio Sky for Amazon Web Services (AWS) gives customers the option of purchasing the appliance through the AWS Marketplace. In fact, many buyers consume Actifio as a cloud service. CHALLENGES AND OPPORTUNITIES IDC's best practice guidelines for addressing the copy data and copy data access problems are as follows: Quantify and understand what copies are made, how many, and why. Having copies of data is OK and, in fact, is a necessity for resilience, ETL, DR, and so forth. However, copies should be made consciously and within the context of business requirements. The organization should have as many copies as it needs to conduct business, but no more. Encrypt data by default, both at rest and in flight. Encryption does involve some processing overhead, but it is a small cost compared with the cost of sensitive data exposure. Moreover, hardware performance simply continues to increase according to Moore's law, meaning that the cost of encryption goes down over time. 2015 IDC #259143 7
Scrub copy data of sensitive information by default. In other words, data users should need to justify why the data should not be scrubbed rather than the other way around. Apply data security access policies automatically and systematically at the time of repository creation. Ad hoc or case-by-case policies may seem to be a way to optimize access, but such practices open security gaps because of inadvertent loss of oversight, if nothing else. Waiting to apply access policies, such as during application deployment or test/dev, leaves the data vulnerable in the intervening time. Encrypt all data on tape. It should be simply assumed that tapes will move offsite. If a tape that ought to remain onsite is unexpectedly moved outside the datacenter (including theft), it is too late to encrypt the data. IT organizations must incorporate all best practices to be a best practice organization. While it is possible to do so without an automated approach, the level of effort is so great that it is difficult to sustain over the long haul as various priorities arise and staff members change. CONCLUSION Copy data not only is costly but also introduces significant risk. IT managers must address two key issues: the copy data problem and the copy data access problem. IDC research shows that the copy data problem consumes up to 60% of the IT storage hardware budget. While it may not be possible to completely eliminate copy data, any incremental reduction returns real money to the IT budget. Controlling costs is certainly important, but the bigger organizational risk may be in copy data access. Given that IT organizations have an average of 375 data copies, the time at which data is inappropriately disclosed would seem to be more a matter of "when" than "if." Government regulations regarding disclosure of sensitive data are becoming increasingly strict, led by the European Union. Exposure of sensitive data, no matter how inadvertent or benign, can lead to government sanction and fines. With so much potentially on the line, CIOs should take a proactive approach to solving the copy data access problem, even if they see little need for urgency in addressing the copy data problem. Fortunately, it is possible to address both problems at the same time using automated systems. 2015 IDC #259143 8
About IDC International Data Corporation (IDC) is the premier global provider of market intelligence, advisory services, and events for the information technology, telecommunications and consumer technology markets. IDC helps IT professionals, business executives, and the investment community make factbased decisions on technology purchases and business strategy. More than 1,100 IDC analysts provide global, regional, and local expertise on technology and industry opportunities and trends in over 110 countries worldwide. For 50 years, IDC has provided strategic insights to help our clients achieve their key business objectives. IDC is a subsidiary of IDG, the world's leading technology media, research, and events company. Global Headquarters 5 Speen Street Framingham, MA 01701 USA 508.872.8200 Twitter: @IDC idc-insights-community.com www.idc.com Copyright Notice External Publication of IDC Information and Data Any IDC information that is to be used in advertising, press releases, or promotional materials requires prior written approval from the appropriate IDC Vice President or Country Manager. A draft of the proposed document should accompany any such request. IDC reserves the right to deny approval of external usage for any reason. Copyright 2015 IDC. Reproduction without written permission is completely forbidden.