A CommVault Business-Value White Paper Archiving Strategies for Successful ediscovery Gary Cooke Archiving Technical Specialist, CommVault
Contents Why Archive When You Have Backups? 3 Archiving Advancements Optimize Storage & Information Governance 4 Modern Data Management Model Unifies Archives & Backup 5
This article first appeared in the June 2011 issue of Legal Tech Newsletter, an ALM Publication. In today s data-intensive organizations, daily backups are de rigueur yet archiving corporate information assets to preserve them over the long haul is still considered a nice to have option. The reason for this process dichotomy stems from the fact that companies initially viewed backups and archives in a similar fashion as both play pivotal roles in safeguarding e-mail systems, file servers and other data repositories. Still, a variety of dissimilarities exist between archive and backup, which makes it increasingly important for companies to develop distinct yet complementary strategies to serve each purpose and process effectively. In recent years, the importance of strategic archiving has intensified amid increasing demands to better control data growth and infrastructure costs while reducing risks associated with corporate, legal or regulatory compliance. As legal teams and other corporate stakeholders strive to decrease the cost and complexity of responding to the growing volume of ediscovery and governance requests, archiving continues to gain prominence as a critical corporate IT priority. As such, organizations need to discern the differences between archives and backups in order to determine the best way to integrate these core functionalities into a cohesive, overall data management structure. This article will offer insights into both of these crucial processes before presenting an updated view of how to align backups and archives to ease business continuity, ediscovery and information lifecycle management. Why Archive When You Have Backups? The function of backups is primarily to create copies of primary data in order to protect it from loss due to hardware failures, user errors or catastrophic events. Typically, these production data copies are stored on low-cost media, such as tape, and then are shipped offsite as part of an overarching disaster recovery plan. Performing daily or other regularly scheduled backups are an essential element of most corporate data protection policies, yet it s important to recognize they are tactically driven and typically short-term focused. While more organizations are opting for disk-based backup and recovery so they can expedite both backup and restore processes, a large percentage of companies still rely heavily on tape backups. Restoring from tape can be a tedious, time-consuming and unreliable process and countless examples exist of problems that companies have encountered when they couldn t reproduce crucial evidence from tape-based backups. Archiving, on the other hand, is designed to store all data, not just what has been housed on the server at a certain point in time. Typically, archives are kept on disk in a format that enables the content to be searched and accessed quickly and easily if needed. With archiving, the accessibility and readability of content are secured throughout its entire lifecycle, which is why this process is so relevant to ediscovery. 3
In most cases, data can be produced from an archive in a fraction of the time and cost it takes to reproduce content from backups alone. Most integrated content archiving solutions encompass e-mail systems, which are becoming a front-and-center issue for a variety of reasons. According to various surveys conducted by Osterman Research, corporate users spend an average of 152 minutes on a typical workday using e-mail. In fact, Osterman reports that users send and receive more than 120 e-mails each day, and the majority of the content they need is tied up in e-mail. 1 Aside from the storage and network implications of dealing with massive volumes of e-mail, the ability to enable proactive legal discovery for messaging environments is driven by archiving, not backup. In fact, the list of reasons to archive both e-mail and data is quite compelling and includes the following: n ediscovery n Regulatory compliance n Storage management n Knowledge management n Retention lifecycle management n Disaster recovery and business continuity Archiving Advancements Optimize Storage & Information Governance In recent years, advancements in archiving technologies have made it much easier to address the challenges associated with growing volumes of e-mails and file shares. Of particular interest is the policy-driven approach of moving stale and inactive data stored on expensive primary storage to low-cost secondary storage. The addition of compression and deduplication techniques have proven immensely useful in further reducing storage requirements while applying encryption provides an extra measure of security. In messaging environments, multi-platform support for major e-mail and collaboration systems, such as Microsoft Exchange and SharePoint as well as IBM Lotus Notes, is key. Equally important is the ability to replace e-mails with self-contained stub files on primary storage, which retain key properties of the original but at a fraction of the storage requirement. E-mail can be automatically moved and archived based on defined policies, either by journaling a copy of every e-mail or capturing it from the user s mailbox based on parameters such as date and/or message size. 1 Various Osterman Research Surveys as described in Why Cloud-Based Security and Archving Makes Sense, an Osterman Research White Paper published in March 2010. 4
Not only does this help organizations reclaim valuable storage space, this level of archive functionality ensures litigation and audit readiness by enforcing retention and disposition policies. Also, the ability to add file archiving as an extension of e-mail archiving enables organizations to address the increasing trend to include files as part of ediscovery actions. Modern archiving solutions include the features and functionality to improve information governance and optimize data tiering to ensure that costs can be managed and processes are repeatable and defensible. By having a robust archiving system in place, companies also can reduce the amount of data to be backed up while increasing the copies of data needed to meet Recovery Time Objectives/Recovery Point Objectives (RTOs/RPOs). Another advantage of today s modern content archiving is the opportunity to reduce costs even more through cloud computing integration. Integrated deduplication and encryption features permit efficient movement of archive data across a network to reap the compelling economic benefits of long-term cloud storage. Integrated alerting, reporting and data verification helps ensure that data has safely reached the cloud without the risks associated with manual scripting or standalone gateway appliances. As more companies adopt disk-based backups, the thinking about modern archiving is changing to encompass a more consolidated approach to both areas. The result ultimately can streamline information lifecycle management while further lowering costs and risks. Modern Data Management Model Unifies Archives & Backups Conventional wisdom about separating backups and archives is giving way to a new approach, which blends both to ease business continuity, data recovery and ediscovery. This unified approach addresses the age-old question of what to do when data backed up from an e-mail application becomes a candidate for archiving from the primary application. Traditionally, this data would be moved from the production environment into the archive irrespective of the backup. It s become increasingly clear, however, that unifying this process could reduce processing effort and storage costs, which is why momentum is growing for modern data management models that deliver the flexibility to move data from backups to archives while minimizing their interaction with core applications. To accomplish this, however, requires a singular information framework whereby all data is managed from a single source across all locations, media and public or private clouds from cradle to grave. Such an innovative approach lets all organizational data of a similar nature (e.g., e-mail, attachments, databases, video, audio, instant messages, etc.) be viewed, stored and managed using integrated policies and procedures. In this fashion, information can be stored, organized and retained according to its business value. The result can dramatically improve access while enabling end-to-end information governance. 5
Automated content classification is another important attribute of modern data management as it helps companies retain the right data, which minimizes the unnecessary and expensive impulse to save everything. Content-inspired management, along with other options for discrete legal preservation/hold policies and purpose-built records management solutions, including Microsoft Office SharePoint Server (MOSS), can elevate collaboration and knowledge sharing. Additionally, full content indexing enables all copies of indexed data (e.g., backup and archive data) to be combined in a single, searchable archive, which improves productivity by facilitating self-serve access to data for search and discovery. Search interfaces that are optimized for use by non-it staff will become more prevalent, especially as it becomes more evident that easy, fast and accurate searching can eliminate the need for expensive, outsourced discovery services. In time, the overarching benefits of seamless integration of backup and archiving will change the way companies think about both areas, especially when the benefits of accelerating large-volume discovery searches and secure legal holds are realized first hand. Having a common, intuitive interface to access a centralized archive of all electronically stored information, including e-mail, file, SharePoint and backup data, is the ultimate solution for enabling a proactive and legally defensible information management strategy and ediscovery workflow. In order to provide the data flexibility and consolidation required to fully realize these advantages, solutions need to be heterogeneous so companies aren t locked into specific hardware. Fortunately, modern data management has evolved to where organizations can transform their backup and archive strategies to elevate policy management, security, auditing, capacity planning and ediscovery all through a single interface that provides a picture-perfect view of the entire organization. As an archiving technical specialist, Gary Cooke has helped numerous organizations in developing archiving strategies for litigation readiness, compliance and record retention. He spent his career developing his expertise in the content archiving field while managing enterprise-level messaging environments and as a senior systems engineer for a solution provider. Cooke can be reached at gcooke@commvault.com. www.commvault.com n 888.746.3849 n E-mail: info@commvault.com CommVault Worldwide Headquarters n 2 Crescent Place n e, NJ 07757 n 888-746-3849 n Fax: 732-870-4525 CommVault Regional Offices: United States n Europe n Middle East & Africa n Asia-Pacific n Latin America & Caribbean n Canada n India n Oceania 1999-2011 CommVault Systems, Inc. All rights reserved. CommVault, CommVault and logo, the CV logo, CommVault Systems, Solving Forward, SIM, Singular Information Management, Simpana, CommVault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, SnapProtect, Recovery Director, CommServe, CommCell, ROMS and CommValue are trademarks or registered trademarks of CommVault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice. 6