White Paper Central Administration of Data Archiving
Archiving and Securing Corporate Data... 1 The Growing Need for Data Archive Solutions... 1 Determining Data Archiving Policy... 2 Establishing the Data Archiving Processes... 4 Manning the Data Archive Solution... 5 Choosing the Right Product... 6 Summary... 7 i
Archiving and Securing Corporate Data Corporations run on information. Firms hire armies of people who generate data as their primary work product. From data entry to document creation to analyses and even email, the flow of data is relentless. The use and flow of this information is the heart of decision-making processes that allow businesses to function and thrive. It is only reasonable that the investment made in creating and organizing this data be protected. This white paper will discuss the policies, processes, people, and products that can help ensure that the data on which an enterprise depends is secure and easily accessed. In addition, this paper explores the topic of archiving and securing corporate data. It examines the administration of data archiving. It explores the growth of enterprise data and outlines the policy considerations that allow the data to be adequately secured. It explores the processes needed to validate that the data is securely copied and quickly retrieved when required. It looks at the people needed to oversee the process, and the product requirements that can help materialize the policies and processes efficiently. The Growing Need for Data Archive Solutions Most enterprises face an ever-growing mountain of data. The sheer volume of information stored within most organizations is overwhelming and continues to increase each year. Simply making copies of the information is not enough. The archived data must be organized so that it can be identified, collected, tracked, audited, and managed. The complexity of the enterprise infrastructure makes the identification and collection of data more challenging. Corporate system data may be stored in mainframes; mini computers; or UNIX, Linux, Windows, and Apple servers. Each of these platforms handles their data in a different manner. The file storage and data storage systems are different. They monitor system activity differently. A comprehensive data archiving solution must span all these platforms. The complexity of data archiving is exacerbated by the diverse types of data to be stored. Data is stored in file systems, database systems, messaging systems, message queues, and the like. Some files can simply be copied, while others require the application that manages the data to prepare a copy for backup. The data is protected by a wide range of security mechanisms as well. Some are stored in secure file systems; others require application credentials and permissions to copy the data. There may be simple usernames and password to access; others require LDAP or Kerberos authentication to read. Still others may require security certificates. With the types of data so widely distributed and the mechanisms to access the data so varied, a successful solution will need to provide the means for backing up all the required forms of data. The solution will need to work with each distinct security mechanism required to reach the data. It will need to manage the security mechanisms that allow access to the data and keep those credentials secure. 1
Most organizations store their data near where people use it. With organizations that spread across geographical regions or the entire globe, data archiving has more challenges. The solution must be able to run on servers in different locations. The process must be managed in each of these remote locations. The archived data should be, for security purposes, stored at a safe distance from the primary storage location. This will require the coordination of the transportation of the data, whether physically or by electronic means. To successfully manage the archiving of enterprise data, a centralized point of control for the system often proves invaluable. If the system can abstract the backup process without miring the operator in the details of the operation, it makes the process easier to schedule, control, and monitor. Any successful solution must be monitored to be proved effective. A data archive solution that can provide centralized monitoring of backup activities will help the operations staff ensure that corporate information is secure. Not only must the system audit the process of collecting and storing data, it must help identify the information to make its retrieval quick and simple. It should track the life cycle of the retained records and safely destroy records that have reached the end of their useful life. The needs of most enterprises are in a constant state of flux. Servers are commissioned, decommissioned, and consolidated. Services move from one data center to another. Applications are upgraded and replaced, changing the requirements of the data archive system. Companies purchase new applications, or merge and must combine assets with another firm. A data archive solution that can adjust quickly and efficiently to these changes can make it much simpler and safer to maintain data security. Determining Data Archiving Policy Ultimately, corporate policy will dictate the requirements of the data archive solution. A carefully crafted policy will help IT develop the right system for the needs of the organization. Key tenets of the policy must be considered before the policy can be published and the appropriate solution designed and implemented. Policies must be developed to identify the type of data to be archived and the life cycle of that data. Policies must provide parameters for devising a schedule for storing data. The data archive process must be integrated into the change management policies of all the IT systems within the organization. The policies must be validated to meet with all regulatory and corporate compliance standards. First, the data to be archived must be defined and taxonomy developed for identifying the critical corporate data that should be protected and preserved. For every type of data, a retention policy should be developed. This policy will include how often the data should be backed up and for how long it should be preserved. The policy should provide the IT team developing the solution with enough information to locate and classify all the information that is to be secured by the solution. 2
Once the data has been identified, policies must be created that direct when and by what means the data is to be preserved. Backups require schedules. The schedule is determined by the availability of the data being preserved and the resources on which it is being stored. For instance, file shares can typically be backed up at will. But the backup will generate activity on the hard drives and network traffic. These considerations should be factored in when determining the policy of the schedule. Every type of data will have its own issues, and the policy should provide guidance so that operations personnel can make good choices. Once the policies have been established for identifying and scheduling data archives, there needs to be governance of the processes implemented. These policies include validation of the processes, change management control for the data archive solution, and plans for re-evaluating and updating the data archive policy itself. It is easy to miss very practical matters in data archiving policy. For instance, if a backup is never restored, the organization has little assurance that the archive solution works or that the operators know how to execute it on demand. A simple policy that requires a periodic drill to demonstrate the ability to restore data before a disaster can save a great deal of struggle when a real need to restore the data is encountered. Preparing change management controls for the data archive solution will also help to maintain the system and ensure that no critical data is missed. When systems change for instance, the organization consolidates servers or moves an application from one data center to another it is important to confirm that the data archive solution has been altered to meet the changed circumstances By integrating the data archive policy into the change management policies of the organization as a whole, there is little chance that changes to the IT infrastructure will leave mission-critical data unprotected. An overarching goal of these policies is to ensure that the data archive solution meets both corporate and regulatory standards. The solution should be able to provide validation that the data has been handled as required and is secure at all times. The policy must address regulatory compliance issues as well. Standards such as the Health Insurance Portability and Accountability Act (HIPAA) and PCI data management standards require data to be held securely. This holds true for the archive data as well as the data in the system of record. Regulations such as the Sarbanes-Oxley Act (SOX) require control of what data is backed up and when. Such regulations require auditing procedures that validate that the policy is implemented. Thus, policies must be defined so that they can be implemented and executed. 3
Establishing the Data Archiving Processes The policies effectively provide a requirement for the data archiving solution. The IT staff can use the policy to design a solution that implements the commitments for service defined in the policy. They can then implement that design throughout the organization. They must establish an effective system for managing and monitoring the processes. The system should validate execution of the policies. When things go awry, a well-designed and implemented system can be quickly and easily corrected. The systems will provide fast, reliable access to all archived data and assurance that it can be restored on demand. With a clear set of policies in hand, the IT staff can implement a solution that meets the requirements of the policy. During the design phase of the solution architecture development, there will be some interaction between the solution designers and policy makers. The policy might call for a solution that exceeds the available budget. The policy will need to be consonant with the budget of the system that is subsequently designed. The cost of the system should reflect the value of the data and potential cost and risks to replace. The total cost of ownership (TCO) of the solution, from software to hardware to operating costs, needs to be considered in this equation. Once a system is designed that fulfills the requirements of the policy, implementation can begin. Most enterprises host a number of applications, many on disparate server platforms. The operations staff will be required to manage the backup process on all these servers and applications. The task becomes more difficult when servers and data centers are geographically dispersed and separated by multiple time zones. The development of these processes can be complex. For example, if a database backup requires file space in which to dump its transaction logs and data backup files, that drive space is not available for other systems. The period of time the files are maintained on the drive until they are secured in the data archive may affect more than just the backup process. Other systems may have need of the drive space. The processes must be coordinated with all the resources within the enterprise. To manage such a system requires a great deal of coordination. Communicating across time, language, cultural, and technological barriers can be time consuming and costly for staff. A system that is able to consolidate control of the backup processes will help centralize oversight of the process as a whole. It will help minimize staff and training requirements as well. The data stored within the servers will be secured. A cleverly crafted set of processes will allow the operators to collect and archive the data without granting them permission to access the data with their own credentials. The system will manage the full range of credentials required by the servers and applications within the system. It will keep them secure and make them easy to maintain. 4
The only way to validate that the system is operational is to provide monitoring. The solution should provide reports that satisfy the needs of the data archive policy to document the security of the archives. It should catalog the entire data store and help provide quick, reliable access to any of the data stored. The processes of the solution should report their activity on an ongoing basis to a centralized location. This data can be used to report on activity and develop baselines to project the future needs of the system. As with any system, things will occasionally go wrong. The data archive solution should provide alerts to the operators to notify them quickly when backup routines fail. It should collect and present helpful information concerning the nature of the failure, and provide the operations staff with information to remediate the error and quickly secure the missing data. Procedures for changing the source, location, and type of data to be archived will help keep the system flexible and responsive to the needs of the enterprise. Streamlining processes to help implement backups quickly with minimal overhead can reduce organization costs and make the entire organization more nimble and able to respond to new challenges. Manning the Data Archive Solution The impact of staff on the cost of the data archive solution is significant. Systems that can minimize labor costs while maintaining the full integrity of the data archive will provide greater value. Implementing processes and tools that relieve staff of tedious tasks that are easily automated will help control costs. Centralized monitoring and control help keep staff requirements down. The solution will require a staff to oversee its operation and maintenance. The challenge with staff begins with training. Data archives come from a wide range of platforms, so a great deal of cross training might be required to help staff understand all the applications and platforms from which they collect data. This could be avoided if the experts on each platform and application assume the added responsibility of supporting the archival of their individual data. However, when responsibility is spread across many different individuals, each with varying priorities, it becomes difficult to manage. The staff bears the responsibility for the execution of the data archive processes. They must implement the initial processes and schedule and monitor them. They need to have the tools to isolate and correct errors. They need to organize the data archives. Without the ability to quickly locate archived data, its value is questionable. They must be able to restore data quickly when required. They need to be able to report on the condition of the system and satisfy any audits. The cost of staff is always a significant component in any solution. A solution that can minimize the cost of staff is desirable. All areas of staffing should be considered: administration, operations, configuration, troubleshooting, and training, Finding solutions that run with minimal staff and yet do not compromise the security of the data should be a top priority. Regardless of the solution devised, the staff must remain accountable for the outcome of the system. They must be able to deliver on the commitments dictated by the data archive policy. Equipping the staff with the proper set of tools can help them deliver on that commitment while controlling costs. 5
Choosing the Right Product For most organizations, the data archive solution will be centered on a product. With the diversity of platforms and types of data, there are few piecemeal solutions that can provide the level of reliability and security required to protect enterprise data. Products that can support people and simplify the implementation and maintenance of processes will provide superior value. The product needs to be able to archive all the various data types identified in the corporate policy. It should be able to work with every platform in the infrastructure to reliably gather the data required. It should also be able to secure that data. The solution needs to work with many different types of server configurations, from clusters to virtualized servers to server farms. The solution should grow with the inevitable evolution of the technology it supports. The product should make implementation of backups simple and error-resistant. The use of agents that handle the technological variations on the platforms and applications will facilitate backing up data on a wide variety of sources without specialized training in the products that it backs up. The product should help manage the scheduling of backups. Backups must be sensitive to the consumption of resources, whether individual server resources or shared resources, such as storage devices and network bandwidth. The ability to schedule tasks nimbly will help optimize the utilization of resources. Having a centralized view of all backup activities also helps manage the process enterprise-wide. The product should monitor the execution of the backup schedule. This will provide validation of the security of the data. It should also alert operators when backup fails. Products that provide operators with relevant information will help operators quickly correct any issues that they encounter and keep the backup process running as designed. The product should track the storage of all the data archived by the system. This will allow operators to quickly retrieve data on demand. The product should also help to maintain the data for the appropriate length of time to ensure that data retention policies are fulfilled. In addition, the product should help minimize space used by the data archive. This will help manage the hardware costs of the solution. Products should allow for quick extension of the system. If new applications are added, the product should allow that data to be added to the archive quickly. When infrastructures are reconfigured, and servers are moved, consolidated, or de-commissioned, the system should be able to be reconfigured to meet the need. Data centers are scattered in multiple locations, so this reconfiguration is best handled remotely. The product will also need to grow with the technologies it supports. It should have a good record of supporting a wide range of servers and applications. It should demonstrate that it quickly provides agents that support new product updates and improvements. 6
The total cost of the product needs to be considered before making a choice. It is more than the cost of the software and hardware. It includes the cost to operate the solution, the risks associated if it is inoperable or cannot support some or all the requirements stated in the data archive policy. Supporting services and infrastructure should also be considered. An ideal system will allow centralized control and monitoring through a console system. The central monitoring control should be distributable to multiple locations to provide redundancy. The system should maintain the requirements for regulatory compliance, such as securing data through encryption and providing reports that clearly document how the data is handled, where it is located, and who has accessed it. Summary Enterprise data is vital to the ongoing success of any organization. Protecting that data protects the hundreds or thousands of man-hours and dollars used to collect and organize it. It provides the information every organization uses to make critical decisions and keep operations going. A well-designed data archive strategy can protect that data. Because of the geographically dispersed and technologically diverse nature of data in most organizations, most corporations need a centralized solution. A system that can manage backup operations in many locations and on many different platforms can secure data in a cost-effective manner. A data archive system begins with a well-crafted policy that defines what data needs to be protected. It provides guidance that allows it to design and implement a system that secures the data. The procedures should be simple to schedule and maintain. The processes should be easy to troubleshoot. A full audit trail should be maintained. Data should be well cataloged and quickly available on demand. The cost of staff should be minimized without endangering the data archive policy. Products should be chosen that will meet the needs of the organization with the lowest possible TCO. 7