August 2013 Business-driven governance: Managing policies for data retention Establish and support enterprise data retention policies for ENTER»
Table of contents 3 4 5 Step 1: Identify the complete business context 5 Step 2: Classify and define data 7 Step 3: Archive and manage data 9 Step 4: Dispose of data 10 11 13 15 2
Many organizations today find it extremely challenging to efficiently manage the growing volumes of data stored in various data repositories across the organization, particularly the data volumes stored in enterprise applications, databases and data warehouses. The arrival of big data will amplify these challenges in the coming years. Attempting to retain growing data volumes adds not only management complexity but also costs. More data means more disk storage. Organizations can easily become drawn into a never-ending cycle of purchasing additional production storage to retain valuable historical data. The resulting infrastructure sprawl can become more expensive than the initial investment. Many organizations still turn a blind eye to information governance issues, and rely instead on IT to keep adding storage capacity as a way to manage data growth, often past the budgeted limits. 1 Industry experts recommend that organizations leverage database archiving as a best practice to manage data growth and maintain data retention needs for business-critical applications. Archiving is the intelligent process for moving inactive or infrequently accessed data that still has value. The archive process should provide the ability to search and retrieve the data in a manner that functional users need to consume the data for data retention and analytical purposes. In some cases, organizations simply store these growing data volumes without considering governance issues. Many organizations may need to keep both current and historical data to comply with data retention rules. But are they storing this data appropriately? According to Gartner research, 3
Different types of data have different data retention requirements. In establishing information governance and database archiving policies, take a holistic approach: Understand where the data exists. Your organization cannot properly retain and archive data unless you know where data resides and how different pieces of information relate to one another across the enterprise. Classify and define data. Define what data needs to be archived and for how long, based on business and retention needs. Archive and manage data. Once data is defined and classified, archive data appropriately, based on business access needs. Manage that archival data in a way that supports the defined data retention policies. IBM InfoSphere solutions are designed to support this holistic approach to information governance and database archiving. They incorporate intelligence that enables organizations to establish data retention policies and manage data growth across a heterogeneous enterprise. 4
Define and support data retention policies to sustain Step 1: Identify the complete business context Step 2: Classify and define data Step 3: Archive and manage data Step 4: Dispose of data To build an effective overall database archiving and data retention strategy, consider the following guidelines: Involve all stakeholders in the process of aligning the business and legal requirements for the data retention policies, along with the technology infrastructure required to execute them. Define clear lines of accountability and responsibility while ensuring that IT, business units and groups work together. Establish common objectives for supporting archiving and data retention best practices within the organization. Make sure business users are appropriately involved and informed about how information will be managed and how their business requirements for data access will be met. Monitor, review and update documented data retention policies and archiving procedures. Continue to improve archive processes to support your ongoing business objectives for providing appropriate service levels while supporting retention requirements. Step 1: Identify the complete business context First, find out where the data is located, and then determine the relationships among pieces of information within a business context. Different types of data are important to diverse departments, so as a prerequisite, examine how data relates to applications and functions (see Figure 1). For example, you could identify the rows and tables associated with data in a customer order scenario to provide the complete business context. The data related to the customer order might include information about the salesperson, the customer, the product or products that make up the order, and the order details, such as shipping or delivery information. 5
Step 1: Identify the complete business context Step 2: Classify and define data Step 3: Archive and manage data Step 4: Dispose of data Figure 1. Strive to understand how all the pieces of information associated with a customer order relate to one another. 6
Step 1: Identify the complete business context Step 2: Classify and define data Step 3: Archive and manage data Step 4: Dispose of data Step 2: Classify and define data The classification of data can be based on any criteria. However, as a simple example, you can classify data based on its business value or the frequency with which it is accessed (see Table 1). Functional Data Frequent Applicationbased Access Functional Usage/Access Requirements Over Time Infrequent Ad Hoc, Query-based Access (Self-help) Exception-based, Application-independent Access (24-hour IT response) Complete Deletion (Dictates storage planning) Ledgers (GL) Current - 2Y Years 3-5 Years 6-10 Year 11 Journals (GL) Current - 2Y Years 3-5 Years 6-10 Year 11 Payments (AP) Current - 2Y Years 3-5 Years 6-10 Year 11 Invoices (AR) Current - 2Y Years 3-5 Years 6-10 Year 11 Items (AR) Current - 2Y Years 3-5 Years 6-10 Year 11 Invoices (BI) Current - 2Y Years 3-5 Years 6-10 Year 11 Table 1. Business application data may be classified in multiple ways. 7
Step 1: Identify the complete business context Step 2: Classify and define data Step 3: Archive and manage data Step 4: Dispose of data By classifying these objects, you can begin to define the rules for managing them at different stages in the information lifecycle. Ask yourself the following questions: Who needs access to archived data and why? How fast do they need it? Do access requirements change as the archives age? How long do we need to keep the archived data? When should it be disposed of or deleted? To effectively define and classify business information for retention and disposal, consider the following best practices. Promote cross-functional ownership. Typically, business units own their data and set the data retention policies, while IT owns the infrastructure and controls data management processes. Accordingly, business managers are responsible for defining who can touch the data and what they can do with it. IT must implement a technology infrastructure that supports these policies. Promoting a cross-functional ownership for archiving, retention and disposal policies provides a great indicator of project success, because then all groups have a vested interest in a positive outcome. These retention policy definitions can then be saved to a glossary to be leveraged throughout the data lifecycle, providing the proper context and metadata to define, manage and validate retention policies. Plan and practice data retention and orderly disposal. After all stakeholders have signed off on the archiving and data retention policies, IT can develop a plan to implement those policies. Consider solutions that manage enterprise-wide retention policies for both structured and unstructured data, supporting the defensible disposal of unneeded information in addition to the retention of information based on its business value, regulatory or legal obligations. Also, think about solutions that generate notification reports and identify which archives are nearing expiration. 8
Step 1: Identify the complete business context Step 2: Classify and define data Step 3: Archive and manage data Step 4: Dispose of data Step 3: Archive and manage data Once the data to be archived has been identified and defined, the complete business context of that data or complete business object can be archived. As indicated in Step 1, this business object represents a historical point-in-time snapshot of a business transaction and includes both transaction details and related master information. After capturing the complete business object, the archive process should also perform the appropriate functional condition checks to identify which specific records in a defined group are safe and appropriate to archive. For example, a customer order should not necessarily be archived just because it is three years old. Before moving to the archive, the order must first be fully paid and posted (see Figure 2). Figure 2. Archiving policies should be able to check for certain conditions before taking action, such as making sure customer orders over three years old have been fully paid and posted before being archived. 9
Step 1: Identify the complete business context Step 2: Classify and define data Step 3: Archive and manage data Step 4: Dispose of data Depending on defined retention policies and condition checks, inactive yet still-valuable data is removed from the production environment and stored as compressed archive files. Compressed data takes up less space in the archive environment and places less of a burden on the production server. Because the complete business object is captured, the archives can serve as an intact, accurate, standalone repository of transaction history. This information can then be queried to respond to customer inquiries or electronic discovery requests without needing to restore back into production or to reference information stored in a separate repository. Step 4: Dispose of data In a business climate conditioned to keep everything forever, the concept of data disposal may seem counterintuitive and daunting. Business executives and IT managers hesitate to delete data for fear of business or legal repercussions. However, it is not only expensive to keep everything forever, it is also risky. Any existing data can become a target for discovery. With a data retention and disposal plan in place, your organization can then confidently execute the deletion of data when retention needs have expired. At first, you might want to begin the delete process manually until deleting expired data becomes a normal practice. Also, consider a solution that lets you verify the data targeted for deletion before running the delete process. Later, you might want to automatically delete expired data. Finally, make sure your solution provides an adequate audit trail so you can verify with your stated deletion policies. 10
Manage and support data retention policies with The Archive solution can help enterprises manage and support data retention policies by archiving historical data and storing that data in its original business context, all while controlling growing data volumes and improving application performance. This approach helps support longterm data retention by archiving data in a way that allows it to be accessed independently of the original application. InfoSphere Optim Archive data growth management capabilities enable you to apply business policies to govern data retention, access and disposal. You can automate data retention to support initiatives and respond quickly and accurately to audit and discovery requests. For organizations leveraging InfoSphere Business Glossary to define and document retention rules for business content, these rules can be easily integrated into InfoSphere Optim Archive. You can also manage data retention policies within InfoSphere Optim or import policies into InfoSphere Optim with solutions such as IBM Global Retention Policy and Schedule Management for better management of data retention and defensible disposal. Applying suitable and secure methods for governance and helps you prevent your information assets from becoming liabilities. Included with InfoSphere Optim Archive Enterprise Edition, InfoSphere Discovery provides a full range of data analysis capabilities to understand where related data resides and bring data clearly into view. Techniques include single-source and cross-source data overlap analysis, advanced matching key discovery, reverse discovery based on transformation logic and more. The relationships identified during the discovery process are then aggregated to create the baseline business for archiving. Organizations can leverage InfoSphere Discovery to help ensure accuracy and completeness, and to speed the successful implementation of data archiving projects. 11
In cases where the originating application has been retired or is not available, InfoSphere Optim offers application-independent access to archived transactions. Users can perform ad hoc searches using InfoSphere Data Explorer (included with InfoSphere Optim Archive Enterprise Edition), providing a quick, web-based search engine to access archived data. In addition, other application-independent ways to access archived data can be used following industry-standard methods such as ODBC/JDBC, XML or SQL, and reporting tools such as IBM Cognos Business Intelligence, SAP Crystal Reports and even Microsoft Excel. InfoSphere Optim supports all leading enterprise databases including IBM DB2, Oracle, Sybase, Microsoft SQL Server, IBM Informix, IBM IMS and IBM Virtual Storage Access Method (VSAM) and all leading operating systems including Microsoft Windows, UNIX, Linux and IBM z/os. Plus, it supports the key enterprise resource planning (ERP) and customer relationship management (CRM) applications in use today: Oracle E-Business Suite, PeopleSoft Enterprise, JD Edwards EnterpriseOne and Amdocs CRM, along with custom and packaged applications. InfoSphere Optim provides the flexibility to manage large volumes of data over long periods of time, allowing you to deploy appropriate data retention policies for managing your valuable application data. 12
IBM Software With 23 colleges on 40 campuses throughout the Commonwealth of Virginia, the Virginia Community College System (VCCS) delivers quality education and workforce training with programs and courses to serve the distinct demands of every region. Students who attend community colleges transition in and out of programs based on their interests and needs. Since 1966, the flexible admission policies at VCCS have allowed students to return at any point in time and continue their education. To support these policies and comply with state law, the VCCS retains all academic records and related information on instructors, classroom scheduling and the use of facilities indefinitely. 13
The challenge VCCS uses PeopleSoft Enterprise Campus Solutions to manage day-to-day academic and business activities to support 373,000 students and 15,000 staff and faculty. However, increasing data volumes were affecting service levels in all aspects of production and operations. VCCS first tried to address the issue by buying more storage, but the team had difficulty keeping up with the growth rate, and the time required to implement and tune the database storage to manage performance. It became clear that enterprise-wide database archiving was necessary to manage data growth while supporting long-term retention needs. The staff then went on to define specific archiving criteria to meet their needs. The solution had to support: Archiving complete historical student records in batches based on the age of the data how long the student has been inactive (versus graduation date) Viewing and accessing the archived data for reporting and research and analysis Processing requests for transcripts against archived student data without having to restore the data to the production environment Selectively restoring a complete record for a single student on demand from the archive The solution Based on these criteria, the college implemented a policy-based archiving solution using solutions and was able to effectively manage data growth, improve service levels and enhance the flexibility with which it stored data. With less data remaining on expensive production-level storage systems, IT staff can manage these systems more efficiently and conduct backups more rapidly. Archiving dormant data helps to increase application performance and improve employee productivity while still supporting data retention and requirements. Frankly, I cannot say enough about how well we partnered with the IBM Optim development team. We knew the areas of Campus Solutions and our data well, but they were great at identifying all the necessary records and fine-tuning the archive criteria. Andy Clark, Technical Lead, Virginia Community College System 14
Now is the time to leverage the power of businessdriven governance solutions to realize measurable business value for your enterprise. To learn more about these strategies, explore the following resources: Analyst webcast: Building an Enterprise-wide Data Archiving Strategy e-book: Business-driven data privacy policies - Establish and enforce enterprise data privacy policies to support and protect sensitive data Solution brief: InfoSphere Optim Archive solution InfoSphere Optim Archive solution web page For more information To learn more about the InfoSphere Optim Archive solution, please contact your IBM representative or IBM Business Partner, or visit the following website: ibm.com/software/products/us/en/infosphereoptim-archive/ Additionally, IBM Global Financing can help you acquire the software capabilities that your business needs in the most cost-effective and strategic way possible. We ll partner with credit-qualified clients to customize a financing solution to suit your business and development goals, enable effective cash management, and improve your total cost of ownership. Fund your critical IT investment and propel your business forward with IBM Global Financing. For more information, visit: ibm.com/financing 15
Copyright IBM Corporation 2013 IBM Corporation Software Group Route 100 Somers, NY 10589 Produced in the United States of America August 2013 IBM, the IBM logo, ibm.com, Cognos, DB2, IMS, Informix, InfoSphere, Optim, and z/os are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data and client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. The client is responsible for ensuring with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in with any law or regulation. 1 Sheila Childs and Alan Dayley, Best Practices for Storage Administrators: Staying Relevant in an Information-Centric Data Center, Gartner, March 2013, www.gartner.com/id=2368715. Please Recycle «HOME IMM14132-USEN-00