WHITE PAPER Reducing Dormant Data: 7 Tips for Delivering Data Warehouse Performance and Cost Savings
Reducing Dormant Data Minimizing dormant data reduces system costs and improves performance, service levels, and IT staff productivity Defining Dormant Data Studies show that much of the data loaded into data warehouses and analytical application databases is dormant; that is, it is infrequently used or never used. Unlike OLTP databases, data warehouses continuously collect and store detailed and summary historical information for business analysis. Frequently, data warehouses include information to satisfy unknown requirements and data is included that may or may not be used. These databases expand significantly over time as new information is added from internal and external data sources. Dormant data can take various forms. One kind of dormant data evolves when historical data is maintained beyond its useful life in the database. This information accumulates much like geological layers buried deep in the earth, hidden and unused. A second form of dormant data develops when data elements thought to be relevant initially are included in the data warehouse but in practice are not useful to business analysis. A third type of dormant data is summary data that is created over time but no longer used. Summary tables can grow to be a huge percentage of the overall data warehouse. And finally, a fourth kind of dormant data stems from the disuse of detailed data over time as users find summary level information more useful. Dormant Data Users Database Estimating Dormant Data Bill Inmon, a noted data warehouse expert, states that as warehouses grow, the ratio of dormant data to total data increases dramatically. He asserts that dormant data may be as much as 65%- 70% in data warehouses that are a terabyte or greater in size. Inmon recommends a simple formula for calculating the data dormancy ratio: the number of queries per year times the average amount of data per query divided by total data warehouse space. While this ratio may be high since it does not consider that some queries inevitably use the same data, it does provide a rule of thumb for making ballpark estimates. 1 Teleran White Paper Reducing Dormant Data to Improve Performance and Reduce Cost
Identifying and Reducing Dormant Data with Query Monitoring Armed with this estimate, you can get a sense for the magnitude of performance improvements and systems savings that can be generated by reducing dormant data. But, how do you actually identify dormant data? In his book, Data Warehouse Performance, Bill Inmon writes, "Understanding that there is dormant data in a data warehouse is one thing. Finding the dormant data is another matter altogether. The best way to find the dormant data is to monitor the endusers' query activity against the data warehouse... the monitor sits between the end-users' query activity and the data warehouse server." An effective means of capturing end user queries and database usage is Teleran s isight usage monitor. isight identifies and reports on dormant data through its comprehensive and continuous profile of all SQL application queries against relational database objects including tables, columns, rows, views, stored procedures, and indexes. isight accomplishes this without requiring any database agents, traces, or monitors that consume a significant portion of database resources. Dormant Data Users isight Usage Monitor Database Benefits of Reducing Dormant Data Lower Costs, Better Performance The server resources and disk storage space consumed by loading and storing dormant data can be very large. Minimizing this dormant data enables organizations to recover significant server and storage resources and dramatically increase service levels by reducing database load-time windows. It also delivers two additional and important benefits: reduced DBA effort in maintaining your databases, and improved query performance. Studies show that the amount of data administration is directly related to database size. The smaller the database, the less effort and expense to maintain it. Also, with less data to process, most data-intensive end user queries will run faster. The following case studies show how these benefits translate directly into improved business performance and reduced costs. 2 Teleran White Paper Reducing Dormant Data to Improve Performance and Reduce Cost
Dormant Data Case Study 1 Saving $800,000 in distribution costs by reducing data load time One Teleran customer, a global office products company, reduced their terabyte size data warehouse by more than 30% using isight query monitoring to identify and eliminate dormant data. The data reduction allowed them to recover almost one-third of their disk storage and decrease their daily load-time window by 30%. The shorter load time enabled the company to increase availability by 1½ hours each day. The business impact of their increased information availability is significant. Each morning, this company s customer service reps must handle millions of dollars of return products. By having return goods and new order information 1½ hours earlier, the service reps can now arrange for these return goods to be shipped directly to another customer before those orders must be filled from a company warehouse. This avoids the extra expense of having to ship the return goods back to the warehouse. By providing this critical information 1½ hours earlier, the company was able to reduce shipping expenses by over $800,000 in the first 12 months. This company s first year return on investment in isight was over 800%. These returns were achieved by identifying the following information: More than one-third of the database tables being loaded daily had not been accessed during the past three months. By storing these tables off-line, the nightly load was reduced by 600 million rows. Of the remaining tables, most contained columns that had not been used in three months or more. Removal of these unused columns further reduced load time and storage requirements. 20% of all indexes had not been used in the past three months and could also be dropped. Because database indexes take time and resources to build and maintain, additional resource savings contributed to the overall improvement in availability and service levels realized from dormant data reduction. 3 Teleran White Paper Reducing Dormant Data to Improve Performance and Reduce Cost
Dormant Data Case Study 2 $120,000 saved in server and storage costs in two months Another Teleran customer, a food manufacturing and distribution company, saved over $120,000 in its first two months of using isight query monitoring. This company s data warehouse had grown beyond 1 terabyte with the addition of several new subject areas. In order to meet its nightly batch load service level agreement, the company planned to upgrade its server CPU at a cost of $60,000 to process the additional data. However, after reviewing isight dormant data reports, the company learned that a large portion of their data was looked at by users on a weekly level, not daily. By loading that portion of the data once a week summarized at the weekly level, they were able to significantly reduce nightly load volumes. This enabled them to meet their service level without having to upgrade their server processor. Avoiding the server upgrade generated $60,000 in immediate savings. After looking at additional isight usage reports, the company was able to eliminate 200 gigabytes of unused data from their database. Specifically, the company identified tables, columns, and views that had not been accessed for more than eight months. In addition, by looking at row level usage reports, they determined that historical data prior to 2000 was rarely accessed and could be removed from the data warehouse and archived. Eliminating the 200 gigabytes reduced their disk storage capacity requirement by 20% and saved them an incremental $60,000 in planned disk storage upgrade costs. From its investment in isight this company achieved: An initial savings of $120,000 Payback in less than 2 months 160% ROI in 2 months Seven Steps to Reducing Dormant Data Based on Teleran s real world experience with a wide range of organizations, we have identified seven proven steps to help organizations identify and reduce the amount of dormant data contained in their data warehouses. Following these steps will enable you and your organization to enjoy the benefits of performance and productivity improvements, as well as operational cost savings. 4 Teleran White Paper Reducing Dormant Data to Improve Performance and Reduce Cost
Step 1 Assess Dormant Data at a Table Level The most logical place to begin identifying data that is infrequently or never used is to look at table usage over a particular time period. Deciding on the appropriate time period requires some judgement based on knowledge of how tables are or were intended to be used. If a company s business is heavily seasonal, such as soft drink companies which generate a large portion of their annual sales in the summer, it may make sense to evaluate data usage over at least a one year period. If your business is relatively consistent across business periods, you may be able to apply a shorter period of time. The isight Table Usage Summary report below identifies database table dormancy by reporting on when database tables were first and then last accessed over a specified time period. Step 2 Evaluate Dormant Data at the Column Level Once you have established an understanding of your table usage, both active and dormant, the next step is to review column usage within your active tables over your relevant time period. This sample isight report details dormant columns by table, reporting on when columns were first and then last accessed. Note the last three columns were never accessed in a twelve month period. 5 Teleran Solutions Reducing Dormant Data to Improve Performance and Reduce Cost
Step 3 Identify Dormant Columns As you continue your dormant data evaluation, it is helpful to run usage reports that show only the dormant database objects. The sample isight report below identifies only those columns that have not been accessed by users over a twelve month period. In addition, this report indicates whether or not the dormant column is indexed. As indexes also take up database space, eliminating unused indexed columns enables you to reduce your database size even more. Step 4 Assess Unused or Infrequently Used Views Database views can add materially to database volume and should be taken seriously in the dormant data evaluation of your data warehouse. Because views often are created for individual users or specific analyses, they can easily fall into disuse as people change jobs or as analytical and reporting requirements change. The following isight report example reveals which views have not been utilized in the twelve month period and which views should be considered active. 6 Teleran Solutions Reducing Dormant Data to Improve Performance and Reduce Cost
Step 5 Identify Dormant Columns within Views In your assessment of column usage, it is important to remember to identify columns that can be accessed within views. This isight usage report shows what columns within particular views have not been accessed. In this case, none of the columns have been accessed over a twelve month period and most, if not all, are good candidates for elimination. Step 6 Evaluate Dormant Stored Procedures Much like views, stored procedures are often designed for very specific reports or applications. As conditions change, stored procedures become dormant, accumulating over time and increasing overall data warehouse size. The following isight report shows stored procedure usage over a twelve month period and confirms that there are a number of unused stored procedures that probably should be deleted. 7 Teleran Solutions Reducing Dormant Data to Improve Performance and Reduce Cost
Step 7 - Assess Dormant Data at the Row Level Eliminating or archiving unused row level data can materially reduce database size; the challenge is finding it. Identifying row level data usage requires a deeper analysis of the monitored SQL queries than the database object level analyses described above. Row level data is generally specified in a query by the predicate in the where clause. Predicate values can, for example, be dates or date ranges, specific product codes, or geographic areas. The isight report below reveals row level dormant data by identifying the number of times a predicate exists in the database as well as how many times it is accessed. In this case the predicate, MA (as in Massachusetts), exists 1808 times in the database, but is never specified. It is most likely a good candidate for archiving or deletion. Summary Reducing Dormant Data Improves Performance and Reduces Costs The steps to evaluating and reducing dormant data with Teleran isight are relatively easy and offer a large payoff. Identifying what tables, columns, indexes, views, stored procedures and rows are not being used can dramatically reduce the size of your database. Minimizing dormant data on an ongoing basis enables you to generate immediate, measurable returns and clear cost justification through hardware server and storage savings. By reducing load times, improvements in data availability yield quantifiable business benefits including lower operating costs and increased revenue generation. And finally, speeding query times by minimizing the overall size of the data warehouse enables you to improve business productivity while reducing IT overhead. Sources: Data Warehouse Performance, Wiley, 1999, by Inmon, Rudin, Buss and Sousa 8 Teleran Solutions Reducing Dormant Data to Improve Performance and Reduce Cost
Teleran Technologies is the leading provider of software for managing business intelligence (BI) activity in data warehouses, CRM, supply chain and analytical applications. Through end-to-end knowledge of the BI environment users, queries, applications and databases Teleran software aligns IT processes with business needs, reducing costs and improving performance and productivity. isight continuously profiles application performance and the use of corporate data enterprise-wide, helping IT to better understand, manage, and secure BI activity. iguard controls queries and users to ensure that all BI applications are performing optimally, improving resource efficiency and reducing system costs. Automated Helpdesk guides users with real-time messages, maximizing user performance and productivity while reducing helpdesk calls and support costs. Teleran Technologies, Inc. PO Box 667 Roseland, NJ 07068 973.439.1820 Phone 973.439.1821 Fax info@teleran.com www.teleran.com Service Level Manger automatically maintains service levels over time by generating predictive iguard query performance policies as usage patterns and system resources change. Teleran s Access Architecture enables these products to install quickly and operate continuously on the network without degrading database or application performance. Founded in 1996, Teleran pioneered the concept of BI activity monitoring and management for data warehouses and analytic applications with its patented policy engine and management process. Today the company provides solutions for many of the world s leading companies, including Allstate, Aventis, Ernst & Young, Gordon Food Service, Horizon Blue Cross Blue Shield, JPMorgan Chase, Merrill Lynch, MetLife, State of Texas, Sun Microsystems, Unisys and Wells Fargo. 2005 Teleran Technologies, Inc. All rights reserved. Teleran and the Teleran logo are registered trademarks and isight, iguard, Discovery, Automated Helpdesk, Access Architecture, and InfoUse Knowledge Base are trademarks of Teleran Technologies, Inc. All other names are the property of their respective owners. SO1204.3 9 Teleran Solutions Reducing Dormant Data to Improve Performance and Reduce Cost