When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP technology to analyze your data? OR the Reporting tool that can connect to the database is sufficient for a business user. Read this article to get the answers for these questions before you spend lots of money on buying an OLAP tool and then realize you really don t need the OLAP tool to extract the information from the data. Intellectual Property / Copyright Material All text and graphics found in this article are the property of the Evaltech, Inc. and cannot be used or duplicated without the express written permission of the corporation through the Office of Evaltech, Inc. Evaltech, Inc. Copyright 2008 Page 1 of 6
When should we consider implementing OLAP technology? Basically, any business process that requires us to analyze (roll up, drill down etc.) transactional data across a variety of categories is an excellent application of OLAP technology. Akey tenet of OLAP is that users should see consistent response times for each view, or slice, of the data they request. Because data is collected at the detail level only, the information summary usually computed in advance. These precomputed values, or aggregations, are the basis of the OLAP performance gains. OLAP services include a middle tier server that allows users to perform sophisticated analysis on large volumes of data with exceptional performance. Another feature of OLAP services is PivotTable service, which allows users to conduct analyses while disconnected from the corporate network. OLAP services organize data from a data warehouse into multidimensional cubes with pre calculated summary information to provide a answers to complex analytical queries. OLAP services can access source data in any supported OLE DB data provider, which including not only SQL server but also a large number of desktop and server databases including Microsoft Access, Microsoft, FoxPro, Oracle, Sybase and Informix. Performance of OLAP depends up on these things Aggregations Materializing aggregations usually lead to a faster query response since we probably need to do less work to answer a request for cell values. Partitions Partitions give you the ability to choose different storage strategies to optimize the tradeoff between processing and querying performance. Data slices on partitions Setting a data slice is an efficient way to avoid querying irrelevant partitions. Among the key features of OLAP services are Ease of use provides by user interface wizards A flexible, robust data model for cube definition and storage Write enabled cubes for what if scenarios analysis Scalable architecture that provides a variety of storage scenarios and an automated solution to the data-explosion syndrome that plagues traditional OLAP technologies Integration of administration tools, security, data sources and client/server caching. Widely supported APIs and open architecture to support custom applications Evaltech, Inc. Copyright 2008 Page 2 of 6
OLAP and Data Warehouse Data Warehouse When should we consider a data warehousing solution? When users are requesting access to large amounts of historical information for reporting purposes, we should strongly consider a warehouse or mart. The user will be benefit when the information is organized in an efficient manner for this type of access. A data warehouse is often used as the basis for a decision support system. It is designed to overcome problems encountered when an organization attempts to perform strategic analysis using the same database that is used for online transaction processing (OLTP).Data warehouses are read-only, integrated databases designed to answer comparative and what if questions. OLAP is a key component of data warehousing, and OLAP Services provides essential functionality for a wide array of applications ranging from reporting to advanced decision support. Unlike OLTP systems that store data in a highly normalized fashion, the data in the data warehouse is stored in a very de normalized manner to improve query performance. Data warehouses often use star and snowflake schemas to provide the fastest possible response times to complex queries, and the basis for aggregations managed by OLAP tools. Difficulties often encountered when OLTP databases are used for online analysis include the following: Analysts do not have the technical expertise required to create ad hoc queries against the complex data structure. Analytical queries that summarize large volumes of data adversely affect the ability of the system to respond to online transactions. System performance when responding to complex analysis queries can be slow or unpredictable, providing inadequate support to online analytical users. Constantly changing data interferes with the consistency of analytical information. Evaltech, Inc. Copyright 2008 Page 3 of 6
Security becomes more complicated when online analysis is combined with online transaction processing. Data warehousing provides one of the keys to solving these problems, organizing data for the purpose of analysis. Data warehouses: Can combine data from heterogeneous data sources into a single homogenous structure. Organize data in simplified structures for efficiency of analytical queries rather than for transaction processing. Contain transformed data that is valid, consistent, consolidated, and formatted for analysis. Provide stable data that represents business history. Are updated periodically with additional data rather than frequent transactions. Simplify security requirements. Provide a database organized for OLAP rather than OLTP. Data Warehousing Architecture Two basic types of data warehouse architecture exist: enterprise data warehouses and data marts. The enterprise data warehouse contains enterprise-wide information integrated from multiple operational data sources for consolidated data analysis. The data mart contains a subset of enterprise-wide data that is built for use by an individual department or division in an organization. Data Granularity A data warehouse typically stores data in different levels of granularity or summarization, depending on the data requirements of the business. If an enterprise needs data to assist strategic planning, then only highly summarized data is required. The lower the level of granularity of data required by the enterprise, the higher the number of resources (specifically data storage) required to build the data warehouse. The different levels of summarization in order of increasing granularity are: Current operational data Historical operational data Aggregated data Metadata Evaltech, Inc. Copyright 2008 Page 4 of 6
The components of schema design are dimensions, keys, and fact and dimension tables. Fact tables o Contain data that describes a specific event within a business, such as a bank transaction or product sale. Alternatively, fact tables can contain data aggregations, such as sales per month per region. Except in cases such as product or territory realignments, existing data within a fact table is not updated; new data is simply added. o Because fact tables contain the vast majority of the data stored in a data warehouse, it is important that the table structure be correct before data is loaded. Expensive table restructuring can be necessary if data required by decision support queries is missing or incorrect. o The characteristics of fact tables are: Many rows; possibly billions Primarily numeric data; rarely character data. Multiple foreign keys (into dimension tables). Static data. Dimension tables o Contain data used to reference the data stored in the fact table, such as product descriptions, customer names and addresses, and suppliers. Separating this verbose (typically character) information from specific events, such as the value of a sale at one point in time, makes it possible to optimize queries against the database by reducing the amount of data to be scanned in the fact table. o Dimension tables do not contain as many rows as fact tables, and dimensional data is subject to change, as when a customer s address or telephone number changes. Dimension tables are structured to permit change. o The characteristics of dimension tables are: Fewer rows than fact tables; possibly hundreds to thousands. Primarily character data. Multiple columns that are used to manage dimension hierarchies. One primary key (dimensional key). Updatable data. Dimensions o Are categories of information that organize the warehouse data, such as time, geography, organization, and so on. Dimensions are usually hierarchical in that one member may be a child of another member Dimensional keys o Are unique identifiers used to query data stored in the central fact table Evaltech, Inc. Copyright 2008 Page 5 of 6
Changes in the Data Warehouse Data is usually added periodically to the data warehouse to include more recent information about the organization s business activities. Changes to data already in the data warehouse are less frequent and usually made only to incorporate corrections to errors discovered in the source from which the data was extracted, or to restructure data due to organizational changes. Structural changes to the data warehouse design typically are the least common. Referential integrity must be maintained when data warehouse data is added, changed, or deleted. Loss of referential integrity can cause errors during cube processing, fact table records to be bypassed, or result in inaccurate OLAP information. Creating the informational data, that is, the data warehouse, from the operational systems is a key part of the overall data warehousing solution. Building the informational database is done with the use of transformation or propagation tools. These tools not only move the data from multiple operational systems, but often manipulate the data into a more appropriate format for the warehouse. This could mean: The creation of new fields that are derived from existing operational data Summarizing data to the most appropriate level needed for analysis Denormalizing the data for performance purposes Cleansing of the data to ensure that integrity is preserved Evaltech, Inc. Copyright 2008 Page 6 of 6