OLAP Business Intelligence OLAP definition & application Multidimensional data representation 1
Business Intelligence Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that provide advanced analytical capabilities. There are two main types of access tools available to meet this demand, namely Online Analytical Processing (OLAP) and data mining. 2
Business Intelligence OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies. An environment that includes a data warehouse (or more commonly one or more data marts) together with tools such as OLAP and /or data mining are collectively referred to as Business Intelligence (BI) technologies. BI = DW + OLAP BI = DW + DM 3
OLAP definition & application The dynamic synthesis, analysis, and consolidation of large volumes of multidimensional data, Codd (1993). Describes a technology that uses a multi-dimensional view of aggregate data to provide quick access to strategic information for the purposes of advanced analysis. 4
OLAP definition & application Can easily answer who? and what? questions, however, ability to answer what if? and why? type questions distinguishes OLAP from generalpurpose query tools. Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling. 5
OLAP definition & application Examples of OLAP applications 6
OLAP definition & application Although OLAP applications are found in widely divergent functional areas, they all have the following key features: multi-dimensional views of data support for complex calculations time intelligence 7
OLAP - multi-dimensional views of data Core requirement of building a realistic business model. Provides basis for analytical processing through flexible access to corporate data. The underlying database design that provides the multi-dimensional view of data should treat all dimensions equally. 8
OLAP - support for complex calculations Must provide a range of powerful computational methods such as that required by sales forecasting, which uses trend algorithms such as moving averages and percentage growth. Mechanisms for implementing computational methods should be clear and non-procedural. 9
OLAP time intelligence Key feature of almost any analytical application as performance is almost always judged over time. Time hierarchy is not always used in the same manner as other hierarchies. Concepts such as year-to-date and period-over-period comparisons should be easily defined. 10
Representation of Multidimensional Data Example of two-dimensional query. What is the total revenue generated by property sales in each city, in each quarter of 2004? Choice of representation is based on types of queries end-user may ask. Compare representation - three-field relational table versus twodimensional matrix. 11
Multi-dimensional Data as Three-field table versus Twodimensional Matrix Pearson Education Limited 1995, 2005 12
Representation of Multidimensional Data Example of three-dimensional query. What is the total revenue generated by property sales for each type of property (Flat or House) in each city, in each quarter of 2004? Compare representation - fourfield relational table versus three-dimensional cube. 13
Multi-dimensional Data as Four-field Table versus Threedimensional Cube Pearson Education Limited 1995, 2005 14
Representation of Multidimensional Data A sample data cube 15
Representation of Multidimensional Data Cube represents data as cells in an array. Relational table only represents multidimensional data in two dimensions. Multi-dimensional structures are best visualized as cubes of data, and cubes within cubes of data. Each side of a cube is a dimension. A cube can be expanded to include other dimensions. 16
Representation of Multidimensional Data As number of dimensions increases, number of the cube s cells increases exponentially. However, majority of multidimensional queries use summarized, high-level data. Solution is to pre-aggregate (consolidate) all logical subtotals and totals along all dimensions. 17
Representation of Multidimensional Data Pre-aggregation is valuable, as typical dimensions are hierarchical in nature. (e.g. Time dimension hierarchy - years, quarters, months, weeks, and days) Predefined hierarchy allows logical preaggregation and, conversely, allows for a logical drill-down. 18
Representation of Multidimensional Data Supports common analytical operations Consolidation Drill-down Slicing and dicing 19
Representation of Multidimensional Data Pivoting select two dimensions used to aggregate a measure e.g., the Sales for a Product in a particular City Product City Date Sales Juice NY 10 NY CA MN Cola CA 20 Juice 40 x x Juice NY 30 Cola 10 20 x Cola NY 10 Cream x x 10 Cream MN 10 20
Representation of Multidimensional Data Rollup taking a current data and group-by further on another dimension e.g., rollup the total sales on city by product Drill-down 21
22
23
Representation of Multidimensional Data Slice_and_dice reducing the dimensionality of data the table (city, date) of a specific product 24
25
26
Representation of Multidimensional Data Consolidation - aggregation of data such as simple roll-ups or complex expressions involving inter-related data. Drill-Down - is the reverse of consolidation and involves displaying the detailed data that comprises the consolidated data. 27
Representation of Multidimensional Data Slicing and Dicing - (also called pivoting) refers to the ability to look at the data from different viewpoints. 28
Representation of Multidimensional Data Can store data in a compressed form by dynamically selecting physical storage organizations and compression techniques that maximize space utilization. Dense data (that is, data that exists for a high percentage of cells) can be stored separately from sparse data (that is, a significant percentage of cells are empty). 29
Representation of Multidimensional Data Ability to omit empty or repetitive cells can greatly reduce the size of the cube and the amount of processing. Allows analysis of exceptionally large amounts of data. 30
Representation of Multidimensional Data In summary, pre-aggregation, dimensional hierarchy, and sparse data management can significantly reduce the size of the cube and the need to calculate values onthe-fly. Removes need for multi-table joins and provides quick and direct access to arrays of data, thus significantly speeding up execution of multi-dimensional queries. 31