OLAP
Learning Objectives Definition of OLAP Data cubes OLAP operations MDX OLAP servers 2
What is OLAP? OLAP has two immediate consequences: online part requires the answers of queries to be fast, the analytical part is a hint that the queries itself are complex Complex questions with Fast Answers! 3
Why OLAP? Empowers end-users to do own analysis Increased productivity of business end-users and consequently the entire organization Frees up IT of report requests Reduced backlog of applications development for IT staff by making end-users self-sufficient enough to build their own models No knowledge of tables or SQL required 4
OLAP Applications Marketing: Market research analysis, sales forecasting, promotions analysis, customer analysis, and market/customer segmentation. Sales: Sales analysis and sales forecasting. Finance: Budgeting, activity-based costing, financial performance analysis, and financial modeling. Manufacturing: Production planning and defect analysis. 5
OLAP Clients Visualization OLAP capabilities Interactive manipulation 6
Excel as OLAP Client 7
Learning Objectives Definition of OLAP Data cubes OLAP operations MDX OLAP servers 8
From Tables and Spreadsheets to Data Cubes OLAP is based on a multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item (item_name, brand, type), or time (day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables 9
Representing Multi- Dimensional Data Example of two-dimensional query. What is the total revenue generated by property sales in each city, in each quarter? Compare representation: three-field relational table versus two-dimensional matrix. 10
Multi-Dimensional Data as Three-Field Table versus Two-Dimensional Matrix 11
Representing Multi- Dimensional Data Example of three-dimensional query. What is the total revenue generated by sales for each type of property (Flat or House) in each city, in each quarter? Compare representation: four-field relational table versus threedimensional cube. 12
Multi-Dimensional Data as Four-Field Table versus Three-Dimensional Cube 13
Example: 3-d data cube 14
Definition of Cube The data cube summarizes the measure with respect to a set of n dimensions and provides summarizations for all subsets of them product chairs tables desks shelves boards ALL year 1999 2000 2001 2002 ALL 25 37 89 21 172 10 30 0 45 85 56 84 9 35 184 19 20 0 71 110 5 16 11 15 47 115 187 109 187 598 Data cube 15
Cube as set of cuboids The most detailed part of the cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube. product chairs tables desks shelves boards ALL year 1999 2000 2001 2002 ALL 25 37 89 21 172 10 30 0 45 85 56 84 9 35 184 19 20 0 71 110 5 16 11 15 47 115 187 109 187 598 base cuboid Data cube apex cuboid 16
Cube as set of cuboids all product date country product,date product,country date, country 0-D(apex) cuboid 1-D cuboids 2-D cuboids product, date, country 3-D(base) cuboid 17
Example: Cube and cuboids color, size : DIMENSIONS count : MEASURE size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 18
Ex: Cube and cuboids color, size : DIMENSIONS count : MEASURE size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 19
Ex: Cube and cuboids color, size : DIMENSIONS count : MEASURE size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 20
Ex: Cube and cuboids color, size : DIMENSIONS count : MEASURE size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 21
Ex: Cube and cuboids color, size : DIMENSIONS count : MEASURE size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 22
Ex: Cube and cuboids color, size : DIMENSIONS count : MEASURE size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 DataCube 23
Learning Objectives Definition of OLAP Data cubes OLAP operations MDX OLAP servers 24
Typical OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes. Other operations drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its backend relational tables 25
Example of operations on a Datacube size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 26
Roll-up Roll-up: In this example we reduce one dimension It is possible to climb up one hierarchy Example (product, city) (product, country) size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 27
Drill-down Drill-down In this example we add one dimension It is possible to climb down one hierarchy Example (product, year) (product, month) size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 28
Slice Slice: Perform a selection on one dimension size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 29
Dice Dice: Perform a selection on two or more dimensions size φ color C / S S M L TOT Red 20 3 5 28 Blue 3 3 8 14 Gray 0 0 5 5 color; size TOT 23 6 18 47 30
Slice/Dice Easy terms compared to Select-Where in SQL customers store 31
Learning Objectives Definition of OLAP Data cubes OLAP operations MDX OLAP servers 32
Multidimensional Expressions (MDX) Microsoft SQL Server OLAP Services provides an architecture for access to multidimensional data For expressing queries to this data, OLAP employs a full-fledged, highly functional expression syntax: Multidimensional EXpressions (MDX) OLAP Services supports MDX functions as a full language implementation for creating and querying cube data 33
MDX Introductory Tutorial Dimensions used in the examples Dimension Name Product Hierarchy Product Family Product Department Product Category Product Subcategory Brand Name Product Name Description The products that are on sale in the FoodMart stores Promotions Promotion Name Identifies promotion that triggered the Sale Store Customer Store Country Store State Store City Store Name Country, State or Province, City, Name Geographical hierarchy for different stores in the chain (country, state, city) Geographical hierarchy for registered customers Time Years, Quarters, Months Time period when the sale was made 34
MDX Introductory Tutorial Outline of an expression returning two cube dimensions SELECT axis specification ON COLUMNS, axis specification ON ROWS FROM cube_name WHERE slicer_specification axis specification: members of a dimension (all levels of hierarchy) If a single dimension: COLUMNS must be returned For more dimensions, the named axes would be PAGES, CHAPTERS and, finally, SECTIONS WHERE clause is actually optional and acts as slicer specification 35
MDX Introductory Tutorial 2 dimensions -or levels of a dimension- and slice SELECT NON EMPTY {[Store Type].MEMBERS} ON COLUMNS, NON EMPTY {[Store].[Store State].MEMBERS} ON ROWS FROM [Sales] WHERE (Measures.[Sales Average], [Time].[1997]) 36
MDX Introductory Tutorial Query top-n in list SELECT Measures.[Profit] ON COLUMNS, TOPCOUNT([Store].[Store City].MEMBERS, 5, Measures.[Profit]) ON ROWS FROM [Sales] TOPCOUNT(set, count, numeric_expression) 37
Learning Objectives Definition of OLAP Data cubes OLAP operations MDX OLAP servers 38
Conceptual vs. Actual The cube is a logical way of visualizing the data in an OLAP setting Not how the data is actually represented Two opposite ways of storing data: ROLAP: Relational OLAP MOLAP: Multidimensional OLAP 39
OLAP Server Architectures Relational OLAP (ROLAP) Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services greater scalability Multidimensional OLAP (MOLAP) Array-based multidimensional storage engine (sparse matrix techniques) Pre-calculating cuboids (space overhead) fast indexing to pre-computed summarized data Hybrid OLAP (HOLAP) User flexibility, e.g., low level: relational, high-level: array 40
ROLAP ROLAP supports RDBMS products through the use of an application logic layer To improve performance, some ROLAP products have enhanced SQL engines to support the complexity of multi-dimensional analysis The development issues associated with ROLAP technology: Performance problems associated with the processing of complex queries that require multiple passes through the relational data. Development of middleware to facilitate the development of multi-dimensional applications. 41
MOLAP MOLAP tools use specialized data structures and multi-dimensional database management systems (MDDBMS) to organize, navigate, and analyze data. MOLAP data structures use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. The development issues associated with MOLAP: Only a limited amount of data can be efficiently stored and analyzed. MOLAP products require a different set of tools to build and maintain the database. OLAP, by Dr. Khalil 42
ROLAP vs. MOLAP Performance: How fast will the system appear to the end-user? MOLAP vendors believe this is a key point in their favor Data volume and scalability: While MOLAP servers can handle up to 100GB of storage, ROLAP servers can handle hundreds of gigabytes and terabytes 43
ROLAP vs. MOLAP 44
HOLAP HOLAP tools deliver selected data directly from DBMS or via MOLAP server Is the fastest-growing type of OLAP tools. 45
HOLAP Partial materialization 2 n views for n dimensions (nohierarchies) Storage/updatetime explosion More precomputation doesn t mean better performance!!!! 46