Data Warehousing OLAP



Similar documents
Data Warehousing & Data Mining

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Data W a Ware r house house and and OLAP II Week 6 1

TUTORIAL: Introduction to Multidimensional Expressions (MDX)

DATA WAREHOUSING - OLAP

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Building Data Cubes and Mining Them. Jelena Jovanovic

CS2032 Data warehousing and Data Mining Unit II Page 1

OLAP Systems and Multidimensional Queries II

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

A Technical Review on On-Line Analytical Processing (OLAP)

DATA WAREHOUSING AND OLAP TECHNOLOGY

Monitoring Genebanks using Datamarts based in an Open Source Tool

OLAP Systems and Multidimensional Expressions II

BUILDING OLAP TOOLS OVER LARGE DATABASES

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Introduction to Data Warehousing. Ms Swapnil Shrivastava

When to consider OLAP?

Analytics with Excel and ARQUERY for Oracle OLAP

Basics of Dimensional Modeling

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Business Intelligence & Product Analytics

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Week 3 lecture slides

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Cognos 8 Best Practices

Overview of Data Warehousing and OLAP

CHAPTER 4 Data Warehouse Architecture

Oracle OLAP 11g and Oracle Essbase

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

SQL Server Analysis Services Complete Practical & Real-time Training

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

UNIT-3 OLAP in Data Warehouse

Data Warehouse design

Implementing Data Models and Reports with Microsoft SQL Server

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

OLAP Systems and Multidimensional Expressions I

Week 13: Data Warehousing. Warehousing

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Designing a Dimensional Model

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

Creating BI solutions with BISM Tabular. Written By: Dan Clark

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Business Intelligence, Analytics & Reporting: Glossary of Terms

ULTIMATE GUIDE. Jaspersoft OLAP 4.0

SQL Server Administrator Introduction - 3 Days Objectives

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

MS 50511A The Microsoft Business Intelligence 2010 Stack

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Implementing Data Models and Reports with Microsoft SQL Server

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Oracle OLAP What's All This About?

Multi-dimensional index structures Part I: motivation

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Creating Hybrid Relational-Multidimensional Data Models using OBIEE and Essbase by Mark Rittman and Venkatakrishnan J

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

PREFACE INTRODUCTION MULTI-DIMENSIONAL MODEL. Chris Claterbos, Vlamis Software Solutions, Inc.

Open Source Business Intelligence Intro

Data Warehousing and OLAP

SQL Server 2012 Business Intelligence Boot Camp

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

What is OLAP - On-line analytical processing

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

8. Business Intelligence Reference Architectures and Patterns

The Art of Designing HOLAP Databases Mark Moorman, SAS Institute Inc., Cary NC

University of Gaziantep, Department of Business Administration

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

Microsoft Excel 2010 Pivot Tables

Data Testing on Business Intelligence & Data Warehouse Projects

Building Cubes and Analyzing Data using Oracle OLAP 11g

Data Warehouse: Introduction

The Benefits of Data Modeling in Business Intelligence

Hybrid OLAP, An Introduction

SQL SERVER TRAINING CURRICULUM

IST722 Data Warehousing

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Transcription:

Data Warehousing OLAP

References Wei Wang. A Brief MDX Tutorial Using Mondrian. School of Computer Science & Engineering, University of New South Wales. Toon Calders. Querying OLAP Cubes. Wolf-Tilo Balke, Silviu Homoceanu. Data Warehousing & Data Mining, part 6: OLAP Operations & Queries. 2

Outline OLAP Operations Roll-up, Drill-down, Slice and dice, Pivot OLAP Data Visualization From OLAP Operations to the Data MDX Motivation Basic Concepts Advanced Operations FILTER, ORDER, HEAD, TOPCOUNT, CROSSJOIN, NONEMPTY, WITH MEMBER 3

Data Warehouse Queries DW queries are big queries Imply a large portion of the data Mostly read queries Redundancy a necessity Materialized views, special-purpose indexes, de-normalized schemas Data is refreshed periodically Daily or weekly Their purpose is to analyze data OLAP (OnLine Analytical Processing) OLAP usage fields Management Information Sales per product group / area / year Government Population census Scientific databases Geo-, Bio-Informatics Goal: Response time of seconds / few minutes 4

OLAP Operations Roll-up Drill-down Slice and dice Pivot (rotate) 5

Roll-up Taking the current aggregation level of fact values and doing a further aggregation Summarize data by Climbing up hierarchy (hierarchical roll-up) By dimensional reduction Used for obtaining an increased generalization E.g., from Time.Week to Time.Year 6

Hierarchical roll-ups 7

Dimensional roll-ups 8

Drill-down 9

Roll-up Drill-down Example 10

Slice 11

Slice 12

Dice 13

Dice 14

Pivot 15

Pivot 16

Pivot 17

Typical Analytical Requests OLAP operations are hard to express in query languages Most analysts and decision makers won t enjoy it SELECT f.region, z.month, sum(a.price * a.volume) FROM Order a, Time z, PoS f WHERE a.pos = f.name AND a.date = z.date GROUP BY f.region, z.month OLAP clients allow operations to be performed through GUIs 18

OLAP Data Visualization How do these operations look like for the user? 2 dimensions is trivial E.g. Products by Store 19

OLAP Data Visualization 3 dimensions: we can visualize sold quantity on 3 dimensions as layers Another way is by nesting on the same axis 20

OLAP Data Visualization OLAP reporting has to be very flexible The IBM Infosphere - OLAP web based report 21

OLAP Data Visualization Drill-down operation Can be performed easy by going down on the hierarchy and choosing the granularity 22

OLAP Data Visualization Trends Visualization With the help of charts 23

From Presentation to Data Client/server architecture The client displays reports and allows interaction with the end user to perform the OLAP operations and other custom queries The server is responsible for providing the requested data. How? It depends on whether it is MOLAP, ROLAP, HOLAP, etc. 24

OLAP Server High-capacity, multi-user data manipulation engine specifically designed to support and operate on multidimensional data structures Optimized for fast, flexible calculation and transformation of raw data based on formulaic relationships Either Physically stage the processed multidimensional information to deliver consistent and rapid response times to end users (MOLAP) Store data in relational databases and simulate multidimensionality with special schemas (ROLAP) Or offer a choice of both (HOLAP) 25

Getting from OLAP Operations to the Data 26

Typical OLAP Queries The idea is to Select by Attributes of Dimensions region = Europe Group by Attributes of Dimensions region, month, quarter Aggregate on measures sum(price * volume) OLAP queries in SQL SELECT d1.x, d2.y, d3.z, sum(f.t1), avg(f.t2) FROM Fact f, Dim1 d1, Dim2 d2, Dim3 d3 WHERE a < d1.field < b AND d2.field = c GROUP BY d1.x, d2.y, d3.z; 27

MDX - MultiDimensional expressions Developed by Microsoft Not really brilliant But adopted by major OLAP providers due to Microsoft's market leader position Used in specifications and an industry standards for multidimensional data processing OLE DB for OLAP (ODBO) with API support XML for Analysis (XMLA): specification of web services for OLAP Supported by many data warehousing systems MS SQL Server SAS OLAP Server drivers for MDX for Oracle OLAP For ROLAP to support MDX, it is usually translated into SQL 28

Motivation 29

Pivot and Unpivot 30

MDX Syntax Similar to SQL syntax SELECT {Germany, Niedersachsen, Bayern, Frankfurt} ON COLUMNS, {Qtr1.CHILDREN, Qtr2, Qtr3} ON ROWS FROM SalesCube WHERE (Measures.Sales, Time.[2011], Products.[All Products]); SELECT Dimensions, on columns and rows FROM Data source cube specification If joined, data cubes must share dimensions WHERE Slicer - restricts the data area Specifies the measures to return 31

MDX Syntax 32

Example 33

Example: Cross Tabulation 34

Basic Concepts 35

Tuples Tuple is a combination of members from one or more dimensions When a tuple has more than one dimension, it has only one member from each dimension ([Customer].[Chicago, IL], [Time].[Jan, 2005], [Time].[Feb, 2005]) not valid Any dimensions can be part of a tuple, including measures () is not a valid tuple Tuple can not be composed from other tuples ([Time].[2004], ([Customer].[Chicago, IL], [Product].[Tools])) not valid In calculations and queries, cell identification is based on tuples when a tuple is used in an expression where a number or string might be used, the default behavior is to reference the value in the cell that the tuple specifies ([Product].[Leather Jackets], [Time].[June-2005], [Store].[Fifth Avenue NYC], [Measures].[Dollar Sales]) may define a cell with a value of $13,000 36

Axis Specification 37

Axis Specification 38

Sets of Members 39

FoodMart 40

CHILDREN and MEMBERS 41

Slicer Dimension vs. Filter 42

ORDER (I) 43

ORDER (II) Example of result SELECT { [Measures].[Dollar Sales] } on columns, Order ( [Product].[Product Category].Members, [Measures].[Dollar Sales], BDESC)) on rows FROM [Sales] WHERE [Time].[2004] 44

HEAD 45

TOPCOUNT, TOPPERCENT, TOPSUM 10 products that have the highest Internet Sales Amount SELECT {[Measures].[Internet Sales Amount]} ON COLUMNS, TOPCOUNT([Product].[Product].Members,10, [Measures].[Internet Sales Amount]) ON ROWS FROM [Adventure Works] Top 10 percent of Products with the highest Sales Amount SELECT {[Measures].[Sales Amount]} ON COLUMNS, TOPPERCENT([Product].[Product].[Product].Members, 10,[Measures].[Sales Amount]) ON ROWS FROM [Adventure Works] A set of Products with the highest values whose cumulative Sales Amount total is greater than or equal to 10,000,000 SELECT {[Measures].[Sales Amount]} ON COLUMNS, TOPSUM([Product].[Product].[Product].Members, 10000000, [Measures].[Sales Amount]) ON ROWS FROM [Adventure Works] 46

CROSSJOIN SELECT CROSSJOIN ( { [Time].[Q1, 2005], [Time].[Q2, 2005]}, { [Measures].[Dollar Sales], [Measures].[Unit Sales] } ) ON COLUMNS, { [Product].[Tools], [Product].[Toys] } ON ROWS FROM Sales 47

NON EMPTY (I) SELECT [Time].[1997].CHILDREN ON COLUMNS, CROSSJOIN([Store].[Store State].MEMBERS, [Product].[Product Family].MEMBERS) ON ROWS FROM [Sales] WHERE (Measures.[Profit]) Problem with the query: many members in the ROW axis is empty, hence many empty rows It needs a simple filtering - removing empty members from the axis Solution: NON EMPTY (CROSSJOIN( ) ) 48

NON EMPTY (II) SELECT { [Time].[Jan,2005], [Time].[Feb,2005] } ON COLUMNS, NON EMPTY { [Product].[Toys], [Product].[Toys].Children } ON ROWS FROM Sales WHERE ([Measures].[Dollar Sales], [Customer].[TX]) 49

[Advanced MDX].[Calculate Member] 50