Data Warehousing & OLAP

Similar documents
Overview of Data Warehousing and OLAP

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data W a Ware r house house and and OLAP Week 5 1

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING - OLAP

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Lection 3-4 WAREHOUSING

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Lecture 2: Introduction to Business Intelligence. Introduction to Business Intelligence

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

14. Data Warehousing & Data Mining

Data Warehousing and OLAP Technology for Knowledge Discovery

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Week 13: Data Warehousing. Warehousing

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

Part 22. Data Warehousing

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Data W a Ware r house house and and OLAP II Week 6 1

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Indexing Techniques for Data Warehouses Queries. Abstract

Data Warehousing and OLAP Technology

Chapter 3, Data Warehouse and OLAP Operations

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Data Warehousing and Data Mining

Data Warehousing and Online Analytical Processing

B.Sc (Computer Science) Database Management Systems UNIT-V

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

IST722 Data Warehousing

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data Mining for Knowledge Management. Data Warehouses

Data Warehousing Systems: Foundations and Architectures

Data Warehousing and OLAP

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

CHAPTER 3. Data Warehouses and OLAP

Fluency With Information Technology CSE100/IMT100

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Lecture 2 Data warehousing

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Why Business Intelligence

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Module 1: Introduction to Data Warehousing and OLAP

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Multi-dimensional index structures Part I: motivation

2 Data Warehouse and OLAP Technology for Data Mining What is a data warehouse? Amultidimensional data model... 6

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

CS54100: Database Systems

Business Intelligence & Product Analytics

Week 3 lecture slides

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Foundations of Business Intelligence: Databases and Information Management

Data Warehousing and Data Mining in Business Applications

OLAP and Data Warehousing! Introduction!

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Implementing Data Models and Reports with Microsoft SQL Server

Data Mining for Successful Healthcare Organizations

Turkish Journal of Engineering, Science and Technology

Data Warehousing, OLAP, and Data Mining

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Designing a Dimensional Model

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

When to consider OLAP?

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Basics of Dimensional Modeling

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Business Intelligence, Analytics & Reporting: Glossary of Terms

BUILDING A WEB-ENABLED DATA WAREHOUSE FOR DECISION SUPPORT IN CONSTRUCTION EQUIPMENT MANAGEMENT

Advanced Data Management Technologies

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

A Critical Review of Data Warehouse

A Technical Review on On-Line Analytical Processing (OLAP)

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Data Warehouse: Introduction

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Data Warehousing and Data Mining Introduction

Transcription:

Data Warehousing & OLAP

Motivation: Business Intelligence Customer information (customer-id, gender, age, homeaddress, occupation, income, family-size, ) Product information (Product-id, category, manufacturer, made-in, stockprice, ) Sales information (customer-id, product-id, #units, unit-price, salesrepresentative, ) Business queries: Which categories of products are most popular for customers in Vancouver? Find pairs (customer groups, most popular products) Jian Pei: Big Data Analytics -- Multidimensional Analysis 2

In what aspect is he most similar to cases of coronary artery disease and, at the same time, dissimilar to adiposity? Symptoms: overweight, high blood pressure, back pain, short of breadth, chest pain, cold sweat J. Pei: Finding Outstanding Aspects and Contrast Subspaces 3

Don t You Ever Google Yourself? Big data makes one know oneself better 57% American adults search themselves on Internet Good news: those people are better paid than those who haven t done so! (Investors.com) Egocentric analysis becomes more and more important with big data J. Pei: Finding Outstanding Aspects and Contrast Subspaces 4

Egocentric Analysis How am I different from (more often than not, better than) others? In what aspects am I good? http://img03.deviantart.net/a670/i/2010/219/a/e/glee egocentric_by_gleeondoodles.jpg J. Pei: Finding Outstanding Aspects and Contrast Subspaces 5

Dimensions An aspect or feature of a situation, problem, or thing, a measurable extent of some kind Dictionary Dimensions/attributes are used to model complex objects in a divide-and-conquer manner Objects are compared in selected dimensions/ attributes More often than not, objects have too many dimensions/attributes than one is interested in and can handle Jian Pei: Big Data Analytics -- Multidimensional Analysis 6

Multi-dimensional Analysis Find interesting patterns in multi-dimensional subspaces Michael Jordan is outstanding in subspaces (total points, total rebounds, total assists) and (number of games played, total points, total assists) Different patterns may be manifested in different subspaces Feature selection (machine learning and statistics): select a subset of relevant features for use in model construction a set of features for all objects Different subspaces may manifest different patterns Jian Pei: Big Data Analytics -- Multidimensional Analysis 7

OLAP Conceptually, we may explore all possible subspaces for interesting patterns What patterns are interesting? How can we explore all possible subspaces systematically and efficiently? Fundamental problems in analytics and data mining Jian Pei: Big Data Analytics -- Multidimensional Analysis 8

OLAP Aggregates and group-bys are frequently used in data analysis and summarization SELECT time, altitude, AVG(temp) FROM weather GOUP BY time, altitude; In TPC, 6 standard benchmarks have 83 queries, aggregates are used 59 times, group-bys are used 20 times Online analytical processing (OLAP): the techniques that answer multi-dimensional analytical (MDA) queries efficiently Jian Pei: Big Data Analytics -- Multidimensional Analysis 9

OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction (Day, Store, Product type, SUM(sales) à (Month, City, *, SUM(sales)) Drill down (roll down): reverse of roll-up, from higher level summary to lower level summary or detailed data, or introducing new dimensions Jian Pei: Big Data Analytics -- Multidimensional Analysis 10

Roll Up http://www.tutorialspoint.com/dwh/images/rollup.jpg Jian Pei: Big Data Analytics -- Multidimensional Analysis 11

Drill Down http://www.tutorialspoint.com/dwh/images/drill_down.jpg Jian Pei: Big Data Analytics -- Multidimensional Analysis 12

Other Operations Dice: pick specific values or ranges on some dimensions Pivot: rotate a cube changing the order of dimensions in visual analysis http://en.wikipedia.org/wiki/file:olap_pivoting.png Jian Pei: Big Data Analytics -- Multidimensional Analysis 13

Dice http://www.tutorialspoint.com/dwh/images/dice.jpg Jian Pei: Big Data Analytics -- Multidimensional Analysis 14

Relational Representation If there are n dimensions, there are 2 n possible aggregation columns Roll up by model by year by color in a table Jian Pei: Big Data Analytics -- Multidimensional Analysis 15

Difficulties Many group bys are needed 6 dimensions à 2 6 =64 group bys In most SQL systems, the resulting query needs 64 scans of the data, 64 sorts or hashes, and a long wait! Jian Pei: Big Data Analytics -- Multidimensional Analysis 16

Dummy Value ALL Jian Pei: Big Data Analytics -- Multidimensional Analysis 17

CUBE SALES Model Year Color Sales Chevy 1990 red 5 Chevy 1990 white 87 Chevy 1990 blue 62 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 blue 49 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 blue 71 Ford 1990 red 64 Ford 1990 white 62 Ford 1990 blue 63 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 blue 55 Ford 1992 red 27 Ford 1992 white 62 Ford 1992 blue 39 SELECT Model, Year, Color, SUM(sales) AS Sales FROM Sales WHERE Model in {'Ford', 'Chevy'} AND Year BETWEEN 1990 AND 1992 GROUP BY CUBE(Model, Year, Color); CUBE DATA CUBE Model Year Color Sales Chevy 1990 blue 62 Chevy 1990 red 5 Chevy 1990 white 95 Chevy 1990 ALL 154 Chevy 1991 blue 49 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 ALL 198 Chevy 1992 blue 71 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 ALL 156 Chevy ALL blue 182 Chevy ALL red 90 Chevy ALL white 236 Chevy ALL ALL 508 Ford 1990 blue 63 Ford 1990 red 64 Ford 1990 white 62 Ford 1990 ALL 189 Ford 1991 blue 55 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 ALL 116 Ford 1992 blue 39 Ford 1992 red 27 Ford 1992 white 62 Ford 1992 ALL 128 Ford ALL blue 157 Ford ALL red 143 Ford ALL white 133 Ford ALL ALL 433 ALL 1990 blue 125 ALL 1990 red 69 ALL 1990 white 149 ALL 1990 ALL 343 ALL 1991 blue 106 ALL 1991 red 104 ALL 1991 white 110 ALL 1991 ALL 314 ALL 1992 blue 110 ALL 1992 red 58 ALL 1992 white 116 ALL 1992 ALL 284 ALL ALL blue 339 ALL ALL red 233 ALL ALL white 369 ALL ALL ALL 941 Jian Pei: Big Data Analytics -- Multidimensional Analysis 18

Semantics of ALL ALL is a set Model.ALL = ALL(Model) = {Chevy, Ford } Year.ALL = ALL(Year) = {1990,1991,1992} Color.ALL = ALL(Color) = {red,white,blue} Jian Pei: Big Data Analytics -- Multidimensional Analysis 19

OLTP Versus OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date, detailed, flat relational Isolated usage repetitive ad-hoc access read/write, index/hash on prim. key historical, summarized, multidimensional integrated, consolidated lots of scans unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response Jian Pei: Big Data Analytics -- Multidimensional Analysis 20

What Is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management s decision-making process. W. H. Inmon Data warehousing: the process of constructing and using data warehouses Jian Pei: Big Data Analytics -- Multidimensional Analysis 21

Subject-Oriented Organized around major subjects, such as customer, product, sales Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Providing a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process Jian Pei: Big Data Analytics -- Multidimensional Analysis 22

Integrated Integrating multiple, heterogeneous data sources Relational databases, flat files, on-line transaction records Data cleaning and data integration Ensuring consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted Jian Pei: Big Data Analytics -- Multidimensional Analysis 23

Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems Operational databases: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse contains an element of time, explicitly or implicitly But the key of operational data may or may not contain time element Jian Pei: Big Data Analytics -- Multidimensional Analysis 24

Nonvolatile A physically separate store of data transformed from the operational environment Operational updates of data do not occur in the data warehouse environment Do not require transaction processing, recovery, and concurrency control mechanisms Require only two operations in data accessing Initial loading of data Access of data Jian Pei: Big Data Analytics -- Multidimensional Analysis 25

Why Separate Data Warehouse? High performance for both Operational DBMS: tuned for OLTP Warehouse: tuned for OLAP Different functions and different data Historical data: data analysis often uses historical data that operational databases do not typically maintain Data consolidation: data analysis requires consolidation (aggregation, summarization) of data from heterogeneous sources Jian Pei: Big Data Analytics -- Multidimensional Analysis 26

To-Do List Read Section 4.1 Jian Pei: CMPT 741/459 Data Warehousing and OLAP (1) 27