Week 3 lecture slides



Similar documents
DATA WAREHOUSING - OLAP

DATA WAREHOUSING AND OLAP TECHNOLOGY

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Multi-dimensional index structures Part I: motivation

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Data W a Ware r house house and and OLAP II Week 6 1

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Week 13: Data Warehousing. Warehousing

OLAP and Data Warehousing! Introduction!

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

14. Data Warehousing & Data Mining

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Part 22. Data Warehousing

Basics of Dimensional Modeling

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Overview of Data Warehousing and OLAP

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

CHAPTER 4 Data Warehouse Architecture

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Introduction to Data Warehousing. Ms Swapnil Shrivastava

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Monitoring Genebanks using Datamarts based in an Open Source Tool

Data Warehousing and OLAP Technology for Knowledge Discovery

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

IST722 Data Warehousing

CS2032 Data warehousing and Data Mining Unit II Page 1

Hybrid OLAP, An Introduction

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

Optimizing Your Data Warehouse Design for Superior Performance

Data Warehouse design

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Data Warehousing and OLAP

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Data Warehousing. Paper

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Module 1: Introduction to Data Warehousing and OLAP

Advanced Data Management Technologies

Data Warehousing and Online Analytical Processing

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

SAS BI Course Content; Introduction to DWH / BI Concepts

Data Warehousing, OLAP, and Data Mining

Building Data Cubes and Mining Them. Jelena Jovanovic

OLAP Theory-English version


A Technical Review on On-Line Analytical Processing (OLAP)

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Chapter 3, Data Warehouse and OLAP Operations

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

New Approach of Computing Data Cubes in Data Warehousing

Data Warehousing and Data Mining

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Data Warehouse: Introduction

What is OLAP - On-line analytical processing

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Data Mart/Warehouse: Progress and Vision

Chapter 3 - Data Replication and Materialized Integration

B.Sc (Computer Science) Database Management Systems UNIT-V

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Data Warehousing Systems: Foundations and Architectures

When to consider OLAP?

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

Fluency With Information Technology CSE100/IMT100

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

BUILDING OLAP TOOLS OVER LARGE DATABASES

CHAPTER 3. Data Warehouses and OLAP

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehousing Concepts

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Business Intelligence, Analytics & Reporting: Glossary of Terms

Data W a Ware r house house and and OLAP Week 5 1

Data Warehouses & OLAP

A Critical Review of Data Warehouse

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Transcription:

Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3

Data Warehouses A data warehouse is a collection of data specifically designed for data mining activities. One such data mining activity is Online Analytical Processing (OLAP). OLAP is interactive analysis of multidimensional data stored as data cubes.

OLAP: Online Analytical Processing OLAP is a decision support system commonly associated with Data Mining OLAP supports interactive complex queries on aggregations of data: e.g. sum, average, count Conceptually, aggregate data is stored as data cubes - also known as a Multidimensional Database. Data Cubes are pre-calculated from individual data records Pre-calculation solves the slow response time normally expected from complex query execution

Information Reporting Systems vs OLAP Routine or on-demand operational reports are fixed. An ad-hoc report would require software development Database query systems provide ad-hoc reporting but are inefficient (slow) for complex querying OLAP queries on precalculated data cubes handle a range of complex queries in reasonable response time.

Data Mining and OLAP Data Mining investigates data in order to discover actionable information OLAP provides different summarized views of data and is therefore a data mining tool, aiding the discovery of useful information Data Mining/OLAP synergy examples: OLAP may identify anomalies for further investigation by other data mining techniques The attributes used in the upper nodes of decision trees are chosen because they are the most predictive. Hence they would be good choices for Data Cube dimensions The break points used on continuous data type attributes (e.g. age < 20) indicate effective bin sizes for continuous OLAP dimensions

Normalized Database Schema Normalized data is the standard Relational Database schema technique Normalization is designed to eliminate duplicated (redundant) information rather than for speed of access It is not suited to OLAP

Star Schema OLAP systems favours schemas that optimize access time. The star schema has a fact table and connected dimension tables The dimensions are chosen according to anticipated OLAP queries, e.g. For movies seen more than 5 times, how often was each seen? For what movies is the average viewer aged over 30? How many people of each gender have been to each cinema? The fact table contains whatever key attributes make a fact (record) unique.

Other OLAP Schemas The snowflake schema uses fact and dimension tables, but normalizes the dimensions. That is, dimension tables are split into separate tables. The fact constellation schema has multiple fact tables that share dimension tables.

Aggregation As well as fact data, dimension tables also require aggregate data Aggregations are chosen in anticipation of queries and may be partially precalculated. Partial aggregations are stored by dimensions, in data cubes

Data Cubes Conceptually, aggregated data is in the form of an n-cube, where n is the number of dimensions. The size of each dimension determines the number of subcubes. E.g. 2 cinemas, 2 genders and 8 movies makes 2 by 2 by 8 = 32 subcubes

Data Cubes Each subcube contains specific information. For example, the subcube at the bottom left front is: cid=1,(cname=belgrave) mid=1,(mname=moulin Rouge) Gender=Male contains aggregated data : # of viewers=56, Sum of ages of viewers=1456

OLAP Aggregate Queries Data Cubes support aggregate queries by dimension, e.g. How many people of each gender have been to each cinema? Could be answered by a table such as

OLAP Aggregate Queries Tables are calculated by taking sections of the cube along the required dimensions. Then calculating using the information in the subcubes.

Multiple Views To solve a query like - How many people have seen Moulin Rouge? We view the data cube by movie and calculate using the 4 relevant subcubes To solve a query like - What is the ratio of attendance at the same movie at the two different cinemas? We view the data cube by Movie and by Cinema. Thus storing partial aggregate data in subcubes supports multiple views of the aggregate data.

Concept Hierarchies Concept Hierarchies provide summarization at different levels of a dimension. For example: The Time dimension might be: Year -> Quarter -> Month -> Week -> Day Example two: A location dimension might be: Country -> Region -> City So that we could view data summarised at each of these levels of detail.

OLAP Operations Rollup View data in more summarised form Drilldown View data in more detailed form Slice View data along part of one dimension Dice View data along parts of two or more dimensions Pivot View data from different orientations

Rollup Rollup is an OLAP tool that effectively summarizes the view by combining subcubes. Example: We are currently viewing the data by cinema and by gender A summarization is to view the data regardless of gender. That is, gender is combined Rollup decreases the level of detail provided in the view.

Drilldown Drilldown is an OLAP tool that expands the view by splitting along a dimension. It is the opposite of rollup Example: We are currently viewing the data by cinema. We can drilldown to a view the data that includes gender. That is, the data cube is split along the gender dimension. Drilldown increases the level of detail provided in the view. The most detailed view is individual records (the fact table).

Rollup / Drilldown Through Concept Hierarchies Rollup can also collapse data to higher levels of the concept hierarchy. Drilldown expands to lower levels in the concept hierarchy.

Slice A slice is a selection on one dimension of a cube Example: The cube with Cinema=Belgrave

Dice A Dice is a section of the cube, for example: Total people who have been to see Moulin Rouge and The Well.

Pivot A pivot is the same cube viewd from a different orientation. Example: or Cinema by Gender Gender by Cinema

Data Warehouse Tools Data Warehouse Systems may include tools to support warehouse setup, maintenance and usage (data mining) Back-end Tools: for extraction, cleaning, transformation, refreshing etc Front-end Tools: To support specific tasks, e.g multidimensional views (aggregations by attributes) rollup (summarize) drilldown (detail) Extended SQL queries, e.g. statistical analysis (mean, standard deviation..) time window operations (moving average..) comparison operations

Data Hierarchy Metadata is the logical view The Database Schema is the way the data is physically stored Transformed data is the data configured for data mining purposes Source data is the corporate operational data

Metadata Metadata is data about data. The metadata contains the Business Model. The description of the data that is presented to the users - entities, relationships, attributes. The users view may be different for different user groups Administration Model. How the data is derived - the source, extraction method, required transformation, when it should be updated, the current status (current, out-ofdate..), user authorization and access control, where the data is stored. Operational Model. Information about the usage of data - usage statistics, error reports, audit trails.

Data Warehouse Architectures Middleware is an interfacing system to allow user access to disparate source systems The Data Warehouse system may be a large Database System with an extended query language to support data mining

ROLAP versus MOLAP ROLAP (Relational Online Analytical Processing) is OLAP on a standard relational database platform Star schemas are used to support OLAP operations Standard SQL is used to generate views The Decision Support Environment takes advantage of well developed security, concurrent and maintenance features of relational database technology BUT..relational databases come with overhead designed to support OLTP, that can make OLAP activities inefficient. MOLAP (Multidimensional OLAP) is a designed specifically for OLAP and not based on a relational database Aggregated data is stored in multidimensional array structures. Specialized tools are used to generate views

Data Marts Data Marts, mini data warehouses designed for specific groups of users (e.g. Departments). Supports different logical views among different user groups Keeps each data warehouse of manageable proportions Increased efficiency/response time as operations are on smaller data warehouses Not all source system data is relevant to all user groups.

Multilayered Architecture A multilayered architecture incorporates most architectural features Source extraction is a regular process (as new data is gathered) Data mining also creates data that can be stored for future use. The Central Repository is ideally scalable so as to grow as data and user demand requires. It may use tertiary storage technology and parallel processing units to serve data to the data marts

Example: Indonesian Ministry of Education Subgroups of the Ministry with overlapping interests Directorate of Primary Education Directorate of General Secondary Education Directorate of Private Schooling Directorate of Vocational Education Directorate of Higher Education Coordinating Body for Private Universities General Directorate of Informal Education, Youth and Sport Recommended architecture: Integrated Data Marts forming a Distributed Data Warehouse. Standardised client-server interface for all Data Marts Deborah Wyburn Decision Support Systems in the MOEC, Indonesia, Master of Computing Studies Thesis, 2001