Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Similar documents
Overview of Data Warehousing and OLAP

14. Data Warehousing & Data Mining

DATA WAREHOUSING AND OLAP TECHNOLOGY

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Week 3 lecture slides

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

CS2032 Data warehousing and Data Mining Unit II Page 1

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Fluency With Information Technology CSE100/IMT100

Part 22. Data Warehousing

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Week 13: Data Warehousing. Warehousing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehouse: Introduction

Lection 3-4 WAREHOUSING

Data Warehousing and OLAP

Data Warehouses & OLAP

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

OLAP Theory-English version

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data W a Ware r house house and and OLAP II Week 6 1

Chapter 3, Data Warehouse and OLAP Operations

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

DATA WAREHOUSING - OLAP

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

B.Sc (Computer Science) Database Management Systems UNIT-V

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

IST722 Data Warehousing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

UNIT-3 OLAP in Data Warehouse

Why Business Intelligence

OLAP and Data Warehousing! Introduction!

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

CHAPTER 4 Data Warehouse Architecture

When to consider OLAP?

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Module 1: Introduction to Data Warehousing and OLAP

Data W a Ware r house house and and OLAP Week 5 1

Data Warehousing, OLAP, and Data Mining

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Data Warehousing Systems: Foundations and Architectures

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

New Approach of Computing Data Cubes in Data Warehousing

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Advanced Data Management Technologies

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Designing a Dimensional Model

Business Intelligence & Product Analytics

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Turkish Journal of Engineering, Science and Technology

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

INFO 321, Database Systems, Semester

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

A Technical Review on On-Line Analytical Processing (OLAP)

Data Warehousing with Oracle

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Data Warehousing and Data Mining in Business Applications

CHAPTER 5: BUSINESS ANALYTICS

Business Intelligence

DSS based on Data Warehouse

Data Warehousing Concepts

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Business Intelligence. 1. Introduction September, 2013.

QAD Business Intelligence Data Warehouse Demonstration Guide. May 2015 BI 3.11

Data Warehousing Fundamentals Student Guide

Data Warehousing OLAP

SAS BI Course Content; Introduction to DWH / BI Concepts

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

CS54100: Database Systems

A Critical Review of Data Warehouse

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Transcription:

Slide 29-1

Chapter 29 Overview of Data Warehousing and OLAP

Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics of Data Warehouses Classification of Data Warehouses Multi-dimensional Schemas Building a Data Warehouse Functionality of a Data Warehouse Warehouse vs. Data Views Implementation difficulties and open issues Slide 29-3

Purpose of Data Warehousing Traditional databases are not optimized for data access only they have to balance the requirement of data access with the need to ensure integrity of data. Most of the times the data warehouse users need only read access but, need the access to be fast over a large volume of data. Most of the data required for data warehouse analysis comes from multiple databases and these analysis are recurrent and predictable to be able to design specific software to meet the requirements. There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data. The above functionality is achieved by Data Warehousing and Online analytical processing (OLAP) Slide 29-4

Introduction, Definitions, and Terminology W. H Inmon characterized a data warehouse as: A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management s decisions. Slide 29-5

Introduction, Definitions, and Terminology Data warehouses have the distinguishing characteristic that they are mainly intended for decision support applications. Traditional databases are transactional. Applications that data warehouse supports are: OLAP (Online Analytical Processing) is a term used to describe the analysis of complex data from the data warehouse. DSS (Decision Support Systems) also known as EIS (Executive Information Systems) supports organization s leading decision makers for making complex and important decisions. Data Mining is used for knowledge discovery, the process of searching data for unanticipated new knowledge. Slide 29-6

Conceptual Structure of Data Warehouse Data Warehouse processing involves Cleaning and reformatting of data OLAP Data Mining Back Flushing Data Warehouse Databases Cleaning Reformatting Data Metadata OLAP DSSI EIS Data Mining Other Data Inputs Updates/New Data Slide 29-7

Comparison with Traditional Databases Data Warehouses are mainly optimized for appropriate data access. Traditional databases are transactional and are optimized for both access mechanisms and integrity assurance measures. Data warehouses emphasize more on historical data as their main purpose is to support time-series and trend analysis. Compared with transactional databases, data warehouses are nonvolatile. In transactional databases transaction is the mechanism change to the database. By contrast information in data warehouse is relatively coarse grained and refresh policy is carefully chosen, usually incremental. Slide 29-8

Characteristics of Data Warehouses Multidimensional conceptual view Generic dimensionality Unlimited dimensions and aggregation levels Unrestricted cross-dimensional operations Dynamic sparse matrix handling Client-server architecture Multi-user support Accessibility Transparency Intuitive data manipulation Consistent reporting performance Flexible reporting Slide 29-9

Classification of Data Warehouses Generally, Data Warehouses are an order of magnitude larger than the source databases. The sheer volume of data is an issue, based on which Data Warehouses could be classified as follows. Enterprise-wide data warehouses They are huge projects requiring massive investment of time and resources. Virtual data warehouses They provide views of operational databases that are materialized for efficient access. Data marts These are generally targeted to a subset of organization, such as a department, and are more tightly focused. Slide 29-10

Data Modeling for Data Warehouses Traditional Databases generally deal with two-dimensional data (similar to a spread sheet). However, querying performance in a multidimensional data storage model is much more efficient. Data warehouses can take advantage of this feature as generally these are Non volatile The degree of predictability of the analysis that will be performed on them is high. Slide 29-11

Data Modeling for Data Warehouses Example of Two- Dimensional vs. Multi- Dimensional Two Dimensional Model REGION Three dimensional data cube REG1 REG2 REG3 P R O D U C T P123 P124 P125 P126 : : P r o d u c t P123 P124 P125 P126 : : Fiscal Quarter Qtr 1 Reg Qtr 12 Qtr 3 Qtr 4 Reg 2 Reg 3 Region Slide 29-12

Data Modeling for Data Warehouses Advantages of a multi-dimensional model Multi-dimensional models lend themselves readily to hierarchical views in what is known as roll-up display and drill-down display. The data can be directly queried in any combination of dimensions, bypassing complex database queries. Slide 29-13

Multi-dimensional Schemas Multi-dimensional schemas are specified using: Dimension table It consists of tuples of attributes of the dimension. Fact table Each tuple is a recorded fact. This fact contains some measured or observed variable (s) and identifies it with pointers to dimension tables. The fact table contains the data, and the dimensions to identify each tuple in the data. Slide 29-14

Multi-dimensional Schemas Two common multi-dimensional schemas are Star schema: Consists of a fact table with a single table for each dimension Snowflake Schema: It is a variation of star schema, in which the dimensional tables from a star schema are organized into a hierarchy by normalizing them. Slide 29-15

Multi-dimensional Schemas Star schema: Consists of a fact table with a single table for each dimension. Slide 29-16

Multi-dimensional Schemas Snowflake Schema: It is a variation of star schema, in which the dimensional tables from a star schema are organized into a hierarchy by normalizing them. Slide 29-17

Multi-dimensional Schemas Fact Constellation Fact constellation is a set of tables that share some dimension tables. However, fact constellations limit the possible queries for the warehouse. Slide 29-18

Multi-dimensional Schemas Indexing Data warehouse also utilizes indexing to support high performance access. A technique called bitmap indexing constructs a bit vector for each value in domain being indexed. Indexing works very well for domains of low cardinality. Slide 29-19

Building A Data Warehouse The builders of Data warehouse should take a broad view of the anticipated use of the warehouse. The design should support ad-hoc querying An appropriate schema should be chosen that reflects the anticipated usage. Slide 29-20

Building A Data Warehouse The Design of a Data Warehouse involves following steps. Acquisition of data for the warehouse. Ensuring that Data Storage meets the query requirements efficiently. Giving full consideration to the environment in which the data warehouse resides. Slide 29-21

Building A Data Warehouse Acquisition of data for the warehouse The data must be extracted from multiple, heterogeneous sources. Data must be formatted for consistency within the warehouse. The data must be cleaned to ensure validity. Difficult to automate cleaning process. Back flushing, upgrading the data with cleaned data. Slide 29-22

Building A Data Warehouse Acquisition of data for the warehouse (contd.) The data must be fitted into the data model of the warehouse. The data must be loaded into the warehouse. Proper design for refresh policy should be considered. Slide 29-23

Building A Data Warehouse Storing the data according to the data model of the warehouse Creating and maintaining required data structures Creating and maintaining appropriate access paths Providing for time-variant data as new data are added Supporting the updating of warehouse data. Refreshing the data Purging data Slide 29-24

Building A Data Warehouse Usage projections The fit of the data model Characteristics of available resources Design of the metadata component Modular component design Design for manageability and change Considerations of distributed and parallel architecture Distributed vs. federated warehouses Slide 29-25

Functionality of a Data Warehouse Functionality that can be expected: Roll-up: Data is summarized with increasing generalization Drill-Down: Increasing levels of detail are revealed Pivot: Cross tabulation is performed Slice and dice: Performing projection operations on the dimensions. Sorting: Data is sorted by ordinal value. Selection: Data is available by value or range. Derived attributes: Attributes are computed by operations on stored derived values. Slide 29-26

Warehouse vs. Data Views Views and data warehouses are alike in that they both have read-only extracts from the databases. However, data warehouses are different from views in the following ways: Data Warehouses exist as persistent storage instead of being materialized on demand. Data Warehouses are not usually relational, but rather multidimensional. Data Warehouses can be indexed for optimization. Data Warehouses provide specific support of functionality. Data Warehouses deals huge volumes of data that is contained generally in more than one database. Slide 29-27

Difficulties of implementing Data Warehouses Lead time is huge in building a data warehouse Potentially it takes years to build and efficiently maintain a data warehouse. Both quality and consistency of data are major concerns. Revising the usage projections regularly to meet the current requirements. The data warehouse should be designed to accommodate addition and attrition of data sources without major redesign Administration of data warehouse would require far broader skills than are needed for a traditional database. Slide 29-28

Open Issues in Data Warehousing Data cleaning, indexing, partitioning, and views could be given new attention with perspective to data warehousing. Automation of data acquisition data quality management selection and construction of access paths and structures self-maintainability functionality and performance optimization Incorporating of domain and business rules appropriately into the warehouse creation and maintenance process more intelligently. Slide 29-29

Recap Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics of data Warehouses Classification of Data Warehouses Multi-dimensional Schemas Building A Data Warehouse Functionality of a Data Warehouse Warehouse vs. Data Views Implementation difficulties and open issues Slide 29-30