CHAPTER 3. Data Warehouses and OLAP



Similar documents
CHAPTER 4 Data Warehouse Architecture

Data W a Ware r house house and and OLAP Week 5 1

DATA WAREHOUSING AND OLAP TECHNOLOGY

2 Data Warehouse and OLAP Technology for Data Mining What is a data warehouse? Amultidimensional data model... 6

CHAPTER-24 Mining Spatial Databases

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Part 22. Data Warehousing

Data Warehousing and Online Analytical Processing

New Approach of Computing Data Cubes in Data Warehousing

Basics of Dimensional Modeling

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing Systems: Foundations and Architectures

CHAPTER-12. Analytical Characterization : Analysis of Attribute Relevance

14. Data Warehousing & Data Mining

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

DATA WAREHOUSING - OLAP

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

A Critical Review of Data Warehouse

Week 3 lecture slides

Dimensional Modeling for Data Warehouse

Chapter 3, Data Warehouse and OLAP Operations

Overview of Data Warehousing and OLAP

IST722 Data Warehousing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Data Warehousing and Data Mining

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Module 1: Introduction to Data Warehousing and OLAP

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Turkish Journal of Engineering, Science and Technology

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

What is Management Reporting from a Data Warehouse and What Does It Have to Do with Institutional Research?

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Fluency With Information Technology CSE100/IMT100

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

B.Sc (Computer Science) Database Management Systems UNIT-V

Building Data Warehousing and Data Mining from Course Management Systems: A Case Study of FUTA Course Management Information Systems

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Why Business Intelligence

Designing a Dimensional Model

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Lection 3-4 WAREHOUSING

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

University of Gaziantep, Department of Business Administration

Understanding Data Warehousing. [by Alex Kriegel]

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Student Performance Analytics using Data Warehouse in E-Governance System

Implementing Data Models and Reports with Microsoft SQL Server

CASE PROJECTS IN DATA WAREHOUSING AND DATA MINING

Data Warehousing & OLAP

Data Warehousing and OLAP Technology

When to consider OLAP?

A Design and implementation of a data warehouse for research administration universities

Course Design Document. IS417: Data Warehousing and Business Analytics

Methodology Framework for Analysis and Design of Business Intelligence Systems

LEARNING SOLUTIONS website milner.com/learning phone

Presented by: Jose Chinchilla, MCITP

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH CARE

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Data Warehouse: Introduction

Data Warehousing and Data Mining in Business Applications


Speeding ETL Processing in Data Warehouses White Paper

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

BUILDING DATA WAREHOUSING AND DATA MINING FROM COURSE MANAGEMENT SYSTEMS: A

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

Data Warehousing Concepts

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija,

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

How To Model Data For Business Intelligence (Bi)

Data Mining for Knowledge Management. Data Warehouses

Data warehousing and data mining an overview

Business Intelligence. 1. Introduction September, 2013.

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus

Advanced Data Management Technologies

Transcription:

CHAPTER 3 Data Warehouses and OLAP 3.1 Data Warehouse 3.2 Differences between Operational Systems and Data Warehouses 3.3 A Multidimensional Data Model 3.4Stars, snowflakes and Fact Constellations: 3.5 Review Questions 3.6 References

3.Data Warehouses and OLAP 3.1 Data Warehouse A data warehouse provides tools for executives and business managers to systematically organize, understand and use their data to make strategic decisions. It is a must have latest marketing weapon and a way to keep customers, by learning more about their needs. Many possible definitions are there for a data warehouse. A data warehouse is a copy of transaction data specifically structured for query and analysis. Sometimes non-transaction data are stored in a data warehouse-through probably 95-99% of the data usually are transaction data. It is query and analysis because the main output from data warehouse systems are either tabular listings (queries) with minimal formatting or highly formatted formal reports.w.h.inmon a leading architect in the construction of data warehouse systems defines it to be A data ware house is a subject-oriented, integrated and time variant volatile data in collection of data in support of management s decision making process. Subject oriented The data warehouse is organized around major subjects such as a customer, product, supplier, sales etc.data warehousing focuses on modeling and analysis of data for decision-makers. Integrated A data warehouse is made by collecting heterogeneous collection of data, which is, integrated e.g. flat files, relational databases etc.

Time variant Data is stored from a historical perspective (over the last ten years, twenty years etc.). Every key structure in the database has an element of time embedded within it. Non volatile It is data stored that is physically separate from the optional database-it requires only two operations loading and access to data unlike a transaction processing system which requires concurrently control, processing and recovery mechanisms. 3.2 Differences between Operational Systems and Data Warehouses The main feature of an online transaction processing system is its ability to perform transform and query processing. The systems are usually known as On Line Transaction Processing (OLTP) systems. They cover day to day operational and transactional data. Operational databases, historic, support large volumes of data (databases of size above 100 GB). Data warehouse on the other hand serve users or knowledge workers in the role of data analysis and decision making. These systems are known as Online Analytical Processing System (OLAP). This major distinguishing feature between OLAP & OLTP. User and system orientation: Clerks, clients and information technology professionals use OLTP systems and it is customer_ oriented whereas the OLAP is market-oriented used by knowledge workers, analysis and managers. Data contents: An OLTP system manages current data and is not used for decision making purposes. An OLAP manages large amounts of historical data with facilities for Summerton and aggregation. Database design: An OLTP system usually adopts an Entity-relationship model and an application oriented database design. An OLAP system uses a star or a snowflake model.

View: An OLTP system rustics itself to data available within a department or an organization whereas OLAP spans versions of database schemes and it makes use of information that generated from organizations integrating information from many data stores. Access Patterns: Short, atomic transactions are made on OLTP systems and OLAP systems deal with read only operations and complex queries on historical data. 3.3 A Multidimensional Data Model: The model views data in the form of a data cube. OLAP tools are based on multidimensional data model. Data cubes usually model n-dimensional data. From Tables Spreadsheets to Data Cubes A data cube allows data to be modeled and viewed in multiple dimensions. Dimensions are facts that define a data cube. Dimensions are the perspective or entities with respect to which organizations would like to keep records. For example National Bank may create a customer warehouse in order to keep records of the bank s customers with respect to the dimension time, transaction, branch and location. These dimensions allow the bank to keep track of things like monthly transactions, branches and locations where the transactions were made. Each dimension may have a table associated with it, called the dimension table. For example the dimension tables for a transaction might include amount, type of transaction etc. A multidimensional data model is typically organized around a central; theme like transactions. A fact table represents this theme where facts are numerical measures. Facts are usually quantities, which are used for analyzing relationship between dimensions. The fact table contains then names of facts, or measures, as well as keys related dimensions. Although we hand to visualize data cubes three-dimensional geometric structures in the data warehouse the data cube inn n-dimensional.

To gain a better understanding of data cubes let us look at a simple 2-D data cube: a spreadsheet from a ABC company. In particular we would like to look at the ABC Company sales data for items sold per quarter in the city of Hyderabad. These data are shown. 3.1 Table 3.1 Location= Hyderabad Item(type) Time(quarter) Home entertainment Computer Phone Security Q1 605 825 14 400 Q2 680 952 31 512 Q3 812 1023 35 502 Q4 997 1089 48 630 Table 3.2 Location = Chennai location= Bangalore location= Calcutta Item Item Item time Home ent. Comp. Phone sec. Home Ent. Comp. phone sec. Home Ent. Comp. Phone sec. Q1 845 828 90 623 1027 986 34 978 678 825 13 430 Q2 943 870 5 6 645 1130 1024 45 940 645 765 25 421 Q3 768 890 45 6 57 1002 940 54 945 789 876 54 543

Now suppose we would like to view the sales data with a third dimension. For instance, according to time, item as well as location for the cities Calcutta, Bangalore and Chennai. This 3-D data is shown in Table 3.2. Conceptually, we may also represent the same data in the form of a 3-D data cube. Suppose that we would like to view our sales data with the additional fourth dimension, such as supplier. Viewing these 4-D cubes becomes tricky. However, we can think of 4-D cube as being a series of 3-D cubes. In data warehousing literature, the data cube such as of the above is referred to as a cuboids. Given a set of dimensions we can construct a lattice of cuboids, each showing data at a different level of summarization, or group by. The lattice of cuboids is then to as a data cube. 3.4Stars, snowflakes and Fact Constellations: Schemas for Multidimensional Databases Unlike an entity-relationship model used for relational databases a data warehouse requires a concise subject oriented schema that facilities on-line data analysis. The most commonly used data model for a data warehouse is a multidimensional model. Such a model can exist in the form of a star schema, a snowflake schema or a fact constellation schema. Star Schema: The most common modeling paradigm is the star schema, in which the data warehouse contains (1) a large central table (fact table) containing the bulk of data with no redundancy, and (2) a set of smaller attunement tables (dimension tables), one for each dimension.

Snowflake: The snowflake schema is the variant of the star schema model, where some of the dimension tables are normalized, thereby splitting the table into additional tables. The resulting schema graph forms a shape similar or a snowflake. Fact Constellation: Sophisticated applications may require multiple fact tables to share dimension tables. This kind of schema can be viewed as a collection of stars. This kind of schema can be viewed as a collection of stars, and hence is called as a galaxy schema or a fact constellation. Example for defining Star, Snowflake and Fact Constellation Schema Just as we use relational query languages like SQL, a data miming query language can be used to query a data-mining task DMQL, whi9ch contains language primitives for defining data warehouse and data marts. Data warehouse and data marts can be defined using two language primitives, one for cube definition and another for dimension definition. The cube definition has the following syntax: Define cube <cube_name> [(dimensional list)]:<measure list> The dimension definition has the following syntax: Define dimension<dimension_name> as (<attribute or sub-dimension list>)

3.5 Review questions 1 Expalin about Data Warehouse 2 list Differences between Operational Systems and Data Warehouses 3 Expalin abouta Multidimensional Data Model 4 Discus about Stars, snowflakes and Fact Constellations: 3.6 References [1]. Data Mining Techniques, Arun k pujari 1 st Edition [2].Data warehousung,data Mining and OLAP, Alex Berson,smith.j. Stephen [3].Data Mining Concepts and Techniques,Jiawei Han and MichelineKamber [4]Data Mining Introductory and Advanced topics, Margaret H Dunham PEA [5] The Data Warehouse lifecycle toolkit, Ralph Kimball Wiley student Edition