CHAPTER : 5. MODELS & HYBRID METHODS



Similar documents
Data Warehousing Concepts

Building a Hybrid Data Warehouse Model

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Part 22. Data Warehousing

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Week 3 lecture slides

Chapter 7 Multidimensional Data Modeling (MDDM)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Dimensional Modeling for Data Warehouse

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

When to consider OLAP?

The Data Warehouse ETL Toolkit

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Databases in Organizations

8902 How to Generate Universes from SAP Sybase PowerDesigner. Revision:

A Design and implementation of a data warehouse for research administration universities

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

Lection 3-4 WAREHOUSING

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

CHAPTER 3. Data Warehouses and OLAP

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Data Warehouse design

IST722 Data Warehousing

A Critical Review of Data Warehouse

SAP BO Course Details

Fluency With Information Technology CSE100/IMT100

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

DATA WAREHOUSING AND OLAP TECHNOLOGY

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

OLAP and Data Warehousing! Introduction!

SQL Server 2012 Business Intelligence Boot Camp

Distance Learning and Examining Systems

Basics of Dimensional Modeling

Data Warehousing Systems: Foundations and Architectures

University of Gaziantep, Department of Business Administration

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Methodology Framework for Analysis and Design of Business Intelligence Systems

LEARNING SOLUTIONS website milner.com/learning phone

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Multi-Dimensional Modeling with BI. A background to the techniques used to create BI InfoCubes

Optimizing Your Data Warehouse Design for Superior Performance

Jet Data Manager 2012 User Guide

Dimensional Data Modeling for the Data Warehouse

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

DATA WAREHOUSING - OLAP

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing. Paper

Data Warehousing with Oracle

SAS BI Course Content; Introduction to DWH / BI Concepts

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Performance Enhancement Techniques of Data Warehouse

Budgeting and Planning with Microsoft Excel and Oracle OLAP

Database Design Patterns. Winter Lecture 24

Implementing Data Models and Reports with Microsoft SQL Server

Chapter 3 - Data Replication and Materialized Integration

Archiving the insurance data warehouse

A Service-oriented Architecture for Business Intelligence

Chapter 3 Data Warehouse - technological growth

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Turkish Journal of Engineering, Science and Technology

Dimensional Modeling 101. Presented by: Michael Davis CEO OmegaSoft,LLC

6NF CONCEPTUAL MODELS AND DATA WAREHOUSING 2.0

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Module 1: Introduction to Data Warehousing and OLAP

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

B.Sc (Computer Science) Database Management Systems UNIT-V

Data Warehousing and Data Mining

Oracle OLAP 11g and Oracle Essbase

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

The Oracle Enterprise Data Warehouse (EDW)

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Transcription:

CHAPTER : 5. MODELS & HYBRID METHODS 5.1 Data Warehouse Modeling Data warehouse modeling is the process of designing the schemas of the detailed and summarised data of the data warehouse. The aim of data warehouse modeling is to design a schema representing the reality, or at least a part of the reality, which the data warehouse is required to support. Data warehouse modeling is an important stage of building a data warehouse for two main reasons. Firstly, through the schema, data warehouse users have the ability to visualise the relationships among the warehouse data, so as to use them with greater ease. Secondly, a well designed schema allows an effective data warehouse architecture to emerge, to help reduce the cost of implementing the warehouse and improve the efficiency of using it. Data modeling in data warehouses is rather different from data modeling in operational database systems. The main functionality of data warehouses is to support DSS processes. Thus, the aim of data warehouse modeling is to make the data warehouse efficiently support complex queries on long term information. In contrast, data modeling in operational database systems focuses on efficiently supporting simple transactions in the database such as retrieving, inserting, deleting and changing data. Moreover, data warehouses are designed for users with general information knowledge about the enterprise, whereas operational data base systems are more oriented toward use by software specialists for creating specific applications. 80

Modeling warehouse data requires information about both the source data and the target warehouse data. The source data can be treated as inputs which are transformed into the target warehouse data. How this transformation happens is required to be reflected in data warehouse modeling. Multidimensional data modeling is a commonly used technique to conceptual and visualize schemas by using the major components of the business, such as customers, products, services, prices and sales. This data modeling technique is especially used for summarising and rearranging data and presenting views of the data to support DSS. Particularly, multidimensional data modeling focuses on numeric data such as sales, counts, balances and costs. In multidimensional data modeling, the data warehouse is designed to collect f acts on one or more measures, each measure depending on a set of dimensions. For example, a sales measure may depend on three dimensions: products, times and locations. Facts are collections of related data items, which are stored within fact tables in the data warehouse. Dimensions are collections of the items of one component of the business, such as the products dimension, the times dimension and the locations dimension for sales. The items of a dimension are stored within a dimension table in the data warehouse. The primary key of a fact table is a concatenation of the primary keys of one or more dimension tables. Thus, every row in the fact table is associated with one and only one row from each dimension table. Measures are the non key attributes of fact tables, and they represent information relating to the dimensions key attributes of the fact table. 81

The non key attributes of a dimension table may be organised as a dimension hierarchy. For example, the times dimension may consist of the dates, months and weeks attributes; the products dimension may consist of the category, model and producer attributes; and the locations dimension may consist of the city, region and country attributes. There are two kinds of schemas used in multidimensional data modeling: star schemas and snowflake schemas. A star schema typically has one fact table, and a set of smaller tables.the links between the primary keys of the fact table and the foreign keys in the dimension tables can be visualized as a radial pattern with the fact table in the middle. The dimension tables may contain data redundancies. For example, in the dimension table Locations(LocationID, Address, City, Region, Country), the City and Region information may be repeatedly stored for the locations in the same cities. This kind of data redundancy incurs storage overheads and may lead to update anomalies and poor update performance. If necessary, snow flake schemas can be used to avoid such data redundancies. A snow flake schema is the result of normalizing the dimensions of a star schema, n which there are links between primary keys and foreign keys of tables in the dimension hierarchy. Fully normalizing the dimension tables may not be necessary in a data warehouse environment. Since there are generally no updates occurring to individual rows in the dimension tables, although new rows may be added when the data warehouse is refreshed with new data, and existing rows may be deleted. when the data warehouse is purged of out of date data, the issue of update anomalies and poor update performance will generally not arise in the data warehouse. 82

In addition, the storage consumption of the data warehouse is dominated by the fact tables and the space saved by normalizing the dimension tables would generally be comparatively small. Moreover, un normalized dimension tables can reduce the time required to combine information in the fact table with dimension information, which is a main performance criterion of a data warehouse. 5.2 Types of models Figure 5.1. Relational model POLICY ID TERRITORY CAT CLASS SCOPE IND COVERAGE DEDUCTIBL PERS LIMIT ACCD LIMIT ACCOUNT_DI M POLICY VEHICLE COVERAGE PREMIUM ACCOUNT ID STATE CD PLAN TYPE DEAL PCT VECHILE ID VEHICLE T ENGINE CC COLOR PAY DATE PREMIUM A The hybrid model is based on the relational model, with two changes that derive from dimensional modeling practices: (1) Create a relationship from the PREMIUM table to each table in the upper portion of the hierarchy, 83

and (2) Add the time dimension. The Existing star snow galaxy schemas of data warehouse will be discussed and hybrid model will be suggested. ACCD_LI THE_DAY PERS_LI DEDUCTI COVERA GE TIME_DI M THE_MON THE_YEA COVERA YMD PREMIU PREMIU M_FACT PREMIU M_FACT SCOPE_I COLOR CAT_CL TERRITO POLICY VEHICL E ENGINE_ VEHICLE POLICY_ VECHILE DEAL_PC ACCOUN TDIM PLAN_TY STATE_C ACCOUNT Figure 5.2. Dimensional model 84

ACCOUNT POLICY VEHICLE COVERAGE PREMIUM TIME_DIM ACCOUNT ID STATE CD PLAN TYPE DEAL PCT POLICY ID TERRITORY CAT CLASS SCOPE IND VECHILE ID VEHICLE TYP ENGINE CC COLOR COVERAGE I DEDUCTIBLE PERS LIMIT ACCD LIMIT VEHICLE TYP ENGINE CC YMD THE YEAR THE MONTH THE DAY Figure5. 3. Hybrid model 85

FIG. 5.4 REAL model Economic Resource Dimension Time Period Dimension For Roll-up External Agent Dimension Economic Event Fact Table Location Dimension for Roll-up Internal Agent Dimension Figure 5.5 : star model The revised model, can be represented with a star or a snowflake schema. At the center of 86

the star is the event table, capturing the critical information about the event (the fact table). Surrounding the event are the dimensions of resources, agents, location, and time period as they relate to the event, resulting in a star schema. FIG 5.6 : Sample Snowflake Schema 5.3 A Data Warehouse Hybrid Model HYBRID model the revised model, can be represented with a star or a snowflake schema. At the center of the star is the event table, capturing the critical information about the event (the fact table). Surrounding the event are the dimensions of resources, agents, location, and time period as they relate to the event, resulting in a star schema. additional tables are included to provide information about time period and manufacturer. As in REA and REAL models, the particular process being modeled influences which resources, events, agents, and locations are included and the number of tables used to represent each. Additional information changes the star model into a 87

snowflake model. Control information can also be added with links from agents to economic unit, or agent information can be linked to resource information. Data Warehouse Synchronism Model Relational and dimensional modeling are often used separately, but they can be successfully incorporated into a single design when needed. Doing so starts with a normalized relational model and then adds dimensional constructs, primarily at the physical level. The result is a single model that can provide the strengths of its parent models fairly well: it represents entities and relationships with the precision of the traditional relational model, and it processes dimensionally filtered, fact aggregated queries with speed approaching that of the traditional dimensional model. Real world experience was the motivation for this analysis: on three separate data warehousing projects where I worked as programmer, architect, and manager, respectively, I found a consistent pattern of data/database behavior that lent itself far more to a hybrid combination of dimensional and relational modeling than to either one alone. This article discusses the hybrid design and provides a fully functional reference implementation. FIG 5.7. Data Warehouse Synchronism Model 88

Dynamic Synchronism Model Dynamic Synchronism updates on transactional environment reflected immediately on analytical environment DW loading divided among various maintenance transactions Data portions based on business niches o Solution for paralelization of static and dynamic loadings Hybrid Synchronism Model Portions are synchronized in different time intervals Based on Characteristics of: o Transaction/Operational Environment o Analytical Applications Environment Hybrid tables After further experience and discussion it was realized the functional tables could be hybridized to satisfy specific reporting needs and to provide a transitional bridge to the future. Definition a hybrid functional table is one that has data derived from disparate systems. (normally legacy and PeopleSoft) Hybrid tables can become transitional tables. With time, hybrid tables can become normal functional tables. (When the legacy data is no longer required, the columns cease to be filled or are removed.) Legacy Stdnt No. Addr_type Addr_line1 12345678 L BVRIT, NSP 23456789 M LBRCE, MYM 34567890 B VSCE, KMM Table:5. 1 89

Converted PeopleSoft Stdnt No. Addr_type PS_Addr_type Addr_line1 12345678 L LOC BVRIT, NSP 23456789 M Mail LBRCE, MYM 34567890 B BUS VSCE, KMM Table : 5.2 Table name (address) is unchanged Legacy Code changes kept to a minimum Hybrid Tables Outpu New Fig 5.8. : Hybrid table process 90

Table 5.3 : Hybrid Table 91