Introduction to Data Warehousing. Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in



Similar documents
Overview of Data Warehousing and OLAP

DATA WAREHOUSING - OLAP

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Part 22. Data Warehousing

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data W a Ware r house house and and OLAP Week 5 1

Week 13: Data Warehousing. Warehousing

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP

CHAPTER 3. Data Warehouses and OLAP

Week 3 lecture slides

Turkish Journal of Engineering, Science and Technology

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Module 1: Introduction to Data Warehousing and OLAP

IST722 Data Warehousing

A Critical Review of Data Warehouse

Monitoring Genebanks using Datamarts based in an Open Source Tool

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Data Warehousing & OLAP

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data W a Ware r house house and and OLAP II Week 6 1

When to consider OLAP?

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

14. Data Warehousing & Data Mining

Data Warehousing Systems: Foundations and Architectures

Chapter 3, Data Warehouse and OLAP Operations

Data Warehouse: Introduction

Data Warehousing and Online Analytical Processing

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

CS2032 Data warehousing and Data Mining Unit II Page 1

OLAP Theory-English version

CHAPTER 4 Data Warehouse Architecture

Why Business Intelligence

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Data Warehousing, OLAP, and Data Mining


Data Mining for Knowledge Management. Data Warehouses

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Data Warehousing and Data Mining

CHAPTER 4: BUSINESS ANALYTICS

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

OLAP and Data Warehousing! Introduction!

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

BUILDING OLAP TOOLS OVER LARGE DATABASES

Data Warehousing and OLAP Technology

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

Designing a Dimensional Model

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

CHAPTER 5: BUSINESS ANALYTICS

UNIT-3 OLAP in Data Warehouse

Lection 3-4 WAREHOUSING

VIJAY SHANKER PANDEY. Academic/Professional Qualification

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

Business Intelligence, Analytics & Reporting: Glossary of Terms

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

Data Warehousing OLAP

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Data Warehousing. Paper

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

Data Warehouse. The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

Presented by: Jose Chinchilla, MCITP

Dimensional Modeling for Data Warehouse

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Building Data Cubes and Mining Them. Jelena Jovanovic

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Lecture 2: Introduction to Business Intelligence. Introduction to Business Intelligence

A Technical Review on On-Line Analytical Processing (OLAP)

2 Data Warehouse and OLAP Technology for Data Mining What is a data warehouse? Amultidimensional data model... 6

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

DATA MINING AND WAREHOUSING CONCEPTS

Fluency With Information Technology CSE100/IMT100

Transcription:

Introduction to Data Warehousing Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in

Necessity is the mother of invention Why Data Warehouse?

Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

Scenario 1 : ABC Pvt Ltd. Mumbai Delhi Chennai Sales per item type per branch for first quarter. Sales Manager Banglore

Solution 1:ABC Pvt Ltd. Extract sales information from each database. Store the information in a common repository at a single site.

Solution 1:ABC Pvt Ltd. Mumbai Report Delhi Data Warehouse Query & Analysis tools Sales Manager Chennai Banglore

Scenario 2 One Stop Shopping Super Market has huge operational database.whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

Scenario 2 : One Stop Shopping Data Entry Operator Report Wait Operational Database Management Data Entry Operator

Solution 2 Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.

Solution 2 Data Entry Operator Report Transaction Operational database Extract data Data Warehouse Manager Data Entry Operator

Scenario 3 Cakes & Cookies is a small,new company.president of the company wants his company should grow.he needs information so that he can make correct decisions.

Solution 3 Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.

Solution 3 Expansio n Data Warehouse Query and Analysis tool sales President time Improvemen t

What is Data Warehouse??

Inmons s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management s decision making process.

Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.

Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation

Integration In terms of data. encoding structures. Measurement of attributes. physical attribute. of data remarks naming conventions. Data type format

Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time

Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data load access

Operational v/s Information System Features Characteristics Orientation User Function Data View DB design Unit of work Access Current Operational Operational processing Transaction Clerk,DBA,database professional Day to day operation Detailed,flat relational Application oriented Short,simple transaction Read/write Informational processing Analysis Knowledge workers Decision support Historical Information Summarized, multidimensional Subject oriented Complex query Mostly read

Operational v/s Information System Features Focus Number of records accessed Number of users DB size Priority Metric Operational Data in tens thousands 100MB to GB High performance,high availability Transaction throughput Information Information out millions hundreds 100 GB to TB High flexibility,enduser autonomy Query througput

Data Warehousing Architecture Monitoring & Administration OLAP Servers Metadata Repository Reconciled data Analysis External Sources Operational Dbs Extract Transform Load Refresh Serve Query/Reporting Data Mining DATA SOURCES TOOLS DATA MARTS

Data Warehouse Architecture Data Warehouse server almost always a relational DBMS,rarely flat files OLAP servers to support and operate on multi-dimensional data structures Clients Query and reporting tools Analysis tools Data mining tools

Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema

Star Schema A single,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins. (.. contd ) Star Schema Store Dimension Store Key Store Name City State Region Fact Table Store Key Product Key Period Key Units Price Time Dimension Period Key Year Quarter Month Product Key Product Desc Product Dimension

SnowFlake Schema Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables

(.. contd ) SnowFlake Schema Store Dimension Store Key Store Name City Key City Dimension Fact Table Store Key Product Key Period Key Units Price Time Dimension Period Key Year Quarter Month City Key Product Key City Product Desc State Region Product Dimension Drawbacks: Time consuming joins,report generation slow

Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.

(.. contd ) Fact Constellation Store Key Product Key Period Key Units Price Sales Fact Table Product Dimension Product Key Product Desc Store Dimension Shipping Fact Table Shipper Key Store Key Product Key Period Key Units Price Store Key Store Name City State Region

Building Data Warehouse Data Selection Data Preprocessing Fill missing values Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.

Case Study Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda. There products are sold in North,North West and Western region of India. They have sales units at Mumbai, Pune, Ahemdabad,Delhi and Baroda. The President of the company wants sales information.

Sales Information Report: The number of units sold. 113 Report: The number of units sold over time January February March April 14 41 33 25

Sales Information Report : The number of items sold for each product with time Jan Feb Mar Apr Wheat Bread 6 17 Cheese 6 16 6 8 Time Swiss Rolls 8 25 21 Product

Sales Information Report: The number of items sold in each City for each product with time Mumbai Wheat Bread Jan Feb Mar 3 Apr 10 City Cheese 3 16 6 Swiss Rolls 4 16 6 Time Pune Wheat Bread 3 7 Cheese 3 8 Product Swiss Rolls 4 9 15

Sales Information Report: The number of items sold and income in each region for each product with time. Jan Feb Mar Apr Rs U Rs U Rs U Rs U Mumbai Wheat Bread 7.44 3 24.80 10 Cheese 7.95 3 42.40 16 15.90 6 Swiss Rolls 7.32 4 29.98 16 10.98 6 Pune Wheat Bread 7.44 3 17.36 7 Cheese 7.95 3 21.20 8 Swiss Rolls 7.32 4 16.47 9 27.45 15

Sales Measures & Dimensions Measure Units sold, Amount. Dimensions Product,Time,Region.

Sales Data Warehouse Model Fact Table City Product Month Units Rupees Mumbai Wheat Bread January 3 7.95 Mumbai Cheese January 4 7.32 Pune Wheat Bread January 3 7.95 Pune Cheese January 4 7.32 Mumbai Swiss Rolls February 16 42.40

Sales Data Warehouse Model City_ID Prod_ID Month Units Rupees 1 589 1/1/1998 3 7.95 1 1218 1/1/1998 4 7.32 2 589 1/1/1998 3 7.95 2 1218 1/1/1998 4 7.32 1 590 2/1/1998 16 42.40

Sales Data Warehouse Model Product Dimension Tables Prod_ID 589 590 288 Product_Name Wheat Bread Swiss Rolls Coconut Cookies Product_Category_ID 1 1 2 Product_Category_Id 1 2 Product_Category Bread Cookies

Sales Data Warehouse Model Region Dimension Table City_ID City Region Country 1 Mumbai West India 2 Pune NorthWest India

Sales Data Warehouse Model Time Sales Fact Product Product Category Region

( Processing(OLAP Online Analysis It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. Product Data Warehouse Region Time

OLAP Cube City Product Time Units Dollars All All All 113 251.26 Mumbai All All 64 146.07 Mumbai White Bread All 38 98.49 Mumbai Wheat Bread All 13 32.24 Mumbai Wheat Bread Qtr1 3 7.44 Mumbai Wheat Bread March 3 7.44

OLAP Operations Drill Down Product Category e.g Electrical Appliance Sub Category e.g Kitchen Region Product e.g Toaster Time

OLAP Operations Drill Up Product Category e.g Electrical Appliance Sub Category e.g Kitchen Region Product e.g Toaster Time

OLAP Operations Slice and Dice Product Product=Toaster Region Region Time Time

OLAP Operations Pivot Product Product Region Time Time Region

OLAP Server An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure. OLAP server available are MOLAP server ROLAP server HOLAP server

Presentation Product Region Time Reporting Tool Report

Data Warehousing includes Build Data Warehouse Online analysis processing(olap). Presentation. RDBMS Cleaning,Selection & Integration Presentation Flat File Warehouse & OLAP server Client

Need for Data Warehousing Industry has huge amount of operational data Knowledge worker wants to turn this data into useful information. This information is used by them to support strategic decision making.

(.. contd ) Need for Data Warehousing It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions.

(.. contd ) Need for Data Warehousing From business perspective -it is latest marketing weapon -helps to keep customers by learning more about their needs. -valuable tool in today s competitive fast evolving world.

Data Warehousing Tools Data Warehouse SQL Server 2000 DTS Oracle 8i Warehouse Builder OLAP tools SQL Server Analysis Services Oracle Express Server Reporting tools MS Excel Pivot Chart VB Applications

References Building Data Warehouse by Inmon Data Mining:Concepts and Techniques by Han,Kamber. www.dwinfocenter.org www.datawarehousingonline.com www.billinmon.com

Thank You