Lection 3-4 WAREHOUSING



Similar documents
Fluency With Information Technology CSE100/IMT100

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

DATA WAREHOUSING AND OLAP TECHNOLOGY

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Data Warehousing Systems: Foundations and Architectures

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data Warehousing and Data Mining in Business Applications

IST722 Data Warehousing

Part 22. Data Warehousing

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Turkish Journal of Engineering, Science and Technology

Data Warehouse Overview. Srini Rengarajan

資 料 倉 儲 (Data Warehousing)

Data Warehousing and OLAP Technology for Knowledge Discovery

Data W a Ware r house house and and OLAP Week 5 1

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

14. Data Warehousing & Data Mining

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

The Data Warehouse ETL Toolkit

B.Sc (Computer Science) Database Management Systems UNIT-V

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Master Data Management and Data Warehousing. Zahra Mansoori

Data Warehouse Database Design Student Guide

Data Warehousing and Data Mining

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

A Survey on Data Warehouse Architecture

Chapter 5. Learning Objectives. DW Development and ETL

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Business Intelligence. 1. Introduction September, 2013.

Data Warehouse: Introduction

Integrating Netezza into your existing IT landscape

Overview of Data Warehousing and OLAP

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

Data Warehouse (DW) Maturity Assessment Questionnaire

Framework for Data warehouse architectural components

Data Warehousing & OLAP

A Critical Review of Data Warehouse

An Overview of Database management System, Data warehousing and Data Mining

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

The Oracle Enterprise Data Warehouse (EDW)

Vertical Data Warehouse Solutions for Financial Services

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Week 13: Data Warehousing. Warehousing

When to consider OLAP?

Introduction to Datawarehousing

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Extraction Transformation Loading ETL Get data out of sources and load into the DW

3.1 Data Warehouse Methodology. Literature Review on Data Warehouse

Databases in Organizations

Practical meta data solutions for the large data warehouse

Class News. Basic Elements of the Data Warehouse" 1/22/13. CSPP 53017: Data Warehousing Winter 2013" Lecture 2" Svetlozar Nestorov" "

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Master Data Management. Zahra Mansoori

Data-Warehouse-, Data-Mining- und OLAP-Technologien

INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

SAS BI Course Content; Introduction to DWH / BI Concepts

Hybrid Support Systems: a Business Intelligence Approach

Understanding Data Warehousing. [by Alex Kriegel]

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Data warehouse and Business Intelligence Collateral

BENEFITS OF AUTOMATING DATA WAREHOUSING

Week 3 lecture slides

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

BUILDING OLAP TOOLS OVER LARGE DATABASES

DATA MINING AND WAREHOUSING CONCEPTS

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

BUSINESSOBJECTS DATA INTEGRATOR

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Data Warehousing and Data Mining Introduction

Advanced Data Management Technologies

Dimensional Data Modeling for the Data Warehouse

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Technology-Driven Demand and e- Customer Relationship Management e-crm

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Designing a Dimensional Model

Transcription:

Lection 3-4 DATA WAREHOUSING

Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing and managing data warehouses Explain data warehousing operations Explain the role of data warehouses in decision support 2

Data Warehousing Definitions and Concepts Data warehouse A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format 3

Data Warehousing Definitions and Concepts Characteristics of data warehousing Subject oriented Integrated Time variant (time series) Nonvolatile Web based Relational/multidimensional Client/server Real-time Include metadata 4

What is Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management s decision-making process. W. H. Inmon, the father of the term data warehouse 5

What is Data Warehouse? Subject Oriented Integrated Data Warehouse Non Volatile Time Variant 6

DW: Subject-Oriented Data is categorized and stored by business subject rather than by application Equity Plans Savings Customer Financial Information Loans Shares Data Warehouse Subject Area Operational Systems Insurance 7

DW: Subject-Oriented Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision i support process. 8

DW: Integrated Data on a given subject is defined and stored once Savings Application Current Accounts Application No Application Flavor Loans Application Subject = Customer Operational Environment Data Warehouse 9

DW: Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records One set of consistent, accurate, quality information Standardization Naming conventions Coding structures Data attributes Measures 10

DW: Integrated Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources When data is moved to the warehouse, it is converted 11

DW: Time Variant Data is stored as a series of snapshots, each representing a period of time Time 01/97 02/97 Data Data for January Data for February 03/97 Data for March Data Warehouse 12

DW: Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical i perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain 13 time element

DW: Non-Volatile Typically data in the data warehouse is not deleted Load Operational Databases Warehouse Database INSERT Read UPDATE DELETE Read 14

DW: Non-Volatile A physically separate store of data transformed from the operational environment Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data 15

Other definitions A decision-support database, which separately maintained from the operational database of the organization S. Chaudhuri, U. Dayal, VLDB 96 tutorial A database specifically modeled and fine-tuned for analysis and decision making A single, integrated store of corporate data Ab body of fdata, extracted t dfrom a variety of sources... a strategic collection of all types of data in support of the decision-making process at all levels of the enterprise Oracle datawarehouse 16

Other definitions A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available a ab to end users s in a way they can understand and use it in a business context. -- Barry Devlin, IBM Consultant 17

Data Warehousing Process Overview 18

Data Warehousing Definitions and Concepts Data mart A departmental data warehouse that stores only relevant data Dependent data mart A subset that is created directly from a data warehouse Independent data mart A small data warehouse designed for a strategic business unit or a department 19

Data Warehousing Definitions and Concepts Operational data stores (ODS) A type of database often used as an interim area for a data warehouse, especially for customer information files Oper marts An operational data mart. An oper mart is a small-scale scale data mart typically used by a single department or functional area in an organization 20

Data Warehousing Definitions and Concepts Enterprise data warehouse (EDW) A technology that provides a vehicle for pushing data from source systems into a data warehouse Metadata Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use 21

Data Warehousing Process Overview Organizations continuously collect data, information, and knowledge at an increasingly accelerated rate and store them in computerized ed systems s The number of users needing to access the information continues to increase as a result of improved reliability and availability of network access, especially the Internet 22

Data Warehousing Process Overview The major components of a data warehousing process Data sources Data extraction Data loading Comprehensive database Metadata Middleware tools 23

DW components (according to Kimball) 24

Operational Source Systems Each source system is a stovepipe application, little investment to sharing common data (e.g., product, customer) Reengineering with consistent view would be great Enterprise Application i Integration (EAI) effort will make the pass to DW more easy 25

Data staging area It is off-limits it to the business users Does not provide query and presentation services Normalization is not the end goal Normalized databases are excluded from the presentation area, so no need to normalize in data staging area 26

Data presentation Where the data are organized, stored, made available for querying This is the data warehouse for business community (remember, they can t see data staging area) Series of integrated data marts A data mart presents the data from a single business process Business processes cross the boundaries of organizational functions 27

Data presentation Data must be stored and accessed in dimensional schemas No normalization (3NF) should be used Dimensional schemas are simple and intuitive for business users; Normalized schemas are difficult to grasp by them Data must be atomic (at lower granularity) Not only summarized they don t allow for arbitrary, complex queries Data marts must be build on dimensions and facts that are conformed Otherwise, data marts are stovepipes Conformation leads to bus architecture data marts can cooperate 28

Data access tools Ad hoc, complex queries are targeted t to small percentage of business users 80-90% of the potential users will be served by canned applications Canned: pre-build parameter-driven analytic applications 29

Again on Metadata and ODS Metadata t repository Management of metadata Operational Data Store (ODS) Management of almost real-time data 30

Metadata Repository Meta data is the data defining warehouse objects. It has the following kinds Description of the structure of the warehouse schema, view, dimensions, hierarchies, derived data defn, data mart locations and contents Operational meta-data data lineage (history of migrated data and transformation path), currency of data (active, archived, or purged), monitoring information (warehouse usage statistics, error reports, audit trails) The algorithms used for summarization (measures, gran, etc) The mapping from operational environment to the data warehouse Data related to system performance warehouse schema, view and derived data definitions Business data business terms and definitions, ownership of data, charging policies 31

Importance of metadata repository Ultimate goal of metadata repositories: to corral, catalog, integrate, and leverage the disparate varieties of metadata t (like the resources of a library) The task looms, but we can t ignore it Need to develop an overall metadata plan 32

Standards for metadata Metadata Coalition proposed: MetaData Interchange Specification (MDIS) Open Information Model (OIM) OMG Common Warehouse Model (CWM) Microsoft Repository (in the Office suit) Includes some simple Information Models for data warehousing 33

Operational Data Store (ODS) What is it? Do we need it? If yes, when we need it? How is it usually implemented? 34

What is ODS? There is no single definition for the ODS, if and where to belong to a DW ODS is a frequently updated, somewhat integrated copies of operational data It stands between operational databases and the rest of DW The frequency of update and degree varies on application requirements Operation al Data ODS Warehouse 35

When we need an ODS? ODS is implemented to deliver operational reporting, especially when the OLTP operational databases do not provide any reporting capabilities The reports address the organization s tactical requirements, especially those that need the most current information No aggregation g abilities, much simpler than the OLAP analysis 36

How is ODS usually implemented? In CRM (Customer Relationship Management), especially in the case of e- commerce, need near-real-time data In such cases, they are implemented within the DW ODS then feeds the DW Alternatively, ODS is a third, physically separated system Conclusion: Get an ODS when OLTP operational databases and the DW cannot answer immediate operational questions (if you need them) 37

Data Warehousing Architectures Three parts of the data warehouse The data warehouse that contains the data and associated software Data acquisition (back-end) software that extracts data from legacy systems and external sources, consolidates and summarizes them, and loads them into the data warehouse Client (front-end) software that allows users to access and analyze data from the warehouse 38

Data Warehousing Process Overview 39

Data Warehousing Process Overview 40

Data Warehousing Process Overview 41

Data Warehousing Architectures Issues to consider when deciding which h architecture to use: Which database management system (DBMS) should be used? Will parallel processing and/or partitioning be used? Will data migration tools be used to load the data warehouse? What tools will be used to support data retrieval and analysis? 42

Data Warehousing Process Overview 43

Data Warehousing Process Overview 44

Case study from Teradata http://searchdatamanagement.techtarget. hd t t t t com/news/article/0,289142,sid91_gci113 7386,00.html 45

Distributed DW 46

Data Warehousing Architectures Ten factors that potentially affect the architecture selection decision: 5. Constraints on resources 6. Strategic view of the data warehouse prior to implementation 7. Compatibility with existing systems 8. Perceived ability of the in- house IT staff 9. Technical issues 10. Social/political factors 1. Information interdependence between organizational units 2. Upper management s information needs 3. Urgency of need for a data warehouse 4. Nature of end-user tasks 47

Controversial issues http://en.wikipedia.org/wiki/data_warehou / iki/d t h se 48