Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer



Similar documents
Part 22. Data Warehousing

Week 13: Data Warehousing. Warehousing

Data Warehousing and OLAP

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

When to consider OLAP?

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Week 3 lecture slides

IST722 Data Warehousing

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

1960s 1970s 1980s 1990s. Slow access to

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD )

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Fluency With Information Technology CSE100/IMT100

DATA WAREHOUSING - OLAP

SAS BI Course Content; Introduction to DWH / BI Concepts

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Chapter 3 - Data Replication and Materialized Integration

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Turkish Journal of Engineering, Science and Technology

DATA WAREHOUSING AND OLAP TECHNOLOGY

Advanced Data Management Technologies

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

On-Line Application Processing. Warehousing Data Cubes Data Mining

Lecture Data Warehouse Systems

Data Warehousing and Data Mining Introduction

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

14. Data Warehousing & Data Mining

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehouse: Introduction

OLAP and Data Warehousing! Introduction!

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Overview of Data Warehousing and OLAP

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Oracle Business Intelligence 11g Business Dashboard Management

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

SQL Server 2012 Business Intelligence Boot Camp

An Architectural Review Of Integrating MicroStrategy With SAP BW

How to Enhance Traditional BI Architecture to Leverage Big Data

Indexing Techniques for Data Warehouses Queries. Abstract

Extraction Transformation Loading ETL Get data out of sources and load into the DW

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Table of Contents Chapter 1 - Getting Started with Oracle Data Relationship Management (DRM) 1

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

B.Sc (Computer Science) Database Management Systems UNIT-V

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

A Critical Review of Data Warehouse

Data Warehousing Systems: Foundations and Architectures

Two-Phase Data Warehouse Optimized for Data Mining

CS2032 Data warehousing and Data Mining Unit II Page 1

Data Warehousing and Data Mining

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

Multi-dimensional index structures Part I: motivation

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Real-time Data Replication

Data Integration and ETL Process

Data Mart/Warehouse: Progress and Vision

Data W a Ware r house house and and OLAP II Week 6 1

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

A Design and implementation of a data warehouse for research administration universities

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data warehouse Architectures and processes

Data Warehouse. The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

DATA MINING AND WAREHOUSING CONCEPTS

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

Framework for Data warehouse architectural components

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Data Warehousing, OLAP, and Data Mining

A Survey on Data Warehouse Architecture

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

SAP BO 4.1 COURSE CONTENT

Introduction to Datawarehousing

Foundations of Business Intelligence: Databases and Information Management

Lection 3-4 WAREHOUSING

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Data Warehousing and Online Analytical Processing

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution.

IBM WebSphere DataStage Online training from Yes-M Systems

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

CS54100: Database Systems

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Transcription:

Data Warehousing Overview, Terminology, and Research Issues 1

Heterogeneous Database Integration Integration System World Wide Web Digital Libraries Scientific Databases Personal Databases Collects and combines information Provides integrated view, uniform user interface Supports sharing 2

The Traditional Research Approach Query-driven (lazy, on-demand) Clients Integration System Metadata... Wrapper Wrapper Wrapper Source Source... Source 3

Disadvantages of Query-Driven Approach Delay in query processing Slow or unavailable information sources Complex filtering and integration Inefficient and potentially expensive for frequent queries Competes with local processing at sources Hasn t caught on in industry 4

The Warehousing Approach Information integrated in advance Stored in wh for direct querying and analysis Clients Data Warehouse Integration System Metadata Extractor/ Monitor Extractor/ Monitor... Extractor/ Monitor... Source Source Source 5

Advantages of Warehousing Approach High query performance But not necessarily most current information Doesn t interfere with local processing at sources Complex queries at warehouse OLTP at information sources Information copied at warehouse Can modify, annotate, summarize, restructure, etc. Can store historical information Security, no auditing Has caught on in industry 6

Not Either-Or Decision Query-driven approach still better for Rapidly changing information Rapidly changing information sources Truly vast amounts of data from large numbers of sources Clients with unpredictable needs 7

Data Warehousing: Two Distinct Issues (1) How to get information into warehouse Data warehousing (2) What to do with data once it s in warehouse Warehouse DBMS Terms coined by Jennifer Widom (WHIPS) Both rich research areas Industry has focused on (2) 8

What is a Warehouse? Stored collection of diverse data A solution to data integration problem Single repository of information Subject-oriented Organized by subject, not by application Used for analysis, data mining, etc. Optimized differently from transaction-oriented db User interface aimed at executive 9

What is a Warehouse (2)? Large volume of data (Gb, Tb) Non-volatile Historical Time attributes are important Updates infrequent May be append-only Examples All transactions ever at WalMart Complete client histories at insurance firm Stockbroker financial information and portfolios 10

Warehouse is a Specialized DB Standard DB Mostly updates Many small transactions Mb -Gbof data Current snapshot Index/hash on p.k. Raw data Thousands of users (e.g., clerical users) Mostly reads Warehouse Queries are long and complex Gb - Tb of data History Lots of scans Summarized, consol. data Hundreds of users (e.g., decision-makers, analysts) 11

Warehousing and Industry Warehousing is big business $2 billion in 1995 $3.5 billion in early 1997 Predicted: $8 billion in 1998 [Metagroup] WalMart has largest warehouse 4Tb 200-300Mb per day 12

Warehouse DBMS Buzzwords Used primarily for decision support (DSS) A.K.A. On-Line Analytical Processing (OLAP) Complex queries, substantial aggregation TPC-D benchmark May support multidimensional database (MDDB) A.K.A. Data Cube View of relational data: all possible groupings and aggregations 13

Warehouse DBMS Buzzwords (2) ROLAP vs. MOLAP Special purpose OLAP servers that directly implement multidimensional data and operations Roll-up = aggregate on some dimension Drill-down = deaggregate on some dimension ROLAP: Oracle, Sybase IQ, RedBrick MOLAP: Pilot, Essbase, Gentia 14

Warehouse DBMS - Buzzwords (3) Clients: Query and reporting tools Analysis tools Data mining: discovering patterns of various forms Poses many new research issues in: Query processing and optimization Database design View management 15

Data Warehousing: Basic Architecture Clients Clients Clients Data Warehouse Integration System Metadata... Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor... Source Source Source 16

Data Warehousing: Current Practice Warehouse is standard or specialized DBMS Everything is relational Extraction and integration are done in batch, usually off-line, often by hand Integrator never queries sources Everything is replicated at warehouse Generalizing introduce many research issues 17

Research Issues in Data Warehousing Extraction Integration Warehousing specification Optimizations Miscellaneous 18

Data Extraction Translation Translate information to warehouse data model Similar to wrapper in query-driven approach [Stanford Tsimmis project] Change detection For on-line data propagation and incremental warehouse refresh Different classes of information sources Cooperative Logged Queryable Snapshot 19

Data Extraction (2) Efficient change detection algorithms for snapshot sources Record-oriented Hierarchical data Data scrubbing (or cleansing) Discard or correct erroneous data Insert default values Eliminate duplicates and inconsistencies Aggregate, summarize, sample Implementing extractors Specification-based or toolkit approach 20

Data Integration Warehouse data materialized view Initial loading View maintenance But: differs from conventional materialized view maintenance Warehouse views may be highly aggregated Warehouse views may be over history of base data Schema may evolve Base data doesn t participate in view maintenance Simply reports changes Loosely coupled Absence of locking, global transactions May not be queriable 21

Warehouse Maintenance Anomalies Materialized view maintenance in loosely coupled, anon-transactional environment Simple example Data Warehouse Integrator Sold (item,clerk,age) Sold = Sale Emp Sales Comp. Sale(item,clerk) Emp(clerk,age) 22

Warehouse Maintenance Anomalies Data Warehouse Sold (item,clerk,age) Integrator Sales Comp. Sale(item,clerk) Emp(clerk,age) 1. Insert into Emp(Mary,25), notify integrator 2. Insert into Sale (Computer,Mary), notify integrator 3. (1) integrator adds Sale (Mary,25) 4. (2) integrator adds (Computer,Mary) Emp 5. View incorrect (duplicate tuple) 23

Warehouse Self-Maintainability Goal: No queries back to sources Research issues: What views are self-maintainable Store auxiliary views so original + auxiliary views are self-maintainable Examples: Keep keys Inserts into Sale, maintain auxiliary view: Emp - Π clerk,age (Sold) Lots of issues... 24

Warehouse Specification Ideal scenario: Change detection requirements View definitions Warehousing Configuration Module Integration rules Data Warehouse Integrator Metadata Extractor Extractor Extractor Info Source Info Source... Info Source 25

Optimizations Update filtering at extractor Multiple view optimization If warehouse contains several views Exploit shared sub-views Others? 26

Additional Research Issues Warehouse management Schema design Initial loading Metadata management 27