Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.



Similar documents
2.1 Basics: Indexing. 2.1 Primary Index. 2.1 Secondary Index. 2.1 Secondary Index. 2.1 Indexes. 2.1 Indexes

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Introduction: Database management system

DATA WAREHOUSING AND OLAP TECHNOLOGY

B.Sc (Computer Science) Database Management Systems UNIT-V

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

DATABASE MANAGEMENT SYSTEM

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Indexing Techniques for Data Warehouses Queries. Abstract

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

When to consider OLAP?

1 File Processing Systems

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Multi-dimensional index structures Part I: motivation

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

Relational Database Systems 2 1. System Architecture

Data Warehouse: Introduction

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Enterprise Application Integration

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

Understanding Client/Server Computing

Data Warehousing and Data Mining

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Chapter 14: Databases and Database Management Systems

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Cache Database: Introduction to a New Generation Database

Tier Architectures. Kathleen Durant CS 3200

Oracle Warehouse Builder 10g

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

System types. Distributed systems

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002

C/S Basic Concepts. The Gartner Model. Gartner Group Model. GM: distributed presentation. GM: distributed logic. GM: remote presentation

BUILDING OLAP TOOLS OVER LARGE DATABASES

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Chapter 3 - Data Replication and Materialized Integration

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Data Warehousing Concepts

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

Client/server is a network architecture that divides functions into client and server

Data Warehousing and OLAP Technology for Knowledge Discovery

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Lection 3-4 WAREHOUSING

In-memory databases and innovations in Business Intelligence

Real-time Data Replication

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

OBIEE 11g Data Modeling Best Practices

Data Warehouses & OLAP

Global Data Integration with Autonomous Mobile Agents. White Paper

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

1. INTRODUCTION TO RDBMS

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

How To Build A Decision Support System

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

JDBC Drivers, Bridges and SequeLink. Edward M. Peters Vice President & General Manager DataDirect

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

In principle, SAP BW architecture can be divided into three layers:

Data warehouse Architectures and processes

Introductory Concepts

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Databases in Organizations

Week 3 lecture slides

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Lecture Data Warehouse Systems

The difference between. BI and CPM. A white paper prepared by Prophix Software

8. Business Intelligence Reference Architectures and Patterns

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

DISTRIBUTED AND PARALLELL DATABASE

3-Tier Architecture. 3-Tier Architecture. Prepared By. Channu Kambalyal. Page 1 of 19

Extraction Transformation Loading ETL Get data out of sources and load into the DW

CS2032 Data warehousing and Data Mining Unit II Page 1

Building Cubes and Analyzing Data using Oracle OLAP 11g

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Breadboard BI. Unlocking ERP Data Using Open Source Tools By Christopher Lavigne

A Technical Review on On-Line Analytical Processing (OLAP)

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Module 17. Client-Server Software Development. Version 2 CSE IIT, Kharagpur

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Budgeting and Planning with Microsoft Excel and Oracle OLAP

Transcription:

Data Warehousing & Mining Techniques Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

2. Architecture 2. Architecture 2.1 Basics 2.2 Storage structures 2.3 Tier architectures 2.4 Distributed DW 2.5 Middleware Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

2.1 Basics Architecture of a DW Data is stored in a predefined database Maintenance of the database is performed as in OLTP by a DBMS Usual functionality of the database is ensured Storage, Update, Delete, Locate Individually Structured Departmentally Structured Organizationally Structured Source Systems Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

2.1 Basics: Databases & DBMS Databases & DBMS Application SELECT id FROM revenues WHERE val> 50000 DBMS Disk 1 Disk N ID VAL 1 37 000 2 67000 3 45 000.. Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

2.1 Basics: Indexing DW is characterized by Large volume of information Mostly used for reading the information and not for updating or deleting operations outside the ETL phase This characteristics suggest indexes as a must have in DW so let s remember indexes Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

2.1 Basics: Indexing Indexes are additional data structures which help locating records in a DB Creation of indexes is part of the physical tuning task of the DB administrators Indexes can influence the actual location of storage for a record Sequential storage, or via a hash function If the location is determined by the index not all attributes can be directly indexed (primary vs. secondary indexes) Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

2.1 Basics: Indexing Indexes are useful for speeding up access to the data They are ordered by indexing field (search key) search key is the attribute used to look up records into a file An index file consists of index entries Records of the form search key location Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

2.1 Primary Index Primary indexes Order data by some unique attribute as indexing field (primary key), store database records in this order An index record contains a pointer to the respective storage place Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

2.1 Secondary Index Secondary indexes point to locations of records regarding non-ordering attribute Indexing does not affect storage order There can be multiple secondary indexes for the same DB file Secondary indexes are usually dense Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

2.1 Secondary Index Characteristics of secondary indexes Speeds up retrieval, if secondary index on the searched attribute does not exist, the entire file has to be searched linearly Use more time and space, because they are dense Provide logical ordering Accessing records in this order might not be the most efficient way regarding block accesses In DW, due to the large amount of data, multilevel ordered indexes are used Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

2.1 Indexes Here s a great idea: Why not index every attribute? Have a physical index on the primary key, and logical indexes on every other attribute This results in good read efficiency, but really terrible write/update efficiency But Data Warehouses only need good read efficiency? Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

2.1 Indexes Whenever a DB is modified, most of the indexes have to be updated This result in a large amount of overhead on operations like insert, delete or update If the indexes are multi-level every level has to be updated Why should we care? We have a DW not an OLTP system The majority of the operations in a DW are reads But remember ETL? We should use considerable more indexes than in OLTP, but loading data into the DW should not last forever!!! Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

2.1 Indexes in DW In DW the underlying technology has to support Creation and loading of new indexes Efficient access to the indexes Efficient access can be accomplished in different ways Using bit maps Having multi-leveled indexes Storing all or parts of an index in main memory Compacting the index entries when the order of the data being indexed allows such compaction Creating selective indexes and range indexes Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

2.1 Indexes in DW Recommended index structures are: B-tree indexes, on high cardinality attribute columns (due to the bushy nature of B-Trees) Bitmap indexes on all medium and low cardinality attributes Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

2.1 B-Trees Rel. DB 2: Basic structure of a B-Tree node Node contains key values and respective data (block) pointers to the actual data records Additionally, there are node pointers for the left respectively right interval around a key value Key Value Data Pointer Tree Node Node Pointers Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

2.1 Bitmap indexes Bitmap indexes: Work well with small number of distinct values E.g., gender data Identifier Gender Bitmaps 1 Female 1 0 2 Female 1 0 3 Male 0 1 4 Unspecified 0 0 5 Male 0 1 Have a significant space and performance advantage over other structures for this type of data Useful in DW for joining a large fact table to smaller dimension tables F M Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

2.1 Basic architecture Architecture of a DW Data Sources Staging Area Warehouse Data Marts Users Operational System Purchasing Analysis Operational System Summary Data Metadata Raw Data Sales Reporting Flat files Inventory Mining Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

2.1 Basic architecture The Data Staging Area Is both a storage and process area (the ETL process) It represents everything that Data Sources Staging Area Warehouse Data Marts happens between the operational source system and the data presentation area The key architectural requirement for data staging area is that it is off-limits to business users and does not provide query and presentation services Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

2.1 Basic architecture Customers aren t invited to visit the kitchen Similar to a restaurant s kitchen, the data staging area should be accessible only to skilled professionals Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

2.1 Basic architecture The Data Presentation Area Is where data is organized, stored and made available for queries, report writers, and other analytical processing Data Sources Staging Area Warehouse This area is the Warehouse as far as the business community is concerned Data Marts Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

2.2 Storage structures Storage structure After extraction from the operational data, in DW information is stored in databases The databases are operated by a DBMS Different database structures can be used for a DW: Relational model (RDB) operated by a RDBMS MultiDimensional model (MDB) operated by a MDBMS Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

2.2 Storage structures RDB and MDB are complementary and do not have to exclude each other In the staging area some RDBMS can be used, however it must be off-limits to user queries because of performance reasons By default, normalized databases are excluded from the presentation area, which should be strictly multi-dimensionally (MDBMS) Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

2.2 Relational DB DB in relational model A database is seen as a collection of predicates over a finite set of variables The content of the DB is modeled as a set of relations in which all predicates are satisfied Books Title ISBN (PK) Price Publisher(FK) Category (FK) Publisher Name ID (PK) BookCategory Cat_ID(PK) Description Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23

2.2 Relational DB A relation is defined as a set of tuples that have the same attributes It is usually described as a table Attribute Tuple Relation Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

2.2 Multidimensional DB A Multidimensional DB (MDB) is optimized for DW and OLAP applications They are created using input from the staging area Their purpose is to answer questions like How many Nokia 5800 have we sold so far this year in Braunschweig? MDBs are RDBS optimized for OLAP queries Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

2.2 Multidimensional DB MDB are Designed for efficient and convenient storage and retrieval of large volumes of data Stored, viewed and analyzed from different perspectives called dimensions Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

2.2 Multidimensional DB MDB example An automobile manufacturer wants to increase sale volumes Evaluation requires to view historical sale volume figures from multiple dimensions Sales volume by model, by color, by dealer, over time Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

2.2 Multidimensional DB A relational structure of the given evaluation would be Model Color Sales volume Mini VAN Blue 324 Mini VAN Black 113 Mini VAN Red 18 Sedan Black 160 Sedan Blue 115 Sedan Red 6 Sports coupe Red 16 Sports coupe Black 16 Sports coupe Blue 12 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

2.2 Multidimensional structure * 289 451 40 1560 Mini VAN 113 324 18 455 Sedan 160 115 6 281 Coupe 16 12 16 44 Black Blue Red * Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

2.2 Multidimensional DB The complexity grows quickly with the number of dimensions and the number of positions Example: 3 dimensions with 10 values each and no indexes If we consider viewing information in a RDB it would result in a worst case of 10 3 =1000 records view Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

2.2 Multidimensional DB Now, if we consider performance For responding to a query when car type = Sedan, color = Blue, and dealer = Berg RDBMS has to search through 1000 records to find the right record MDB has more knowledge about where data lies The maximum of searches in the case of MDB is of 30 positions Average case 18 vs. 501 positions Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

2.2 Multidimensional DB If the query is more relaxed Total sales across all dealers for all colors when car type = sedan RDBMS still has to go through the 1000 records MDB, however, goes only through a slice of 10x10 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

2.2 Multidimensional DB Performance advantages MDBs are an order of magnitude faster than RDBMSs Performance benefits are more for queries that generate cross-tab views of data (the case of DW) Conclusion The performance advantages offered by MDBs facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

2.2 RDB vs. MDB Any database manipulation is possible with both technologies MDBs however offer some advantages in the context of DW: Ease of data presentation Ease of maintenance Performance Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

2.2 RDB vs. MDB Ease of data presentation Data views are natural output of the MDBs Obtaining the same views in RDB requires a complex query Example with Walmart and Sybase: select sum(sales.quantity_sold) from sales, products, product_categories, manufacturers, stores, cities where manufacturer_name = Colgate and product_category_name = toothpaste and cities.population < 40 000 and trunc(sales.date_time_of_sale) = trunc(sysdate-1) and sales.product_id = products.product_id and sales.store_id = stores.store_id and products.product_category_id = product_categories.product_category_id and products.manufacturer_id = manufacturers.manufacturer_id and stores.city_id = cities.city_id Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

2.2 RDB vs. MDB Ease of data presentation Top k queries cannot be expressed well in SQL Find the five cheapest hotels in Frankfurt SELECT * FROM hotels h WHERE h.city = Frankfurt AND 5 > (SELECT count(*) FROM hotels h1 WHERE h1.city = Frankfurt AND h1.price < h.price); Some RDBMS extended the functionality of SQL with STOP AFTER functionality SELECT * FROM hotels WHERE city = Frankfurt Order By price STOP AFTER 5; Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

2.2 RDB vs. MDB Ease of maintenance No additional overhead to translate user queries into requests for data Data is stored as it is viewed RDBs use indexes and sophisticated joins which require significant maintenance and storage to provide same intuitiveness Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

2.2 RDB vs. MDB Performance Performance of MDBs can be matched by RDBs through database tuning Not possible to tune the database for all possible ad- hoc queries Aggregate navigators are helping RDBs to catch up with MDBs as far as aggregation queries are concerned Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

2.2 MDB When MDBs are in-appropriate? If the dataset types are not highly related, using a MDB results in a sparse representation Mini VAN 113 324 18 Smith 34 Sedan 160 115 6 James 25 34 Coupe 16 12 16 Fox 45 Black Blue Red 6 115 3 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

2.2 MDB When MDBs are appropriate? In the case of highly interrelated dataset types MDBs are recommended for greatest ease of access and analysis Examples of applications Financial Analysis and Reporting Budgeting Promotion Tracking Quality Assurance and Quality Control Product Profitability Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

2.3 Tier architectures Popular DW architectures Generic Two-Tier Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Active Warehouse Three-Tier Architecture Other One-Tier Architecture N-Tier Architecture Web-based Architecture Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

2.3 Layered architectures Generic Two-Tier Architecture Data is not completely current in the DW Periodic extraction Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

2.3 Layered architectures Data analysis comes in two flavors Depending on the execution place of the analysis Thin Client Analytics are executed on the server Client just displays This architecture fits well for Internet/Intranet DW access Client HTTP, IIOP Server Analysis Data storage Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

2.3 Layered architectures Fat Client The server just delivers the data Analytics are executed on the client Communication between client and server must be able to sustain large data transfers ODBC, JDBC, NFS Client Server Analysis Data storage Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

2.3 Layered architectures Independent Data Mart Mini warehouses limited in scope Separate ETL for each independent Data Mart High Data Marts access complexity Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

2.3 Layered architectures Dependent Data Mart and Operational Data Store Single ETL for the DW Data Marts are loaded from the DW More simple data access than in the previous case Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

2.3 Layered architectures Logical Data Mart and Active Warehouse The ETL is near real-time Data Marts are not separate databases, but logical views of the DW Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

2.3 DW vs. Data Marts DW Application independent Centralized, Planned DW Historical, detailed, summarized Lightly denormalized Scope Data Marts Specific DSS application Decentralized by user area Organic, possibly not planned Data Data Marts Some history,detailed, summarized Highly denormalized DW Multiple subjects DW Many internal and external sources DW Flexible Data-oriented Long life Large Single complex structure Subjects Sources Data Marts One central subject Data Marts Few internal and external sources Other characteristics Data Marts Restrictive Project oriented Short life Start small, becomes large Multiple,semi-complex structure, together complex Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

2.3 Layered architectures Generic Three-Tier Architecture Derived data Data that had been selected, formatted, and aggregated for DSS support Reconciled data Detailed, current data intended to be the single, authoritative source for all decision support Derived data Data Mart Reconciled data DW and ODS Operational data Operational data Data mart metadata DW metadata Operational metadata Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

2.3 Layered architectures One-Tier Architecture Theoretically possible Might be interesting for mobile applications N-Tier Architecture Higher tier architecture is also possible But the complexity grows with the number of tier-interfaces Web-based Architecture Advantages: Usage of existing software, reduction of costs, platform independence Disadvantages: Security issues: data encryption/user access and identification Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

2.4 Distributed DW In most cases the economics and technology greatly favor a single centralized DW But in some cases, distributed DW make sense Types of distributed DW Geographically distributed Local DW/global DW Technologically distributed DW Logically one DW, physically more DW Independently evolving distributed DW Uncontrolled growth Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

2.4 Distributed DW Geographically distributed In the case of corporations spread around the world Information is needed both locally and globally A distributed DW makes sense When much processing occurs at the local level Even though local branches report to the same balance sheet, the local organizations are their own companies Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

2.4 Distributed DW All IBM Europe Site A Local DW Local operational processing IBM/Teradata USA Local DW HQ Local operational processing Sybase Asia Site B Local DW Local operational processing Global DW Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

2.4 Distributed DW Technologically distributed DW Placing the DW on the distributed technology of a vendor Advantages The entry cost is cheap large centralized hardware is expensive No theoretical limit to how much data can be placed in the DW we can add new servers to the network Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

2.4 Distributed DW As the DW starts to expand network data communication starts playing an important role Example: Let s simplify and consider we have 4 nodes holding each data regarding the last 4 years Now let s consider we have a query which needs to access the data from the last 4 years: such a query arises the issue of transporting large amount of data between processors 2005 2006 2007 2008 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

2.4 Distributed DW Independently evolving distributed DW In practice there are many cases in which independent DW are developed concurrently and uncontrolled in the same organization The first step many corporations make is to build a DW for financial or marketing Once it is successfully set up, other parts of the organization follow independently the process resulting in the coexistence of more independent DW in the same organization This problem will be addressed later Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

2.5 Middleware Middleware-Systems Provide an inter-connectivity layer between heterogeneous platforms and the applications that come on top Application Application Application APIs Middleware-System Platform-interface Platform -Hardware - Operating System Platform-interface Platform -Hardware - Operating System Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

2.5 Middleware Middleware in DW? DW usually implies Heterogeneous hardware, databases, operating systems, networks and applications Middleware serves both users and developers It shields both users and developers from differences in services and resources used by applications Without middleware Changes at the lower layers could imply propagating changes by updating the higher layers Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58

2.5 Middleware Roles of Middleware Assist the developer in ETL and populating the DW Assist DW users in accessing the DW It is therefore needed at different points in the life cycle Types: Copy management: data extraction, transformation, Gateways: DB and independent gateways Program to program: RPC, ORBs Message oriented Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59

2.5 Middleware Most common middleware technologies CORBA (Common Object Request Broker Architecture) DCOM (Distributed Component Object Model) J2EE in DW Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

2.5 Middleware CORBA Mechanism for normalizing method-call semantics between application objects on the same host or on remote host (client) main() Object reference (server) main() Object implementation reference ORB vendor code ORB vendor-tool generated code User-defined application code Generated stub code ORB network Generated skeleton code ORB Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61

2.5 Middleware ORB Is a middleware technology that manages communication and data exchange between objects in object-oriented programming and databases ORB Client app Establish connection Client Service communication Object implementation (service) Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62

2.5 Middleware Client The application program that invokes a method or operation on an object implementation Stub Precompiled interface between the client and the ORB, generated by the ORB tool ORB An interface containing help functions and APIs that can be used by a client or an object implementation BOA (Basic Object Adapter) Refers to the part of the ORB responsible for managing server-side operations Replaced by the POA (Portable Object Adapter) Skeleton The server-side analogue of stubs Implementation Called a service or method in object-oriented terminology, defines the operations supporting an interface definition language(idl) Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63

2.5 How CORBA works 1. the client makes the call through the stub to the ORB 2. the ORB dispatches the call to the BOA, that does the object activation 3. the implementation registers itself, if necessary, and declares itself ready 4. the BOA, now signaled ready, invokes the implementation via the skeleton from IDL 5. a response or exception propagates up to the client caller. Client Stub ORB BOA Skeleton Implementation 1 2 3 4 5 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 64

2.5 Middleware CORBA in DW Query Service Supports SQL and OQL Object Transaction Service Ensure correct state of transactional objects Distributed commit/rollback Guarantees ACID properties It is able to send copies of multidimensional data Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 65

2.5 Middleware DCOM Microsoft's concurrent for CORBA Can access distributed stored data through ADO (ActiveX Data Objects) ADO uses for the actual database access OLE DB (Object Linking and Embeding DB) and ODBC (Open DB Conectivity) There is also a multidimensional ADO ADO MD It contains objects for communication of data cubes Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 66

2.5 Middleware J2EE in DW Not fit for storage and analyze of a multidimensional DB JOLAP offers a programming interface for analytical access to the DW A Java community initiative, sustained by SUN and Oracle Lack of effective support OLAP4J Is simply put a multidimensional JDBC Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 67

2.5 Middleware So why is middleware important? Heterogeneous Hardware, Data sources, Data targets, Platforms, Operating systems Communication protocols Connectivity Platform and Application independence Support of standard protocols and interfaces Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 68

Next lecture Modeling Basics of data modeling Data models in DW Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 69