Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration



Similar documents
Data Warehousing and Data Mining

Data Warehousing Systems: Foundations and Architectures

IST722 Data Warehousing

Data Warehouse: Introduction

Part 22. Data Warehousing

Advanced Data Management Technologies

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Oracle Business Intelligence 11g Business Dashboard Management

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

SAS BI Course Content; Introduction to DWH / BI Concepts

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Warehouse Design

14. Data Warehousing & Data Mining

Lection 3-4 WAREHOUSING

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehouse Overview. Srini Rengarajan

A Critical Review of Data Warehouse

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data Warehousing and Data Mining in Business Applications

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Understanding Data Warehousing. [by Alex Kriegel]

Week 3 lecture slides

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Introducing Oracle Exalytics In-Memory Machine

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Data Warehousing and Data Mining Introduction

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Master Data Management and Data Warehousing. Zahra Mansoori

CHAPTER 4 Data Warehouse Architecture

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Turkish Journal of Engineering, Science and Technology

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Establish and maintain Center of Excellence (CoE) around Data Architecture

A Survey on Data Warehouse Architecture

Hybrid Support Systems: a Business Intelligence Approach

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Data W a Ware r house house and and OLAP II Week 6 1

Data Warehousing and OLAP Technology for Knowledge Discovery

Fluency With Information Technology CSE100/IMT100

Data warehouse Architectures and processes

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

Building Cubes and Analyzing Data using Oracle OLAP 11g

Data Warehouse (DW) Maturity Assessment Questionnaire

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

MDM and Data Warehousing Complement Each Other

Data Integration and ETL Process

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Data Warehousing, OLAP, and Data Mining

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Data warehouse and Business Intelligence Collateral

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Class News. Basic Elements of the Data Warehouse" 1/22/13. CSPP 53017: Data Warehousing Winter 2013" Lecture 2" Svetlozar Nestorov" "

How to Enhance Traditional BI Architecture to Leverage Big Data

An Architectural Review Of Integrating MicroStrategy With SAP BW

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

A Service-oriented Architecture for Business Intelligence

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

When to consider OLAP?

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Week 13: Data Warehousing. Warehousing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Breadboard BI. Unlocking ERP Data Using Open Source Tools By Christopher Lavigne

Super-Charged Oracle Business Intelligence with Essbase and SmartView

Safe Harbor Statement

BENEFITS OF AUTOMATING DATA WAREHOUSING

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing & Business Intelligence

Microsoft Data Warehouse in Depth

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

A Comparison of Business Intelligence Strategies and Platforms

Chapter 5. Learning Objectives. DW Development and ETL

Introduction to Data Warehousing. Ms Swapnil Shrivastava

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

PREFACE INTRODUCTION MULTI-DIMENSIONAL MODEL. Chris Claterbos, Vlamis Software Solutions, Inc.

Transcription:

DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course 2 End User Applications (EUA) EUA Concepts The business impact of the DW! Canned reports End user application templates Provide answers to common questions Can be used as (quality-assured) building blocks for other reports Two extremes Ad hoc strategic analysis, power users, DIY query tools Fixed operational analysis, report consumers, operational reporting EUA fills the gap Tactical analysis, push-button knowledge workers Templates Layout/structure + parameters Compare sales per product in <area> for <period1> and <period2> Parameters - chosen at run-time Come from any level of the given dimension drill-down Time (All time, 2002, 2002 4Q, 2002 Dec, 2002 Dec 1) possible Many different Identify report candidates Produce a list of candidates Consolidate candidate list Categorize candidates by data elements Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4

What Templates to Choose? Overview Analytical Cycle Steps (repeats) 1) How s business? current performance 2) What are the trends? performance over time 3) What s unusual? quick identification of exceptions (+/-) 4) What is driving the exceptions? find causes for exceptions 5) What if? play around with parameters and see effect 6) Make a business decision small as well as big decisions 7) Implement the decision feed analysis results into op. systems Prioritize template list Rank or group templates implement 15 most important at first DW Front End Tools Source Integration DW architecture Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6 Data Integration Research Projects Focus on source integration and update propagation Wrapper: convert source data into a standard format Information Manifold Sources: databases, SGML docs, unstructured files, Relational integration data model TSIMMIS Wrapper/mediator Semi-structured OEM integration data model Squirrel Powerful integration mediator WHIPS Wrapper/mediator Relational integration data model Views on DW Metadata Most DW projects: DW architecture as a stepwise flow of information from source to analyst No conceptual domain model used for integration Some questions cannot be answered DWQ project: extended metamodel to capture all relevant aspects Aalborg University 2007 - DWML course 7 Aalborg University 2007 - DWML course 8

Using DW Metadata in the Enterprise Analyst: Why can t I answer question X? Analyst wants to analyze data Gather data from operational departments through OLTP Question travels through (1)-(5) Traditional DW (previous slide) only describes step (3)-(4) Cannot solve problems like why can t I answer quest. X? Conceptual relationships between enterprise model, operational models + DW must be captured Everything is a view on the enterprise model! ( local as view ) unlike previous slide (1) (2) (5) (4) (3) Possible reasons Certain measures not included in fact table Granularity of facts too coarse Particular dimensions not in DW Descriptive attributes missing from dimensions Meaning of attributes/measures deviate from the analyst s expectation Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10 DWQ Metadata Three metadata perspectives must be captured Conceptual (enterprise) Logical (data model) Physical (data flow) Framework instantiated by conceptual, logical, and physical information models DW quality heavily depends on DW processes rather than schemas A process meta model is needed to capture process definitions, and the relationships to DW quality Source integration practice Focus on information integration in databases (schema and data) Two main approaches Constructing integrated enterprise model Focus on mappings between sources and DW Tools for DW management Schema integration Metadata management Based on modeling tools Tools for data integration Mapping specification ETL tools like last lecture Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12

Schema Integration Producing one global schema (one-shot or incremental) Pre-integration Analyzing and annotating source schemata Semantic enrichment of schema, often in richer data model Schema comparison Determine correlations/conflicts among schema concepts Heterogeneity conflicts different source data models Naming conflicts homonyms and synonyms Semantic conflicts different abstraction levels Structural different constructs Schema conforming Conform/align schemas to make them compatible Typically semi-automatic process Schema merging and restructuring Superimpose conformed schemas Quality: completeness, correctness, minimality, understandability Virtual Data Integration Only data definition is integrated Data only in sources, queries on views, queries shipped to sources Not suited for DW? Carnot Individual schemata mapped onto rich GCL ontology (1. order logic) Articulation axioms specify mappings, queries mapped to GCL SIMS Creates common class-based domain model to describe sources Sources are dynamically chosen and integrated at query time Query reformulation, access planning, optimization, execution Information Manifold Relational world view + information source description + correspondences Metamodel enriched using description logic/datalog rules Datalog queries, optimized by choosing minimal sources TSIMMIS Wrappers wrap sources using semi-structured OEM model Mediator performs its own integration no global integration (global as view) Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14 Materialized Data Integration Views on source data are materialized in integrated Squirrel Integration mediators incrementally maintain materialized views Cooperation of sources required WHIPS Relational SPJ + aggregation views specified in view tree View manager computes view and handles updates Integrator ensures view maintainability Global query processor queries sources using wrappers/mediators In combination with virtual integration? DWQ Source Integration Current DW tools cannot fully support DW quality No support for validation of interschema assertions and other specified relationships, i.e., the DW design process Conceptual perspective Domain model = enterprise model + source models Consolidated and reconciled description of important concepts Not all enterprise data captured (at first, incremental approach) Logic-based formalism allows reasoning over metadata Intermodel assertions capture interdependencies Logical perspective Source schemata + DW schema in logical data model (relational) Defined as queries over the corresponding conceptual component Physical perspective The actual data stores Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16

DWQ Source Integration Architecture DWQ Source Integration Methodology Note explicit mappings! Aalborg University 2007 - DWML course 17 Source-driven integration Enterprise and source model construction Source model integration (into the domain model) Source and DW schema specification (+ mappings) Data integration and reconciliation Quality analysis steps in all phases above Client-driven integration New client query considered Reasoning determines whether query can be answered by materialized views already in DW Query containment reasoning If DW not sufficient, materialize new concepts in domain model? Otherwise, new sources must be added using source-driven integr. Aalborg University 2007 - DWML course 18 Overview Lifecycle Overview DW Front End Tools Source Integration Technical Architecture Design Product Selection& Installation DW architecture Project Planning Business Requirements Definition Dimensional Modeling Physical Design Data Staging Design & Development Deployment Maintenance and Growth End-User Application Specification End-User Application Development Project Management Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20

Aalborg Copenhagen Aalborg Copenhagen Aalborg Bread Bread Copenhagen Bread Milk 57 123 2000 Milk 57 123 2000 Milk 57 123 2000 56 56 2001 56 45 127 45 127 2001 45 127 2001 67 67 67 211 211 211 Technical DW Architecture Central DW Existing databases and systems (OLTP) ETL Data Warehouse New databases and systems (OLAP) DWHow to organize DW and s? Data Marts Clients OLAP Data mining Visualization All data in one, central DW All client queries directly on the central DW Pros Simplicity Easy to manage Cons Bad performance due to no redundancy/ workload distribution Source Central DW Source Clients Aalborg University 2007 - DWML course 21 Aalborg University 2007 - DWML course 22 Federated DW Tiered Architecture Data stored in separate data marts, aimed at special departments Logical DW (i.e., virtual) Data marts contain detail data Pros Performance due to distribution Cons More complex Finance mart Source Mrktng mart Logical DW Source Clients Distr. mart Central DW is materialized Data is distributed to data marts in one or more tiers Only aggregated data in cube tiers Data is aggregated/reduced as it moves through tiers Pros Best performance due to redundancy+distribution Cons Most complex Hard to manage Central DW Milk 56 67 Bread Aalborg 57 45 211 Copenhagen 123 127 2000 2001 Milk 56 67 Bread Aalborg 57 45 211 Copenhagen 123 127 2000 2001 Aalborg University 2007 - DWML course 23 Aalborg University 2007 - DWML course 24

Coordination w. Development Strategy Operational Data Store (ODS) Different development strategies pose different demands to the architecture elements Example: Kimball Dimensional Modeling Centralized design of (conforming) dimensions First, design of a single-source data mart Later, design of multi-source data marts Integration of existing data marts into new data marts The DW is just the union of the marts it is composed of Entails top-down ( Bus Architecture ) and bottom-up elements Consequences No initial design of DW, from which data marts are extracted Data is extracted directly from sources to data marts Allows distribution of data marts and computation on them Existing databases and systems (OLTP) New ODS ETL DW OLAP Data mining Visualization Aalborg University 2007 - DWML course 25 Aalborg University 2007 - DWML course 26 Operational Data Store I a subject oriented, integrated, volatile, current valued data store containing only corporate detailed data (Inmon et al.) A database which integrates and accumulates operational data in a subject-oriented structure Not dimensional, but ordinary relational An extra level between operational systems and dimensional structures Two benefits sought Integration of operational systems Basis for data warehouse Operational Data Store II ODS - pros More modeling choices The dimensional straightjacket can force sub-optimal design decisions hiding the true semantics of data No need to choose a granularity, and no need to exclude data In summary, no need to make design decisions that cannot be changed subsequently This means extra flexibility ODS cons Not feasible to do analysis directly on ODS extra complexity Separate ODS unnecessary, DW = ODS (Kimball et al.) Aalborg University 2007 - DWML course 27 Aalborg University 2007 - DWML course 28

MS Analysis Services IBM 2 OLAP Server Cheap Easy to use (R/M/H)OLAP technology Data placement as desired Intelligent pre-aggregation Server and client parts Reporting Services a separate tool Built-in data mining Decision trees Clustering MS OLE for OLAP interface Light version of Hyperion Essbase (OLAP market leader) Extra product on top of 2 (R/M/H)OLAP Data in 2 or in multidimensional structures Interfaces Hyperion Essbase API OLE for OLAP (promised) 2 can also handle aggregates Automatic summary tables Used by 2 optimizer Automatic maintenance by 2 Aalborg University 2007 - DWML course 29 Aalborg University 2007 - DWML course 30 Oracle 10g BI Based on Express OLAP product On the market since 1970! (R/M/H)OLAP Flexible data placement Integrates ROLAP strategy and Express OLAP Total integration with Oracle 10g RMS Storage, security, management, Best integration compared to MS and IBM Add-on data mining (10g Data Mining) Associations, classification, prediction, clustering Architecture Alternatives Cubes are smart Intuitive model Better overview Better suited for data analysis But logical cubes suffice Implementation hidden from user Architecture alternatives MS, IBM, Oracle Virtual cubes, physical cubes ROLAP, MOLAP Separate relational DW, cubes directly on source data Client tools 3*2 3 = 24 different possibilities (without clients) less in reality Aalborg University 2007 - DWML course 31 Aalborg University 2007 - DWML course 32

MS vs. IBM vs. Oracle All Good scalability Good analysis facilities Flexible storage (MOLAP, ROLAP, HOLAP) Incremental update Many client tools MS Analysis Server Built-in mining + good integration with MS SQL Server 2 OLAP Server Good integration with 2 Oracle Best RMS/MOLAP integration of the three All three products are good Dependent on the other choices + existing technical architecture Virtual vs. Physical Cubes Virtual cubes Logical cube specification directly on source data ROLAP implementation without aggregates + flexible, design can be changed quickly - performance, constant load on source Physical cubes Data for cube extracted and stored on OLAP server Several implementation choices possible + good performance, only source load at creation/update - harder to change design Aalborg University 2007 - DWML course 33 Aalborg University 2007 - DWML course 34 MOLAP vs. ROLAP MOLAP Data in specialized data structure, optimized for OLAP + best performance, least space consumption - changing design requires rebuilding, scalability at detail level?, detail data stored several times ROLAP Data in RMS + more flexible change of design, scalable for detail data - not as good performance, larger space consumption HOLAP Detail data in RMS (can be source ) Aggregates in multidimensional structure + good performance for higher-level queries, detail data only stored once - handling design changes, operational complexity Separate Data Warehouse? Separate DW Integration of source data in DW Cubes built from DW Sometimes the only solution + better integration and cleansing, less load on existing servers - larger complexity, design changes, updating DW Cubes directly on source data Cubes built directly from source data Cannot handle all cases + less complexity, easier to change design, no update of DW - cannot handle all forms of integration and cleansing, more load on operational servers Aalborg University 2007 - DWML course 35 Aalborg University 2007 - DWML course 36

Choosing Client Tools Many OLAP clients on the market, e.g., Hyperion, Targit, Oracle MS Reporting Services Client and server communicates via an API MS OLE for OLAP De facto standard Supported by almost all client tools Hyperion Essbase API Supported by many client tools Some criteria Functionality (web distribution, analysis, reporting, ) Support Price Architecture Alternatives - Conclusion Architecture alternatives, their pros and cons No simple general choices Choices dependent on the concrete situation Look at books Look at requirements specs Look at the latest products Think about prototyping Aalborg University 2007 - DWML course 37 Aalborg University 2007 - DWML course 38 Summary Mini Project DW Front End Tools Source Integration DW architecture New subtask Build a few reports in Reporting Services to answer important business questions you proposed in part (1a) Discuss the architecture of your DW system Discuss source integration in your system MS Reporting Services Tutorial http://msdn2.microsoft.com/en-us/library/ms170246.aspx Aalborg University 2007 - DWML course 39 Aalborg University 2007 - DWML course 40