Data warehouse Architectures and processes



Similar documents
Data Warehouse: Introduction

Data warehouse design

Introduction to Datawarehousing

Data Warehousing Concepts

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Chapter 3 - Data Replication and Materialized Integration

Introduction. Introduction to Data Warehousing

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Original Research Articles

When to consider OLAP?


DATA WAREHOUSING AND OLAP TECHNOLOGY

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Fluency With Information Technology CSE100/IMT100

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Understanding Data Warehousing. [by Alex Kriegel]

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

OPERATIONAL DATA STORE

Optimizing Your Data Warehouse Design for Superior Performance

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE

Framework for Data warehouse architectural components

Whitepaper. Data Warehouse/BI Testing Offering YOUR SUCCESS IS OUR FOCUS. Published on: January 2009 Author: BIBA PRACTICE

Data Warehousing Systems: Foundations and Architectures

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

Patterns of Information Management

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Advanced Data Management Technologies

Data Warehousing and OLAP Technology for Knowledge Discovery

MDM and Data Warehousing Complement Each Other

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data-Warehouse-, Data-Mining- und OLAP-Technologien

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

SAS BI Course Content; Introduction to DWH / BI Concepts

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Dimensional Modeling for Data Warehouse

Data Integration and ETL Process

Mario Guarracino. Data warehousing

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Week 3 lecture slides

Architecting Real-Time Data Warehouses with SQL Server

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Lection 3-4 WAREHOUSING

Data Warehouse Design

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

A Survey on Data Warehouse Architecture

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

IST722 Data Warehousing

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Enterprise Data Quality

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

The Data Quality Continuum*

10 Biggest Causes of Data Management Overlooked by an Overload

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

The Quality Data Warehouse: Solving Problems for the Enterprise

Building an Effective Data Warehouse Architecture James Serra

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

The Role of the BI Competency Center in Maximizing Organizational Performance

How to Enhance Traditional BI Architecture to Leverage Big Data

IBM WebSphere DataStage Online training from Yes-M Systems

Data Warehousing and Data Mining

Sterling Business Intelligence

ETL PROCESS IN DATA WAREHOUSE

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Multidimensional Modeling - Stocks

DATA MINING AND WAREHOUSING CONCEPTS

Data warehousing with PostgreSQL

Course Outline. Module 1: Introduction to Data Warehousing

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

An Architecture for Integrated Operational Business Intelligence

Oracle Architecture, Concepts & Facilities

Cúram Business Intelligence and Analytics Guide

A Service-oriented Architecture for Business Intelligence

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Reverse Engineering in Data Integration Software

Part 22. Data Warehousing

and NoSQL Data Governance for Regulated Industries Using Hadoop Justin Makeig, Director Product Management, MarkLogic October 2013

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Data Warehousing Fundamentals Student Guide

Designing a Dimensional Model

Establish and maintain Center of Excellence (CoE) around Data Architecture

SQL Server 2012 Business Intelligence Boot Camp

[callout: no organization can afford to deny itself the power of business intelligence ]

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

LEARNING SOLUTIONS website milner.com/learning phone

Transcription:

Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between transactional computing and data analysis avoid one level architectures Architectures characterized by two or more levels separate to a different extent data incoming into the data warehouse from analyzed data more scalable DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 2 Pag. 1

Two level architecture Database and data mining group, Metadata DW management OLAP servers ETL tools Analysis tools Data sources (operational and esternal) Source level Data warehouse Data marts Data warehouse level DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 3 Data analysis Two level architecture features Database and data mining group, Decoupling between source and DW data management of external (not OLTP) data sources (e.g., text files) data modelling suited for OLAP analysis physical design tailored for OLAP load Easy management of different temporal granularity of operational and analytical data Partitioning between transactional and analytical load On the fly data transformation and cleaning (ETL) DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 4 Pag. 2

Three level architecture Database and data mining group, OLAP servers ETL tools Data sources (operational and external) Metadata DW management Staging area ETL level Loading Data warehouse Analysis tools Data analysis Source level Data marts Data warehouse level DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 5 Three level architecture features Database and data mining group, Staging area: buffer area allowing the separation between ET management and data warehouse loading complex transformation and cleaning operations are eased provides an integrated model of business data, still close to OLTP representation sometime denoted as Operational Data Store (ODS) Introduces further redundancy more disk space is required for data storage DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 6 Pag. 3

Database and data mining group, Extraction, Transformation and Loading (ETL) Prepares data to be loaded into the data warehouse data extraction from (OLTP and external) sources data cleaning data transformation data loading Eased by exploiting the staging area Performed when the DW is first loaded during periodical DW refresh DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 7 Extraction Database and data mining group, Data acquisition from sources Extraction methods static: snapshot of operational data performed during the first DW population incremental: selection of updates that took place after last extraction exploited for periodical DW refresh immediate or deferred The selection of which data to extract is based on their quality DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 8 Pag. 4

Extraction Database and data mining group, It depends on how operational data is collected historical: all modifications are stored for a given time in the OLTP system bank transactions, insurance data operationally simple partly historical: only a limited number of states is stored in the OLTP system operationally complex transient: the OLTP system only keeps the current data state example: stock inventory operationally complex DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 9 Incremental extraction Database and data mining group, Application assisted data modifications are captured by ad hoc application functions requires changing OLTP applications (or APIs for database access) increases application load hardly avoidable in legacy systems Log based log data is accessed by means of appropriate APIs log data format is usually proprietary efficient, no interference with application load DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 10 Pag. 5

Incremental extraction Database and data mining group, Trigger based triggers capture interesting data modifications does not require changing OLTP applications increases application load Timestamp based modified records are marked by the (last) modification timestamp requires modifying the OLTP database schema (and applications) deferred extraction, may lose intermediate states if data is transient DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 11 Data cleaning Database and data mining group, Techniques for improving data quality (correctness and consistency) duplicate data missing data unexpected use of a field impossible or wrong data values inconsistency between logically connected data Problems due to data entry errors different field formats evolving business practices DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 12 Pag. 6

Data cleaning Database and data mining group, Each problem is solved by an ad hoc technique data dictionary appropriate for data entry errors or format errors can be exploited only for data domains with limited cardinality approximate fusion appropriate for detecting duplicates/similar data correlations approximate join: join on similar fields, without foreign key purge/merge problem: duplicate records identification outlier identification, deviations from business rules Prevention is the best strategy reliable and rigorous OLTP data entry procedures DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 13 Database and data mining group, Transformation Data conversion from operational format to data warehouse format requires data integration A uniform operational data representation (reconciled schema) is needed Two steps from operational sources to reconciled data in the staging area conversion and normalization matching (possibly) significant data selection from reconciled data to the data warehouse surrogate keys generation aggregation computation DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 14 Pag. 7

Data cleaning and transformation example C.so Duca degli Abruzzi 24 20129 Torino (I) name: Elena surname: Baralis Normalization address: C.so Duca degli Abruzzi 24 ZIP: 20129 city: Torino country: I Database and data mining group, name: Elena surname: Baralis Standardization address: Corso Duca degli Abruzzi 24 ZIP: 20129 city: Torino country: Italia name: Elena surname: Baralis Correction address: Corso Duca degli Abruzzi 24 ZIP: 10129 city: Torino country: Italia Adapted from Golfarelli, Rizzi, Data warehouse, teoria e pratica della progettazione, McGraw Hill 2006 DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 15 Data warehouse loading Database and data mining group, Update propagation to the data warehouse Update order that preserves data integrity 1. dimensions 2. fact tables 3. materialized views and indices Limited time window to perform updates Transactional properties are needed reliability atomicity DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 16 Pag. 8