Advanced Data Management Technologies



Similar documents
Data Warehousing and Data Mining Introduction

Data Warehousing and Data Mining

A Survey on Data Warehouse Architecture

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehouse: Introduction

DATA WAREHOUSING AND OLAP TECHNOLOGY

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Week 3 lecture slides

Why Business Intelligence

Data Warehouse Design

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Data warehouse and Business Intelligence Collateral

Data Warehousing Systems: Foundations and Architectures

Dimensional Modeling for Data Warehouse

Data Mining for Successful Healthcare Organizations

Introduction to Datawarehousing

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Data Warehouse Overview. Srini Rengarajan

MDM and Data Warehousing Complement Each Other

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Part 22. Data Warehousing

Lection 3-4 WAREHOUSING

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

OLAP Theory-English version

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Understanding Data Warehousing. [by Alex Kriegel]

Class News. Basic Elements of the Data Warehouse" 1/22/13. CSPP 53017: Data Warehousing Winter 2013" Lecture 2" Svetlozar Nestorov" "

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

14. Data Warehousing & Data Mining

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Fluency With Information Technology CSE100/IMT100

Data warehouse Architectures and processes

IST722 Data Warehousing

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data Warehousing and Data Mining in Business Applications

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Introduction to Data Warehousing. Ms Swapnil Shrivastava

ETL and Advanced Modeling

Data warehouse life-cycle and design

Chapter 3 - Data Replication and Materialized Integration

Turkish Journal of Engineering, Science and Technology

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Information assets are immensely valuable to any enterprise, and because of this,

When to consider OLAP?

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University

A Review of Data Warehousing and Business Intelligence in different perspective

Business Intelligence. 1. Introduction September, 2013.

New Approach of Computing Data Cubes in Data Warehousing

Data Warehousing and OLAP Technology for Knowledge Discovery

Driving Peak Performance IBM Corporation

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

Week 13: Data Warehousing. Warehousing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Overview of Data Warehousing and OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

B.Sc (Computer Science) Database Management Systems UNIT-V

Designing a Dimensional Model

SAS BI Course Content; Introduction to DWH / BI Concepts

Data Mart/Warehouse: Progress and Vision

CHAPTER 4 Data Warehouse Architecture

Hybrid Support Systems: a Business Intelligence Approach

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

CHAPTER 3. Data Warehouses and OLAP

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Indexing Techniques for Data Warehouses Queries. Abstract

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

Lecture 2: Introduction to Business Intelligence. Introduction to Business Intelligence

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

BUILDING A HEALTH CARE DATA WAREHOUSE FOR CANCER DISEASES

Chapter 5. Learning Objectives. DW Development and ETL

Data Warehousing and OLAP

Multi-dimensional index structures Part I: motivation

Presented by: Jose Chinchilla, MCITP

Transcription:

ADMT 2015/16 Unit 2 J. Gamper 1/44 Advanced Data Management Technologies Unit 2 Basic Concepts of BI and Data Warehousing J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements: I am indebted to Michael Böhlen and Stefano Rizzi for providing me their slides, upon which these lecture notes are based.

ADMT 2015/16 Unit 2 J. Gamper 2/44 Outline 1 Definition of Data Warehouse 2 Multidimensional Model 3 Methodological Framework to Build a DW 4 DW Project Management

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 3/44 Outline 1 Definition of Data Warehouse 2 Multidimensional Model 3 Methodological Framework to Build a DW 4 DW Project Management

Definition of Data Warehouse BI: Key Problems 1 Complex and unusable models in operational systems Many DB models are difficult to understand DB models do not focus on a single clear business purpose 2 Same data found in many different systems Examples: customer data in many different systems, residential address of citizens in many public administration DBs, etc. The same concept is defined and stored differently 3 Data is suited only for operational systems Accounting, billing, etc. Do not support analysis across business functions 4 Data quality is bad Missing data, imprecise data, different use of systems 5 Data are volatile Data deleted in operational systems (6 months) Data change over time no historical information ADMT 2015/16 Unit 2 J. Gamper 4/44

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 5/44 BI: Solution A new analysis environment with a data warehouse at the core, where data is integrated (logically and physically), subject oriented (versus function oriented), supporting management decisions (different from organization), stable (data is not deleted, several versions), time variant (data can always be related to time).

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 6/44 Definition of a Data Warehouse [Barry Devlin, IBM] A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 7/44 Definition of a Data Warehouse [William H. Inmon] A data warehouse is a subject-oriented, integrated, time-varying, and non-volatile collection of data that is used primarily in organizational decision making.

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 8/44 Requirements for a DW Architecture Separation: Analytical and transaction processing should be kept apart as much as possible. Scalability: HW and SW should be easy to upgrade as the data volume and the number of user requirements increase. Extensibility: Should be possible to host new applications and technologies without redesigning the whole system. Security: Monitoring accesses is essential because of the strategic data stored in DW. Administerability: Administration not too difficult.

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 9/44 Single-layer DW Architecture [Golfarelli & Rizzi] Only source layer is physical DW exists only virtually as view Not frequently used in practice + Mimimizes amount of stored data No separation between analytical and transactional processing, hence queries affect regular workload No additional data can be stored

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 10/44 Two-layer DW Architecture [Golfarelli & Rizzi] Source layer and DW exist physically clear separation Data staging: extraction, transformation, and cleaning of data (Primary) DW can be source for data marts + Clearly separates analytical and transactional processing, hence queries do not affect regular workload + DW is structured according to multidimensional model + DW is accessible, even if source systems are unavailable

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 11/44 Data Mart (DM) A data mart is a subset or an aggregation of the data stored in a primary DW. It includes a set of information pieces relevant to a specific business area, corporate department, or category of users. DMs are typically populated from a primary DW (dependent DM) Might also be populated direclty by data sources (independent DM) Used as building blocks in incremental DW design Can deliver better performance

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 12/44 Three-layer DW Architecture [Golfarelli & Rizzi] Reconciled layer (in addition to source and DW layer): Materialization of integrated, clean and consistent operational data + Reconciled layer is common reference data model for whole enterprise, i.e., single, detailed, comprehensive, and top-quality data source + Separates source data extraction from DW population More data redundancy

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 13/44 Three-layer DW Architecture [Kimball]

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 14/44 Data Integration Two different ways to integrate the data from the sources: Query-driven Warehouse-driven

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 15/44 Query-driven Data Integration Data is integrated on demand (lazy) PROS Access to most up-to-date data (all source data directly available) No duplication of data CONS Delay in query processing due to slow (or currently unavailable) information sources complex filtering and integration Inefficient and expensive for frequent queries Competes with local processing at sources Data loss at the sources (e.g., historical data) cannot be recovered Has not caught on in industry

Definition of Data Warehouse ADMT 2015/16 Unit 2 J. Gamper 16/44 Warehouse-driven Data Integration Data is integrated in advance (eager) Data is stored in DW for querying and analysis PROS High query performance Does not interfere with local processing at sources Assumes that DW update is possible during downtime of local processing Complex queries are run at the DW OLTP queries are run at the source systems CONS Duplication of data The most current source data is not available Has caught on in industry

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 17/44 Outline 1 Definition of Data Warehouse 2 Multidimensional Model 3 Methodological Framework to Build a DW 4 DW Project Management

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 18/44 Multidimensional Model/1 Key for representing and querying information in a DW. Stores information about enterprise-specific facts that affect decisions The occurrence of a fact is often called event Facts are characterized by measures: numerical values that provide a quantitative description of events dimensional attributes: provide different perspectives for analyzing the facts (Data) Cube: metaphor to store facts/events in an n-dimensional space cells store measures axes store dimensions

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 19/44 Multidimensional Model/2 Multidimensional model is very easy to use and understand (main reason for success) Examples of typical queries in DW What is the total amount of receipts recorded last year per state and per product category? What is the relationship between the trend of PC manufacturers shares and quarter gains over the last five years? Which orders maximize receipts? What is the relationship between profit gained by the shipments consisting of less than 10 items and the profit gained by the shipments of more than 10 items? There are a number of operators that allow to answer such queries easily ( OLAP)

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 20/44 Online Analytical Processing (OLAP) OLAP is an approach to answer multi-dimensional analytical queries swiftly Essentially aggregations along different dimensions Slicing and dicing Aggregation

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 21/44 OLTP versus OLAP/1 Different query types in operational systems and DW OLTP in operational DB OLAP in DW On-Line Transaction Processing (OLTP) Many small queries on a small number of tuples from many tables that need to be joined Frequent updates The system is always available for both updates and reads Smaller data volume (few historical data) Complex data model (normalized) On-Line Analytical Processing (OLAP) Fewer, but bigger queries that typically need to scan a huge amount of records and doing some aggregation Frequent reads, in-frequent updates (daily, weekly) 2-phase operation: either reading or updating Larger data volumes (collection of historical data) Simple data model (multidimensional/de-normalized)

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 22/44 OLTP versus OLAP/2 A mix of analytical queries (OLAP) with transactional routine queries (OLTP) inevitably slows down the system This does not meet the needs of users of both types of queries Separate OLAP from OLTP by Creating a new repository (DW) that integrates data from various sources Makes data available for analysis and evaluation aimed at decision-making processes

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 23/44 OncoNet Example OncoNet is a (small) system for the management of patients undergoing a cancer therapy used in the Hospital of Meran > 200 tables Well-suited for daily management of patients But: statistical analysis are expensive takes up to 12 hours tables are locked for that time run queries over weekend A DW approach reduced the runtime of the same queries to a few seconds (BSc-thesis of A. Heinisch)

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 24/44 OLTP versus OLAP/3

Multidimensional Model ADMT 2015/16 Unit 2 J. Gamper 25/44 OLTP versus OLAP/4 OLTP is function-oriented, OLAP subject-oriented

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 26/44 Outline 1 Definition of Data Warehouse 2 Multidimensional Model 3 Methodological Framework to Build a DW 4 DW Project Management

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 27/44 Methodological Framework Building a DW is a very complex task It requires an accurate planning aimed at devising satisfactory answers to organizational and architectural questions A large number of organizations lack experience and skills that are required to meet the challenges involved in DW projects Reports of DW project failures state that a major cause lies in the absence of a global view of the design process, i.e., absence of a design methodology

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 28/44 Many Ways not to Do/1

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 29/44 Many Ways not to Do/2

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 30/44 Top-Down Approach Top-down approach: Analyze global business needs, plan how to develop a data warehouse, design it, and implement it as a whole Looks promising as it is based on a global picture of the goal to achieve, and in principle it ensures consistent, well integrated DW High cost estimates with long-term implementations discourage company managers Analyzing/integrating all relevant sources at the same time is a very difficult task, even because it is not very likely that they are all available and stable at the same time. It is extremely difficult to forecast the specific needs of every department, which can result in the analysis process coming to a standstill. Since no working system is delivered in the short term, users cannot check for this project to be useful, so they lose trust and interest in it.

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 31/44 Bottom-Up Approach Bottom-up approach: DW is incrementally built by iteratively creating several data marts Each data mart is based on a set of facts that are linked to a specific department and that can be interesting for a user group Leads to concrete results in a short time Does not require huge investments Enables designers to investigate one area at a time Gives managers a quick feedback about the actual benefits of the system being built Keeps the interest for the project constantly high May determine a partial vision of the business domain

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 32/44 The Life-cycle Goal Setting and Planning Set system goals, borders, and size Select an approach for design and ipmlementation Estimate costs and benefits Analyze risks and expectaions Examine the skills of the working team

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 33/44 The Life-cycle Infrastructure Design Analyze and compare the possible architectural solutions Assess the available technologies and tools Create a preliminary plan of the whole system

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 34/44 The Life-cycle Design and Development of DMs Every iteration causes a new DM and new applications to be created and progressivley added to the DW system.

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 35/44 Data Mart Design Phases

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 36/44 The First Data Mart Is the one playing the most strategic role for the enterprise Should be a backbone for the whole DW Should lean on available and consistent data sources

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 37/44 Supply- vs. Demand-driven DM Design Supply-driven (data-driven) approach Begin with an analysis of operational data sources User requirements show designers which groups of data should be selected - User requirements play a minor role Demand-driven (requirement-driven) approach Begin with the definition of information requirements of DM users The problem of how to map those requirements into existing data sources is addressed at a later stage + User requirements play a leading role

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 38/44 Supply-driven DM Design

Methodological Framework to Build a DW ADMT 2015/16 Unit 2 J. Gamper 39/44 Demand-driven DM Design

DW Project Management ADMT 2015/16 Unit 2 J. Gamper 40/44 Outline 1 Definition of Data Warehouse 2 Multidimensional Model 3 Methodological Framework to Build a DW 4 DW Project Management

DW Project Management ADMT 2015/16 Unit 2 J. Gamper 41/44 DW Project Management DW projects are large and different from ordinary SW projects 12-36 months and $1+ million per project Data marts are smaller and safer (bottom up approach) Reasons for failure Lack of proper design methodologies High HW+SW cost Deployment problems (lack of training) Organizational changes are difficult (new processes, data ownership,... ) Ethical issues (security, privacy,... ) Creation of a Business Intelligence Competence Center (BICC) is crucial for success

DW Project Management ADMT 2015/16 Unit 2 J. Gamper 42/44 Business Intelligence Competence Center (BICC)/1 Combines competences from different but crucial sectors Leads and is responsible for the DW project

DW Project Management ADMT 2015/16 Unit 2 J. Gamper 43/44 Business Intelligence Competence Center (BICC)/2 BICC requires a change in the organization (difficult!) No best place, but has strategic importance

ADMT 2015/16 Unit 2 J. Gamper 44/44 Summary A Data Warehouse (DW) is at the core of BI; provides a complete, consistent, subject-oriented and time-varying collection of the data; provides comprehensive knowledge about your business; allows to separte OLAP from OLTP. A good DW is a prerequisite for BI, but a DW is a means rather than a goal... it is only a success if it is heavily used. Single-, two-, and three-layer architectures of DWs Separate OLAP from OLTP by creating a DW Building a DW is a complex task There is a lack of experience and of a methodological framework; Top-down versus Bottom-up design Supply-driven versus demand-driven DM design Life-cycle of DW: goal setting and planning, infrastructure design, iterative design and development of DMs Creation of a Business Intelligence Competence Center is crucial for success.