Terminology and Definitions. Data Warehousing and OLAP. Data Warehouse characteristics. Data Warehouse Types. Typical DW Implementation



Similar documents
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

14. Data Warehousing & Data Mining

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Week 3 lecture slides

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Overview of Data Warehousing and OLAP

CS2032 Data warehousing and Data Mining Unit II Page 1

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Warehouse: Introduction

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Data W a Ware r house house and and OLAP II Week 6 1

Week 13: Data Warehousing. Warehousing

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Part 22. Data Warehousing

Turkish Journal of Engineering, Science and Technology

Data Warehousing and OLAP

When to consider OLAP?

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Lecture Data Warehouse Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Data Warehousing and OLAP Technology for Knowledge Discovery

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Lection 3-4 WAREHOUSING

DATA WAREHOUSING - OLAP

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

UNIT-3 OLAP in Data Warehouse

A Design and implementation of a data warehouse for research administration universities

Data Warehousing, OLAP, and Data Mining

CHAPTER 4 Data Warehouse Architecture

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Hybrid OLAP, An Introduction

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

OLAP and Data Warehousing! Introduction!

Dimensional Modeling for Data Warehouse

SAS BI Course Content; Introduction to DWH / BI Concepts

Data Warehousing Systems: Foundations and Architectures

New Approach of Computing Data Cubes in Data Warehousing

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Fluency With Information Technology CSE100/IMT100

Why Business Intelligence

MDM and Data Warehousing Complement Each Other

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

IBM WebSphere DataStage Online training from Yes-M Systems

A Critical Review of Data Warehouse

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

OLAP Theory-English version

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

IST722 Data Warehousing

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD )

CHAPTER 5: BUSINESS ANALYTICS

Data warehousing with PostgreSQL

Business Intelligence: Multidimensional Data Analysis

A Technical Review on On-Line Analytical Processing (OLAP)

1960s 1970s 1980s 1990s. Slow access to

Chapter 3 - Data Replication and Materialized Integration

Foundations of Business Intelligence: Databases and Information Management

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Introduction to Data Warehousing. Ms Swapnil Shrivastava

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

IBM Cognos Training: Course Brochure. Simpson Associates: SERVICE associates.co.uk


Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Multi-dimensional index structures Part I: motivation

Data Warehouses & OLAP

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Foundations of Business Intelligence: Databases and Information Management

Introduction to Datawarehousing

Data warehouse Architectures and processes

Foundations of Business Intelligence: Databases and Information Management

On-Line Application Processing. Warehousing Data Cubes Data Mining

Data W a Ware r house house and and OLAP Week 5 1

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Warehousing Concepts

Data Warehousing with Oracle

CHAPTER 4: BUSINESS ANALYTICS

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

SQL Server Analysis Services Complete Practical & Real-time Training

SAS Business Intelligence Online Training

Transcription:

Data Warehousing and OLAP Topics Introduction Data modelling in data warehouses Building data warehouses View Maintenance OLAP and data mining Reading Lecture Notes Elmasriand Navathe, Chapter 26 Ozsu and Valduriez, Chapter 16 U. Dayal and S. Chaudhuri. An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26, 1, (1997) pp 65-74. Terminology and Definitions DW provides access to data for complex analysis, knowledge discovery and decision support High performance demands OLAP - on line analytical support DSS - decision support systems also known as EIS (executive information systems) Traditional Databases support on-line transaction processing (OLTP) - updates, deletions and insertions 1 2 Data Warehouse characteristics Multidimensional conceptual view Unlimited dimensions and aggregation levels Unrestricted cross-dimensional operations Client-server architecture Multi user support Accessibility Transparency Intuitive data manipulation Flexible reporting Scalability Data Warehouse Types Enterprise-wide data warehouses - large projects with massive investment of time and resources Virtual data warehouses - provide views of operational Databases that are materialized for efficiency Data Marts - target a subset of the organization, such as departments 3 4 External Data Sources Operational Data Typical DW Implementation Business Modelling Backbone Networking Extraction/ Transformation/ Scrubbing Data Extraction Data Warehouse Metadata Warehouse Design Warehouse Management Performance Tuning Data Access & Analysis Application Integration Query Design & Management Data Warehousing Environments In a DW, the data that is subject to analysis is decoupled from the data produced at the source. Information in DW can be organised in a form that makes it easy to use for applications. Views: (from simple replication to arbitrarily complex processing). Information is available independently at the availability of the source. The views are materialised. Information is structured and stored as to optimise processing of queries against the DW. A small amount of cooperation is required from the source to keep the warehouse in sync when the sources change. 5 6

Data Modeling for DWs Multidimensional models hyper-cubes if more then 3 dimensions, region Two-dimensional matrix query performance much better then in the relational model. product there are typically three dimensions in corporate DW : fiscal periods, products and regions, 7 8 fiscal period A data cube Multidimensional displays Roll-up Display - moves up the hierarchy, grouping into larger units along a dimension product region for example from days to weeks, from weeks to months, from months to quarters, from quarters to years, etc 9 10 region The roll-up operation Multidimensional displays Drill-down Display - provides a finer grain view by dis -aggregating some dimensions Products 1xx Products 2xx for example from months to weeks, from weeks to days, etc Products nxx 11 12

The drill-down operation Tables in multi-dimensional model P123 styles P124 styles A B C A B C Sub-regions Two types of tables in multi-dimensional model: Dimension table - tuples of attributes of dimension, Fact table - contains measured variable(s) and identifies it (them) with pointers to dimension tables. 13 14 Multi-dimensional schemas Two common multi-dimensional schemas: Star schema - consists of a fact table with a single table for each dimension, Snowflake schema - a variation on the star schema - dimensional tables are organized into a hierarchy by normalizing them, product Prod No Fact table - business results Product Quarter Region A Star Schema Fiscal quarters Qtr Sales revenue for regions Region 15 16 Dimension tables A Snowflake Schema Dimension tables A fact constellation Fiscal qtrs FQ dates Prod. name Product Fact table Fact table 1 Business results Dimension Table Product Fact table 2 Business forecast P. line Sales revenue 17 18

Building a Data Warehouse The data must be extracted from multiple, heterogeneous sources The data must be formatted for consistency The data must be cleaned to ensure the validity The data must be fitted to the DW data model The data must be loaded into the DW Typical Data Warehouse Reports, analysis, strategic planning DW DB1 DB2 DB3 DBi Operational databases/ outside sources, heterogeneous pre-existing databases 19 20 Important Questions How up-to-date must the data be? Can the DW go off-line, and for how long? What are the data interdependencies? What is the storage availability? What are the distribution requirements - replication and partitioning? What is the loading time - including cleaning, formatting, coping, transmitting, index rebuilding? Data Storage Processes Storing the data according to the DW data model Creating and maintaining required data structures Creating and maintaining appropriate access path Supporting the updating of VIEWS Refreshing the data 21 22 Design Considerations Usage projection The fit of the data model Characteristics of available data sources Design of the meta data components Modular component design Design for a change Considerations of distributed and parallel architectures Difficulties with implementation Schema conflicts resolutions The administration of a DW is an intensive enterprise The quality control of the data Correct estimations of usage - optimization issues Selection of highly specialized team 23 24

Query Mediation Traditional, virtual tables A user query is decomposed into sub-queries that are executed by the data sources Answers are based always on current data Query performance (at several sites) for large sets of data is a problem A Query Mediation Architecture User query Virtual views Mediated sub-query Data source 25 26 Monitors Views are materialised Need for a view maintenance The question is: WHERE is the view maintenance performed? In the monitor architecture, the responsibility rests with the sources. Monitors are installed. A few prototypes/products adopted monitors IBM Starburst ConceptBase system Disadvantage is that additional work load is imposed on the data sources A Monitor Architecture User query Materialised views Data source View update Local update Sources are responsible for view refresh. 27 28 Data Warehouse Architecture User query Materialised views Base update Source data Local updates The warehouse is responsible for view refresh View Maintenance in DW Full re-computation DW taken down periodically for scheduled maintenance During this period all the views are re-derived from scratch from the data sources It is frequently accepted as the simplest and safest policy However, for yesterday s data, it is often wasteful, Time limitation for re-computation is often a big problem 29 30

Incremental Maintenance Only parts of DW that change are computed. 1. DW scheduled for maintenance as before but views are incrementally done. All changes made to the data sources during operation hours to be logged. 2. Maintenance can be dynamic, views always reflect fresh data. Problem is efficiency of dynamic maintenance and data quality. Approaches to Incremental View Maintenance Unrestricted base access Self-maintainable DW Run-time DW self-maintainable 31 32 Web Browser Web Server Platform Applications Web Browser The Internet Web Access to DW Web Browser Data Warehouse Management Middleware Metadata Web-Based DW Access How do you hone the environment to provide good performance to large numbers of users? How can you serve the needs of both web and nonweb clients? How can you provide Internet access to internal data warehousing applications without opening a security hole? How can you help users design queries for efficient processing? How do you limit unreasonable queries that would jam the system? 33 34 OLAP and Data Mining Provide a strong decision support environment Data Mining: Characterization of patterns inherent in the data, Development of a hypothesis from these patterns, Prediction of future behaviour. OLAP The first approach consists of the use of relational technology,suitably adapted and extended. The data is stored using tables, but the analysis operations are carried out efficiently using special data structures. This type of system is normally called ROLAP (Relational OLAP). The second, more radical, approach consists of storing data directly in a multi- dimensional form, using vector data structures. This type of system is called MOLAP (Multi-dimensional OLAP). Goals of Data Mining Prediction - to show a future potential behavior Identification - data patterns indicate the existence of an item, an event, or an activity Classification - partition the data based on identified combinations of parameters Optimization - optimize the use of limited resources such as time, money, space, material 35 36

Types of knowledge discovered during Data Mining Association rules - correlate the presence of the set of items with another set of values for another set of variables Classification hierarchies - to create a hierarchy of classes Sequential patterns - a sequence of action or events is sought Patterns within time series - similarities detected within positions of the time series Next Lecture Introduction to Business Process Technology 37 38