Query-Driven Method for Improvement of Data Warehouse Conceptual Model



Similar documents
On Implicitly Discovered OLAP Schema-Specific Preferences in Reporting Tool

Performance Measurement Framework with Indicator Life-Cycle Support

Determining Preferences from Semantic Metadata in OLAP Reporting Tool

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management

Triple-Driven Data Modeling Methodology in Data Warehousing: A Case Study

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented

Goal-Driven Design of a Data Warehouse-Based Business Process Analysis System

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

How To Write A Diagram

Data warehouse Architectures and processes

RapidResponse Training Catalog

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Data Warehouse Design

Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Methodology Framework for Analysis and Design of Business Intelligence Systems

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

OLAP AND DATA WAREHOUSE BY W. H. Inmon

Dimensional Modeling for Data Warehouse

Performance Measurement Framework with Formal Indicator Definitions

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

When to consider OLAP?

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

Data Warehouse Architecture Overview

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Towards the Next Generation of Data Warehouse Personalization System A Survey and a Comparative Study

SQLMutation: A tool to generate mutants of SQL database queries

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Course MIS. Foundations of Business Intelligence

OLAP Theory-English version

My Favorite Issues in Data Warehouse Modeling

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Performance Measurement Framework with Indicator Life-Cycle Support

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Fluency With Information Technology CSE100/IMT100

Chapter 3 - Data Replication and Materialized Integration

Oracle BI 11g R1: Build Repositories

Data warehouse design

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

SAS BI Course Content; Introduction to DWH / BI Concepts

Advanced Data Management Technologies

CHAPTER 4 Data Warehouse Architecture

WhatsUp Gold v11 Features Overview

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

CHAPTER 4: BUSINESS ANALYTICS

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Implementing Data Models and Reports with Microsoft SQL Server

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Lecture Data Warehouse Systems

Data warehouse life-cycle and design

The Benefits of Data Modeling in Business Intelligence

DBMS / Business Intelligence, Business Intelligence / DBMS

Dx and Microsoft: A Case Study in Data Aggregation

Foundations of Business Intelligence: Databases and Information Management

SimCorp Solution Guide

14. Data Warehousing & Data Mining

Oracle Database: Introduction to SQL

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

SQL Server 2012 Business Intelligence Boot Camp

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

2 Associating Facts with Time

Oracle Database: Introduction to SQL

Why Business Intelligence

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

IBM Cognos Training: Course Brochure. Simpson Associates: SERVICE associates.co.uk

CHAPTER 5: BUSINESS ANALYTICS

Business Intelligence Tutorial

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Essbase Integration Services Release 7.1 New Features

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Mario Guarracino. Data warehousing

Lection 3-4 WAREHOUSING

ACSG 552 Assignment #1 Spring 2009 Diana Oboikovitz. Due in class on Thursday, Jan. 22:

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

RS MDM. Integration Guide. Riversand

TOWARDS A FRAMEWORK INCORPORATING FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS FOR DATAWAREHOUSE CONCEPTUAL DESIGN

Week 3 lecture slides

University of Gaziantep, Department of Business Administration

Zero Downtime Deployments with Database Migrations. Bob Feldbauer

MySQL for Beginners Ed 3

Tracking System for GPS Devices and Mining of Spatial Data

Tableau Metadata Model

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

Data Testing on Business Intelligence & Data Warehouse Projects

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

Triggers & Packages. {INSERT [OR] UPDATE [OR] DELETE}: This specifies the DML operation.

Data Mart/Warehouse: Progress and Vision

Data W a Ware r house house and and OLAP Week 5 1

Serious Threat. Targets for Attack. Characterization of Attack. SQL Injection 4/9/2010 COMP On August 17, 2009, the United States Justice

Ignite Your Creative Ideas with Fast and Engaging Data Discovery

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Semantic Analysis of Business Process Executions

Using Oracle Data Integrator with Essbase, Planning and the Rest of the Oracle EPM Products

Oracle Database: Introduction to SQL

Transcription:

This work has been supported by ESF project No. 2009/0216/1DP/1.1.1.2.0/09/APIA/VIAA/044 Query-Driven Method for Improvement of Data Warehouse Conceptual Model Darja Solodovnikova, Laila Niedrite, Aivars Niedritis University of Latvia International Conference on Information Systems Development 2012, Prato to, Italy

Multidimensional Model of Data Warehouses Data warehouses (DW) are based on multidimensional models, which contain: facts (the goal of the analysis), measures (quantitative data), dimensions (qualifying data), dimension attributes that form classification hierarchies. Approaches to the development of conceptual models of DW : supply driven (also known as data-driven) demand driven named according to the requirements elicitation method, user-driven, process-driven, goal-driven etc.

Data Warehouse Evolution Data warehouse conceptual models tend to evolve, because of changes in : data sources of DW information requirements of users In our approach: a data warehouse is a part of the data warehouse evolution framework we use metadata about existing data warehouse schema, mappings of its elements to data source elements, as well as existing reports defined on the data warehouse schema

Research problem and our proposal The problem addressed in our research is: how to reconcile new information requirements with the existing conceptual model of the data warehouse We propose a new type of demand-driven method: query-driven method elicits the requirements from existing queries on data sources and their usage statistics, combines the needs of users with existing data supply method recommends changes to existing data warehouse schemata new version of DW can be constructed according to the recommendations

Related work Our method for eliciting user information requirements presumes that the queries against the source database reflect the interests and needs of users Some other works also exploit similar idea: to improve query performance, but not to get a conceptual model, analyze expected queries (documented during interviews) instead of real queries analyze only the structure of implemented queries in source systems, regardless of the real usage of the queries The proposed method: We take into account not only structure but also the usage statistics of queries in source systems Our method is tailored to a definite purpose to recommend the necessary changes in existing data warehouse schema

Example Data Warehouse the existing version The data is obtained from the university data management system.

Example the student status changing process is redesigned the source system model is changed (in grey) How the DW model should be changed to reflect the analysis needs of users and new data model of the source system?

Information Sources for Query Usage Study (1) We will examine a source system that collects different logs about functions and data usage The function usage: Logging is invoked when the function is performed Log format: Function_name, user, date. This log can exhibit the most frequently used functions the data manipulation statements that are used within these functions should be analyzed The data changes: Each table has its own trigger for logging insert, update and delete statements Log format: Table_name, user, date, change_type, change_text The statistics about changes for each table and attribute is computed

Information Sources for Query Usage Study (2) A special kind of functions: reports with a wide range of parameters, that user can change The queries in reports represent also the information requirements of users format of such log is the following: Function_name, user, date, parameter_text, where parameter_text contains values in the format: p1=<value1> pn=<valuen>, where p1 pn are in the format: <table_name.column_name>.

Information Sources for Query Usage Study (3) The structure of data manipulation statements used in functions of source systems is examined Two kinds of information sources can be prepared. INSERT and UPDATE statements : Function_name, Action_Type, List_of_columns, Where_condition, SELECT statements : Function_name, Joins, Where_condition, Group_By, Columns_Select, Aggregation,

Query-Driven Method (1.step) Pre-processing of Information Sources two usage tables are constructed data modification table - contains data about the frequency of INSERT and UPDATE statements performed by functions with tables Function_name, List_of_columns, Where_condition, count. data selection table - contains data about the frequency of SELECT statements performed by functions with tables Function_name, Joins, Where_condition, Group_By, Columns_Select, Aggregation, count.

Query-Driven Method (2.step) Analysis of Data Modification Statements data modification table is analysed. For each column, we calculate the number of times it was updated or inserted we summarize count for every occurrence of the column in List_of_columns of the data modification table we choose the Top-N columns with the biggest number of times they were updated or inserted and consider such columns as potential measures. we identify the potential dimension attributes from the Top-N columns with the biggest number of times they were used in Where_condition in the data modification table. We assume that potential fact measures could be analysed together with potential dimension attributes if they are used together in one data modification statement.

Query-Driven Method (3.step) Analysis of Data Selection Statements the Top-N most frequently used columns in Where_condition or Group_By are considered as potential dimension attributes. The Top-N columns with the biggest number of times they were used in Aggregation are considered the potential measures. Several potential measures belonging to one source table are united into a potential data warehouse fact table, Several potential attributes belonging to one source table are united into a potential data warehouse dimension table

Query-Driven Method (4.step) Conflict Resolution it is possible that the same source column can be identified as potential fact measure and as a potential dimension attribute In such case we analyse the data type of the column: If the data type of the column is numeric, then we consider such column as a measure, If the data type of the column is character or date, then we refer to such column as to a dimension attribute

Query-Driven Method (5.step) Generation of Potential Improvements: the potential data warehouse schema elements are compared with the existing data warehouse schema Mappings between source table columns and existing data warehouse elements are used in this comparison. Non-existing potential data warehouse schema elements are recommended to be added to the existing data warehouse schema. Schema changes supported by the data warehouse evolution framework: add a new dimension attribute or fact measure, add a new dimension or fact table, connect dimension to a fact table, add a new dimension hierarchy or new hierarchy level. Recommended changes should be accepted by the DW administrator

Example (1) The proposed method was applied to the student status change data warehouse and student management system Student status changing process was logged and analyzed: The following columns are encountered frequently in Where_Condition, Group_By and Columns_Select of the data selection table: Attribute Name of the table Type; Order_Date and Order_Number of the table Group_Order; Prev_Value and New_Value of the table Student_Order. these columns are considered as potential dimension attributes

Example (2) The proposed method recommends to extend the data warehouse schema with the dimensions Group_Order, Student_Order and Type. Group_Order Student_Order Type

Example (3) After the manual revision of recommended improvements: it was decided to unite dimensions Group_Order, Student_Order and Type in one dimension Order with the following attributes: Order_Date, Order_Number, Prev_Value, New_Value. the column Name of the table Type was added as 3 attributes of the dimension Order: Order_Type, Reason_Type and Status_Type.

Conclusions and Future Work The proposed method: allows to keep track of actual analysis tasks performed in operational data sources adjust the existing data warehouse schema according to the most frequent queries. the method is used to recommend necessary changes for a new data warehouse schema version. a data warehouse schema also serves as a representation of the existing data warehouse analysis needs. Future work: We are working on implementation of the method in more complicated cases and evaluation of constructed models to be convinced that the new version of the data warehouse is adjusted to real analysis needs

Thank you!