Index Selection Techniques in Data Warehouse Systems

Size: px
Start display at page:

Download "Index Selection Techniques in Data Warehouse Systems"

Transcription

1 Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005

2 2 Contents 1 DATA WAREHOUSES 1 Data warehouses Data warehouse design Logical design Physical design Architectural sketch 4 3 Plan generation Execution plan operators Selection of the execution plan The cost model The index selection algorithm 8 5 Experimental results 9 6 Related work 10 6 Summary 10 Literature 10 A Index selection Algorithm in pseudo code 11 B Cost functions 12 Introduction The goal of this elaboration is a brief overview of the certain phases and techniques in a data warehouse (DW) designing process, aimed at optimizing the query processing. Basic issue is given to the physical design. Execution Paths Generator tool and a Cost Evaluation model are to be introduced as the important components, and, finally, a greedy algorithm is presented, which selects an optimal index set to be built in a DW implemented on a RDBMS respecting a constraint on the disk space devoted to indexing. 1 Data warehouse A data warehouse [1] is a collection of data from multiple sources, integrated into a common repository and extended by summary information (such as aggregate views) that is used primarily in organizational decision making.

3 3 1.1 Data warehouse design 1 DATA WAREHOUSES Data warehouse design methods consider the read-oriented character of warehouse data and enable the efficient query processing over huge amounts of data. DW design includes logical and physical phases aimed at improving the system performance. On the logical level the so-called view materialization[2], which strongly impacts the performance, takes place. However indexing techniques, which along with all the issues related to implementing the DW on a specific DBMS considered by physical design, have the fundamental meaning Logical design The multidimensional view of data in the DWs implemented on relational DBMSs is achieved by adopting different schemes. A special type of relational database schemas, called star schema (Picture 1), is often used to model the multiple dimensions of warehouse data (in contrast to the two-dimensional representation of normal relational schemas). In this case, the database consists of a central fact table (FT) and several dimension tables (DT). The FT contains tuples that represent business facts (measures) to be analyzed, e.g., sales or shipments. Each fact table tuple references multiple dimensional table tuples each one representing a dimension of interest like products, customers, time, region or salesperson. Dimensions usually have associated with them hierarchies that specify aggregation levels and hence granularity of viewing data. Since DTs are not normalized, joining the FT with the DTs provides different views (dimensions) of the warehouse data in an efficient way. Picture 1

4 4 2 ARCHITECTURAL SKETCH One of the most effective ways to minimize query response time during the logical design is view materialization. The underlining algorithm, driven by a workload (set of possible queries to be submitted to the system during operation), selects an optimal set of materialized views. Each view contains aggregated data obtained from the base fact table (FT) that includes elemental data; the aggregation level characterizing a view consists of attributes of the dimensional table (DT) Physical design Availability of the logical scheme enables further optimizations within the physical design phase. The set of indexes to be built on both FTs and DTs defined in this phase is another very important tool for speeding up the query response times. One of the possible ways to achieve this is to implement an optimizer model capable of determining an execution plan for each query and then let the greedy algorithm choose the most beneficial indexes with respect to the space constraints. Unlike the previous phases, physical design strongly depends on the features of specific DBMS: the categories of indexes available, the types of execution plans generated, the statistics consulted by the optimizer. 2 Architectural sketch M. Golfarelli, S. Rizzi and E. Saltarelli (DEIS University of Bologna) suggested the optimization method, which functional architecture can be represented by the following sketch (Figure 1). The approach features different components, which, given some input (ex. DW logical scheme, Workload, Data volume, System constraints etc.), according to the function they are responsible for, return some output (ex. Bound Workload, Indexable Attributes, Candidate Indexes, and Optimal Indexes) which either act as input for other component(s) or is returned as the final solution. The components involved in processing carry the following functions out: Aggregate Navigator: given a workload and a logical scheme including one or more materialized views, this component is in charge of selecting the best view on which each query should be

5 5 2 ARCHITECTURAL SKETCH solved. The aggregate navigator does not usually take indexes into account. Indexable Attributes Selector: based on the structure of the queries, this component determines which attributes of DTs could be usefully indexed. Candidate Indexes Selector: for each indexable attribute, this component evaluates which type of index is the most convenient. The indexes selected by this component, defined by couples (attribute; index type), are called candidate indexes. Optimal Indexes Selector: this component implements the algorithm which selects the indexes to be created. The optimal index set includes a subset of candidate indexes on DT attributes as well as all the indexes built on primary keys of DTs and FTs. Cost Evaluator: it is necessary to both the Generator of Candidate Indexes and the Optimal Set Generator to evaluate the access cost for each index. Plan Generator: given a physical scheme, a query and the FT on which it should be solved, it returns the best execution plan which solves the query. Figure 1

6 6 3 PLAN GENERATION 3 Plan generation As mentioned before, several components have to be implemented in order to perform the optimal indexing. One of them is a rule-based [3] optimizer that estimates the best execution plan for each query according to the view on which it will be solved. 3.1 Execution plan A query execution plan is a sequence of elementary operators applied to the physical scheme. Each operator (Table scan, Index scan, Table access, Index access, Hash join, Tid intersection) models a function carried out by the DBMS on either tables or indexes, which, if a local predicate specification is allowed, can provide additional filtration of the output. 3.2 Selection of the best execution plan The decision for selecting an execution plan for the query q is determined mostly by the number of the DTs on which at least one condition is expressed, which are called conditioned dimensioned tables (DTc): no DTc present FT is sequentially scanned then joined with all the DT involved in q through a nested-loop on their primary key index exactly one DTc present algorithm checks if there is an index allowing to access the FT from its foreign key referencing DTc and in this case for each tuple of DTc that satisfies condition access this index and the FT. Otherwise, a hybridhash join between DTc and the FT is executed. The result is joined with the other DTs requested in output. two or more DTc available for each algorithm decides how to carry out a join with the FT. Received tid sets obtained from different DTs are intersected then and the resulting tuples of the FT are accessed.

7 7 3 PLAN GENERATION The number of DT attributes on which a filter is defined and an index is built, α, drives the choice of the plan as follows: 1. if α=0, a sequential scan of the DT is executed, applying the filter to each tuple. 2. α=1 means that an index on a conditioned attribute is built. In this case an index scan is executed and, for each tid retrieved, the DT is accessed. 3. if α 2, all indexes on conditioned attributes are accessed, the tid sets obtained are intersected, possibly further filters are applied, and finally DT is accessed. Thereby, by means of a set of heuristic rules based on the database structure, the optimizer produces the execution plan without taking statistics to account. But the chosen physical scheme is valid also for cost-based DBMSs, since the statistics on data is used to further evaluate the cost of execution plan. 3.3 The cost model The cost model is adopted in order to compare different physical schemes and evaluated as a number of logical pages needed to be read to execute a plan. The cost function assigned to each operator in execution plan (Appendix B) except the aggregation, which is assumed to have a null cost. The cost of the full plan is evaluated as the sum of the costs for all the operators in it. Thus, according to the information required by the cost model the total cost for each execution plan can be calculated.

8 8 4 The index selection algorithm, 4 INDEX SELECTION ALGORITHM Due to its high complexity, the index selection problem is usually faced heuristically. As already stated, the view used to solve a query is selected during logical design, which is carried out neglecting the issues related to the indexing. Indexable attributes and primary keys of tables are the only elements that may be indexed. It should be noted that indexing an indexable attribute does not necessary lead to any performance improvement; on the other hand, once an index on an indexable attribute is built, the execution plans for all the queries in its support will use that index. Indexes in the physical scheme are independent of each other, but their contribution to the query execution cost depends on the table they are built on. The index selection algorithm (Appendix A) can be subdivided into three distinct sections. 1. Initialization of the set of candidate and optimal indexes as well as the available space for indexes on attributes other than primary keys. 2. Is determined in the while loop and carries out a greedy selection of indexes from the set C of candidate indexes for the workload based on the benefit per index page. If after inserting a new index into set O of the optimal indexes, it turns out that all the prime attributes of the fact table are indexed, one of these indexes must be transformed into the multiple-attribute index on the FT primary key; the choice is driven by the decay per index page related to the transformation. 3. Sets up the primary key indexes for the remaining FTs. If, for a given FT, a non-empty set of candidate indexes still exists, the one whose insertion in O as a multiple-attribute index on the primary key is cheapest is chosen. Otherwise, a non-indexable attribute is randomly chosen to build the multiple index.

9 9 5 Experimental results 5 EXPERIMENTAL RESULTS The approach has been tested on the TPC-H benchmark; 20 GPSJ queries inspired to those in the benchmark have been executed varying both their selectivity and the storage available for materialized views and indexes. The results proved that Indexing can considerably reduce the workload execution cost. Views space constraint Basic indexing space Sel=0.1% Sel=2% Sel=10% 100 Mb 190 Mb 43.99% 1.43% 0.01% 300 Mb 198 Mb 41.06% 1.52% 0.01% 500 Mb 226 Mb 41.01% 1.40% 0.01% Table 1 The Table 1 shows how the space for full indexing reduces to the low selectivity since the average utility of indexes decreases. It is remarkable that the indexes created are always used by the DBMS and each index actually reduces the overall execution cost. Besides, it is worth mentioning that there is a correlation between the workload selectivity and the best trade-off between the space used for views and that used for indexing. Figure 2 shows the workload costs, for the different ratio between space constraint on views and on index (S/VS). Figure 2 It is easily seen that high selectivity encourages indexing, while at low selectivity view materialization is more convenient.

10 10 6 Related work 6 RELATED WORK, 7 SUMMARY, LITERATURE Just a few works in the literature focus on the selection of indexes for DWs. In [6] the authors propose both an optimal algorithm and a set of thumb rules that should be adopted when the problem size is intractable. Rules, that are justified by the adoption of appropriate cost functions, state that indexes should be created on keys and on attributes involved in joins, as well as when their size fits into main memory. In [3] the problem of simultaneously choosing views and B+-tree indexes is investigated; the linear cost function adopted is very simple, and no specific optimizer model is considered. 7 Summary For the implementation of a complete DW, a set of tools must be integrated to form a concrete warehousing solution. The attention in this paper was concentrated on the most important level of the relational database modeling cycle: DW design; with the main focus to it s physical phase. Heuristic approach to the index selection problem in a data warehouse with materialized views was proposed, which experimental performance evaluation on the TPC-H benchmark data demonstrated a sound improvement results. However, it has to be expanded to the other types of indexes and join algorithms. Literature [1] R. Kimball. The data warehouse toolkit. John Wiley & Sons, [2] Informix. Administrator s Guide Informix Red Brick Decision Server, Version 6.0, November [3] M. Golfarelli, S. Rizzi, E.Saltarelli. Index selection techniques in data warehouse systems D2.R5 4 febbraio 2002 [4] G. Graefe. Query Evaluation Techniques for Large Databases. ACM Computing Surveys, 25(2):73 170, June [5] A. Gupta, V. Harinarayan, and D. Quass. Aggregate-Query Processing in Data Warehousing Environments.In Proc. 21st VLDB, Zurich, Swizerland, 1995.

11 11 A. INDEX SELECTION ALGORITHM

12 12 B. COSTS FUNCTIONS

D2I. Integrazione, Warehousing e Mining di sorgenti eterogenee Programma di ricerca (cofinanziato dal MURST, esercizio 2000)

D2I. Integrazione, Warehousing e Mining di sorgenti eterogenee Programma di ricerca (cofinanziato dal MURST, esercizio 2000) D2I Integrazione, Warehousing e Mining di sorgenti eterogenee Programma di ricerca (cofinanziato dal MURST, esercizio 2) Index selection techniques in data warehouse systems M. GOLFARELLI, S. RIZZI, E.

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

Automatic Selection of Bitmap Join Indexes in Data Warehouses

Automatic Selection of Bitmap Join Indexes in Data Warehouses Automatic Selection of Bitmap Join Indexes in Data Warehouses Kamel Aouiche, Jérôme Darmont, Omar Boussaïd, Fadila Bentayeb To cite this version: Kamel Aouiche, Jérôme Darmont, Omar Boussaïd, Fadila Bentayeb

More information

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Thiago Luís Lopes Siqueira Ricardo Rodrigues Ciferri Valéria Cesário Times Cristina Dutra de

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Warehouse Logical Design Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Mart logical models MOLAP (Multidimensional On-Line Analytical Processing) stores data

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES A User-driven Approach

DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES A User-driven Approach DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES A User-driven Approach Cécile Favre, Fadila Bentayeb, Omar Boussaid ERIC Laboratory, University of Lyon, 5 av. Pierre Mendès-France, 69676 Bron Cedex, France

More information

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUILDING OLAP TOOLS OVER LARGE DATABASES BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,

More information

Oracle Database 11g: SQL Tuning Workshop Release 2

Oracle Database 11g: SQL Tuning Workshop Release 2 Oracle University Contact Us: 1 800 005 453 Oracle Database 11g: SQL Tuning Workshop Release 2 Duration: 3 Days What you will learn This course assists database developers, DBAs, and SQL developers to

More information

Data Warehouse Design

Data Warehouse Design Data Warehouse Design Modern Principles and Methodologies Matteo Golfarelli Stefano Rizzi Translated by Claudio Pagliarani Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

The Cubetree Storage Organization

The Cubetree Storage Organization The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction

More information

ETL-EXTRACT, TRANSFORM & LOAD TESTING

ETL-EXTRACT, TRANSFORM & LOAD TESTING ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA rajesh.popli@nagarro.com ABSTRACT Data is most important part in any organization. Data

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2015/16 Unit 2 J. Gamper 1/44 Advanced Data Management Technologies Unit 2 Basic Concepts of BI and Data Warehousing J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology Jun-Zhong Wang 1 and Ping-Yu Hsu 2 1 Department of Business Administration, National Central University,

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: + 38516306373 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release 2 training assists database

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File Management Information Systems Data and Knowledge Management Dr. Shankar Sundaresan (Adapted from Introduction to IS, Rainer and Turban) LEARNING OBJECTIVES Recognize the importance of data, issues involved

More information

2 Associating Facts with Time

2 Associating Facts with Time TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points

More information

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies Review Data Warehousing CPS 216 Advanced Database Systems Data warehousing: integrating data for OLAP OLAP versus OLTP Warehousing versus mediation Warehouse maintenance Warehouse data as materialized

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches Column-Stores Horizontal/Vertical Partitioning Horizontal Partitions Master Table Vertical Partitions Primary Key 3 Motivation

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

Dx and Microsoft: A Case Study in Data Aggregation

Dx and Microsoft: A Case Study in Data Aggregation The 7 th Balkan Conference on Operational Research BACOR 05 Constanta, May 2005, Romania DATA WAREHOUSE MANAGEMENT SYSTEM A CASE STUDY DARKO KRULJ Trizon Group, Belgrade, Serbia and Montenegro. MILUTIN

More information

Data Warehousing. Yeow Wei Choong Anne Laurent

Data Warehousing. Yeow Wei Choong Anne Laurent Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes

More information

Data warehouse life-cycle and design

Data warehouse life-cycle and design SYNONYMS Data Warehouse design methodology Data warehouse life-cycle and design Matteo Golfarelli DEIS University of Bologna Via Sacchi, 3 Cesena Italy matteo.golfarelli@unibo.it DEFINITION The term data

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented A Comphrehensive Approach to Data Warehouse Testing Matteo Golfarelli & Stefano Rizzi DEIS University of Bologna Agenda: 1. DW testing specificities 2. The methodological framework 3. What & How should

More information

Query Optimizer for the ETL Process in Data Warehouses

Query Optimizer for the ETL Process in Data Warehouses 2015 IJSRSET Volume 1 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Query Optimizer for the ETL Process in Data Warehouses Bhadresh Pandya 1, Dr. Sanjay

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

BW-EML SAP Standard Application Benchmark

BW-EML SAP Standard Application Benchmark BW-EML SAP Standard Application Benchmark Heiko Gerwens and Tobias Kutning (&) SAP SE, Walldorf, Germany tobas.kutning@sap.com Abstract. The focus of this presentation is on the latest addition to the

More information

Tertiary Storage and Data Mining queries

Tertiary Storage and Data Mining queries An Architecture for Using Tertiary Storage in a Data Warehouse Theodore Johnson Database Research Dept. AT&T Labs - Research johnsont@research.att.com Motivation AT&T has huge data warehouses. Data from

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

DWEB: A Data Warehouse Engineering Benchmark

DWEB: A Data Warehouse Engineering Benchmark DWEB: A Data Warehouse Engineering Benchmark Jérôme Darmont, Fadila Bentayeb, and Omar Boussaïd ERIC, University of Lyon 2, 5 av. Pierre Mendès-France, 69676 Bron Cedex, France {jdarmont, boussaid, bentayeb}@eric.univ-lyon2.fr

More information

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati ISM 318: Database Systems Dr. Hamid R. Nemati Department of Information Systems Operations Management Bryan School of Business Economics Objectives Underst the basics of data databases Underst characteristics

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301)

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301) Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301) Kjartan Asthorsson (kjarri@kjarri.net) Department of Computer Science Högskolan i Skövde, Box 408 SE-54128

More information

A Design and implementation of a data warehouse for research administration universities

A Design and implementation of a data warehouse for research administration universities A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon

More information

Performance Enhancement Techniques of Data Warehouse

Performance Enhancement Techniques of Data Warehouse Performance Enhancement Techniques of Data Warehouse Mahesh Kokate VJTI-Mumbai, India mahesh.kokate2008@gmail.com Shrinivas Karwa VJTI, Mumbai- India shrikarwa1@gmail.com Saurabh Suman VJTI-Mumbai, India

More information

The DC-tree: A Fully Dynamic Index Structure for Data Warehouses

The DC-tree: A Fully Dynamic Index Structure for Data Warehouses The DC-tree: A Fully Dynamic Index Structure for Data Warehouses Martin Ester, Jörn Kohlhammer, Hans-Peter Kriegel Institute for Computer Science, University of Munich Oettingenstr. 67, D-80538 Munich,

More information

2. Basic Relational Data Model

2. Basic Relational Data Model 2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija. The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.si ABSTRACT Health Care Statistics on a state level is a

More information

Lection 3-4 WAREHOUSING

Lection 3-4 WAREHOUSING Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing

More information

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and

More information

Logical Design of Data Warehouses from XML

Logical Design of Data Warehouses from XML Logical Design of Data Warehouses from XML Marko Banek, Zoran Skočir and Boris Vrdoljak FER University of Zagreb, Zagreb, Croatia {marko.banek, zoran.skocir, boris.vrdoljak}@fer.hr Abstract Data warehouse

More information

Using the column oriented NoSQL model for implementing big data warehouses

Using the column oriented NoSQL model for implementing big data warehouses Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 469 Using the column oriented NoSQL model for implementing big data warehouses Khaled. Dehdouh 1, Fadila. Bentayeb 1, Omar. Boussaid 1, and Nadia

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1 Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview

More information

COST-EFFECTIVE DATA ALLOCATION IN DATA WAREHOUSE STRIPING

COST-EFFECTIVE DATA ALLOCATION IN DATA WAREHOUSE STRIPING COST-EFFECTIVE DATA ALLOCATION IN DATA WAREHOUSE STRIPING Raquel Almeida 1, Jorge Vieira 2, Marco Vieira 1, Henrique Madeira 1 and Jorge Bernardino 3 1 CISUC, Dept. of Informatics Engineering, Univ. of

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

Data warehouse design

Data warehouse design DataBase and Data Mining Group of DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, Data warehouse design DATA WAREHOUSE: DESIGN - 1 Risk factors Database

More information

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses 1

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses 1 Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses 1 Thiago Luís Lopes Siqueira 1, Ricardo Rodrigues Ciferri 1, Valéria Cesário Times 2, Cristina

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

Designing and Using Views To Improve Performance of Aggregate Queries

Designing and Using Views To Improve Performance of Aggregate Queries Designing and Using Views To Improve Performance of Aggregate Queries Foto Afrati 1, Rada Chirkova 2, Shalu Gupta 2, and Charles Loftis 2 1 Computer Science Division, National Technical University of Athens,

More information

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm Prepared by: Yacine ghanjaoui Supervised by: Dr. Hachim Haddouti March 24, 2003 Abstract The indexing techniques in multidimensional

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil

More information

The Methodology Behind the Dell SQL Server Advisor Tool

The Methodology Behind the Dell SQL Server Advisor Tool The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity

More information

IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs

IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs Elzbieta Malinowski and Esteban Zimányi Computer & Decision Engineering Department, Université Libre de Bruxelles 50 av.f.d.roosevelt,

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1 Data Warehouse Security Akanksha 1, Akansha Rakheja 2, Ajay Singh 3 1,2,3 Information Technology (IT), Dronacharya College of Engineering, Gurgaon, Haryana, India Abstract Data Warehouses (DW) manage crucial

More information

Heuristics for Selecting Robust Database Structures with Dynamic Query Patterns ABSTRACT

Heuristics for Selecting Robust Database Structures with Dynamic Query Patterns ABSTRACT Heuristics for Selecting Robust Database Structures with Dynamic Query Patterns Andrew N. K. Chen School of Accountancy and Information Management College of Business Arizona State University Tempe, Arizona

More information

Data Warehousing and OLAP: Improving Query Performance Using Distributed Computing

Data Warehousing and OLAP: Improving Query Performance Using Distributed Computing Data Warehousing and OLAP: Improving Query Performance Using Distributed Computing Jorge Bernardino Instituto Superior de Engenharia de Coimbra (ISEC) Dept. Eng. Informática e de Sistemas Coimbra, Portugal

More information

My Favorite Issues in Data Warehouse Modeling

My Favorite Issues in Data Warehouse Modeling University of Münster My Favorite Issues in Data Warehouse Modeling Jens Lechtenbörger University of Münster & ERCIS, Germany http://dbms.uni-muenster.de Context Data Warehouse (DW) modeling ETL design

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Data Management in Forecasting Systems: Case Study Performance Problems and Preliminary Results

Data Management in Forecasting Systems: Case Study Performance Problems and Preliminary Results Data Management in Forecasting Systems: Case Study Performance Problems and Preliminary Results Haitang Feng 1,2, Nicolas Lumineau 1, Mohand-Saïd Hacid 1, and Richard Domps 2 1 Université de Lyon, CNRS

More information

Data Warehouse Schema Design

Data Warehouse Schema Design Data Warehouse Schema Design Jens Lechtenbörger Dept. of Information Systems University of Münster Leonardo-Campus 3 D-48149 Münster, Germany lechten@wi.uni-muenster.de 1 Introduction A data warehouse

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Open Problems in Data Warehousing: 8 Years Later... Stefano Rizzi DEIS - University of Bologna srizzi@deis.unibo.it Summary Archeology The early 90 s Back to 1995 Into 2k At present Achievements Hot issues

More information

Data Warehouse Logical Modeling and Design (6)

Data Warehouse Logical Modeling and Design (6) Data Warehouse Logical Modeling and Design (6) Bernard ESPINASSE Professeur à Aix-Marseille Université (AMU) Ecole Polytechnique Universitaire de Marseille October 2013 Methodological framework Logical

More information

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing A Framework for Developing the Web-based Integration Tool for Web-Oriented Warehousing PATRAVADEE VONGSUMEDH School of Science and Technology Bangkok University Rama IV road, Klong-Toey, BKK, 10110, THAILAND

More information

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT Xiufeng Liu 1 & Xiaofeng Luo 2 1 Department of Computer Science Aalborg University, Selma Lagerlofs Vej 300, DK-9220 Aalborg, Denmark 2 Telecommunication Engineering

More information

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters

More information

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan? Why Query Optimization? Access Path Selection in a Relational Database Management System P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price Peyman Talebifard Queries must be executed and execution

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

How To Evaluate Web Applications

How To Evaluate Web Applications A Framework for Exploiting Conceptual Modeling in the Evaluation of Web Application Quality Pier Luca Lanzi, Maristella Matera, Andrea Maurino Dipartimento di Elettronica e Informazione, Politecnico di

More information

Self-Tuning Database Systems: A Decade of Progress Surajit Chaudhuri Microsoft Research

Self-Tuning Database Systems: A Decade of Progress Surajit Chaudhuri Microsoft Research Self-Tuning Database Systems: A Decade of Progress Surajit Chaudhuri Microsoft Research surajitc@microsoft.com Vivek Narasayya Microsoft Research viveknar@microsoft.com ABSTRACT In this paper we discuss

More information

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Research on the data warehouse testing method in database design process based on the shared nothing frame

Research on the data warehouse testing method in database design process based on the shared nothing frame Research on the data warehouse testing method in database design process based on the shared nothing frame Abstract Keming Chen School of Continuing Education, XinYu University,XinYu University, JiangXi,

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Data Warehousing. Paper 133-25

Data Warehousing. Paper 133-25 Paper 133-25 The Power of Hybrid OLAP in a Multidimensional World Ann Weinberger, SAS Institute Inc., Cary, NC Matthias Ender, SAS Institute Inc., Cary, NC ABSTRACT Version 8 of the SAS System brings powerful

More information