Using Materialized Views To Speed Up Data Warehousing

Size: px
Start display at page:

Download "Using Materialized Views To Speed Up Data Warehousing"

Transcription

1 Using Materialized Views To Speed Up Data Warehousing Michael Teschke, Achim Ulbrich IMMD 6, University of Erlangen-Nuremberg Martensstr. 3, D Erlangen, Germany {teschke, Abstract Running analytical queries directly against the huge raw data volume of a data warehouse results in unacceptable query performance. The solution to this problem is storing materialized views in the warehouse, which pre-aggregate the data and thus avoid raw data access and speed up queries. In this paper, at first the problems concerning the selection of the right pre-aggregations and their utilization are discussed. The main focus then is the maintenance of the pre-aggregates due to changes of the raw data. We emphasize the specific aspects of warehouse maintenance compared to common view maintenance and present a warehouse model optimized for speeding-up view maintenance. 1 Motivation Lately, the notion of a data warehouse (DW) has become extremely popular. A DW is an integrated collection of data, that is extracted from distributed and heterogeneous (legacy) database systems. Various sorts of applications like on-line analytical processing (OLAP), knowledge discovery or data mining can use the DW as a consistent data basis that contains information which could not be accessed together before. What distinguishes a DW from a federated database system? In the latter, distributed and heterogeneous database systems are integrated as well but in a DW the data is redundantly stored and not only integrated in a view-based manner. Beyond that, the data from the sources is not just copied: it is cleaned from inconsistencies and transformed for optimal access and performance. Additionally, the raw data is aggregated according to application specifications, because analytical queries perform bad when they have to scan the enormous raw data volume. For example imagine a DW for market research, in which sales data like number of sold products or price figures are stored. Let the products be organized into product groups and product groups into product areas. If the summary of sales belonging to the same product groups is aggregated and stored (materialized) in the warehouse, a query about the market percentage of product areas can be calculated much faster, because it does not have to access raw data but can use the higher classificatory level of product groups. The market research example comes from various modeling and performance case studies we have performed with the GfK, Europe s largest market research company ([LeRT95], [LeRT96]). In [LeRT96] we have shown, that the performance gain achieved with storing redundant pre-aggregations outweighs the additional storage overhead by far. These materialized views (MVs) or pre-aggregations are the main focus of this paper. We give an overview of all 1

2 Selection of Views 2 the aspects that must be considered to speed up data warehousing with MVs, together with a summary of related work. The reminder of the paper is organized as follows. In section 2 we discuss ways to decide, which views should be materialized in the DW. In section 3 we show, how the selected set of MVs can be used in query evaluation. Once these views have been computed, they have to be maintained to keep the degree of actuality the users or applications have specified. An overview of the related work concerning view maintenance aspects is given in section 4. In section 5 we present our approach of speeding-up warehouse maintenance. The paper concludes with a summary and some ideas we want to work on in the near future. 2 Selection of Views The most essential issue in speeding up data warehousing with MVs is to select which views should be materialized. Of course, there is a trade-off between time and space requirements. The more views are materialized, the more likely it is to find an appropriate view for answering a query. Generally, the following approaches have been brought up recently: Materialize all Using algorithms Empirical selection In the following, we discuss these points in more detail. The basis for selection and analysis is always the notion of modeling the data in a multidimensional way with dimensions for accessing single cells (or slices) of the cube. Furthermore, different hierarchies can be defined on top of the dimensions, to ease selecting. In Figure 2.1, an example is sketched, where sales data is stored in a 3-dimensional array (cube). On top of the product dimension, a hierarchy is defined, which groups single products up to product areas, as mentioned earlier in section 1. Candidates for pre-aggregations are on the one hand all the possible combinations of dimensional elements. On the other hand, all kinds of SPJ-Views (views defined by a combination of selection, projection and joins) together with aggregate functions can be used. year quarter month... country region shop... product areas main groups product groups items Sales 15 Fig. 2.1: 3-dimensional cube 2

3 2.1 The Selection Cube-Operator of Views 2.1 The Cube-Operator [GBLP96] proposed a cube operator, which calculates all the possible combinations between the elements of the existing dimensions. For three dimensions, this results in 2 3 = 8 different aggregations. This number sounds rather small, but does not take hierarchies into account. The problem gets even worse, if beside this classification oriented analysis, also features (attributes) are taken into account. In features oriented analysis, data is aggregated according to some features of the raw data. For example, the average price of TVs with remote control (feature 1) and HI8-stereo sound (feature 2) would be an interesting question in the context of market research. Even for a small number of features per dimension, say 7, pre-aggregating all feature combinations would blow up the cube and result in about 2 21 = 10 6 combinations. Products which stick to this paradigm cannot solve the real world problems of the GfK, where each product has up to 20 features ([LTAK97]). 2.2 Algorithmic Selection Several algorithms for selecting views have been proposed in literature lately. To the best of our knowledge, most of them base on the same greedy -approach: Given a certain amount of space S, find the best set of views ([HaRU96]). It is shown that the performance gain achievable with these greedy-algorithms is at least 67% of the optimal solution, which is NP-hard to compute. In [GHRU97], this approach is extended to include selecting indices, thus S is shared between views and indices. In [Gupt97] candidate views are weighed by a frequency factor stemming from a monitoring process of the users access behavior. 2.3 Empirical Selection An empirical way to select the views to pre-aggregate is monitoring the queries submitted by the users. Based on factors like query frequency or data volume, a number of views can be calculated and materialized, which supports the top number of queries. This approach is used in the Informix MetaCube Aggregator, where user queries are monitored and views to materialize are proposed to the database administrator. To make our point clear, we do not suppose to materialize one support view per query. On the contrary, we believe that views can be selected, which support a bunch of queries, when the classification hierarchy of the qualifying data is taken into account. The performance gain has been validated in a case study ([LeRT96]). In this study, an aggregation hierarchy has been built, which supports a typical set of market research queries, provided by the GfK. One of the aggregations in the hierarchy is not used directly by any query at all, but supports about seven other aggregations. Depending on their position in the aggregation hierarchy (the higher the more data volume the aggregations have) query response times have been improved from hours or minutes to minutes and seconds. 3

4 Using Views in Queries 3 3 Using Views in Queries To achieve better query performance the queries have to be transformed to access the MVs instead of the raw data. We can distinguish two ways of using MVs in queries: Direct match: The query can be answered completely with one MV, other relations or views are not necessary. Partly match: Only a part of the query can be computed with a MV. This view is not sufficient to answer the total query, other relations or views are necessary. Direct match means to use a MV which can be used to process the query. Of course it is not possible to materialize a view for every query, only for standard queries this technique can be used. To reduce the number of views, they should be defined as universal as possible to fulfill the demand of several queries without decreasing the efficiency ([LeRT96]). For example, in [Hans87] it is shown, that projecting out some attributes does not decrease the cost of querying. For this reason no projections are required in any MV, unless some attributes are never used. Also selections could be performed very fast using index structures. So selections are only useful if some data is never required. Especially for ad-hoc queries, direct match is not always possible. Partial match usage of MVs can be applied in more cases, because only parts of queries are calculated with MVs. The non-matching parts of the queries can/must access raw data. There are two methods to replace the relations in the original query by MVs: Explicit use: The user is aware of the views and uses them explicitly when formulating the query. Transparent use: Instead of hard-coding the view in the query, the query optimizer is used to answer the query as efficient as possible with the use of MVs. Explicit use is not very user-friendly. The user has to know all relevant views and if the views are changing all users must be informed. Furthermore, as shown in section 4.4, if multiple maintenance policies are supported, only querying a single viewgroup is allowed. Otherwise inconsistencies are possible and that could lead to incorrect answers. Thus the user would be responsible for data consistency. To avoid these drawbacks, the query optimizer has to be extended for the detection and usage of the MVs with which the query performance is best. Then the user does not have to know anything about the MVs and the query modification is transparent for him. [GuHQ95] have introduced the generalized projection approach, which explores the query tree wrt. whether a given MV can be integrated instead of a part of the tree. For that, the query tree and the tree of the MV have to be normalized. To implement transparent use of MVs to increase queries in DWs, a lot of information has to be stored as meta data. Beside the view definitions, also the normalized query tree of the view is required. With implementing the transparent-use extension of the query optimizer, administration capabilities for the MVs can be enhanced. It would be possible to analyze the queries to the warehouse and to detect whether a new MV would increase query performance (section 2.3). 4

5 4 View Maintenance 4 View Maintenance One of the major performance problems in DWs is the maintenance process. Whenever source data is changing, the warehouse does not reflect the correct state of this source and has to be maintained. Fortunately only parts of the warehouse are affected by a modification to a single source. So the obvious solution would be to reload these parts. But the huge data inventory and the great amount of changes forbids reloading even parts of the DW from the scratch. For this reason only the changes due to the source modifications are computed to the warehouse. Maintaining a DW is based on the theory of maintaining MVs. The process of updating MVs in response to changes to the underlying base relations is called view maintenance. Instead of recomputing the view, incremental view maintenance can be used: the view is computed incrementally from its old state and its changes V due to the changes in the base relations. The maintenance process can be written as V = (V θ (V)) + (V), where (V) and + (V) include the tuples, that have to be deleted from or inserted into the old materialization, respectively ( (V) = (V) + (V)). θ denotes the contained difference, the disjoint difference ([QiWi91]). In [QiWi91] it is shown how to calculate the changes to the MVs with algebraic differencing. In [GrLi95] this approach is extended to views with duplicates. The technique of computing V depends on several influence factors listed below (extension of [GuMu95]): Type of modification: For maintenance, at least insertions and deletions have to be handled. Updates can be handled directly or modeled as deletions followed by insertions. Another sort of modification are changes to the view definition, which we omit due to space constraints ([GuMR95]). Query language expressions: The expressions used for the definition of the MVs have influence on the maintenance techniques. In this paper we focus on MVs with aggregate functions to stress their influence for speeding-up query performance. Information available for maintenance: There are four different information sources: The view definition, the modifications, the contents of the MV (often called materialization) and the base relations. The definition of the view and the modifications are at least necessary for incremental maintenance. Furthermore additional information sources can be used as shown in section 4.3. System environment: We can distiguish between local view maintenance, where the base relations exist local to the views and derived view maintenance, where they reside on heterogeneous, distributed sources. For the latter, maintaining is more difficult and expensive. Moment of update: In [CKL+97] three different maintenance policies are distiguished: Immediate views: These views are updated immediately within the same transaction as the changes to the base relations. Immediate views allow fast querying with maximum actuality, at the expense of slowing down the update transactions. 5

6 View Maintenance 4.1 Deferred views: Deferred views are maintained asynchronously to the modifications. They are refreshed typically when the view is queried. This leads to faster update of base data, with the disadvantage of slowing down querying. Snapshot views: If staleness can be tolerated in the moment of access, views can be maintained asynchronously to the modifications and to the moment of querying. Thus it is possible for queries to access old data. In the following section we present techniques to detect the modifications applied to the base relation of a MV. In section 4.2 we analyze the expressions in MVs whether they can be updated without access to the base relations to speed-up maintenance. After that we describe techniques to reduce the access to the base relations during the maintenance of not self-maintainable MVs. In the last section we explain possible consistency problems. 4.1 Detecting Changes Prerequisite for maintenance is the detection of changes done to the base relations. If the database system supports triggers, the approach of Ceri and Widom ([CeWi91]) can be used to detect modifications with production rules. Production rules in database systems allow for specification of data manipulation operations that are executed automatically when certain events occur. For every base relation three triggers have to be implemented (one for insertions, deletions and updates, if supported). The benefit of this method is to detect the changes at the moment they occur. With this method immediate maintenance is possible. Another technique for detecting changes is using log tables as shown in [KäRi87]. The system log file is parsed to obtain the relevant modifications. Since log files are used for recovery, this approach may not require any modification to the application. Because the log file can only be parsed periodically, only deferred maintenance is possible. The source databases in most DW environments are heterogeneous, autonomous and often legacy systems. For this reason logging or trigger mechanisms cannot be generally expected. So the modifications have to be extracted by comparing a current source snapshot with an earlier one by monitor programs. The problem of detecting differences between two snapshots is called the snapshot differential problem ([LaGa96]). Due to the periodical nature of the snapshots, immediate maintenance is not possible. To reduce the number of modifications, irrelevant updates can be detected and removed. Irrelevant updates are modifications which do not affect the materialization. The cost for filtering such modifications is small, because neither access to the materialization nor to the base relations is required. Previous work has been done on detecting irrelevant updates ([BlCL89], [LeSa93]). If a tuple from one of the base relations of a MV is inserted or deleted and at least one of the attributes of the tuple does not fulfill the restriction in the selection condition, this tuple has no effect to the materialization and can be ignored. 6

7 4.2 Self-Maintainable View Maintenance Views 4.2 Self-Maintainable Views Most important for the performance of MV maintenance are the informations required for refresh. Dependent on the types of modification and the expressions used in the view definition, access to the base relations is or is not needed for maintenance. The latter considerably reduces the maintenance performance, because huge base tables have to be scanned. For this reason it is very important to use MVs which can be maintained without accessing the base relations. A MV that can be maintained without access to the base relations, is called self-maintainable ([BlCL89], [GuJM96], [Huyn96]). This property depends on the expressions in the definition of the MV and the types of modification. Views that are self-maintainable for all types of modifications are called fully self-maintainable, otherwise they are called partially self-maintainable. In the following we investigate the property of self-maintainability for aggregate functions. For a detailed discussion of other view expressions, see [GuJM96] or [Huyn96]. Self-Maintainability of aggregate functions An aggregate function is self-maintainable if the new value of the function can be computed solely from the old values of the aggregation and from the changes to the base relations. In [ChMM88] these functions are called additive. With respect to this definition we can classify the different aggregate functions as follows: Additive functions: The new value of the aggregation can be computed from its old value and the change to the base relations. Examples are the COUNT- and SUM-function. Maintaining the first one wrt. insertions, only the number of the modifications with the same group-by attributes has to be added to the count. For deletions it has to be determined whether the value of the COUNT aggregation has just to be decreased or if a tuple has to be deleted from the materialization (if COUNT = 0). The same applies to updates of group-by attributes, because they correspond to a deletion of the old group-by values and an insertion to the new ones. Maintaining SUM-aggregations is likewise easy; problems resulting from null-values can be handled with the counting-algorithm (Section 4.3.1). Additive-computable functions: These are functions that cannot directly be computed from the old value and the update, but can be transformed into additive functions. For example the aggregation AVG is not additive, but it can be replaced by the additive functions SUM and COUNT (AVG = SUM / COUNT). Partly-additive functions: These functions are only additive for some kinds of updates and cannot be transformed into additive functions. For example MIN and MAX are partly-additive aggregations. Consider the case, when a new tuple is inserted into a MV with the aggregation MIN. If the new value is higher or equal the materialization remains unchanged, otherwise the inserted tuple is the new minimum. Thus, for insertions MIN and MAX are selfmaintainable. If a tuple is deleted from the base tables and the value is higher than the stored minimum, the update can be dropped, otherwise the deleted tuple is the minimum and access to the base relations is necessary to get the new minimum. Thus MIN and MAX are not selfmaintainable wrt. deletions. 7

8 View Maintenance 4.3 Non-additive functions: If no type of modification can be computed to an aggregation without access to the base relations and the aggregation cannot be transformed into an additive function, it is called non-additive. One example is the median, which bisects the set of all values ordered by size. 4.3 Making Views Self-Maintainable As shown in [GuJM96], only a small subset of view definitions is self-maintainable without transformation. For this reason techniques to make views self-maintainable have been developed. The first approach presented in section is representative for a class of approaches where additional attributes are added to the view definition. The second class uses auxiliary MVs to make a set of views self-maintainable The Counting-Algorithm If a tuple from the materialization shall be deleted, it has to be detected, whether the deleted tuple is the only derivation in the base relations for the tuple in the view or not. In the first case the tuple has to be deleted from the materialization, in the second case the materialization remains unchanged. Querying the base relation to get the number of derivations can be avoided by changing the definition of the MV. Using the counting-algorithm ([GuMS93], [MuQM97]), the numbers of derivations for a tuple in the materialization (count) is added as an extra attribute to the view definition. If a tuple is inserted into the materialization and there is no other derivation for this tuple, the count has to be initialized with 1. Otherwise there are other derivations for this tuple and only the count has to be increased. If a derivation for a tuple in the materialization is deleted, the count has to be decreased and if the count is equal zero the tuple has to be deleted from the materialization Auxiliary Materialized Views In section 4.2 it is proposed to use full-outer-joins for self-maintainability. Instead we can also use some other MVs to make the set of these views self-maintainable. These views are called auxiliary materialized views. In [QGMW96] an algorithm is presented for detecting suitable auxiliary views. Querying these views instead of the base relations increases the maintenance performance for local as well as for derived view maintenance. The change in the view definition to reference other views instead of the base relations requires that the MVs have to be maintained in the correct hierarchical order. For this reason a dependency graph ([QGMW96], [CKL+97]) is needed. The dependency graph G(V) is a directed graph, with a node for every base relation and view used in the definition of the view V, either directly or through other views. There is an edge from a node X to a node Y if X derives view Y. All nodes in G(V) have to be maintained before V. 8

9 4.4 Consistency Warehouse Considerations Maintenance 4.4 Consistency Considerations If the update of a base relations is done separately from the update transactions for the MV (deferred and snapshot refresh) and maintaining the view requires access to the base relations, distributed incremental view maintenance anomalies ([ZhGW96]) could occur. This happens, when one base relation is queried to maintain the view while another base relation is modified again. These anomalies can be avoided either by compensating the new modifications (Strobealgorithms, described in [ZhGW96]) or by using only self-maintainable MVs (section 4.2). As the first alternative considerably reduces the maintenance performance, it is preferable using only self-maintainable deferred and snapshot views, when possible. Different maintenance policies like immediate, deferred and periodically refresh could cause consistency problems for hierarchical maintenance structures ([CKL+97]). The maintenance policy of a view cannot be chosen independently from the policies of the related views or relations. For example assume two views V 1 and V 2 which both have snapshot maintenance policies but with different refresh cycles. If they are used to derive a third view V 3, V 3 can also only be maintained periodically and even then it is possible that this view reflects an inconsistent state of the raw data because of the different refresh cycles of V 1 and V 2. In order to provide consistent views in a system allowing multiple maintenance policies, [CKL+97] developed a model based on the notion of viewgroups. A viewgroup is a collection of views that are required to be mutually consistent. That is, every MV in a viewgroup relates to the same state of the underlying base relations. Viewgroups have to be isolated in the sense that maintenance of a view in a viewgroup must not cause changes to views in other viewgroups. Furthermore it must be possible to answer a query without looking outside the queried viewgroup. For this reason it is not only necessary to search for other views to speed-up maintenance, but also to check if the auxiliary views and the new MV have consistency preserving maintenance policies. 5 Warehouse Maintenance In this section we present our approach of data warehousing. At first we make some design decisions before a number of measures to speed-up the maintenance process are discussed. 5.1 The Model One basic assumption in our warehouse model is to store the normalized and filtered source relations in the warehouse. Thus we are able to answer any query, and avoid accessing the sources for view maintenance. This leads to a distinction between two kinds of warehouse maintenance. External maintenance stands for the maintenance process between the sources and the warehouse. Internal maintenance describes the process of maintaining pre-aggregations in the warehouse. Because in most cases changes to the source data are transmitted periodically to the warehouse, external maintenance can only be done periodically. After the transmission, the modifications are stored in update tables to separate the moment of transmission from the moment of maintenance. There is one update table for each source relation with an extra 9

10 Warehouse Maintenance 5.2 attribute to indicate the modification type. Beside the update tables, delta tables are used for deferred and snapshot views. These tables include the modifications to be applied to the respective views. The main aspects of our warehouse model are sketched in Figure Speeding-up Warehouse Maintenance Most important for speeding-up warehouse maintenance is to use only self-maintainable MVs, because they avoid querying the base relations for maintenance. Using self-maintainable views also avoids the distributed incremental view maintenance anomalies, which has been discussed in section 4.4. From the set of self-maintainable MVs, we consider only those with aggregations (pre-aggregations). Compared to simple SPJ-views, they provide a much better performance benefit. As only a subset of MVs is self-maintainable (section 4.2), a generation component is needed to transform the materializations selected by algorithms or empirical methods (section 2) with the techniques described in section 4.3. If querying the base relations is not avoidable (e.g. deletion of MVs with MIN), the cost of querying is reduced due to the mirrored source relations. But deletion is an exception in DWs due its characteristic of a non-volatile storage. Another approach to increase maintenance performance is to use MVs as basis for a MV (section 4.3.2). So the generation component has to extract the best maintenance path in the dependency graph. Due to varying requirements the set of MVs is often changing. For this reason, a dynamic dependency graph concept has to be developed. Not only for the generation of a new MV the best dependency graph has to be found. Also for already stored views it has to be detected whether the new view could improve maintenance performance. Furthermore also the elimination of a MV could cause changes to the maintenance structure of other views. Analytical Systems Query transformer Data Warehouse Hierarchy of pre-aggregations Internal Maintenance filtered and normalized raw-data meta-data delta tables update-tables External Maintenance source-data Source 1 Modification Transmission... Fig. 5.1: Data warehouse model source-data Source n 10

11 5.2.1 Speeding-up Warehouse Maintenance For internal maintenance immediate, deferred or snapshot refresh can be used. To avoid consistency problems, querying the mirrored base relations is not allowed during their refresh. For the same reason access to the immediate MVs based on these relations is also forbidden during maintenance. To shorten refresh-time, deferred and snapshot views can be used. Snapshot maintenance allows fast querying and updates, but queries can read data that is not up-to-date. Unfortunately in most papers snapshots are updated only in a time-based manner periodically. In our opinion value based maintenance is even more interesting (e.g. the value of the aggregate must not deviate more than three percent from the actual value). Especially for aggregations, a deviation from the base relations is tolerable, if the difference between the old value and the new value is insignificant. This insignificance threshold must be defined by the user and stored as part of the meta data. Only if a given threshold is reached, the view has to be maintained. Since the number of views to be maintained is often large, further methods to improve performance have to be applied. Some techniques we consider most useful are discussed in the reminder of the paper Modification Compression For the reason that mostly heterogeneous and autonomous sources are used in DWs, the modifications of source data are transmitted periodically in a batch-manner to relieve the network. The number of updates sent to the warehouse can be reduced by modification compression. That is to reduce all modifications for the same tuple (with the same primary key) to one. For example an insertion followed by an update on the same tuple can be replaced by a modified insertion, or two different updates can be integrated to a single update transaction Premaintenance For deferred and snapshot views the relevant modifications have to be stored in the delta tables. When all relevant immediate views and delta tables have been updated, a modification can be deleted from the update table. In this section we present a technique to reduce the number of modifications in the delta table, to speed-up maintenance. At first we are able to eliminate the irrelevant updates to the MVs by checking the view definitions as described in section 4.1. In [MuQM97] a premaintenance technique is described to increase maintenance performance by dividing the update process into two separate functions: propagate and refresh. The propagate function can be processed without locking the MVs, so querying will not be disturbed. The MVs are not locked until the processing of the refresh function, which applies the modifications in the delta tables to the corresponding view. Therefore, the goal of the propagate function is to do as much work as possible to minimize the time required by the refresh function. With the delta tables and parallel processing, every self-maintainable view can be refreshed at the same time. 11

12 Warehouse Maintenance Fully Self-Maintainable Aggregations In this section, we consider only fully self-maintainable aggregations and the counting-algorithm (section 4.3.1) for maintenance to show how the refresh process can be made more efficient. The propagate function computes all modifications with the same group-by attributes into one tuple, thus the refresh function has only to take care of one modification for a group-by combination instead of many. This is done by pre-computing the aggregate functions, called aggregate compression. Let t be a new modification for a given MV, either there is another tuple in the delta table with the same group-by attributes or not. In the second case the modification is inserted into the delta table without any compression. Otherwise the new modification can be computed to the already stored entry as shown in Table 5.1. v is the value of the aggregation computed over the expression exp, that is already stored in the delta table. v denotes the aggregation value of the new modification (over the same expression and with the same group-by attributes) and v is the value of the aggregation after maintenance. As mentioned, the countingalgorithm is needed to maintain SUM without accessing the base relations for deletions. For this reason the count is represented by c and c (after maintenance). Of course the table can be extended to other aggregate functions. Aggregation value insert( v) delete( v) v = COUNT(*) v = v + 1 v = v - 1 v = COUNT(exp) if v = 0 : v = v; else: v = v + 1 if v = 0 : v = v; else: v = v - 1 v = SUM(exp) v = v + v; c = c + 1 Tab. 5.1: Aggregation compression v = v - v; c = c - 1 For the refresh process the following alternatives result from the values of v and c stored in the delta table: If v = 0 and c = 0 (only for SUM) no modification has to be applied to the tuple in the view with the same group-by attributes. Otherwise the values have to be updated with the values in the delta table. We can see that this aggregation compression is possible even without reading the values in the views. We considered only insertions and deletions here, because direct updates to MVs have to be transformed to insertions and deletions anyway. Either the value of the aggregated attribute or one of the group-by attributes changes. In the first case the difference between the old and the new value has to be computed to or from the MV. If the group-by attributes change, the tuple with the old value has to be deleted from the MV and the tuple with the new value has to be inserted. Partly Self-Maintainable Aggregations For MIN and MAX aggregation compression is not possible. The deletion problem makes it necessary to store every modification in the delta table. For example, considering a MV with the total minimum over an attribute (no group-by attributes) of a relation R. We could imagine to 12

13 6 Conclusion and Future Work store only the minimum of the insertion and the minimum of the deletions in the delta table, because only the minimum is required to modify the view. There is no problem when the value of the deletion is higher than the value of the insertion. Either the minimum in the view is smaller than the insertion (no modification during the refresh procedure required) or the insertion is the new minimum. Problems occur when the value of the deletion is equal or smaller than the value of the insertion. If they are equal we have to delete both the deletion and the insertion from the delta table, because both neutralize each other. But unfortunately there could be another insertion (not the minimum of all insertion) already deleted from the delta table that could be the new minimum. If the value of the deletion is smaller, then the insertion is irrelevant. Either the value of the deletion is higher than the minimum in the view (no modification is required to the view) or it is equal. In the second case the new minimum has to be found by querying R. Even with access to the materialization during the prepare process, we have to store every modification in the delta table. Only modifications that neutralize each other can be deleted from the delta table. But it could be useful to create an auxiliary view MV aux with a number of the smallest values (say five). Any modification can be done to MV aux without interfering with warehouse queries. Whenever a tuple is deleted from R that has a derivation in V aux we have to delete this derivation. But instead of querying R for the fifth minimum, V aux remains unchanged (now containing only the four smallest values). Not before the last minimum is deleted, R has to be queried for the five smallest values during the refresh process. But whenever an insertion is done to R, V aux has to be adapted if the value of the insertion is smaller than the highest value in V aux. If more than five values are stored we can delete the highest minimum. If the new value is smaller than the absolute minimum, also V has to be maintained. This is marked by an extra attribute. Thus V aux is used like a kind of stack to reduce the cost of querying. 6 Conclusion and Future Work In this paper, we discussed the most important issues for speeding-up data warehouse performance with materialized views: view selection, view usage and view maintenance. The latter field has been widely covered by the research community in the last years. We have shown that many approaches dealing with maintaining materialized views can be applied to data warehouses. Nevertheless, still a lot of work has to be done, particularly when considering aggregate views and view consistency problems. Whereas the view usage issue is discussed in some papers, the problem of view selection still remains unsolved. Cube-approaches, which compute the full set of possible aggregations became very popular, but haven been proven unrealistic in [LeTW97]. In the future, we try to extend our work in all the three issues. Using the classification hierarchy to reduce the number of possible combinations is discussed in [WLTA97]. Part of the Cube-Star project of our institute is the building of a query optimizer, which is able to check, which materialized view (or views) currently stored in the system is best to answer a certain query ([Cube97]). If no such view can be found, the query is computed from raw data and the result is stored as a materialized view for later queries. In our View-Star project, we investigate the 13

14 Conclusion and Future Work 6 possibilities of adjusting the view dependency paths dynamically. This enables us to optimize the maintenance of existing views wrt. adding new materialized views in the data warehouse. References BlCL89 CeWi91 ChMM88 CKL+97 Cube97 GBLP96 GHRU97 GrLi95 GuHQ95 GuJM96 GuMR95 Blakeley, J.; Coburn, N.; Larson, P.: Updating Irrelevant and Autonomously Computable Updates, in: ACM Transaktions on Database Systems 14(3), 1989 Ceri, S.; Widom, J.: Deriving Production Rules for Incremental View Maintenance, in: Proc. 17th Int. Conf. on Very Large Data Bases, (VLDB 91, Barcelona, Spain, September 3-6), 1991 Chen, M.; Mc Namee, L.; Melkanoff, M.: A Model of Summary Data and its Applications in: Statistical Databases, in: Rafanelli, M.; Klensin, J. C.; Svensson, P. (eds.): Proc. of the 4th Int. Working Conf. on Statistical and Scientific Database Management (4SSDBM, Rome, Italy, June 21-23) Lectures Notes in Computer Science 339, Berlin e. a.: Springer-Verlag, 1988 Colby, L.; Kawaguchi, A.; Lieuwen, D.; Mumick, I.; Ross, K.: Supporting Multiple View Maintenance Policies, to appear in: SIGMOD 1997 Cubestar: Gray, J.; Bosworth, A.; Layman, A.; Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, in: Proc. of the 12th IEEE Int. Conf. on Data Engineering (ICDE 96, New Orleans, LA, 26. Feb.-1. März), 1996 Gupta H.; Harinarayan, V.; Rajaraman, A.; Ullman, J.: Index Selection for OLAP, in: Proc. of the 13th Int. Conf. on Data Engineering (ICDE 97, Birminghan, UK, April 7-11), 1997 Griffing, T.; Libkin, L.: Incremental Maintenance of Views with Duplicates, in: Proc. of the 1995 ACM SIGMOD Int. Conf. on Management of Data (SIG- MOD 95, San Jose, USA, May 23-25), SIGMOD Record 24(2), 1995 Gupta, A.; Harinarayan, V.; Quass, D.: Aggregate-Query Processing in Data Warehousing Environments, in: Proc. of the 21th Int. Conf. on Very Large Data Bases (VLDB 95, Zurich, Switzerland, September 11-15), 1995 Gupta, A.; Jagadish, H.; Mumick, I.: Data Integration using Self-Maintainable Views, Technical Report, Dept. of CS, Stanford University, 1996 Gupta, A.; Mumick, I.; Ross, K.: Adapting Materialized Views after Redefinition, in: Proc. of the 1995 ACM SIGMOD Int. Conf. on Management of Data (SIG- MOD 95, San Jose, USA, May 23-25), SIGMOD Record 24(2),

15 6 Conclusion and Future Work GuMS93 GuMu95 Gupt97 Hans87 HaRU96 Huyn96 KäRi87 KLM+97 LaGa96 LeRT95 LeRT96 LeSa93 LeTW97 Gupta, A.; Mumick, I.; Subrahmanian, V.: Maintaining Views Incrementally, in: Proc. of the 1993 ACM SIGMOD Int. Conf. on Management of Data (SIG- MOD 93, Washington, USA, May 26-28), SIGMOD Record 22(2), 1993 Gupta, A.; Mumick, I.: Maintenance of Materialized Views: Problems, Techniques and Applications, in: IEEE Data Engineering Bulletin, Spezial Issue on Materialized Views & Data Warehousing 18(2), June 1995 Gupta, H.: Selection of Views to Materialize in a Data Warehouse, in: 6th Int. Conf. on Database Theory (ICDT 97, Delphi, Greece, Jan 8-10), 1997 Hanson, E.: A Performance Analysis of View Materialization Strategies, in: Proc. of the 1987 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 87, San Francisco, USA, May 27-29), SIGMOD Record 16(3), 1987 Harinarayan, V.; Rajaraman, A.; Ullman, J.: Implementing Data Cubes Efficiently, in: Proc. of the 1996 ACM Int. Conf. on Management of Data (SIGMOD 96, Montreal, Quebec, Juni), 1996 Huyn, N.: Efficient View Self-Maintenance, Technical Report, Dept. of CS, Stanford University, 1996 Kähler, B.; Risnes, O.: Extending Logging for Database Snapshot Refresh, in: Proc. of the 13th Int. Conf. on Very Large Data Bases (VLDB 87, Brighton, Great Britain, September 1-4), 1987 Kawaguchi, A.; Lieuwn, D.; Mumick, I..; Quass, D.; Ross,K.: Concurrency Control Theory for Deferred Materialiced Views, in: 6th Int. Conf. on Database Theory (ICDT 97, Delphi, Greece, Jan 8-10, 1997), Labio, W. J.; Garcia-Molina, H.: Efficient Snapshot Differential Algorithms for Data Warehousing, Technical Report, Dept. of CS, Stanford University, 1996 Lehner, W.; Ruf, T.; Teschke, M.: Data Management in Scientific Computing: A Study in Market Research, in: Proc. of the Int. Conf. on Applications of Databases (ADB 95, Santa Clara, California, Dec , 1995) Lehner, W.; Ruf, T.; Teschke, M.: Improving Query Response Time in Scientific Databases Using Data Aggregation, in: Proc. of the 7th Int. Conf. and Workshop on Database and Expert Systems Applications (DEXA 96, Zürich, Schweiz, 9-13 September), 1996 Levy, A. Y.; Sagiv, Y.: Query Independent of Updates, in: Proc. of the 19th Int. Conf. on Very Large Data Bases (VLDB 93, Dublin, Ireland, August 24-27), 1993 Lehner, W.; Teschke, M.; Wedekind, H.: Über Aufbau und Auswertung multidimensionaler Daten (about construction and analysis of multidimensional data), in: Conf. on Databases in Büro, Technik und Wissenschaft (Office, Engineering and Science), (BTW 97, Ulm, Germany, Mar. 5-7),

16 Conclusion and Future Work 6 LTAK97 MuQM97 QGMW96 QiWi91 Quas96 WLTA97 ZhGW96 Lehner, W.; Teschke, M.; Albrecht, J.; Kirsche, T.: Building a real Data Warehouse for Market Research, submitted to DEXA 97 Mumick, I.; Quass, D.; Mumick, B.: Maintenance of Data Cubes and Summary tables in a warehouse, to appear in: SIGMOD 97 Quass, D.; Gupta, A.; Mumick, I.; Widom, J.: Making Views Self-Maintainable for Data Warehousing, Technical Report, Dept. of CS, Stanford University, 1996 Qian, X.; Wiederhold, G.: Incremental Recomputation of Active Relational Expressions, in: IEEE Transactions on Knowledge and Data Engineering 3(3), Sept Quass, D.: Maintenance Expressions for Views with Aggregation, Technical Report, Dept. of CS, Stanford University, 1996 Wedekind, H.; Lehner, W.; Teschke, M.; Albrecht, J.: Preaggregation in Multidimensional Data Warehouse Environments, submitted for publication Zhuge, Y.; Garcia-Molina, H.; Wiener, J.: The Strobe Algorithms for Multi-Source Warehouse Consistency, Technical Report, Dept. of CS, Stanford University,

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301)

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301) Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301) Kjartan Asthorsson (kjarri@kjarri.net) Department of Computer Science Högskolan i Skövde, Box 408 SE-54128

More information

Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools

Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools Thomas Jörg and Stefan Dessloch University of Kaiserslautern, 67653 Kaiserslautern, Germany, joerg@informatik.uni-kl.de, dessloch@informatik.uni-kl.de

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

A Critical Review of Data Warehouse

A Critical Review of Data Warehouse Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin

More information

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations www.ijcsi.org 202 Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations Muhammad Saqib 1, Muhammad Arshad 2, Mumtaz Ali 3, Nafees Ur Rehman 4, Zahid

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

A Dynamic Load Balancing Strategy for Parallel Datacube Computation

A Dynamic Load Balancing Strategy for Parallel Datacube Computation A Dynamic Load Balancing Strategy for Parallel Datacube Computation Seigo Muto Institute of Industrial Science, University of Tokyo 7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558 Japan +81-3-3402-6231 ext.

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

A Brief Tutorial on Database Queries, Data Mining, and OLAP

A Brief Tutorial on Database Queries, Data Mining, and OLAP A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)

More information

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor

More information

Research Problems in Data Warehousing. Jennifer Widom. Stanford University. or in-advance approach to data integration. In an eager.

Research Problems in Data Warehousing. Jennifer Widom. Stanford University. or in-advance approach to data integration. In an eager. Proc. of 4th Int'l Conference on Information and Knowledge Management ècikmè, Nov. 1995 Research Problems in Data Warehousing Jennifer Widom Department of Computer Science Stanford University Stanford,

More information

The Study on Data Warehouse Design and Usage

The Study on Data Warehouse Design and Usage International Journal of Scientific and Research Publications, Volume 3, Issue 3, March 2013 1 The Study on Data Warehouse Design and Usage Mr. Dishek Mankad 1, Mr. Preyash Dholakia 2 1 M.C.A., B.R.Patel

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses

The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses Published in the Proceedings of 16th International Conference on Data Engineering (ICDE 2) The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses Martin Ester, Jörn Kohlhammer, Hans-Peter Kriegel

More information

Chapter 3 - Data Replication and Materialized Integration

Chapter 3 - Data Replication and Materialized Integration Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 3 - Data Replication and Materialized Integration Motivation Replication:

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

The Stanford Data Warehousing Project

The Stanford Data Warehousing Project The Stanford Data Warehousing Project Joachim Hammer, Hector Garcia-Molina, Jennifer Widom, Wilburt Labio, and Yue Zhuge Computer Science Department Stanford University Stanford, CA 94305 E-mail: joachim@cs.stanford.edu

More information

Caching XML Data on Mobile Web Clients

Caching XML Data on Mobile Web Clients Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D-33102 Paderborn,

More information

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer Data Warehousing Overview, Terminology, and Research Issues 1 Heterogeneous Database Integration Integration System World Wide Web Digital Libraries Scientific Databases Personal Databases Collects and

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Johann Eder 1, Heinz Frank 1, Tadeusz Morzy 2, Robert Wrembel 2, Maciej Zakrzewicz 2 1 Institut für Informatik

More information

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department

More information

Data Warehouse Schema Design

Data Warehouse Schema Design Data Warehouse Schema Design Jens Lechtenbörger Dept. of Information Systems University of Münster Leonardo-Campus 3 D-48149 Münster, Germany lechten@wi.uni-muenster.de 1 Introduction A data warehouse

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

The DC-tree: A Fully Dynamic Index Structure for Data Warehouses

The DC-tree: A Fully Dynamic Index Structure for Data Warehouses The DC-tree: A Fully Dynamic Index Structure for Data Warehouses Martin Ester, Jörn Kohlhammer, Hans-Peter Kriegel Institute for Computer Science, University of Munich Oettingenstr. 67, D-80538 Munich,

More information

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing Database Applications Advanced Querying Transaction processing Online setting Supports day-to-day operation of business OLAP Data Warehousing Decision support Offline setting Strategic planning (statistics)

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

CubeView: A System for Traffic Data Visualization

CubeView: A System for Traffic Data Visualization CUBEVIEW: A SYSTEM FOR TRAFFIC DATA VISUALIZATION 1 CubeView: A System for Traffic Data Visualization S. Shekhar, C.T. Lu, R. Liu, C. Zhou Computer Science Department, University of Minnesota 200 Union

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments

Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Yaokai Feng a, Akifumi Makinouchi b a Faculty of Information Science and Electrical Engineering, Kyushu University,

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation OLAP Business Intelligence OLAP definition & application Multidimensional data representation 1 Business Intelligence Accompanying the growth in data warehousing is an ever-increasing demand by users for

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam

Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam 2009 IEEE International Advance Computing Conference (IACC 2009) Patiala, India, 6 7 March 2009 Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Semantic Analysis of Business Process Executions

Semantic Analysis of Business Process Executions Semantic Analysis of Business Process Executions Fabio Casati, Ming-Chien Shan Software Technology Laboratory HP Laboratories Palo Alto HPL-2001-328 December 17 th, 2001* E-mail: [casati, shan] @hpl.hp.com

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Slowly Changing Dimensions Specification a Relational Algebra Approach Vasco Santos 1 and Orlando Belo 2 1

Slowly Changing Dimensions Specification a Relational Algebra Approach Vasco Santos 1 and Orlando Belo 2 1 Slowly Changing Dimensions Specification a Relational Algebra Approach Vasco Santos 1 and Orlando Belo 2 1 CIICESI, School of Management and Technology, Porto Polytechnic, Felgueiras, Portugal Email: vsantos@estgf.ipp.pt

More information

Deductive Data Warehouses and Aggregate (Derived) Tables

Deductive Data Warehouses and Aggregate (Derived) Tables Deductive Data Warehouses and Aggregate (Derived) Tables Kornelije Rabuzin, Mirko Malekovic, Mirko Cubrilo Faculty of Organization and Informatics University of Zagreb Varazdin, Croatia {kornelije.rabuzin,

More information

The Benefits of Data Modeling in Business Intelligence

The Benefits of Data Modeling in Business Intelligence WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices

More information

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option The following is intended to outline our general product direction. It is intended for

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Vertica Live Aggregate Projections

Vertica Live Aggregate Projections Vertica Live Aggregate Projections Modern Materialized Views for Big Data Nga Tran - HPE Vertica - Nga.Tran@hpe.com November 2015 Outline What is Big Data? How Vertica provides Big Data Solutions? What

More information

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies Review Data Warehousing CPS 216 Advanced Database Systems Data warehousing: integrating data for OLAP OLAP versus OLTP Warehousing versus mediation Warehouse maintenance Warehouse data as materialized

More information

Oracle OLAP 11g and Oracle Essbase

Oracle OLAP 11g and Oracle Essbase Oracle OLAP 11g and Oracle Essbase Mark Rittman, Director, Rittman Mead Consulting Who Am I? Oracle BI&W Architecture and Development Specialist Co-Founder of Rittman Mead Consulting Oracle BI&W Project

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Data Warehousing & OLAP

Data Warehousing & OLAP Data Warehousing & OLAP What is Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management s decisionmaking process. W.

More information

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Date Warehousing: Current, Future? 20 April 2012 Prof. Chris Clifton Data Warehousing: Goals OLAP vs OLTP On Line Analytical Processing (vs. Transaction) Optimize for read, not

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

How To Model Data For Business Intelligence (Bi)

How To Model Data For Business Intelligence (Bi) WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Search and Data Mining Techniques. OLAP Anna Yarygina Boris Novikov

Search and Data Mining Techniques. OLAP Anna Yarygina Boris Novikov Search and Data Mining Techniques OLAP Anna Yarygina Boris Novikov The Database: Shared Data Store? A dream from database textbooks: Sharing data between applications This NEVER happened. Applications

More information

Consistent Query Answering in Data Warehouses

Consistent Query Answering in Data Warehouses Consistent Query Answering in Data Warehouses Leopoldo Bertossi 1, Loreto Bravo 2 and Mónica Caniupán 3 1 Carleton University, Canada 2 Universidad de Concepción, Chile 3 Universidad de Bio-Bio, Chile

More information

ESSBASE ASO TUNING AND OPTIMIZATION FOR MERE MORTALS

ESSBASE ASO TUNING AND OPTIMIZATION FOR MERE MORTALS ESSBASE ASO TUNING AND OPTIMIZATION FOR MERE MORTALS Tracy, interrel Consulting Essbase aggregate storage databases are fast. Really fast. That is until you build a 25+ dimension database with millions

More information

Turkish Journal of Engineering, Science and Technology

Turkish Journal of Engineering, Science and Technology Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server

More information

Optimized Cost Effective Approach for Selection of Materialized Views in Data Warehousing

Optimized Cost Effective Approach for Selection of Materialized Views in Data Warehousing Optimized Cost Effective Approach for Selection of aterialized Views in Data Warehousing B.Ashadevi Assistant Professor, Department of CA Velalar College of Engineering and Technology Erode, Tamil Nadu,

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

Data warehouse Architectures and processes

Data warehouse Architectures and processes Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between

More information

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&

More information

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project

More information

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online Analytical Processing From OLTP to the Data Warehouse Traditionally, database systems stored data relevant to current business

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Aggregates Caching in Columnar In-Memory Databases

Aggregates Caching in Columnar In-Memory Databases Aggregates Caching in Columnar In-Memory Databases Stephan Müller and Hasso Plattner Hasso Plattner Institute University of Potsdam, Germany {stephan.mueller, hasso.plattner}@hpi.uni-potsdam.de Abstract.

More information

1960s 1970s 1980s 1990s. Slow access to

1960s 1970s 1980s 1990s. Slow access to Principles of Knowledge Discovery in Fall 2002 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2015/16 Unit 2 J. Gamper 1/44 Advanced Data Management Technologies Unit 2 Basic Concepts of BI and Data Warehousing J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

OLAP Online Privacy Control

OLAP Online Privacy Control OLAP Online Privacy Control M. Ragul Vignesh and C. Senthil Kumar Abstract--- The major issue related to the protection of private information in online analytical processing system (OLAP), is the privacy

More information

Lection 3-4 WAREHOUSING

Lection 3-4 WAREHOUSING Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing

More information

Integrating Pattern Mining in Relational Databases

Integrating Pattern Mining in Relational Databases Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar A Comprehensive Study on Data Warehouse, OLAP and OLTP Technology Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar Abstract: Data warehouse

More information

The Cubetree Storage Organization

The Cubetree Storage Organization The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction

More information

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information

More information

Snapshots in the Data Warehouse BY W. H. Inmon

Snapshots in the Data Warehouse BY W. H. Inmon Snapshots in the Data Warehouse BY W. H. Inmon There are three types of modes that a data warehouse is loaded in: loads from archival data loads of data from existing systems loads of data into the warehouse

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

Tracking System for GPS Devices and Mining of Spatial Data

Tracking System for GPS Devices and Mining of Spatial Data Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja

More information

Framework for Data warehouse architectural components

Framework for Data warehouse architectural components Framework for Data warehouse architectural components Author: Jim Wendt Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 04/08/11 Email: erg@evaltech.com Abstract:

More information

Portable Bushy Processing Trees for Join Queries

Portable Bushy Processing Trees for Join Queries Reihe Informatik 11 / 1996 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard Wolfgang Scheufele Guido Moerkotte 1 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard

More information

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203.

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : II / III Section : CSE - 1 & 2 Subject Code : CS 6302 Subject Name : Database

More information

Application Tool for Experiments on SQL Server 2005 Transactions

Application Tool for Experiments on SQL Server 2005 Transactions Proceedings of the 5th WSEAS Int. Conf. on DATA NETWORKS, COMMUNICATIONS & COMPUTERS, Bucharest, Romania, October 16-17, 2006 30 Application Tool for Experiments on SQL Server 2005 Transactions ŞERBAN

More information

Budgeting and Planning with Microsoft Excel and Oracle OLAP

Budgeting and Planning with Microsoft Excel and Oracle OLAP Copyright 2009, Vlamis Software Solutions, Inc. Budgeting and Planning with Microsoft Excel and Oracle OLAP Dan Vlamis and Cathye Pendley dvlamis@vlamis.com cpendley@vlamis.com Vlamis Software Solutions,

More information

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP Web Log Data Sparsity Analysis and Performance Evaluation for OLAP Ji-Hyun Kim, Hwan-Seung Yong Department of Computer Science and Engineering Ewha Womans University 11-1 Daehyun-dong, Seodaemun-gu, Seoul,

More information

The Benefits of Data Modeling in Business Intelligence. www.erwin.com

The Benefits of Data Modeling in Business Intelligence. www.erwin.com The Benefits of Data Modeling in Business Intelligence Table of Contents Executive Summary...... 3 Introduction.... 3 Why Data Modeling for BI Is Unique...... 4 Understanding the Meaning of Information.....

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1 Jens Teubner Data Warehousing Winter 2014/15 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Winter 2014/15 Jens Teubner Data Warehousing Winter 2014/15 152 Part VI ETL Process

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

A Model For Revelation Of Data Leakage In Data Distribution

A Model For Revelation Of Data Leakage In Data Distribution A Model For Revelation Of Data Leakage In Data Distribution Saranya.R Assistant Professor, Department Of Computer Science and Engineering Lord Jegannath college of Engineering and Technology Nagercoil,

More information

Andreas Rauber and Philipp Tomsich Institute of Software Technology Vienna University of Technology, Austria {andi,phil}@ifs.tuwien.ac.

Andreas Rauber and Philipp Tomsich Institute of Software Technology Vienna University of Technology, Austria {andi,phil}@ifs.tuwien.ac. An Architecture for Modular On-Line Analytical Processing Systems: Supporting Distributed and Parallel Query Processing Using Co-operating CORBA Objects Andreas Rauber and Philipp Tomsich Institute of

More information

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Thiago Luís Lopes Siqueira Ricardo Rodrigues Ciferri Valéria Cesário Times Cristina Dutra de

More information