# Bitmap Index Design and Evaluation

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 C E E E a sequence of numbers arithmetic, this digit being encoded in bits by turning exactly one out of C bits on The arithmetic we choose for the value representation, ie, the decomposition of the value in digits according to some base, the encoding scheme of each digit in bits are the two dimensions of the space are analyzed below (1 Attribute Value ecomposition Consider an attribute value Fur thermore, define into a sequence of digits Then can be decomposed as follows: where! ", #( * ( E E+,, each digit is in the range ( Based on the above, each choice of sequence gives a different representation of attribute values, therefore a different index An index is welldefined if,, The sequence is the base of the index, which is in / turn called a base index If, then the base is called uniform the index is called base for short The index consists of components, ie, one component per digit Each component individually is now a collection of bitmaps, constituting essentially a base index Let denote the number of bitmaps in the!# component of an index *,,,3 denote the collection of bitmaps that form the "!# component Figure 3 shows a base ValueList index (based on the same 1record relation in Figure 1 By decomposing a singlecomponent index into a component index, the number of bitmaps has been reduced from 7 to / 3 7A 7<E (a 7AH 7A E 7<E =<= = E9:<H = => 1 1 =<= = Ḧ 9:< = => 1 1 =<= = Ḧ 9:< = => E 1 1 =<= = Ḧ 9:< = => 1 1 =<= = 9:< = => 1 1 =<= = Ḧ 9:< = => 1 1 =<= = Ḧ 9:< = => 1 1 =<= = Ḧ 9:<H = => 1 1 =<= = 9:< = => E 1 1 =<= = E9:< = => 1 1 =<= = 9:<H = => 1 1 =<= = E9:< = => E 1 1 Figure 3: Example of a Component ValueList Index (a Projection of indexed attribute values with duplicates preserved (b Base? ValueList Index (b 7AH ( Bitmap Encoding Scheme Consider the!# component of an index with attribute cardinality There are essentially two major schemes to directly encode the corresponding values (( in bits : Equality Encoding: There are bits, one for each possible value The representation of value has all bits set to (, except for the bit corresponding to, which is set to Clearly, an equalityencoded component consists of bitmaps Range Encoding: There are bits again, one for each possible value The representation of value has the rightmost bits set to ( the remaining bits (starting from the one cor Intuitively, each responding to to the left set to bitmap, has in all the records whose!# component value is less than or equal to * Since the bitmap, has all bits set to, it does not need to be stored, so a rangeencoded component consists of bitmaps Figures (b (c show the rangeencoded indexes corresponding to the equalityencoded indexes in Figure 1 Figure 3, respectively 7 / 3 79:7A> 7AB7FEG79H E 7 H 7 E 7 H (a (b (c Figure : Examples of RangeEncoded Bitmap Indexes (a Projection of indexed attribute values with duplicates preserved (b Component,! Base9 RangeEncoded Bitmap Index (c Base RangeEncoded Bitmap Index Figure shows the twodimensional design space of indexes for selection queries, with the two existing alternatives, namely, the ValueList index the BitSliced index [], classified as shown The ValueList index, which is the simplest index is the most commonly implemented, has a single component is equalityencoded The BitSliced index [] has a uniform base is rangeencoded A base BitSliced index has been used in MOEL for evaluating range predicates [] The BitSliced index is also used in Sybase IQ for evaluating range predicates E We have excluded the third basic encoding scheme (the binary encoding scheme discussed in [13, 1], as it can be characterized in our framework as a base decomposition with one of the two encodings we do present Clearly, a symmetric scheme exists as well, where the roles of right left are exchanged 3

4 b < b, b,, b> ATTRIBUTE log b times VALUE ECOMPOSITION < b, b 1 > BITMAP ENCOING SCHEME Equality Range ValueList Index BitSliced Index bitmap operations bitmap scans, compared to Algorithm RangeEval, which incurs ( bitmap operations bitmap scans Moreover, while efficient processing of Algorithm BSRange Opt requires buffering for only one bitmap (the intermediate/final result, efficient processing of Algorithm RangeEval requires buffering for two bitmaps (for,,, Result < b 3, b, b 1 > Figure : esign Space of Bitmap Indexes for Selection Queries Result B 3 1 B 1 3 B 1 performing aggregation [] In this paper, we consider all welldefined decompositions, just the two encoding schemes presented above, which are used in practice = 7 A A = 9 * = 7 > B 7 3 B B B B 7 3 B 3 An evaluation algorithm for selection queries based on rangeencoded indexes has been proposed by O Neil Quass (Algorithm 3 in [], which we referred to as Algorithm RangeEval In this section, we present an improved evaluation algorithm (Algorithm RangeEvalOpt that reduces the number of bitmap operations by about (#/ requires one less bitmap scan for a range predicate evaluation Given this, in the rest of the paper, the improved evaluation algorithm is used as the basis for the time analysis of rangeencoded indexes Both evaluation algorithms are shown in Figure Let denote bitmaps with all bits set to (, respectively Also let,, denote the logical AN, OR, XOR operations, respectively,, denote the complement of a bitmap, epending on the input predicate operator (, Algorithm RangeEval evaluates each range predicate by computing two bitmaps: the, bitmap either the, or, bitmap For example, if the predicate is, then the result bitmap, is obtained by computing the bitmaps, involved,,,, or,, steps that are not required The final result bitmap that is returned is either,,,,,,,,,, or,, corresponding to the predicate operator,,,,, or ", respectively As we show below, such an evaluation strategy not only costs more bitmap operations to incrementally build the two intermediate bitmaps, but also incurs more bitmap scans since the evaluation of the bitmap,, which corresponds to an equality predicate evaluation, is the most expensive predicate in terms of bitmap scans Algorithm RangeEvalOpt avoids the intermediate equality predicate evaluation by evaluating each range query in terms of only the predicate based on the following three identities: (1, (, (3 Furthermore, the evaluation of the predicate itself does not require an equality predicate evaluation Consequently, Algorithm RangeEvalOpt requires the computation of only one bitmap, for any predicate evaluation, as opposed to two bitmaps (, either, or, required in Algorithm RangeEval Figure 7 compares the two algorithms for evaluating the predicate? using a component base ( index Clearly, Algorithm RangeEvalOpt is more efficient as it requires only B 3 B 7 3 (a Figure 7: Comparison of Evaluation Strategies for Selection Predicate? using a Component, Base ( RangeEncoded Bitmap Index (a Algorithm RangeEval (b Algorithm RangeEvalOpt! A " * A# = A A = 9 = 7 > Evaluation Number of Bitmap Operations Num of Predicate AN OR NOT XOR Total Scans RangeEval (*+, /1 /1, (*3+, /1 /1, (*7+,, (*+,, (:9+,, (=< 9+ 1,/1, (*+ RangeEvalOpt 1,/1, = 1 (*3+ 1,/1, = 1 (*7+,, = 1 (*+,, = 1 (:9+,, (=< 9+ 1,/1, Table 1: Worst Case Analysis of Evaluation Algorithms is the number of index components Table 1 shows the worst case analysis of the performance of the evaluation algorithms in terms of the number of bitmap operations scans For Algorithm RangeEval, since each range predicate evaluation entails an equality predicate evaluation (for,> bitmap, the number of bitmap operations of a range predicate evaluation is at least twice that of the predicate evaluation By using a more direct evaluation strategy, Algorithm RangeEvalOpt B (b B B 1

5 ( C Evaluation Algorithms for Selection Queries Using RangeEncoded Bitmap Indexes is the number of components in the rangeencoded index is the base of the index!#" is the predicate operator, is the predicate value, # is a bitmap representing the set of records with nonnull values for the indexed attribute Output: A bitmap representation of the set of records that satisfies the predicate Algorithm RangeEval 1,,,, # 3 let for i = downto do if ( ( then,,,, 7 if ( then,,,, 9,,:,, else 11,,:, 1 else * 13,,,, 1,,, 1,,:, # 1,,,,,, 17 return, Algorithm RangeEvalOpt 1, if ( then 3 let if ( then if ( then,, E for i = to do 7 if ( " then,,, if ( " ( then,,, 9 else for i = 1 to do 11 if ( ( then,,, 1 else if ( then,,, 13 else, ",,, 1 if ( then 1 return,, 1 else 17 return,, * Figure : Algorithms RangeEval RangeEvalOpt reduces the number of bitmap operations for range predicate evaluations by about half the number of bitmap scans for range predicates is reduced by one Both algorithms have the same cost for an equality predicate evaluation For both evaluation algorithms, the worst case occurs when (,, which corresponds also to the most probable case! >#7 = A # = 7 1 :9 A A " = 9 * = 7 #> In this section, we compare the performance of the evaluation algorithms in terms of the average number of bitmap scans the average number of bitmap operations Indexes with uniform base are generated by varying the two main parameters: the attribute cardinality C the base number We experimented with C ( ( (, ( ( ( for each value of C, the entire space of baserangeencoded indexes are generated by varying from to C For each index, we generated a total of C selection queries!" of the form by varying the predicate ( ( The graphs the predicate constant ( C Each query is evaluated using Algorithm RangeEval Algorithm RangeEvalOpt For a given rangeencoded index an evaluation algorithm, the average number of bitmap scans (operations is computed by taking the average of the number of bitmap scans (operations over all #C queries Figure compares the performance of the two evaluation algorithms as a function of the base number with C clearly demonstrate that Algorithm RangeEvalOpt is more efficient than Algorithm RangeEval Similar trends are obtained for other values of C # A 9 7 = A " Our cost model for the spacetime tradeoff analysis uses the following two metrics: The space metric is in terms of the number of bitmaps stored The time metric is in terms of the expected number of bitmap scans for a selection query evaluation, is based on the assumption that the queries in the query space are uniformly!" distributed, : where Note that our time metric incorporates only the I/O cost (ie, number of bitmap scans does not include the CPU cost (ie, number of bitmap operations as the number of bitmap scans the number of bitmap operations for a query evaluation are correlated We denote the space time metric of an index by : :, respectively # = 7 " :9< = 7 * = This section compares the performance of the two basic encoding schemes, namely, equality encoding range encoding, for selection queries Our result shows that range encoding provides better spacetime tradeoff than equality encoding in most cases Let denote an component index with base Based on the cost model, we have the following result : Theorem 1 If is equalityencoded, then where if "!#( (1

6 3 Average Number of Bitmap Scans RangeEval RangeEvalOpt Base Number (a Average Number of Bitmap Scans as a Function of Base Number 1 RangeEval RangeEvalOpt Given the overall superiority of rangeencoding, in the remainder of this paper, we focus on the spacetime tradeoff of rangeencoded indexes Henceforth, we use the term index as an abbreviation for a rangeencoded index 7 <7 = A =,7 = A = 7 #>? In this section, we identify both the spaceoptimal timeoptimal indexes (ie, points (A ( in Figure We refer to the most spaceefficient (respectively, timeefficient index among all indexes with components as the ncomponent spaceoptimal index (respectively, ncomponent timeoptimal index Note that the component spaceoptimal index might not be unique for example, if C ( ( (, then the base base indexes are both component spaceoptimal indexes Based on Equations (3 (, we have the following result Average Number of Bitmap Operations Base Number (b Average Number of Bitmap Operations as a Function of Base Number Figure : RangeEval vs RangeEvalOpt for Uniform Base RangeEncoded Bitmap Indexes, C ( ( :! * * "! * where ( * if (!# : : If is rangeencoded, then (3 ( The time analysis for rangeencoded indexes is based on Algorithm RangeEvalOpt The evaluation algorithm for equalityencoded indexes is not shown here due to space constraint details can be found elsewhere [] A rangeencoded index requires one or two bitmap scans per component for a selection predicate evaluation On the other h, an equalityencoded index requires only one bitmap scan per component for an equality predicate evaluation, but for a range predicate evaluation, the number of bitmap scans required per component is between two half the number of bitmaps in that component Figure 9 compares the spacetime tradeoff of range equalityencoded indexes for C (, ( (, ( ( ( The results show that rangeencoding in general provides better spacetime tradeoff than equalityencoding in most cases Theorem 1 (1 The number of bitmaps in an component, where spaceoptimal index is given by C is the smallest positive integer such that C The index with base is an component spaceoptimal index ( The spaceefficiency of spaceoptimal indexes is a nondecreasing function of the number of components (3 The base of the component timeoptimal index is "! ( E ( The timeefficiency of timeoptimal indexes is a nonincreasing function of the number of components By Theorem 1, it follows that the spaceoptimal index corresponds to the index with the maximum number of components (ie, # C ( with each base number equal to, the timeoptimal index corresponds to the index with the minimum number of components (ie, These are probably immediate corollaries of information theory /or number theory results as well, but they are easy enough to prove that doing so seemed easier than tracking down the existing literature * = 7 #>,+ <7 = A 7 = / 1 In this section, we characterize the knee of the spacetime tradeoff graph, referred to as the knee index (point (C in Figure, which corresponds to the index with the best spacetime tradeoff Note that our characterization is an approximate one as finding a precise analytical characterization is generally a difficult problem However, it turns out that our approximate characterization has very good accuracy We first give a definition of the knee index that will serve as a basis for comparing with our approximate characterization Let " denote the set of optimal indexes corresponding to the points in the spacetime tradeoff graph such that : : 7 7 For, let :9 +9 denote the gradients of the line segments formed by C The set of optimal indexes < is the maximal subset of all possible indexes <>= such that for there does not exist another index in <>= that is more efficient than? in both time space,

7 ! + RangeEncoded Index EqualityEncoded Index RangeEncoded Index EqualityEncoded Index RangeEncoded Index EqualityEncoded Index Time (Expected Number of Bitmap Scans Time (Expected Number of Bitmap Scans Time (Expected Number of Bitmap Scans Space (Number of Bitmaps Space (Number of Bitmaps Space (Number of Bitmaps (a C ( (b C ( ( (c C ( ( ( Figure 9: Comparison of SpaceTime Tradeoff of Range EqualityEncoded Bitmap Indexes with, respectively are defined as follows:! "#! where : : is a normalizing factor The index " with the maximum ratio is the knee index Time (Expected Number of Bitmap Scans TimeOptimal Index SpaceOptimal Index All Index Space (Number of Bitmaps Figure : SpaceTime Tradeoff of Bitmap Indexes, C ( ( ( Time (Expected Number of Bitmap Scans Space (Number of Bitmaps Figure 11: SpaceTime Tradeoff of SpaceOptimal Bitmap Indexes, C ( ( ( 1 We now motivate our approximate characterization, which is based on the results of Theorem 1 Figure compares the spacetime tradeoff graphs for three classes of indexes: the class of spaceoptimal indexes, the class of timeoptimal indexes, the entire class of indexes, for C C ( Note that since the spaceoptimal index is generally not unique, each point in the spaceoptimal graph shown corresponds to the most timeefficient index among all equally spaceefficient indexes with the same number of components Figure shows that the tradeoff graph for spaceoptimal indexes provides a good approximation to that for all indexes In particular, the set of points on the graph for spaceoptimal indexes is a subset of the set of points on the graph for all indexes Figure 11 shows the same spaceoptimal tradeoff graph as in Figure but with each point labelled with the number of components of the corresponding spaceoptimal index We observe that the knee of the spacetime tradeoff graph for the spaceoptimal index corresponds to a component index, something that was consistent throughout our experimentation Hence, we characterize the knee index as the most timeefficient component spaceoptimal index, which is obtained from the following result ( ( ( similar results are obtained for other values of C The graph for spaceoptimal # (respectively, timeoptimal indexes consist of at most C ( points, where each point corresponds to an component spaceoptimal (respec # tively, timeoptimal index, Theorem 71 The base of the most timeefficient component spaceoptimal index is given by (, where C,! *,+ # * * E * * E/ * E, *( We have compared the knee index based on our approximate characterization with that based on the definition for various values of attribute cardinality the results show that both knee indexes match exactly for all the cases that we compared 1 = <7 = A = 7 #>39 7 # In this section, we consider the following practical optimization problem (point (B in Figure : Given a constraint on the available disk space to store an index, say at most bitmaps (or equivalently, at most bits, where is the number of records, determine the most timeefficient index We first present an algorithm that finds the optimal solution, then present a more efficient heuristic approach, which is nearoptimal Both algorithms are shown in Figure 1 In the following, let denote an component index 79: =<! : denote the component space timeoptimal indexes, respectively 7

8 Algorithms to Find TimeOptimal Bitmap Index Under Space Constraint Input: is the space constraint in terms of the maximum number of bitmaps C is the attribute cardinality Output: The timeoptimal bitmap index with no more than Algorithm TimeOptAlg 1 let be the smallest number such that : 97: =< if ( :! : =< then return! : =< : 3 let be the smallest number such that :! let : "! = return such that :, bitmaps =< : = Algorithm TimeOptHeur 1 = FindSmallestN( <,C if ( :! : < then return! : 3 return RefineIndex(I Figure 1: Algorithms TimeOptAlg TimeOptHeur 1! <7 = A SpaceOptimal Indexes 77 : Algorithm TimeOptAlg finds the actual optimal solution By result ( of Theorem 1, the solution must have at least components, where is the smallest number of components such that If the component timeoptimal index satisfies the space constraint (step, then by result ( of Theorem 1, it is the solution otherwise, by result ( of Theorem 1, the solution index must have no more than components where is the smallest number of components such that the component timeoptimal index satisfies the space constraint Figure 13 shows two cases to illustrate the bounds on the number of components of the solution index each point on the spacetime tradeoff graphs shown is labelled with the number of components of the corresponding index The time complexity of the algorithm is determined by the size of the set of cidate indexes defined by (step This search space is large as we need to enumerate for each value of,, all possible component indexes such that C Figure 1 shows the size of as a function of for C ( ( ( 1! " To avoid the costly exhaustive search in the optimal approach (steps, we present in this section a heuristic approach (Algorithm TimeOptHeur in Figure 1 to find an approximate optimal solution Our experimental results show that the heuristic approach is nearoptimal with the optimal index being selected at least 7#/ of the time The main idea is to first select a seed index that satisfies the space constraint, then to improve the timeefficiency of the seed index by adjusting its base numbers For the seed index, our approach uses an component index with :, where is the least number of components such that : 77 : Both are determined < by Algorithm FindSmallestN (Figure 1 Note that if! : satisfies the space constraint =< (step, then, as in step of Algorithm TimeOptAlg,! : is the optimal solution index Algorithm RefineIndex (Figure 1 improves the timeefficiency of an index its correctness is based on the following result Theorem 1 Let be an component index with base Suppose there exists two base numbers,, some integer constant (, such that ( if C, where if otherwise Time (Expected Number of Bitmap Scans Time (Expected Number of Bitmap Scans n = 3 n = 3 M Space (Number of Bitmaps n = (a n = 3 M Space (Number of Bitmaps (b TimeOptimal Indexes 3 1 SpaceOptimal Indexes TimeOptimal Indexes Figure 13: Bounds on Number of Components of Solution Index in Algorithm TimeOptAlg Let be another component index with base as defined above Then : : Based on Theorem 1, Algorithm RefineIndex essentially tries to maximize the number of components with small base numbers in particular, step determines the largest possible value for adjusting a pair of base numbers To illustrate Algorithm Refine Index, consider a component index with base C ( ( ( In the first iteration, with, 7 the base is refined to 1( In the second iteration, with 1(, the base is refined to ( After the final refinement step, the base of the refined index is ( ( By Equation (3, : 1

9 * Algorithm FindSmallestN: Algorithm to Find the Bitmap Index with Least Number of Components Under Space Constraint Input: is the space constraint in terms of the maximum number of bitmaps C is the attribute cardinality Output:, the smallest number of components such that : 97:, An component index with : 1 ( // is the number of components repeat 3 " // is the number of components with base ( until ( C 7 let be the component bitmap index with base return is a component bitmap index Algorithm RefineIndex: Algorithm to Improve the TimeEfficiency of a Bitmap Index Input: C is the attribute cardinality Output: A component bitmap index such that 1 let,! be the base of,? " 3 for = downto do let be the smallest base number from, delete from, if ( then 7 let be the smallest base number from, * * * * let " 9 if ((, then * " * / * * * /, # ( : ?!? +, + = 1 return component bitmap index with base : Figure 1: Algorithms FindSmallestN RefineIndex : : ( 7 By Equation (, : ( The time complexities of Algorithms FindSmallestN RefineIndex are C, respectively, where # is at most C ( Therefore, the time complexity of Algorithm TimeOptHeur is C C Attribute Percentage of Max ifference in Cardinality, Solutions Expected Number that are Optimal of Bitmap Scans / , ,, 1,1 Table : Effectiveness of Heuristic Approach to Select Time Optimal Bitmap Index under Space Constraint Table shows the effectiveness of the heuristic approach for various values of attribute cardinality The second column shows the percentage of solutions selected by the heuristic approach that are optimal for the attribute cardinality value indicated in the first column For those cases where the optimal heuristic approaches differ, the third column shows the maximum difference in the expected number of bitmap scans of their selected indexes The result indicates that the proposed heuristic approach is nearoptimal 3 * :9 = 7# = 7 7 = This section examines the effect of bitmap compression on the spacetime tradeoff of bitmap indexes 3! = 7 > = We first investigate different physical organizations for bitmap indexes to identify storage schemes that have good spacetime trade 9

10 ( 7 Number of Cidate Bitmap Indexes ata Set 1 ata Set Relation Lineitem Order Relation Cardinality ( ( ( ( ( ( ( Attribute Quantity Order ate Attribute Cardinality, C ( Table 3: Characteristics of TPC Benchmark ata used in Experiments 1( Space (Number of Bitmaps Figure 1: Size of Set of Cidate Bitmap Indexes,, as a Function of the Space Constraint for C ( ( ( off Consider an component bitmap index for a record relation, where the!# index component comprises of bitmaps, Let The entire bitmap index is essentially a bitmatrix, where the "!# component is a bitmatrix, each bitmap is a bitvector Based on the above view of a bitmap index, we compare the following three storage schemes: BitmapLevel Storage (BS Each bitmap is stored individually in a single bitmap file of bits Thus, the bitmap index is stored in bit bitmap files ComponentLevel Storage (CS Each index component is stored individually in a rowmajor order in a single bitmap file of bits, Thus, the bitmap index is stored in bitmap files IndexLevel Storage (IS The entire index is stored in a rowmajor order in a single bitmap file of bits We refer to a bitmap index that is organized using storage scheme (without any compression as a index, refer to a index with all its bitmaps compressed as a index (ie, for compressed For a component bitmap index, both a CSindex an ISindex are equivalent For a bitmap index with the maximum number of components ie, each component is of base, (1 both a BSindex a CSindex are equivalent, ( an ISindex is a projection index In a CSindex, each row of a component bitmatrix has either a consecutive stream of bits followed by a consecutive stream of ( bits if the index is rangeencoded, or exactly one bit set to if the index is equalityencoded In contrast, in a BSindex, the distribution of the bits in each bitmap is dependent on the distribution of the attribute values Thus, we expect a CSindex to be more compressible than a BSindex 3! >#7 = A 7 Our experimental study uses two data sets extracted from the TPC Benchmark []: data set 1 is for small attribute cardinality, data set is for large attribute cardinality Table 3 shows the key characteristics of our experimental data To limit the number of experiments for each data set, our comparison of the spacetime tradeoff is restricted to the class of spaceoptimal indexes (as a function of the number of index components, with being varied A projection index [] on an attribute ( in a relation is simply the projection of ( on with duplicates preserved stored in RIorder to this is motivated by our results in Section 7 which show that the spacetime tradeoff of spaceoptimal indexes provides a good approximation to the spacetime tradeoff of all indexes The data compression code used is from the zlib library Table compares the compressibility of the three storage schemes for both data sets For each component index each com pressed storage scheme, where, C, we compute the percentage of the size of under to the size of under BS For both data sets, the results show that CS Base Size of I Compressibility of of under BS Storage Scheme ( Index I (in bytes cbs ccs cis! "# #! # " " (a ata Set 1 Base Size of I Compressibility of of under BS Storage Scheme ( Index I (in bytes cbs ccs cis ( " " " # (b ata Set Table : Comparison of Compressibility of ifferent Storage Schemes indexes give the best compression In terms of query evaluation cost, a BSindex requires at most bitmap file scans per index component On the other h, both a CSindex an ISindex require all the bitmap files to be scanned moreover, they also incur additional CPU overhead to extract the bits of each relevant bitmap from the component bitmap files (which are stored in rowmajor order Thus, we expect cbsindexes to be more timeefficient than ccs cisindexes this is indeed supported by our experimental results For example, consider the base!+*,!!,*(!/ cbsindex,!, the average size of each compressed bitmap is + bytes Even when two bitmaps (an average of? bytes are scanned to evaluate an equality query, using the cbsindex is still significantly cheaper than using the ccsindex (or cisindex, which requires # 7 7 (1 bytes to be scanned Since cbsindexes are the most timeefficient ccsindexes are the most spaceefficient, we shall compare the spacetime trade? The zlib library is written by Jeanloup Gailly Mark Adler ( the compression method is based on a LZ77 variant called deflation

11 ( C ( < " C < offs of only cbs ccsindexes against BSindexes for the rest of this section Each bitmap index is used to evaluate a set of #C selection queries (using Algorithm RangeEvalOpt, where? : The predicate operator is restricted to (for range queries (for equality queries to limit the number of queries The performance of each index is measured in terms of its space timeefficiency The space metric is in terms of the total space of all its bitmaps (in MBytes, the time metric is in terms of the average predicate evaluation time (in secs which includes (1 the time to read the bitmaps, ( the time for inmemory bitmap decompression (for compressed bitmaps, (3 the time for bitmap operations The average predicate evaluation time,, is computed! as follows: C time to evaluate the predicate 3! >#7 = A A, where denote the This section presents experimental results for indexes under the storage schemes BS, cbs, ccs for data set 1 results for data set are not shown due to space limitation Figure 1(a shows the timeefficiency of the indexes as a function of the number of index components Both BS cbsindexes outperform ccsindexes significantly in particular, for ccsindexes, over (/ of the total evaluation time is due to bitmap decompression The decompression cost for ccsindexes generally increases with the number of bitmap components since the number of bits to be extracted is an increasing function of the number of components For ccsindexes, since all the compressed bitmap files need to be scanned for a query evaluation, their I/O cost is a function of the total size of all compressed bitmap files, which generally decreases as the number of components increases In contrast, for both BS cbsindexes, since the number of bitmaps to be scanned increases with the number of components, their I/O cost is an increasing function of the number of components Comparing BS cbsindexes, while cbsindexes incur a lower I/O cost, this is often offset by their bitmap decompression cost our results show that both BS cbsindexes are almost comparable in their timeefficiency Figure 1(b compares the spaceefficiency of the indexes as a function of the number of index components There are two main observations First, as we already know from Table, ccsindexes are the most spaceefficient Second, the results show that the effectiveness of bitmap compression decreases as the number of components increases This can be explained by the fact that the number of bitmaps is a decreasing function of the number of components (result ( of Theorem 1 in particular, there is a significant space reduction when an index is decomposed from one to two components Consequently, the gain of bitmap compression, with respect to space reduction, is only marginal once an index has been decomposed (ie, Thus, in terms of improving space efficiency, decomposing an component BSindex to an component BSindex could be more effective than compressing the component index Figure 1(c compares the spacetime tradeoff of the indexes The result shows that BS cbsindexes have comparable spacetime tradeoff, which is better than that of ccsindexes * :9< 7 1 = This section considers the effect of bitmap buffering on the spacetime tradeoff issues that we have addressed As the typical size of buffer space is at least GB in database systems running on SMP systems for large data warehouse applications [11], it is likely that a good number of bitmaps can remain memoryresident By taking into account the amount of mainmemory allocated for buffering bitmaps, more optimal indexes with better spacetime tradeoff can be designed The unit of buffering that we consider here is the number of bitmaps Consider an component with base Let denote the number of bitmaps that can be buffered in mainmemory, where bitmap buffer assignment for by, where : We denote a is the number of bitmaps in the "!# component of that are buffered A bitmap buffer assignment for is welldefined if In addition to the uniform query distribution assumption stated in Section, we further assume that the buffer hit rate for each referenced bitmap is uniformly distributed (ie, the probability that a reference to any bitmap in the!# component is a buffer hit is given by * Then, the expected number of bitmaps scans for a selection query evaluation using an component index with a bitmap buffer assignment is given by : ( We now present an optimal bitmap buffering policy for indexes From result (3 of Theorem 1 Theorem 1, we know that an index is more timeefficient if it has more components with smaller base numbers Since buffering the bitmaps in an index component effectively reduces the base of that component, it follows that it is more beneficial (in terms of reducing the expected number of bitmap scans to buffer a bitmap from a component with a smaller base number than from one with a larger base number The following optimal bitmap buffering policy is based on a prioritization of the index components using their base numbers Theorem 1 Consider the following priority assignment to bitmap components: 1 Components in have higher priority than those in 3 Within each set ( or 1 Partition the components of an index into two disjoint sets such that 1, components with smaller base numbers have higher priority Based on the above priority assignment, a bitmap buffering policy that favors bitmaps from a higher priority component over those from a lower priority component is optimal Based on Equation ( the above optimal bitmap buffering policy, we have the following result Theorem If index with base (, the timeoptimal index is an component Figure 17 compares the spacetime tradeoff of indexes (based on the optimal bitmap buffering policy as a function of the number of buffered bitmaps for C ( ( ( As expected, the spacetime tradeoff improves as increases Based on our experimental observations, we have the following approximate characterization of the knee: 11

### Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

### PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

### In-Memory Data Management for Enterprise Applications

In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University

### Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches Column-Stores Horizontal/Vertical Partitioning Horizontal Partitions Master Table Vertical Partitions Primary Key 3 Motivation

### Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters

### Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

### low-level storage structures e.g. partitions underpinning the warehouse logical table structures

DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

### In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

### 2.3 Scheduling jobs on identical parallel machines

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

### Bitmap Indices for Data Warehouses

Bitmap Indices for Data Warehouses Kurt Stockinger and Kesheng Wu Computational Research Division Lawrence Berkeley National Laboratory University of California Mail Stop 50B-3238 1 Cyclotron Road Berkeley,

### Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

### Evaluation of Bitmap Index Compression using Data Pump in Oracle Database

IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 3, Ver. III (May-Jun. 2014), PP 43-48 Evaluation of Bitmap Index Compression using Data Pump in Oracle

### An Efficient Bitmap Encoding Scheme for Selection Queries

An Efficient Bitmap Encoding Scheme for Selection Queries CheeYong Chan Department of Computer Sciences University of WisconsinMadison cychan@cs.wisc.edu Yannis E. Ioannidis* t Department of Computer Sciences

### Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil

### DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23

DATA WAREHOUSING II CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23 Last Time: Data Warehousing 2 Last time introduced the topic of decision support systems (DSS) and data warehousing

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

### ICS 434 Advanced Database Systems

ICS 434 Advanced Database Systems Dr. Abdallah Al-Sukairi sukairi@kfupm.edu.sa Second Semester 2003-2004 (032) King Fahd University of Petroleum & Minerals Information & Computer Science Department Outline

### Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse

Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse Zainab Qays Abdulhadi* * Ministry of Higher Education & Scientific Research Baghdad, Iraq Zhang Zuping Hamed Ibrahim Housien**

### RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates

### Linear Codes. Chapter 3. 3.1 Basics

Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

### Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Thiago Luís Lopes Siqueira Ricardo Rodrigues Ciferri Valéria Cesário Times Cristina Dutra de

### Multi-Dimensional Database Allocation for Parallel Data Warehouses

Multi-Dimensional Database Allocation for Parallel Data Warehouses Thomas Stöhr Holger Märtens Erhard Rahm University of Leipzig, Germany {stoehr maertens rahm}@informatik.uni-leipzig.de Abstract Data

### MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

### Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load

### A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

### Load Balancing in MapReduce Based on Scalable Cardinality Estimates

Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching

### Offline sorting buffers on Line

Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

### Information Systems. Computer Science Department ETH Zurich Spring 2012

Information Systems Computer Science Department ETH Zurich Spring 2012 Lecture VIII: Performance Tuning and Benchmarking in Databases Performance Tuning Performance tuning involves adjusting various parameters

### Improved Query Performance with Variant Indexes

Improved Query Performance with Variant Indexes Patrick O Neil Dallan Quass Department of Mathematics and Computer Science Department of Computer Science University of Massachusetts at Boston Stanford

### ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

### CH 7. MAIN MEMORY. Base and Limit Registers. Memory-Management Unit (MMU) Chapter 7: Memory Management. Background. Logical vs. Physical Address Space

Chapter 7: Memory Management CH 7. MAIN MEMORY Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation adapted from textbook slides Background Base and Limit Registers

### Research Statement Immanuel Trummer www.itrummer.org

Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses

### Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

### Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

### A Constraint Programming Application for Rotating Workforce Scheduling

A Constraint Programming Application for Rotating Workforce Scheduling Markus Triska and Nysret Musliu Database and Artificial Intelligence Group Vienna University of Technology {triska,musliu}@dbai.tuwien.ac.at

### Efficient Processing of Joins on Set-valued Attributes

Efficient Processing of Joins on Set-valued Attributes Nikos Mamoulis Department of Computer Science and Information Systems University of Hong Kong Pokfulam Road Hong Kong nikos@csis.hku.hk Abstract Object-oriented

### Introduction: Modeling:

Introduction: In this lecture, we discuss the principles of dimensional modeling, in what way dimensional modeling is different from traditional entity relationship modeling, various types of schema models,

### Handling large amount of data efficiently

Handling large amount of data efficiently 1. Storage media and its constraints (magnetic disks, buffering) 2. Algorithms for large inputs (sorting) 3. Data structures (trees, hashes and bitmaps) Lecture

### ProTrack: A Simple Provenance-tracking Filesystem

ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology das@mit.edu Abstract Provenance describes a file

### Data Warehouse: Introduction

Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

### Lecture 3: Linear Programming Relaxations and Rounding

Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can

### Chapter 12 File Management. Roadmap

Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access

### Chapter 12 File Management

Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access

### Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

### Sizing ASAP FOR BW ACCELERATOR SAP BUSINESS INFORMATION WAREHOUSE. How to size of a BW system SIZING AND PERFORMANCE. Version 3.

Sizing ASAP FOR BW ACCELERATOR SAP BUSINESS INFORMATION WAREHOUSE How to size of a BW system Version 3.1 December 2004 2004 SAP AG 1 Table of Contents ASAP for BW Accelerator... 1 1 INTRODUCTION... 1 2

### EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

### OPRE 6201 : 2. Simplex Method

OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2

### MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

### Rethinking SIMD Vectorization for In-Memory Databases

SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

### Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu

Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.

### Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

### Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

### IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives

Oracle Database In-Memory Power the Real-Time Enterprise Saurabh K. Gupta Principal Technologist, Database Product Management Who am I? Principal Technologist, Database Product Management at Oracle Author

### MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

### GTC 2014 San Jose, California

GTC 2014 San Jose, California An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management Yigal Jhirad and Blay Tarnoff March 26, 2014 GTC 2014: Table of Contents

### Data Management, Analysis Tools, and Analysis Mechanics

Chapter 2 Data Management, Analysis Tools, and Analysis Mechanics This chapter explores different tools and techniques for handling data for research purposes. This chapter assumes that a research problem

### SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

### 1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

### The Cubetree Storage Organization

The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction

### Lecture Notes on Data Warehousing by examples

Lecture Notes on Data Warehousing by examples Daniel Lemire and Owen Kaser October 25, 2004 1 Introduction Data Warehousing is an application-driven field. The Data Warehousing techniques can be applied

### A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the

### Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

### An On-Line Algorithm for Checkpoint Placement

An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute

### Improved Query Performance with Variant Indexes

Improved Query Performance with Variant Indexes Patrick O Neil Dallan Quass Department of Mathematics and Computer Science Department of Computer Science University of Massachusetts at Boston Stanford

### What Is a Data Warehouse?

What Is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management s decision-making process. W. H. Inmon Data warehousing:

### Data Warehousing Concepts

Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

### Roots of Polynomials

Roots of Polynomials (Com S 477/577 Notes) Yan-Bin Jia Sep 24, 2015 A direct corollary of the fundamental theorem of algebra is that p(x) can be factorized over the complex domain into a product a n (x

### Arithmetic Coding: Introduction

Data Compression Arithmetic coding Arithmetic Coding: Introduction Allows using fractional parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### 2. Basic Relational Data Model

2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

### A Catalogue of the Steiner Triple Systems of Order 19

A Catalogue of the Steiner Triple Systems of Order 19 Petteri Kaski 1, Patric R. J. Östergård 2, Olli Pottonen 2, and Lasse Kiviluoto 3 1 Helsinki Institute for Information Technology HIIT University of

### FCE: A Fast Content Expression for Server-based Computing

FCE: A Fast Content Expression for Server-based Computing Qiao Li Mentor Graphics Corporation 11 Ridder Park Drive San Jose, CA 95131, U.S.A. Email: qiao li@mentor.com Fei Li Department of Computer Science

### The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle

The Curious Case of Database Deduplication PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle Agenda Introduction Deduplication Databases and Deduplication All Flash Arrays and Deduplication 2 Quick Show

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### Fairness in Routing and Load Balancing

Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria

### Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

### 8. Query Processing. Query Processing & Optimization

ECS-165A WQ 11 136 8. Query Processing Goals: Understand the basic concepts underlying the steps in query processing and optimization and estimating query processing cost; apply query optimization techniques;

### Further Analysis Of A Framework To Analyze Network Performance Based On Information Quality

Further Analysis Of A Framework To Analyze Network Performance Based On Information Quality A Kazmierczak Computer Information Systems Northwest Arkansas Community College One College Dr. Bentonville,

### Storage Layout and I/O Performance in Data Warehouses

Storage Layout and I/O Performance in Data Warehouses Matthias Nicola 1, Haider Rizvi 2 1 IBM Silicon Valley Lab 2 IBM Toronto Lab mnicola@us.ibm.com haider@ca.ibm.com Abstract. Defining data placement

### Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases

Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases This guide gives you an introduction to conducting DSS (Decision Support System)

### JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February

### Estimating Deduplication Ratios in Large Data Sets

IBM Research labs - Haifa Estimating Deduplication Ratios in Large Data Sets Danny Harnik, Oded Margalit, Dalit Naor, Dmitry Sotnikov Gil Vernik Estimating dedupe and compression ratios some motivation

### Data Warehousing und Data Mining

Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

### Storage Management for Files of Dynamic Records

Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science

### Benchmarking Cassandra on Violin

Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

### Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

### Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

### Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

### ! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions Basic Steps in Query

### Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

### Access Paths for Data Mining Query Optimizer

Access Paths for Data Mining Query Optimizer Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract Data mining

### Two General Methods to Reduce Delay and Change of Enumeration Algorithms

ISSN 1346-5597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII-2003-004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration

### Data warehouse Architectures and processes

Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between

### Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

### An Architecture for Using Tertiary Storage in a Data Warehouse

An Architecture for Using Tertiary Storage in a Data Warehouse Theodore Johnson Database Research Dept. AT&T Labs - Research johnsont@research.att.com Motivation AT&T has huge data warehouses. Data from

### Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

WHITE PAPER Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com SAS 9 Preferred Implementation Partner tests a single Fusion