Bitmap Index Design and Evaluation


 Ruby Miles
 1 years ago
 Views:
Transcription
1 Bitmap Index esign Evaluation CheeYong Chan Yannis E Ioannidis epartment of Computer Sciences epartment of Computer Sciences University of WisconsinMadison University of WisconsinMadison Bitmap indexing has been touted as a promising approach for processing complex adhoc queries in readmostly environments, like those of decision support systems Nevertheless, only few possible bitmap schemes have been proposed in the past very little is known about the spacetime tradeoff that they offer In this paper, we present a general framework to study the design space of bitmap indexes for selection queries examine the diskspace time characteristics that the various alternative index choices offer In particular, we draw a parallel between bitmap indexing number representation in different number systems, define a space of two orthogonal dimensions that captures a wide array of bitmap indexes, both old new Within that space, we identify (analytically or experimentally the following interesting points: (1 the timeoptimal bitmap index ( the spaceoptimal bitmap index (3 the bitmap index with the optimal spacetime tradeoff (knee ( the timeoptimal bitmap index under a given diskspace constraint Finally, we examine the impact of bitmap compression bitmap buffering on the spacetime tradeoffs among those indexes As part of this work, we also describe a bitmapindexbased evaluation algorithm for selection queries that represents an improvement over earlier proposals We believe that this study offers a useful first set of guidelines for physical database design using bitmap indexes While the query performance issues of online transaction processing (OLTP systems have been extensively studied [7] are pretty much wellunderstood, the stateoftheart for ecision Support Systems (SS is still evolving as indicated by the growing active research in this area [3] Current database systems, which are optimized mainly for OLTP applications, are not suitable for SS applications due to their different requirements workload [] In particular, SS operate in readmostly environments, which are Partially supported by the National Science Foundation under Grant IRI9173 (PYI Award by grants from EC, IBM, HP, ATT, Oracle, Informix Author s present address: epartment of Informatics, University of Athens, Hellas (Greece dominated by complex adhoc queries that have high selectivity factors (ie, large foundsets A promising approach to process complex queries in SS is the use of bitmap indexing [, 9, ] Bitmap manipulation techniques have already been used in some commercial products [1] to speed up query processing: a notable example is Model, a prerelational BMS from Computer Corporation of America [] More recently, various BMS vendors, including Oracle, Red Brick, Sybase, have introduced bitmap indexes into their products to meet the performance requirements of SS applications [, ] In its simplest form, a bitmap index on an indexed attribute consists of one vector of bits (ie, bitmap per attribute value, where the size of each bitmap is equal to the cardinality of the indexed relation The bitmaps are encoded such that the "!# record has a value of in the indexed attribute if only if the!# bit in the bitmap associated with the attribute value is set to, the!# bit in each of the other bitmaps is set to ( This is called a ValueList index [] An example of a ValueList index for a * record relation + is shown in Figure 1, where each column in Figure 1(b represents a bitmap, associated with an attribute value / (a (b Figure 1: Example of a ValueList Index (a Projection of indexed attribute values with duplicates preserved (b ValueList Index Consider a highselectivityfactor query with selection predicates on two different attributes A conventional OLTP optimizer would generate one of the following three query plans: (P1 a full relation scan, (P an index scan (using the predicate with lower selectivity factor followed by a partial relation scan to filter out the nonqualifying tuples, or (P3 an index scan for each selection predicate, followed by a merge of the results from the two index scans ue to the compact sizes of bitmaps (especially for attributes with low cardinality the efficient hardware support for bitmap operations (AN, OR, XOR, NOT, plan (P3 using bitmap indexes is likely to be more efficient than a plan that re
2 quires a partial or full relation scan (plans (P1 (P A simple cost analysis shows that evaluating plan (P3 with bitmap indexes is more efficient than using the conventional RIlist based indexes for queries with selectivity factor above some threshold Let be the relation query result cardinalities, respectively Assume that each RI is bytes long that one bitmap is scanned per predicate In terms of the number of bytes read, using bitmap indexes for plan (P3 is more efficient than using RIlist based indexes if ie, Furthermore, operations on bitmaps are more CPUefficient than merging RIlists Various bitmap indexes [, 9,, 13, 1] have been designed for different query types, including range queries, aggregation queries, OLAPstyle queries However, as there is no overall best bitmap index over all kinds of queries, maintaining multiple types of bitmap indexes for an attribute may be necessary in order to achieve the desired level of performance While the gains in query performance using a multipleindex approach might be offset by the high update cost in OLTP applications, this is not an issue in the readmostly environment of SS applications Indeed, the multipleindex approach is adopted by Sybase IQ, a BMS specifically designed for data warehousing applications, which supports five different types of bitmap indexes, requires the database to be fully inverted [1] Maintaining multiple indexes for an attribute, however, further increases the disk space requirement of data warehouse applications Understing the spacetime tradeoff of the various bitmap indexes is therefore essential for a good physical database design In this paper, we study the spacetime tradeoff of bitmap indexes for the class of selection queries, ie, queries with predicates of the form, where refers to the indexed!#" attribute, is one of six comparison operators, is the predicate!" constant We refer to selection predicates where ( as equality, predicates, to selection predicates where *+ as range predicates Information on bitmap indexes for other types of queries (eg, starjoin groupby queries can be found elsewhere [9, ] The main contributions in this paper are as follows: The presentation of a general framework to study the design space of bitmap indexes for selection queries The framework not only captures the existing proposed ideas but reveals several other design alternatives as well An analysis of the spacetime tradeoff of bitmap indexes in particular, we identify analytically or experimentally four interesting points in a typical spacetime tradeoff graph, as shown in Figure : (A The spaceoptimal bitmap index (B The timeoptimal bitmap index under a given space constraint (C The bitmap index with the optimal spacetime tradeoff, ie, the knee of the spacetime tradeoff graph ( The timeoptimal bitmap index An experimental study of the effects of bitmap compression on the spacetime tradeoff issues An analytical study of the effects of bitmap buffering on the spacetime tradeoff issues The proposal of a more efficient evaluation algorithm for bitmap indexes The new algorithm reduces the number of bitmap operations by about (/ incurs one less bitmap scan for a range predicate evaluation The rest of this paper is organized as follows In Section, we present a framework to study the design space of bitmap indexes for Time M (A SpaceOptimal (B TimeOptimal under Space Constraint M Infeasible Region (C Optimal SpaceTime Tradeoff (knee Figure : SpaceTime Tradeoff Issues ( TimeOptimal Space selection queries Section 3 presents an improved evaluation algorithm for bitmap indexes, including both analytical experimental performance evaluations Section presents an analytical cost model for the spacetime tradeoff study Section compares the spacetime tradeoff of two basic bitmap encoding schemes In Section, we examine the spaceoptimal (point (A timeoptimal (point ( bitmap indexes In Section 7, we characterize the knee of the spacetime tradeoff graph of bitmap indexes (point (C Section presents an optimal as well as a heuristic approach to find the timeoptimal bitmap index under space constraint (point (B In Section 9, we present an experimental study of the impact of bitmap compression on the spacetime tradeoff issues Section presents an analytical study of the effect of bitmap buffering on the spacetime tradeoff issues Finally, we summarize our results in Section 11 For the rest of this paper, we use the term index to refer to a bitmap index ue to lack of space, the derivations of analytical results the proofs of theorems algorithms correctness are omitted from this paper full details are given elsewhere [] 13 7 :9< = 7 B In this section, we present a framework to examine the design space of indexes for selection queries The framework has been inspired by the work of Wong et al [13, 1] Let C denote the attribute cardinality ie, the number of distinct actual values of the indexed attribute The attribute cardinality is generally smaller than the cardinality of the attribute domain ie, the number of all possible values of the indexed attribute Without loss of generality to keep the presentation simple, we assume in this paper that the actual attribute values are consecutive integer values from ( to CE For the general case where the actual attribute values are not necessarily consecutive, a bitmap index can be built either on the entire attribute domain, or on C consecutive values by mapping each actual attribute value to its rank via a lookup table As described in Section 1, the ValueList index is a set of bitmaps, one per attribute value In other words, if one views this set as a twodimensional bit matrix (Figure 1(b, the focus is on the columns If the focus moves on the rows, however, then the index can be seen as the list of attribute values represented in some particular way As there is a large number of possible representations for attribute values (integers in this case, this clearly opens up the way for defining a whole host of bitmap indexing schemes In classifying all these representations, we identify two major orthogonal parameters that define the overall space of alternatives In particular, considering the ValueList index again, we observe that each attribute value is represented as a single digit (in basec
3 C E E E a sequence of numbers arithmetic, this digit being encoded in bits by turning exactly one out of C bits on The arithmetic we choose for the value representation, ie, the decomposition of the value in digits according to some base, the encoding scheme of each digit in bits are the two dimensions of the space are analyzed below (1 Attribute Value ecomposition Consider an attribute value Fur thermore, define into a sequence of digits Then can be decomposed as follows: where! ", #( * ( E E+,, each digit is in the range ( Based on the above, each choice of sequence gives a different representation of attribute values, therefore a different index An index is welldefined if,, The sequence is the base of the index, which is in / turn called a base index If, then the base is called uniform the index is called base for short The index consists of components, ie, one component per digit Each component individually is now a collection of bitmaps, constituting essentially a base index Let denote the number of bitmaps in the!# component of an index *,,,3 denote the collection of bitmaps that form the "!# component Figure 3 shows a base ValueList index (based on the same 1record relation in Figure 1 By decomposing a singlecomponent index into a component index, the number of bitmaps has been reduced from 7 to / 3 7A 7<E (a 7AH 7A E 7<E =<= = E9:<H = => 1 1 =<= = Ḧ 9:< = => 1 1 =<= = Ḧ 9:< = => E 1 1 =<= = Ḧ 9:< = => 1 1 =<= = 9:< = => 1 1 =<= = Ḧ 9:< = => 1 1 =<= = Ḧ 9:< = => 1 1 =<= = Ḧ 9:<H = => 1 1 =<= = 9:< = => E 1 1 =<= = E9:< = => 1 1 =<= = 9:<H = => 1 1 =<= = E9:< = => E 1 1 Figure 3: Example of a Component ValueList Index (a Projection of indexed attribute values with duplicates preserved (b Base? ValueList Index (b 7AH ( Bitmap Encoding Scheme Consider the!# component of an index with attribute cardinality There are essentially two major schemes to directly encode the corresponding values (( in bits : Equality Encoding: There are bits, one for each possible value The representation of value has all bits set to (, except for the bit corresponding to, which is set to Clearly, an equalityencoded component consists of bitmaps Range Encoding: There are bits again, one for each possible value The representation of value has the rightmost bits set to ( the remaining bits (starting from the one cor Intuitively, each responding to to the left set to bitmap, has in all the records whose!# component value is less than or equal to * Since the bitmap, has all bits set to, it does not need to be stored, so a rangeencoded component consists of bitmaps Figures (b (c show the rangeencoded indexes corresponding to the equalityencoded indexes in Figure 1 Figure 3, respectively 7 / 3 79:7A> 7AB7FEG79H E 7 H 7 E 7 H (a (b (c Figure : Examples of RangeEncoded Bitmap Indexes (a Projection of indexed attribute values with duplicates preserved (b Component,! Base9 RangeEncoded Bitmap Index (c Base RangeEncoded Bitmap Index Figure shows the twodimensional design space of indexes for selection queries, with the two existing alternatives, namely, the ValueList index the BitSliced index [], classified as shown The ValueList index, which is the simplest index is the most commonly implemented, has a single component is equalityencoded The BitSliced index [] has a uniform base is rangeencoded A base BitSliced index has been used in MOEL for evaluating range predicates [] The BitSliced index is also used in Sybase IQ for evaluating range predicates E We have excluded the third basic encoding scheme (the binary encoding scheme discussed in [13, 1], as it can be characterized in our framework as a base decomposition with one of the two encodings we do present Clearly, a symmetric scheme exists as well, where the roles of right left are exchanged 3
4 b < b, b,, b> ATTRIBUTE log b times VALUE ECOMPOSITION < b, b 1 > BITMAP ENCOING SCHEME Equality Range ValueList Index BitSliced Index bitmap operations bitmap scans, compared to Algorithm RangeEval, which incurs ( bitmap operations bitmap scans Moreover, while efficient processing of Algorithm BSRange Opt requires buffering for only one bitmap (the intermediate/final result, efficient processing of Algorithm RangeEval requires buffering for two bitmaps (for,,, Result < b 3, b, b 1 > Figure : esign Space of Bitmap Indexes for Selection Queries Result B 3 1 B 1 3 B 1 performing aggregation [] In this paper, we consider all welldefined decompositions, just the two encoding schemes presented above, which are used in practice = 7 A A = 9 * = 7 > B 7 3 B B B B 7 3 B 3 An evaluation algorithm for selection queries based on rangeencoded indexes has been proposed by O Neil Quass (Algorithm 3 in [], which we referred to as Algorithm RangeEval In this section, we present an improved evaluation algorithm (Algorithm RangeEvalOpt that reduces the number of bitmap operations by about (#/ requires one less bitmap scan for a range predicate evaluation Given this, in the rest of the paper, the improved evaluation algorithm is used as the basis for the time analysis of rangeencoded indexes Both evaluation algorithms are shown in Figure Let denote bitmaps with all bits set to (, respectively Also let,, denote the logical AN, OR, XOR operations, respectively,, denote the complement of a bitmap, epending on the input predicate operator (, Algorithm RangeEval evaluates each range predicate by computing two bitmaps: the, bitmap either the, or, bitmap For example, if the predicate is, then the result bitmap, is obtained by computing the bitmaps, involved,,,, or,, steps that are not required The final result bitmap that is returned is either,,,,,,,,,, or,, corresponding to the predicate operator,,,,, or ", respectively As we show below, such an evaluation strategy not only costs more bitmap operations to incrementally build the two intermediate bitmaps, but also incurs more bitmap scans since the evaluation of the bitmap,, which corresponds to an equality predicate evaluation, is the most expensive predicate in terms of bitmap scans Algorithm RangeEvalOpt avoids the intermediate equality predicate evaluation by evaluating each range query in terms of only the predicate based on the following three identities: (1, (, (3 Furthermore, the evaluation of the predicate itself does not require an equality predicate evaluation Consequently, Algorithm RangeEvalOpt requires the computation of only one bitmap, for any predicate evaluation, as opposed to two bitmaps (, either, or, required in Algorithm RangeEval Figure 7 compares the two algorithms for evaluating the predicate? using a component base ( index Clearly, Algorithm RangeEvalOpt is more efficient as it requires only B 3 B 7 3 (a Figure 7: Comparison of Evaluation Strategies for Selection Predicate? using a Component, Base ( RangeEncoded Bitmap Index (a Algorithm RangeEval (b Algorithm RangeEvalOpt! A " * A# = A A = 9 = 7 > Evaluation Number of Bitmap Operations Num of Predicate AN OR NOT XOR Total Scans RangeEval (*+, /1 /1, (*3+, /1 /1, (*7+,, (*+,, (:9+,, (=< 9+ 1,/1, (*+ RangeEvalOpt 1,/1, = 1 (*3+ 1,/1, = 1 (*7+,, = 1 (*+,, = 1 (:9+,, (=< 9+ 1,/1, Table 1: Worst Case Analysis of Evaluation Algorithms is the number of index components Table 1 shows the worst case analysis of the performance of the evaluation algorithms in terms of the number of bitmap operations scans For Algorithm RangeEval, since each range predicate evaluation entails an equality predicate evaluation (for,> bitmap, the number of bitmap operations of a range predicate evaluation is at least twice that of the predicate evaluation By using a more direct evaluation strategy, Algorithm RangeEvalOpt B (b B B 1
5 ( C Evaluation Algorithms for Selection Queries Using RangeEncoded Bitmap Indexes is the number of components in the rangeencoded index is the base of the index!#" is the predicate operator, is the predicate value, # is a bitmap representing the set of records with nonnull values for the indexed attribute Output: A bitmap representation of the set of records that satisfies the predicate Algorithm RangeEval 1,,,, # 3 let for i = downto do if ( ( then,,,, 7 if ( then,,,, 9,,:,, else 11,,:, 1 else * 13,,,, 1,,, 1,,:, # 1,,,,,, 17 return, Algorithm RangeEvalOpt 1, if ( then 3 let if ( then if ( then,, E for i = to do 7 if ( " then,,, if ( " ( then,,, 9 else for i = 1 to do 11 if ( ( then,,, 1 else if ( then,,, 13 else, ",,, 1 if ( then 1 return,, 1 else 17 return,, * Figure : Algorithms RangeEval RangeEvalOpt reduces the number of bitmap operations for range predicate evaluations by about half the number of bitmap scans for range predicates is reduced by one Both algorithms have the same cost for an equality predicate evaluation For both evaluation algorithms, the worst case occurs when (,, which corresponds also to the most probable case! >#7 = A # = 7 1 :9 A A " = 9 * = 7 #> In this section, we compare the performance of the evaluation algorithms in terms of the average number of bitmap scans the average number of bitmap operations Indexes with uniform base are generated by varying the two main parameters: the attribute cardinality C the base number We experimented with C ( ( (, ( ( ( for each value of C, the entire space of baserangeencoded indexes are generated by varying from to C For each index, we generated a total of C selection queries!" of the form by varying the predicate ( ( The graphs the predicate constant ( C Each query is evaluated using Algorithm RangeEval Algorithm RangeEvalOpt For a given rangeencoded index an evaluation algorithm, the average number of bitmap scans (operations is computed by taking the average of the number of bitmap scans (operations over all #C queries Figure compares the performance of the two evaluation algorithms as a function of the base number with C clearly demonstrate that Algorithm RangeEvalOpt is more efficient than Algorithm RangeEval Similar trends are obtained for other values of C # A 9 7 = A " Our cost model for the spacetime tradeoff analysis uses the following two metrics: The space metric is in terms of the number of bitmaps stored The time metric is in terms of the expected number of bitmap scans for a selection query evaluation, is based on the assumption that the queries in the query space are uniformly!" distributed, : where Note that our time metric incorporates only the I/O cost (ie, number of bitmap scans does not include the CPU cost (ie, number of bitmap operations as the number of bitmap scans the number of bitmap operations for a query evaluation are correlated We denote the space time metric of an index by : :, respectively # = 7 " :9< = 7 * = This section compares the performance of the two basic encoding schemes, namely, equality encoding range encoding, for selection queries Our result shows that range encoding provides better spacetime tradeoff than equality encoding in most cases Let denote an component index with base Based on the cost model, we have the following result : Theorem 1 If is equalityencoded, then where if "!#( (1
6 3 Average Number of Bitmap Scans RangeEval RangeEvalOpt Base Number (a Average Number of Bitmap Scans as a Function of Base Number 1 RangeEval RangeEvalOpt Given the overall superiority of rangeencoding, in the remainder of this paper, we focus on the spacetime tradeoff of rangeencoded indexes Henceforth, we use the term index as an abbreviation for a rangeencoded index 7 <7 = A =,7 = A = 7 #>? In this section, we identify both the spaceoptimal timeoptimal indexes (ie, points (A ( in Figure We refer to the most spaceefficient (respectively, timeefficient index among all indexes with components as the ncomponent spaceoptimal index (respectively, ncomponent timeoptimal index Note that the component spaceoptimal index might not be unique for example, if C ( ( (, then the base base indexes are both component spaceoptimal indexes Based on Equations (3 (, we have the following result Average Number of Bitmap Operations Base Number (b Average Number of Bitmap Operations as a Function of Base Number Figure : RangeEval vs RangeEvalOpt for Uniform Base RangeEncoded Bitmap Indexes, C ( ( :! * * "! * where ( * if (!# : : If is rangeencoded, then (3 ( The time analysis for rangeencoded indexes is based on Algorithm RangeEvalOpt The evaluation algorithm for equalityencoded indexes is not shown here due to space constraint details can be found elsewhere [] A rangeencoded index requires one or two bitmap scans per component for a selection predicate evaluation On the other h, an equalityencoded index requires only one bitmap scan per component for an equality predicate evaluation, but for a range predicate evaluation, the number of bitmap scans required per component is between two half the number of bitmaps in that component Figure 9 compares the spacetime tradeoff of range equalityencoded indexes for C (, ( (, ( ( ( The results show that rangeencoding in general provides better spacetime tradeoff than equalityencoding in most cases Theorem 1 (1 The number of bitmaps in an component, where spaceoptimal index is given by C is the smallest positive integer such that C The index with base is an component spaceoptimal index ( The spaceefficiency of spaceoptimal indexes is a nondecreasing function of the number of components (3 The base of the component timeoptimal index is "! ( E ( The timeefficiency of timeoptimal indexes is a nonincreasing function of the number of components By Theorem 1, it follows that the spaceoptimal index corresponds to the index with the maximum number of components (ie, # C ( with each base number equal to, the timeoptimal index corresponds to the index with the minimum number of components (ie, These are probably immediate corollaries of information theory /or number theory results as well, but they are easy enough to prove that doing so seemed easier than tracking down the existing literature * = 7 #>,+ <7 = A 7 = / 1 In this section, we characterize the knee of the spacetime tradeoff graph, referred to as the knee index (point (C in Figure, which corresponds to the index with the best spacetime tradeoff Note that our characterization is an approximate one as finding a precise analytical characterization is generally a difficult problem However, it turns out that our approximate characterization has very good accuracy We first give a definition of the knee index that will serve as a basis for comparing with our approximate characterization Let " denote the set of optimal indexes corresponding to the points in the spacetime tradeoff graph such that : : 7 7 For, let :9 +9 denote the gradients of the line segments formed by C The set of optimal indexes < is the maximal subset of all possible indexes <>= such that for there does not exist another index in <>= that is more efficient than? in both time space,
7 ! + RangeEncoded Index EqualityEncoded Index RangeEncoded Index EqualityEncoded Index RangeEncoded Index EqualityEncoded Index Time (Expected Number of Bitmap Scans Time (Expected Number of Bitmap Scans Time (Expected Number of Bitmap Scans Space (Number of Bitmaps Space (Number of Bitmaps Space (Number of Bitmaps (a C ( (b C ( ( (c C ( ( ( Figure 9: Comparison of SpaceTime Tradeoff of Range EqualityEncoded Bitmap Indexes with, respectively are defined as follows:! "#! where : : is a normalizing factor The index " with the maximum ratio is the knee index Time (Expected Number of Bitmap Scans TimeOptimal Index SpaceOptimal Index All Index Space (Number of Bitmaps Figure : SpaceTime Tradeoff of Bitmap Indexes, C ( ( ( Time (Expected Number of Bitmap Scans Space (Number of Bitmaps Figure 11: SpaceTime Tradeoff of SpaceOptimal Bitmap Indexes, C ( ( ( 1 We now motivate our approximate characterization, which is based on the results of Theorem 1 Figure compares the spacetime tradeoff graphs for three classes of indexes: the class of spaceoptimal indexes, the class of timeoptimal indexes, the entire class of indexes, for C C ( Note that since the spaceoptimal index is generally not unique, each point in the spaceoptimal graph shown corresponds to the most timeefficient index among all equally spaceefficient indexes with the same number of components Figure shows that the tradeoff graph for spaceoptimal indexes provides a good approximation to that for all indexes In particular, the set of points on the graph for spaceoptimal indexes is a subset of the set of points on the graph for all indexes Figure 11 shows the same spaceoptimal tradeoff graph as in Figure but with each point labelled with the number of components of the corresponding spaceoptimal index We observe that the knee of the spacetime tradeoff graph for the spaceoptimal index corresponds to a component index, something that was consistent throughout our experimentation Hence, we characterize the knee index as the most timeefficient component spaceoptimal index, which is obtained from the following result ( ( ( similar results are obtained for other values of C The graph for spaceoptimal # (respectively, timeoptimal indexes consist of at most C ( points, where each point corresponds to an component spaceoptimal (respec # tively, timeoptimal index, Theorem 71 The base of the most timeefficient component spaceoptimal index is given by (, where C,! *,+ # * * E * * E/ * E, *( We have compared the knee index based on our approximate characterization with that based on the definition for various values of attribute cardinality the results show that both knee indexes match exactly for all the cases that we compared 1 = <7 = A = 7 #>39 7 # In this section, we consider the following practical optimization problem (point (B in Figure : Given a constraint on the available disk space to store an index, say at most bitmaps (or equivalently, at most bits, where is the number of records, determine the most timeefficient index We first present an algorithm that finds the optimal solution, then present a more efficient heuristic approach, which is nearoptimal Both algorithms are shown in Figure 1 In the following, let denote an component index 79: =<! : denote the component space timeoptimal indexes, respectively 7
8 Algorithms to Find TimeOptimal Bitmap Index Under Space Constraint Input: is the space constraint in terms of the maximum number of bitmaps C is the attribute cardinality Output: The timeoptimal bitmap index with no more than Algorithm TimeOptAlg 1 let be the smallest number such that : 97: =< if ( :! : =< then return! : =< : 3 let be the smallest number such that :! let : "! = return such that :, bitmaps =< : = Algorithm TimeOptHeur 1 = FindSmallestN( <,C if ( :! : < then return! : 3 return RefineIndex(I Figure 1: Algorithms TimeOptAlg TimeOptHeur 1! <7 = A SpaceOptimal Indexes 77 : Algorithm TimeOptAlg finds the actual optimal solution By result ( of Theorem 1, the solution must have at least components, where is the smallest number of components such that If the component timeoptimal index satisfies the space constraint (step, then by result ( of Theorem 1, it is the solution otherwise, by result ( of Theorem 1, the solution index must have no more than components where is the smallest number of components such that the component timeoptimal index satisfies the space constraint Figure 13 shows two cases to illustrate the bounds on the number of components of the solution index each point on the spacetime tradeoff graphs shown is labelled with the number of components of the corresponding index The time complexity of the algorithm is determined by the size of the set of cidate indexes defined by (step This search space is large as we need to enumerate for each value of,, all possible component indexes such that C Figure 1 shows the size of as a function of for C ( ( ( 1! " To avoid the costly exhaustive search in the optimal approach (steps, we present in this section a heuristic approach (Algorithm TimeOptHeur in Figure 1 to find an approximate optimal solution Our experimental results show that the heuristic approach is nearoptimal with the optimal index being selected at least 7#/ of the time The main idea is to first select a seed index that satisfies the space constraint, then to improve the timeefficiency of the seed index by adjusting its base numbers For the seed index, our approach uses an component index with :, where is the least number of components such that : 77 : Both are determined < by Algorithm FindSmallestN (Figure 1 Note that if! : satisfies the space constraint =< (step, then, as in step of Algorithm TimeOptAlg,! : is the optimal solution index Algorithm RefineIndex (Figure 1 improves the timeefficiency of an index its correctness is based on the following result Theorem 1 Let be an component index with base Suppose there exists two base numbers,, some integer constant (, such that ( if C, where if otherwise Time (Expected Number of Bitmap Scans Time (Expected Number of Bitmap Scans n = 3 n = 3 M Space (Number of Bitmaps n = (a n = 3 M Space (Number of Bitmaps (b TimeOptimal Indexes 3 1 SpaceOptimal Indexes TimeOptimal Indexes Figure 13: Bounds on Number of Components of Solution Index in Algorithm TimeOptAlg Let be another component index with base as defined above Then : : Based on Theorem 1, Algorithm RefineIndex essentially tries to maximize the number of components with small base numbers in particular, step determines the largest possible value for adjusting a pair of base numbers To illustrate Algorithm Refine Index, consider a component index with base C ( ( ( In the first iteration, with, 7 the base is refined to 1( In the second iteration, with 1(, the base is refined to ( After the final refinement step, the base of the refined index is ( ( By Equation (3, : 1
9 * Algorithm FindSmallestN: Algorithm to Find the Bitmap Index with Least Number of Components Under Space Constraint Input: is the space constraint in terms of the maximum number of bitmaps C is the attribute cardinality Output:, the smallest number of components such that : 97:, An component index with : 1 ( // is the number of components repeat 3 " // is the number of components with base ( until ( C 7 let be the component bitmap index with base return is a component bitmap index Algorithm RefineIndex: Algorithm to Improve the TimeEfficiency of a Bitmap Index Input: C is the attribute cardinality Output: A component bitmap index such that 1 let,! be the base of,? " 3 for = downto do let be the smallest base number from, delete from, if ( then 7 let be the smallest base number from, * * * * let " 9 if ((, then * " * / * * * /, # ( : ?!? +, + = 1 return component bitmap index with base : Figure 1: Algorithms FindSmallestN RefineIndex : : ( 7 By Equation (, : ( The time complexities of Algorithms FindSmallestN RefineIndex are C, respectively, where # is at most C ( Therefore, the time complexity of Algorithm TimeOptHeur is C C Attribute Percentage of Max ifference in Cardinality, Solutions Expected Number that are Optimal of Bitmap Scans / , ,, 1,1 Table : Effectiveness of Heuristic Approach to Select Time Optimal Bitmap Index under Space Constraint Table shows the effectiveness of the heuristic approach for various values of attribute cardinality The second column shows the percentage of solutions selected by the heuristic approach that are optimal for the attribute cardinality value indicated in the first column For those cases where the optimal heuristic approaches differ, the third column shows the maximum difference in the expected number of bitmap scans of their selected indexes The result indicates that the proposed heuristic approach is nearoptimal 3 * :9 = 7# = 7 7 = This section examines the effect of bitmap compression on the spacetime tradeoff of bitmap indexes 3! = 7 > = We first investigate different physical organizations for bitmap indexes to identify storage schemes that have good spacetime trade 9
10 ( 7 Number of Cidate Bitmap Indexes ata Set 1 ata Set Relation Lineitem Order Relation Cardinality ( ( ( ( ( ( ( Attribute Quantity Order ate Attribute Cardinality, C ( Table 3: Characteristics of TPC Benchmark ata used in Experiments 1( Space (Number of Bitmaps Figure 1: Size of Set of Cidate Bitmap Indexes,, as a Function of the Space Constraint for C ( ( ( off Consider an component bitmap index for a record relation, where the!# index component comprises of bitmaps, Let The entire bitmap index is essentially a bitmatrix, where the "!# component is a bitmatrix, each bitmap is a bitvector Based on the above view of a bitmap index, we compare the following three storage schemes: BitmapLevel Storage (BS Each bitmap is stored individually in a single bitmap file of bits Thus, the bitmap index is stored in bit bitmap files ComponentLevel Storage (CS Each index component is stored individually in a rowmajor order in a single bitmap file of bits, Thus, the bitmap index is stored in bitmap files IndexLevel Storage (IS The entire index is stored in a rowmajor order in a single bitmap file of bits We refer to a bitmap index that is organized using storage scheme (without any compression as a index, refer to a index with all its bitmaps compressed as a index (ie, for compressed For a component bitmap index, both a CSindex an ISindex are equivalent For a bitmap index with the maximum number of components ie, each component is of base, (1 both a BSindex a CSindex are equivalent, ( an ISindex is a projection index In a CSindex, each row of a component bitmatrix has either a consecutive stream of bits followed by a consecutive stream of ( bits if the index is rangeencoded, or exactly one bit set to if the index is equalityencoded In contrast, in a BSindex, the distribution of the bits in each bitmap is dependent on the distribution of the attribute values Thus, we expect a CSindex to be more compressible than a BSindex 3! >#7 = A 7 Our experimental study uses two data sets extracted from the TPC Benchmark []: data set 1 is for small attribute cardinality, data set is for large attribute cardinality Table 3 shows the key characteristics of our experimental data To limit the number of experiments for each data set, our comparison of the spacetime tradeoff is restricted to the class of spaceoptimal indexes (as a function of the number of index components, with being varied A projection index [] on an attribute ( in a relation is simply the projection of ( on with duplicates preserved stored in RIorder to this is motivated by our results in Section 7 which show that the spacetime tradeoff of spaceoptimal indexes provides a good approximation to the spacetime tradeoff of all indexes The data compression code used is from the zlib library Table compares the compressibility of the three storage schemes for both data sets For each component index each com pressed storage scheme, where, C, we compute the percentage of the size of under to the size of under BS For both data sets, the results show that CS Base Size of I Compressibility of of under BS Storage Scheme ( Index I (in bytes cbs ccs cis! "# #! # " " (a ata Set 1 Base Size of I Compressibility of of under BS Storage Scheme ( Index I (in bytes cbs ccs cis ( " " " # (b ata Set Table : Comparison of Compressibility of ifferent Storage Schemes indexes give the best compression In terms of query evaluation cost, a BSindex requires at most bitmap file scans per index component On the other h, both a CSindex an ISindex require all the bitmap files to be scanned moreover, they also incur additional CPU overhead to extract the bits of each relevant bitmap from the component bitmap files (which are stored in rowmajor order Thus, we expect cbsindexes to be more timeefficient than ccs cisindexes this is indeed supported by our experimental results For example, consider the base!+*,!!,*(!/ cbsindex,!, the average size of each compressed bitmap is + bytes Even when two bitmaps (an average of? bytes are scanned to evaluate an equality query, using the cbsindex is still significantly cheaper than using the ccsindex (or cisindex, which requires # 7 7 (1 bytes to be scanned Since cbsindexes are the most timeefficient ccsindexes are the most spaceefficient, we shall compare the spacetime trade? The zlib library is written by Jeanloup Gailly Mark Adler (http://wwwcdromcom/pub/infozip/zlib the compression method is based on a LZ77 variant called deflation
11 ( C ( < " C < offs of only cbs ccsindexes against BSindexes for the rest of this section Each bitmap index is used to evaluate a set of #C selection queries (using Algorithm RangeEvalOpt, where? : The predicate operator is restricted to (for range queries (for equality queries to limit the number of queries The performance of each index is measured in terms of its space timeefficiency The space metric is in terms of the total space of all its bitmaps (in MBytes, the time metric is in terms of the average predicate evaluation time (in secs which includes (1 the time to read the bitmaps, ( the time for inmemory bitmap decompression (for compressed bitmaps, (3 the time for bitmap operations The average predicate evaluation time,, is computed! as follows: C time to evaluate the predicate 3! >#7 = A A, where denote the This section presents experimental results for indexes under the storage schemes BS, cbs, ccs for data set 1 results for data set are not shown due to space limitation Figure 1(a shows the timeefficiency of the indexes as a function of the number of index components Both BS cbsindexes outperform ccsindexes significantly in particular, for ccsindexes, over (/ of the total evaluation time is due to bitmap decompression The decompression cost for ccsindexes generally increases with the number of bitmap components since the number of bits to be extracted is an increasing function of the number of components For ccsindexes, since all the compressed bitmap files need to be scanned for a query evaluation, their I/O cost is a function of the total size of all compressed bitmap files, which generally decreases as the number of components increases In contrast, for both BS cbsindexes, since the number of bitmaps to be scanned increases with the number of components, their I/O cost is an increasing function of the number of components Comparing BS cbsindexes, while cbsindexes incur a lower I/O cost, this is often offset by their bitmap decompression cost our results show that both BS cbsindexes are almost comparable in their timeefficiency Figure 1(b compares the spaceefficiency of the indexes as a function of the number of index components There are two main observations First, as we already know from Table, ccsindexes are the most spaceefficient Second, the results show that the effectiveness of bitmap compression decreases as the number of components increases This can be explained by the fact that the number of bitmaps is a decreasing function of the number of components (result ( of Theorem 1 in particular, there is a significant space reduction when an index is decomposed from one to two components Consequently, the gain of bitmap compression, with respect to space reduction, is only marginal once an index has been decomposed (ie, Thus, in terms of improving space efficiency, decomposing an component BSindex to an component BSindex could be more effective than compressing the component index Figure 1(c compares the spacetime tradeoff of the indexes The result shows that BS cbsindexes have comparable spacetime tradeoff, which is better than that of ccsindexes * :9< 7 1 = This section considers the effect of bitmap buffering on the spacetime tradeoff issues that we have addressed As the typical size of buffer space is at least GB in database systems running on SMP systems for large data warehouse applications [11], it is likely that a good number of bitmaps can remain memoryresident By taking into account the amount of mainmemory allocated for buffering bitmaps, more optimal indexes with better spacetime tradeoff can be designed The unit of buffering that we consider here is the number of bitmaps Consider an component with base Let denote the number of bitmaps that can be buffered in mainmemory, where bitmap buffer assignment for by, where : We denote a is the number of bitmaps in the "!# component of that are buffered A bitmap buffer assignment for is welldefined if In addition to the uniform query distribution assumption stated in Section, we further assume that the buffer hit rate for each referenced bitmap is uniformly distributed (ie, the probability that a reference to any bitmap in the!# component is a buffer hit is given by * Then, the expected number of bitmaps scans for a selection query evaluation using an component index with a bitmap buffer assignment is given by : ( We now present an optimal bitmap buffering policy for indexes From result (3 of Theorem 1 Theorem 1, we know that an index is more timeefficient if it has more components with smaller base numbers Since buffering the bitmaps in an index component effectively reduces the base of that component, it follows that it is more beneficial (in terms of reducing the expected number of bitmap scans to buffer a bitmap from a component with a smaller base number than from one with a larger base number The following optimal bitmap buffering policy is based on a prioritization of the index components using their base numbers Theorem 1 Consider the following priority assignment to bitmap components: 1 Components in have higher priority than those in 3 Within each set ( or 1 Partition the components of an index into two disjoint sets such that 1, components with smaller base numbers have higher priority Based on the above priority assignment, a bitmap buffering policy that favors bitmaps from a higher priority component over those from a lower priority component is optimal Based on Equation ( the above optimal bitmap buffering policy, we have the following result Theorem If index with base (, the timeoptimal index is an component Figure 17 compares the spacetime tradeoff of indexes (based on the optimal bitmap buffering policy as a function of the number of buffered bitmaps for C ( ( ( As expected, the spacetime tradeoff improves as increases Based on our experimental observations, we have the following approximate characterization of the knee: 11
12 < BS cbs ccs 3 BS cbs ccs 3 BS cbs ccs 3 Time (secs 1 Space (MB Time (secs Number of Components 1 3 Number of Components Space (MB (a Time vs Number of Components (b Space vs Number of Components (c Time vs Space Figure 1: Comparison of SpaceTime Tradeoff of Compressed Bitmap Indexes under ifferent Storage Schemes for ata Set 1 The knee of the spacetime tradeoff graph corresponds approximately to an component index with, where C!, base C,!* = E, *,+ # * ( * E * * E / Note that the approximate knee characterization in Section 7 is a special case of the above result Our experimental results show that the approximate knee characterization has very good accuracy Time (Expected Number of Bitmap Scans 1 Space (Number of Bitmaps Figure 17: SpaceTime Tradeoff as a Function of the Number of Buffered Bitmaps, C ( ( ( # A In this paper, we have presented a general framework to study the design space of bitmap indexes for selection queries, have examined several spacetime tradeoff issues To the best of our knowledge, this study represents a first set of guidelines for physical database design using bitmap indexes Our results indicate that rangeencoded bitmap indexes offer, in most cases, better spacetime tradeoff than equalityencoded bitmap indexes Concentrating on these, we have identified the timeoptimal index, the spaceoptimal index, the index with the optimal spacetime tradeoff, the timeoptimal index under diskspace constraints In addition, we have proposed an optimal bitmap buffering policy examined its impact on the spacetime tradeoff issues All these results are based on an improved evaluation algorithm for! = m= m=1 m= m=3 m= m= m= selection queries that we have proposed for rangeencoded bitmap indexes For future work, we plan to explore bitmap indexes for the more general class of selection queries that is based on the membership operator 9 [1] AIP Technical Publications Sybase IQ Indexes In Sybase IQ Administration Guide, Sybase IQ Release 11 Collection, chapter Sybase Inc, March admin/1toc [] CY Chan YE Ioannidis Bitmap Index esign Evaluation Computer Sciences ept, University of WisconsinMadison, cychan/paper1ps [3] S Chaudhuri U ayal An Overview of ata Warehousing OLAP Technology ACM SIGMO Record, (1: 7, March 1997 [] Transaction Processing Performance Council, May 199 [] H Edelstein Faster ata Warehouses Information Week, pages 77, ecember 199 [] C French One Size Fits All atabase Architectures do not work for SS In SIGMO 9, pages 9, San Jose, California, May 199 [7] G Graefe Query Evaluation Techniques for Large atabases Computing Surveys, (:73 17, 1993 [] P O Neil Model Architecture Performance In Proceedings of the nd International Workshop on High Performance Transactions Systems, pages 9, Asilomar, CA, 197 SpringerVerlag In Lecture Notes in Computer Science 39 [9] P O Neil G Graefe MultiTable Joins Through Bitmapped Join Indices ACM SIGMO Record, pages 11, September 199 [] P O Neil Quass Improved Query Performance with Variant Indexes In SIGMO 97, pages 3 9, Tucson, Arizona, May 1997 [11] TF Tsuei, AN Packer, KT Ko atabase Buffer Size Investigation for OLTP Workloads In SIGMO 97, pages 11 1, Tucson, Arizona, May 1997 [1] J Winchell Rushmore s Bald Spot BMS, (:, September 1991 [13] HKT Wong, JZ Li, F Olken, Rotem, L Wong Bit Tranposition for Very Large Scientific Statistical atabases Algorithmica, 1(3:9 39, 19 [1] HKT Wong, HF Liu, F Olken, Rotem, L Wong Bit Tranposed Files In VLB, pages 7, Stockholm, 19 1
An Efficient Bitmap Encoding Scheme for Selection Queries
An Efficient Bitmap Encoding Scheme for Selection Queries CheeYong Chan Department of Computer Sciences University of WisconsinMadison cychan@cs.wisc.edu Yannis E. Ioannidis* t Department of Computer Sciences
More informationEfficient Processing of Joins on Setvalued Attributes
Efficient Processing of Joins on Setvalued Attributes Nikos Mamoulis Department of Computer Science and Information Systems University of Hong Kong Pokfulam Road Hong Kong nikos@csis.hku.hk Abstract Objectoriented
More informationBitmap Index Design Choices and Their Performance Implications
Bitmap Index Design Choices and Their Performance Implications Elizabeth O Neil and Patrick O Neil University of Massachusetts at Boston Kesheng Wu Lawrence Berkeley National Laboratory {eoneil, poneil}@cs.umb.edu
More informationExtracting k Most Important Groups from Data Efficiently
Extracting k Most Important Groups from Data Efficiently Man Lung Yiu a, Nikos Mamoulis b, Vagelis Hristidis c a Department of Computer Science, Aalborg University, DK9220 Aalborg, Denmark b Department
More informationINTERNATIONAL JOURNAL OF COMPUTERS AND COMMUNICATIONS Issue 2, Volume 2, 2008
1 An Adequate Design for Large Data Warehouse Systems: Bitmap index versus Btree index Morteza Zaker, Somnuk PhonAmnuaisuk, SuCheng Haw Faculty of Information Technologhy Multimedia University, Malaysia
More informationApproximately Detecting Duplicates for Streaming Data using Stable Bloom Filters
Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta fandeng@cs.ualberta.ca Davood Rafiei University of Alberta drafiei@cs.ualberta.ca ABSTRACT
More informationES 2 : A Cloud Data Storage System for Supporting Both OLTP and OLAP
ES 2 : A Cloud Data Storage System for Supporting Both OLTP and OLAP Yu Cao, Chun Chen,FeiGuo, Dawei Jiang,YutingLin, Beng Chin Ooi, Hoang Tam Vo,SaiWu, Quanqing Xu School of Computing, National University
More informationSizeConstrained Weighted Set Cover
SizeConstrained Weighted Set Cover Lukasz Golab 1, Flip Korn 2, Feng Li 3, arna Saha 4 and Divesh Srivastava 5 1 University of Waterloo, Canada, lgolab@uwaterloo.ca 2 Google Research, flip@google.com
More informationDULO: An Effective Buffer Cache Management Scheme to Exploit Both Temporal and Spatial Locality. Abstract. 1 Introduction
: An Effective Buffer Cache Management Scheme to Exploit Both Temporal and Spatial Locality Song Jiang Performance & Architecture Laboratory Computer & Computational Sciences Div. Los Alamos National Laboratory
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationWeaving Relations for Cache Performance
Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Carnegie Mellon University Univ. of WisconsinMadison Univ. of WisconsinMadison Univ. of WisconsinMadison
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More informationChord: A Scalable Peertopeer Lookup Service for Internet Applications
Chord: A Scalable Peertopeer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science chord@lcs.mit.edu
More informationCumulus: Filesystem Backup to the Cloud
Cumulus: Filesystem Backup to the Cloud Michael Vrable, Stefan Savage, and Geoffrey M. Voelker Department of Computer Science and Engineering University of California, San Diego Abstract In this paper
More informationLIRS: An Efficient Low Interreference Recency Set Replacement Policy to Improve Buffer Cache Performance
: An Efficient Low Interreference ecency Set eplacement Policy to Improve Buffer Cache Performance Song Jiang Department of Computer Science College of William and Mary Williamsburg, VA 231878795 sjiang@cs.wm.edu
More informationSupporting Keyword Search in Product Database: A Probabilistic Approach
Supporting Keyword Search in Product Database: A Probabilistic Approach Huizhong Duan 1, ChengXiang Zhai 2, Jinxing Cheng 3, Abhishek Gattani 4 University of Illinois at UrbanaChampaign 1,2 Walmart Labs
More informationMining Data Streams. Chapter 4. 4.1 The Stream Data Model
Chapter 4 Mining Data Streams Most of the algorithms described in this book assume that we are mining a database. That is, all our data is available when and if we want it. In this chapter, we shall make
More informationWhat s the Difference? Efficient Set Reconciliation without Prior Context
hat s the Difference? Efficient Set Reconciliation without Prior Context David Eppstein Michael T. Goodrich Frank Uyeda George arghese,3 U.C. Irvine U.C. San Diego 3 ahoo! Research ABSTRACT e describe
More informationEfficient set intersection for inverted indexing
Efficient set intersection for inverted indexing J. SHANE CULPEPPER RMIT University and The University of Melbourne and ALISTAIR MOFFAT The University of Melbourne Conjunctive Boolean queries are a key
More informationOutOfCore Algorithms for Scientific Visualization and Computer Graphics
OutOfCore Algorithms for Scientific Visualization and Computer Graphics Course Notes for IEEE Visualization 2002 Boston, Massachusetts October 2002 Organizer: Cláudio T. Silva Oregon Health & Science
More informationBenchmarking Cloud Serving Systems with YCSB
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Santa Clara, CA, USA {cooperb,silberst,etam,ramakris,sears}@yahooinc.com
More informationThe Set Covering Machine
Journal of Machine Learning Research 3 (2002) 723746 Submitted 12/01; Published 12/02 The Set Covering Machine Mario Marchand School of Information Technology and Engineering University of Ottawa Ottawa,
More informationLow Overhead Concurrency Control for Partitioned Main Memory Databases
Low Overhead Concurrency Control for Partitioned Main Memory bases Evan P. C. Jones MIT CSAIL Cambridge, MA, USA evanj@csail.mit.edu Daniel J. Abadi Yale University New Haven, CT, USA dna@cs.yale.edu Samuel
More informationStreaming Submodular Maximization: Massive Data Summarization on the Fly
Streaming Submodular Maximization: Massive Data Summarization on the Fly Ashwinkumar Badanidiyuru Cornell Unversity ashwinkumarbv@gmail.com Baharan Mirzasoleiman ETH Zurich baharanm@inf.ethz.ch Andreas
More informationStoring a Collection of Polygons Using Quadtrees
Storing a Collection of Polygons Using Quadtrees HANAN SAMET University of Maryland and ROBERT E. WEBBER Rutgers University An adaptation of the quadtree data structure that represents polygonal maps (i.e.,
More informationSpeeding up Distributed RequestResponse Workflows
Speeding up Distributed RequestResponse Workflows Virajith Jalaparti (UIUC) Peter Bodik Srikanth Kandula Ishai Menache Mikhail Rybalkin (Steklov Math Inst.) Chenyu Yan Microsoft Abstract We found that
More informationSummary Cache: A Scalable WideArea Web Cache Sharing Protocol
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 3, JUNE 2000 281 Summary Cache: A Scalable WideArea Web Cache Sharing Protocol Li Fan, Member, IEEE, Pei Cao, Jussara Almeida, and Andrei Z. Broder AbstractThe
More informationThe Role Mining Problem: Finding a Minimal Descriptive Set of Roles
The Role Mining Problem: Finding a Minimal Descriptive Set of Roles Jaideep Vaidya MSIS Department and CIMIC Rutgers University 8 University Ave, Newark, NJ 72 jsvaidya@rbs.rutgers.edu Vijayalakshmi Atluri
More informationScalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
More informationRobust Set Reconciliation
Robust Set Reconciliation Di Chen 1 Christian Konrad 2 Ke Yi 1 Wei Yu 3 Qin Zhang 4 1 Hong Kong University of Science and Technology, Hong Kong, China 2 Reykjavik University, Reykjavik, Iceland 3 Aarhus
More information