Similar documents
Reasoning to Solve Equations and Inequalities

EQUATIONS OF LINES AND PLANES

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Factoring Polynomials

Small Businesses Decisions to Offer Health Insurance to Employees

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

How To Network A Smll Business

Small Business Networking

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn

How fast can we sort? Sorting. Decision-tree model. Decision-tree for insertion sort Sort a 1, a 2, a 3. CS Spring 2009

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

Small Business Networking

Small Business Networking

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Integration by Substitution

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Small Business Networking

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

A.7.1 Trigonometric interpretation of dot product A.7.2 Geometric interpretation of dot product

Section 5-4 Trigonometric Functions

How To Set Up A Network For Your Business

Graphs on Logarithmic and Semilogarithmic Paper

Integration. 148 Chapter 7 Integration

Utilization of Smoking Cessation Benefits in Medicaid Managed Care,

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Basic Analysis of Autarky and Free Trade Models

Econ 4721 Money and Banking Problem Set 2 Answer Key

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

Scalable Mining of Large Disk-based Graph Databases

All pay auctions with certain and uncertain prizes a comment

9 CONTINUOUS DISTRIBUTIONS

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment

CS99S Laboratory 2 Preparation Copyright W. J. Dally 2001 October 1, 2001

Small Business Cloud Services

Operations with Polynomials

AntiSpyware Enterprise Module 8.5

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

4.11 Inner Product Spaces

MATH 150 HOMEWORK 4 SOLUTIONS

Simulation of operation modes of isochronous cyclotron by a new interative method

Vector differentiation. Chapters 6, 7

Lecture 3 Gaussian Probability Distribution

MODULE 3. 0, y = 0 for all y

Dynamic TDMA Slot Assignment in Ad Hoc Networks

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

Vectors Recap of vectors

How To Make A Network More Efficient

Homework 3 Solutions

Health insurance exchanges What to expect in 2014

Exponential and Logarithmic Functions

Helicopter Theme and Variations

1. Find the zeros Find roots. Set function = 0, factor or use quadratic equation if quadratic, graph to find zeros on calculator

5 a LAN 6 a gateway 7 a modem

2 DIODE CLIPPING and CLAMPING CIRCUITS

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

FAULT TREES AND RELIABILITY BLOCK DIAGRAMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University

Regular Sets and Expressions

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

Learning Outcomes. Computer Systems - Architecture Lecture 4 - Boolean Logic. What is Logic? Boolean Logic 10/28/2010

Warm-up for Differential Calculus

Concept Formation Using Graph Grammars

VoIP for the Small Business

2. Transaction Cost Economics

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

Network Configuration Independence Mechanism

Math 314, Homework Assignment Prove that two nonvertical lines are perpendicular if and only if the product of their slopes is 1.

Combined Liability Insurance. Information and Communication Technology Proposal form

Review guide for the final exam in Math 233

Morgan Stanley Ad Hoc Reporting Guide

Recognition Scheme Forensic Science Content Within Educational Programmes

Basic Research in Computer Science BRICS RS Brodal et al.: Solving the String Statistics Problem in Time O(n log n)

Quick Reference Guide: One-time Account Update

Answer, Key Homework 10 David McIntyre 1

Solving the String Statistics Problem in Time O(n log n)

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Health insurance exchanges What to expect in 2014

Vendor Rating for Service Desk Selection

Unleashing the Power of Cloud

Vectors and dyadics. Chapter 2. Summary. 2.1 Examples of scalars, vectors, and dyadics

Experiment 6: Friction

Module 2. Analysis of Statically Indeterminate Structures by the Matrix Force Method. Version 2 CE IIT, Kharagpur

Health insurance marketplace What to expect in 2014

. At first sight a! b seems an unwieldy formula but use of the following mnemonic will possibly help. a 1 a 2 a 3 a 1 a 2

Math 135 Circles and Completing the Square Examples

1.2 The Integers and Rational Numbers

Binary Representation of Numbers Autar Kaw

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

** Dpt. Chemical Engineering, Kasetsart University, Bangkok 10900, Thailand

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 16 th May Time: 14:00 16:00

Transcription:

Encoded Bitmp Indexing for Dt Wrehouses Ming-Chun Wu DVS, Computer Science Deprtment Technische Universitt Drmstdt, GERMANY wu@dvsinformtiktu-drmstdtde Alejndro P Buchmnn DVS, Computer Science Deprtment Technische Universitt Drmstdt, GERMANY uchmnn@dvsinformtiktu-drmstdtde Astrct Complex query types, huge dt volumes, nd very high red/updte rtios mke the indexing techniques designed nd tuned for trditionl dtse systems unsuitle for dt wrehouses (DW) We propose n encoded itmp indexing for DWs which improves the performnce of known itmp indexing in the cse of lrge crdinlity domins A performnce nlysis nd theorems which identify properties of good encodings for etter performnce re presented We compre encoded itmp indexing with relted techniques, such s it slicing, projection-, dynmic-, nd rnge-sedindexing Introduction Complex query types, huge dt volumes nd very high red/updte rtios ply crucil roles in query processing in dt wrehouses (DW) These fctors mke the query processing/optimiztion techniques designed nd tuned for On-Line Trnsction Processing (OLTP) systems unsuitle for the DW environment Mny pproches hve een proposed for query processing in DWs, such s, precomputtion of summrized dt, predened ccess pths, specil index techniques, etc In this pper, we propose encoded itmp indexing n extension of known itmp indexing, rst proposed y O'Neil in the Model 24 DBMS [9] In Section 2, we discuss itmp indexing s proposed in [9, ], nd propose n encoded itmp indexing to del with lrge crdinlity domins We thus correct shortcoming of simple itmp indexing, which is est suited for low crdinlity ttriutes The sic ide of encoded itmp indexing is to encode the ttriute domin Therefore, we lso discuss how encoding ects the performnce of the index We de- ne the concept inry distnce, chin nd well-dened encoding nd derive theorems tht dene the properties of good encoding Some potentil pplictions nd vritions of encoded itmp indexing specic to the DW environment re lso discussed, such s hierrchy encoding, totl-order preserving encoding nd rnge-sed indexes using encoded itmp indexing In Section 3 we give comprtive performnce nlysis of simple nd encoded itmp indexes The result shows tht encoded itmp indexes perform etter in most cses Even if the prolem size increses drmticlly, the performnce degrdes logrithmiclly, while the performnce of simple itmp indexes degrdes linerly In Section 4 we discuss relted indexing techniques, discuss the prolems they solve nd their differences with encoded itmp indexing In Section 5 we present conclusions nd future work 2 Bitmp Indexing Techniques We present rief overview of simple itmp indexing nd the ppliction domin for which it is idelly suited The limittions of simple itmp indexing led us to propose new indexing technique encoded itmp indexing The min dvntges of encoded itmp indexes re drstic reduction in spce requirements nd corresponding performnce gins 2 Simple itmp indexing revisited The sic ide ehind simple itmp indexing is to use string of its ( or ) to indicte whether n ttriute in tuple is equl to specic vlue or not The position of it in the it string denotes the position of tuple in the tle The it is set, if the content of n ttriute is ssocited with specic vlue For exmple, simple itmp index on n ttriute GENDER, withdomin fmle Femleg, results in twoitmpvectors, sy IB M nd IB F For IB M, the it is set to, if the corresponding tuple hs the ttriute GENDER=Mle, otherwise the it is set to For IB F, the it is set to, if the ssocited tuple hs the ttriute GENDER=Femle, otherwise the it is set to The simple itmp index on the ttriute GENDER, B GENDER, is the collection of itmp vectors f IB M IB F g B-trees nd their vrints (lter simply denoted s B-trees) hve een widely dopted in dtse systems s externl indexing They provide ecient mechnisms for serching nd require time nd spce only logrithmic to the mount of indexed keys Their Note tht the negtion of IBM must not necessrily e equivlent to IBF ecuse of missing informtion nd NULLs

strength is their dynmic nture, performnce nd stility under updte properties tht re not required in DW In the DW environment, simple itmp indexing hs dvntges over B-trees, since ) uilding/mintining simple itmp indexes usully costs less time nd spce, nd 2) itmp indexes cn eciently work together to reduce the serch-spce efore relly ccessing the dt Cost Anlysis Let T e tle nd T = ft ::: t n g Dene the crdinlity of T s jtj = the numer of distinct tuples in T Then, uilding simple itmp index on n ttriute A (A 2f ::: m g)of the tle T requires jtjjaj = nm ytes, where m 8 8 is the crdinlity of A, dened s jaj = the numer of distinct vlues in the domin of A On the other hnd, uilding B-tree on ttriute A requires out :44n p ytes, where p is the pge size, nd M is the M degree of the B-tree [2, ] If m< :52p, then simple M itmp indexes re more spce ecient thn B-trees In other words, ssume tht p = 4K nd M = 52, then if the crdinlity of A is smller thn 93 (ie, m<93), uilding simple itmp index on A is more economic in size thn uilding B-trees As for the time complexity, the complexity of uilding B-tree on A is O(n log M m)+o(n log 2 2 ( p )), 4 where p is the pge size nd 4 is the size of tuple-id The rst term denotes the cost of trversing from the root to the lef nodes, nd the second term denotes the cost of inserting the tuple-ids into the corresponding lef nodes On the other hnd, the complexity of uilding simple itmp index on A is O(n m) If n (the crdinlity of the indexed tle) is very lrge, nd m (the crdinlity of the indexed ttriute) is very p smll, then O(nlog M m)+o(nlog ) 2 2 4 > O(nm), ie, the time complexity of uilding B-trees is lrger thn tht of simple itmp indexes Coopertivity of Indexes The min function of indexes is to ccelerte query processing y sizing down the serch spce Both B-trees nd simple itmp indexes cn chieve this However, if two or more selection conditions re given in query, sy A = i AND B = j, seprte B-trees on ttriute A nd ttriute B cnnot eciently cooperte with ech other 2 We need to uild nother B-tree on the compound key (A B) In contrst, seprte simple itmp indexes on A nd B cn eciently work together to fetch the desired dt y simply performing logicl opertor, AND, on the corresponding itmp vectors The impct of the coopertivity of simple itmp 2 Although multiple index ccesses on vlue-list sed indexes hve een implemented in DB2 [5], the cost of multiple index ccesses for itmp indexing is much smller thn tht of B- trees indexes is tht if top n ttriutes with the highest referenced rte in users' queries re chosen, nd indexes re to e uilt on them, we only need n simple itmp indexes Any comintion of selection conditions involving ny suset of the n ttriutes cn e eciently evluted y pplying corresponding logicl opertions on the itmp vectors If B-trees on compound keys re uilt, in order to cover ll possile comintions of selection conditions mong these n ttriutes, we need C n + C2 n + :::+ Cn n = 2 n ; B-trees The cost of mintining so mny B-trees would e uncceptle If we consider index coopertivity, simple itmp indexes will hve dominting dvntges Restrictions However, s the crdinlities of the keys increse, oth the time nd spce complexity of uilding nd mintining simple itmp indexes rpidly ecome higher The sprsity of the itmp vectors is nother prolem which comes with high crdinlity The sprsity of itmp vector is on verge m;, where m is the crdinlity of the ttriute As m m increses, the spce utiliztion degrdes Second, for queries involving lrge rnge serches (rnge serches denote oth IN-lists nd rnge selections of the form i<a<j), the numer of itmp vectors which needs to e processed lso increses For lrge itmp vectors, the cost cnnot e ignored In this cse, simple itmp indexes might perform worse thn B-trees To solve the prolems derived from high crdinlity, new indexing technique encoded itmp indexing, is proposed 22 Encoded itmp indexing Suppose tht we hve fct tle, SALES, with N tuples nd dimension tle, PRODUCTS, contining 2 dierent products Trditionlly, if we wnt to uild simple itmp index on the PRODUCTS dimension, it will result in 2 itmp vectors of N its in length In encoded itmp indexing, insted of 2 itmp vectors, dlog 2 2e = 4 itmp vectors, plus mpping tle, re used For exmple, suppose tht the domin of ttriute A of tle T is f cg (As for the cses of NULL-vlues, or non-existing tuples, simple itmp indexing uses seprte itmp vectors to represent them, while in encoded itmp indexing, they re encoded together with other domin vlues Further discussion cn e found lter in this section) Instedofusing3itmpvectors, we usedlog 2 3e =2 itmp vectors to uild the index on ttriute A As Figure shows, we use 2 its to encode the domin f cg, where is encoded s, s nd c s, respectively For those tuples with A =, we set corresponding positions in oth itmp vectors IB nd IB to for those with A =, IB = nd IB = nd so on In principle, the itmp vector

Tle: T ::: A ::: c simple itmp index encoded itmp index z } { IB IB IBc z } { IB IB Mpping Tle c Figure : An exmple of encoded itmp indexing IB i stores the i-th it (from the lest signicnt it, LSB, to the most signicnt it, MSB) of the encoded vlue of ttriute A To retrieve dt, we dene the retrievl Boolen function for ech vlue s follows Are- trievl Boolen function, or shortly retrievl function, is k-vrile min-term, where k = dlog 2 jaje = 2 in this exmple If vlue v is encoded s ( i 2 f g i = ), then the retrievl function for v is dened s x x, where x i = IB i, if i =, otherwise x i = negtion of IB i, ie, x i = IB i For the ove exmple, the retrievl functions for, nd c should e f = IB IB f = IB IB nd f c = IB IB, where x denotes the negtion of the vrile x, xy denotes (x AND y), nd x + y denotes (x OR y) If we wnt toselect dt where A = OR A =, then we simply pply n OR opertor on f nd f, ie, f + f = IB IB + IB IB, which cn e further reduced to IB In other words, to retrieve tuples with A = OR A =, we simply use the inverse of the itmp vector IB nd the 's indicte those tuples stisfying the selection conditions We dene the encoded itmp index s follows: Denition 2 (Encoded Bitmp Index) Given tle T = ft ::: t n g, where t j is tuple of T (j = ::: n), let A e n ttriute of T, denoted y T:A, nd the domin of A e f ::: m g Then, n encoded itmp index, B A, on T:A is set of itmp vectors f IB k; ::: IB g, one-to-one mpping (M A : A! fh k; ij i 2 f g i = ::: k; nd k = dlog 2 meg) nd set of retrievl Boolen functions (ff ::: f mg) The itmp vectors re dened s follows 8 IB i (i = ::: k;), t j (j = ::: n), 3 IB i [j] = if M A (t j :A)[i] = else IB i [j] = where IB i [j] denotes the j-th it of IB i nd M A (t j :A)[i] the i-th it (from LSB to MSB) of M A (t j :A) In ddition, 8 2 f ::: m g, the retrievl function for, f, is k-vrile min-term (fundmentl conjunction) x k; x, where x i = IB i, if M A ()[i] =, otherwise x i = IB i (i = ::: k;) MAINTENANCE OF ENCODED BITMAP INDEXES As dt is updted, the encoded itmp indexes need to e mintined We discuss the mintennce for updtes without domin expnsion nd updtes with domin expnsion Updtes Without Domin Expnsion Following the exmple ove, if tuple with A = is ppended to tle T, then we only need to ppend IB [j] = nd IB [j] = t the end of itmp vectors IB nd IB, where j is the position of the new inserted tuple in tle T Updtes With Domin Expnsion If tuple with A = d is ppended to T, ie, the domin of A is now expnded to f c dg, then the following eqution should e rst tested: dlog 2 ja (m;) je = dlog 2 ja (m) je () where ja (m;) j denotes the crdinlity ofa efore insertion, nd ja (m) j denotes the crdinlity ofa fter insertion If Eqution () is true, s is the cse in our exmple, then dd the mpping M A (d) =into the mpping tle nd set IB i [j] = M A (d)[i] (where i = ::: k; ndj = the position of the new inserted tuple in T), s Figure 2() shows, nd set f d = IB IB If nother tuple with A = e is further ppended to T, ie, the domin of A is now expnded to f c d eg, then dlog 2 ja (m;) je < dlog 2 ja (m) je The resulting itmp vectors nd the mpping tle re shown in Figure 2() The following ctions need to e tken to reect the chnge to the encoded itmp index Expnd the mpping M A : faja 2f c dgg! fh ij i 2 f g i = g to M A : faja 2 f c d egg! fh 2 ij i 2f g i= 2g 2 Add itmp vector IB 2 to B A, nd set IB 2 to 3 Set IB i [j] =M A (e)[i], where i = 2ndj = the position of the new inserted tuple in T 4 Add the Boolen function f e = IB 2 IB IB for the vlue e nd revise the Boolen functions for,, c nd d y ANDing IB 2 to them, ie, f = IB 2 IB IB f = IB 2 IB IB f c = IB 2 IB IB nd f d = IB 2 IB IB A generl lgorithm for mintining the encoded itmp indexes with respect to oth types of updtes cn e found in [8] Some questions which still need to e clried in the encoded itmp indexing re the representtions for tuples, which re deleted or non-existing, or tuples with NULL vlues A simple wy of solving these prolems is to dd itmp vectors, B NotExist nd B NULL, indicting the non-existing (or deleted) tuples nd the tuples with NULL vlues, y setting the corresponding it to Another method is to ssign the non-existing tuples nd the tuples with NULL vlue rticil key vlues, nd to encode these vlues together with the other key vlues Intuitively, the second method is expected to perform etter, since it reduces the numer of itmp vectors which need to e ccessed while processing

T: ::: A ::: c d IB IB () No itmp vector is inserted Mpping Tle c d T: ::: A ::: c d e Figure 2: Updtes with domin expnsions IB2 IB IB () A new itmp vector is inserted Mpping Tle c d e queries In the ove exmple, the domin of ttriute A, which is to e encoded, should e considered s fnotexist NULLg[f c d eg The ssignment of the encoded vlue for nonexisting (void) tuples is ritrry Nonetheless, we suggest to reserve the vlue for non-existing tuples for the ske of performnce For the ove exmple, if we encode fnotexist NULL c d eg s f (2) (2) (2) (2) (2) (2) (2) g, then, for the selection condition, \A IN fnull cg", the retrievl Boolen expression will e ( IB 2 IB IB +IB 2 IB IB +IB 2 IB IB +IB 2 IB IB ) IB 2 IB IB, where the lst term restricts the selections only to those existing tuples The expression will e reduced to (IB 2 IB + IB 2 IB )( IB 2 + IB + IB ), which will e further reduced to (IB 2 IB + IB 2 IB ) It results in n expression, s if we did not tke the term, IB 2 IB IB, into considertion It is ecuse ll tuples with nyof the three itmp vectors, ie, IB 2, IB nd IB, set to, exist The following theorem certies our suggestion to reserve for non-existing tuples Theorem 2 Let void tuples of tle, T, e encoded s Given ny selection on ttriute A of T on ny suset of existing tuples, the corresponding retrievl Boolen expression, f (A) AND fvoid,cn e reduced tof (A), ie, ignoring the selection condition on the existing tuples In other words, in such n encoding, ny selection on ny suset of non-void tuples cn e evluted without tking the function, f void, into considertion Therefore, it reduces the processing time, while in simple itmp indexing, the existence it vector must e lwys ANDed to the resulting it vector to hve the - nl itmp for selection For the proof of Theorem 2 plese refer to [8] THE ENCODING In Denition 2, we hve dened tht n encoded itmp index includes set of itmp vectors, oneto-one mpping nd set of retrievl functions As the nme suggests, the domin of the indexed ttriute is encoded y the mpping So fr, we did not mention how to dene this mpping nd how it would ect the performnce of query processing We will dene well-dened encoding for the improvement of performnce next Let us rst stte the ide of well-dened encoding y the following exmple Given n ttriute A with the domin f c d e f g hg nd it is known tht tuples with A in f c dg, or fc d e fg re likely to e ccessed together Then, if we dene the mpping s Figure 3() shows, to select tuples with conditions \A IN f c dg" or \A IN fc d e fg", only one itmp vector is needed to e ccessed for ech cse For \A IN f c dg", the retrievl Boolen function is IB 2 IB IB + IB 2 IB IB + IB 2 IB IB + IB 2 IB IB, which cn e reduced to IB, nd s for \A IN fc d e fg", the retrievl Boolen function is IB 2 IB IB +IB 2 IB IB + IB 2 IB IB +IB 2 IB IB = IB Mpping Tle c g e d h f Mpping Tle c d g h e f Mpping Tle c g e d h f () ( ) () Figure 3: Exmples of proper nd improper mppings In contrst, suject to the two selections ove, the mpping in Figure 3() is not well-dened The retrievl functions for \A IN f c dg" nd \A IN fc d e fg" re IB 2 IB + IB 2 IB + IB IB nd IB IB + IB 2 IB + IB 2 IB, respectively, nd they cnnot e further reduced, ie, toevlute the two selections three itmp vectors must e red insted of one The ide is tht, y well-dened encoding (with respect to certin selection conditions), the numer of itmp vectors ccessed is minimized, s result shortening the processing time Before going to the denition of well-dened encoding, let us rst dene inry distnce nd chin Denition 22 (Binry Distnce) Given two i-

nry numers, x nd y, the inry distnce of x nd y is function, (), dened y(x y) =Count(x y), where Count(z) is function which returns the numer of its in z, nd is the itwise XOR opertion For exmple, if = (2) nd = (2), then the inry distnce of nd is ( ) = Denition 23 (Chin) Given set of distinct inry numers, s = fc ::: c n; g (n 2) A chin in s is dened ssequence ons, sy <c o ::: c o n; >, such tht (c o i c oi+) = (i = ::: n ; 2) nd (c o n; c o) = Denition 24 (Prime Chin) Given set of distinct inry numers, s = fc ::: c n; g nd jsj =2 p (p 2 N [fg) A chin on s is sid to e prime chin, if 8 c i c j (i j = ::: n;), 3 (c i c j ) p For exmple, prime chin cn e dened on f g s < >, while no chin cn e dened on f g Now, we de- ne the well-dened encoding s follows Denition 25 (Well-Dened Encoding) Given is sudomin, s = fv ::: v n; g (n 2), of n ttriute A, nd let p = log 2 nc A mpping on ttriute A, M A : A! fh k; ij i 2f g i = ::: k; k = dlog 2 jajeg, is sid to e well-dened with respect to the selection \A IN fv ::: v n; g", if the following conditions re stised i) If n = 2 p, then there exists prime chin in fm A (v)jv 2 sg ii) If 2 p < jsj < 2 p+, nd jsj is even, then 9 s s js j = 2 p, such tht there exists prime chin in fm A (v)jv 2 s g, nd there exists chin in fm A (v)jv 2 sg, nd 8 v v 2 s 3 ; M A (v) M A (v ) p + iii) If 2 p < jsj < 2 p+, nd jsj is odd, then 9 s s js j = 2 p 3 there exists prime chin in fm A (v)jv 2 s g, nd 9 w 62 s ut w 2 A 3 there exists chin in fm A (v)jv 2 s [fwgg, nd 8 v v 2 s [fwg ; M A (v) M A (v ) p + Theorem 22 A well-dened encoding minimizes the numer of it vectors which need to e ccessed The proof of the theorem cn e found in [8] Oviously, Theorem 22 results in locl optimum, since Denition 25 tkes only one sudomin (or, one selection condition) into considertion Theorem 23 is revision of Theorem 22 for descriing the optimum reltive to set of selection conditions Theorem 23 Given set of (rnge) selection predictes on ttriute A, P (A) = fp ::: p n g, nd ech p i ( i n) corresponds to one sudomin of A, ie, s ::: s n The numer of it vectors which must e red while evluting the selection predictes is minimized, if the encoding on A is well-dened with respect to ll p i ( i n) Agin, the proof cn e found in [8] The sudomins, s ::: s n, re not necessrily disjoint, nd the optiml solution must not necessrily exist, or e unique In the ove exmple, oth the mppings in Figure 3() nd ( ) re optiml to oth selections, \A IN f c dg" nd \A IN fc d e fg" Awell-dened encoding is desirle for optimiztion ut not essentil An ecient lgorithm for nding well-dened encoding is needed, since the rute-force pproch hs complexity tht is n exponentil function of the crdinlityofthe ttriute nd the numer of selection conditions We hve explored some heuristics for nding well-dened encoding However, they re eyond the scope of this pper Second, intuitively, whether n encoding is well-dened is suject to the types of selections In Denition 25, we dene the well-dened encoding with respect to rnge selections of the form \Attriute IN f:::g" For other selection conditions, eg, \j < Attriute < i", we hve to redene the well-dened encoding, though, for discrete domins, conditions of the form \j < Attriute < i" cn e expressed s \Attriute IN f:::g" In the next susection, we give exmples of hndling rnge serches on numeric dt types Third, s stted ove, whether n encoding is well-dened is specic to selections As the selections chnge over time, model is needed to evlute when re-mpping is desirle, or how tomke use of don't-cre vlues in the encoding to optimize the performnce 3 23 Applictions nd vritions of encoded itmp indexing HIERARCHY ENCODING The wrehouse dt is usully modeled s str schem, which consists of one (or more) fct tle(s) nd some dimensions Hierrchies might exist in dimensions As Figure 4 shows, the dimension SALES- POINT of the sles dt cn e clssied into three ctegories (hierrchy elements) rnch, compny nd llince 3 For optimiztion of the retrievl Boolen expression, we might tke the don't-cre conditions into ccount For exmple, if we wnt to select dt with the selection condition A = OR A = c, then we consider the following two expressions: f + f c nd f + f c + f don't-cre, in the exmple in the eginning of Section 22, f don't-cre = IB IB Since f + f c =IB IB +IB IB =IB IB nd f + f c + f don't-cre = IB IB +IB IB +IB IB =IB +IB, for computers without hrdwre implementtion of itwise XOR opertion, we mightuse IB +IB to retrieve the dt

TIME yer seson month dy GEOGRAPHIC rnch compny city llince zone region SALESPOINT Figure 4: Hierrchies long dimensions Suppose tht wehve 2 rnches { f 2 3 ::: 2g, 5 compnies { f c d eg, nd 3 llinces { fx Y Zg Some rnches elong to compny, nd some compnies form n llince, eg, rnches f 2 3 4g elong to compny, rnches f5 6g elong to compny,, compnies f cg form the llince X, nd so on, s Figure 5() shows compny c d e llince X Y Z rnches f 2 3 4g f5 6g f7 8g f3 4 9 g f9 2g compnies f cg fc dg fd eg Mpping Tle 2 3 4 5 6 7 8 9 2 ()Memers of hierrchy elements ()Hierrchy { compny nd llince encoding Figure 5: SALESPOINT hierrchy nd its encoding Note tht some compnies join two dierent llinces In the rel world, the reltionships etween hierrchy elements re not necessrily : N, they could lso e m : N s is the cse in the ove exmple One essentil opertion of OLAP is the mnipultion long dimensions [7], eg, roll-ups/drill-downs, dt nlysis long dimension hierrchies All these opertions re sed on selections on dimensions, or on dimension elements, eg, selecting sles dt of ll compnies in llince Z Therefore, dt of the sme dimension hierrchies is very likely to e ccessed together in the DW environment The ide of hierrchy encoding is to uild encoded itmp indexes with respect to selections on hierrchy elements For the ove exmple, the domins of hierrchy elements, \compny" nd \llince", re f c d eg nd fx Y Zg, respectively, nd the set of selection predictes on either \compny" or \llince" will e P = f compny=i ji 2 f c d egg [ f llince=j jj 2 fx Y Zgg A well-dened encoded itmp index with respect to P, s Figure 5() shows, is optimized for selections long dimension elements, \compny" or \llince" For exmple, for selection \llince = X", only one it vector is ccessed This ide cn e further extended to uild groupset index using encoded itmp indexes A groupset index corresponds to the Group-By cluses in users' queries Becuse of the limittion of spce, we do not further discuss this cse TOTAL-ORDER PRESERVING ENCODING Another type of rnge selection, such s \j < Attriute <i", is performed on numeric or ordinl type of ttriutes Numeric or ordinl types hve specil property, nmely, there exists totl-order reltion in their domin As result, if the encoding in encoded itmp indexes destroys the totl-order reltion, then selections in form of \j < Attriute <i" must e rewritten to ones in form of \Attriute IN f:::g" An encoding which preserves the totl-order property of the ttriutes is clled totl-order preserving encoding A simple totl-order preserving encoding is the internl representtion of integers in computers, eg, \8" is encoded s \", \7" s \" If we dene the encoding s the internl representtion of computers, the resulting encoded itmp index is set of it slices of the originl ttriute In [], O'Neil nd Quss dened such n index s it-sliced index nd proposed lgorithms for evluting some query types directly from the it-sliced index Mpping Tle 2 3 4 5 6 Figure 6: Totl-order preserving encoding However, it-sliced index is not the only nswer to numeric (or ordinl) ttriutes For exmple, given n ttriute A with domin f 2 3 4 5 6g, where there exists totl-order in A, ie, < 2 < 3 < 4 < 5 < 6 In ddition, tuples with A in f 2 4 5g re usully ccessed together The mpping in Figure 6 preserves on one hnd the totlorder property, nd on the other hnd, is optimized for the selection \A IN f 2 4 5g" RANGE-BASED ENCODING A possile vrition of encoded itmp indexing is to use it for rnge-sed indexing Becuse of spce limittions, insted of giving forml denition of rngesed encoded itmp indexes, we demonstrte the ide y simple exmple Given n ttriute A with the domin 6 A < 2 A 2 N Suppose tht the following rnge selections re pre-dened y the end users \6 A<", \8 A < 2", \ A < 3" nd \6 A < 2" According to the predened selections, the domin of ttriute A should rst e divided into 6 disjoint prtitions, s Figure 7 shows Next, we encode the set of intervls f[6,8), [8,), [,2), [2,3), [3,6), [6,2)g s Figure 8() shows

- - -- - - 6 8 2 3 6 2 Figure 7: Pre-dened rnges Then, for exmple, for rnge selection \8 A<2", the retrievl function is IB 2 IB IB +IB 2 IB IB, which cn e reduced to IB IB The (reduced) retrievl functions for ll predened rnge selections re listed in Figure 8() Mpping Tle [6,8) [8,) [,2) [2,3) [3,6) [6,2) 6 A< : IB 2 IB 8 A<2 : IB IB A<3 : IB2 IB 6 A<2 : IB2 IB ()Rnge encoding ()Retrievl functions Figure 8: Rnge-sed encoded itmp index If the rnges of selections re not pre-denle, or the rnges re so evenly scttered on the ttriute domin (which will result in mny -element disjoint prtitions), then the rnge-sed itmp index will reduce to n encoded itmp index on set of single vlues, insted of on set of rnges 3 Performnce Anlysis: Anlyticl Approch In Section 2, we hve discussed the dvntges of simple itmp indexes over B-trees in the DW environment under some restrictions, nd proposed encoded itmp indexing to compenste for the limittions of simple itmp indexing In the following, we compre encoded itmp indexing with simple itmp indexing By showing the dvntges of encoded itmp indexing over simple itmp indexing, the dvntges of encoded itmp indexing over B-trees cn e inferred 3 Compring encoded itmp indexing with simple itmp indexing The spce requirement of uilding oth simple nd encoded itmp indexes is jtjh ytes, nd the time 8 complexity is O(jTj h), where h is the numer of itmp vectors In ddition, the time complexity for mintennce with respect to updtes without domin expnsion is O(h) for oth simple nd encoded itmp indexing As for updtes with domin expnsion, the time complexity iso(jtj) + O(h) for simple itmp indexing, nd etween O(h) nd O(jTj) +O(h) for encoded itmp indexing The min dierence is tht for simple itmp indexing, h = jaj, while for encoded itmp indexing, h = dlog 2 jaje Oviously, jaj > dlog 2 jaje, for ll jaj > nd jaj 2 N jaj dlog 2 jaje, if the crdinlity ofa is lrge Besides, the sprsity of simple itmp indexes is on verge m; m, where m is the crdinlity of the indexed ttriute, while the sprsity of encoded itmp indexes is out (independent ofm) 2 As result, uilding/mintining encoded itmp indexes is more economicl thn uilding/mintining simple itmp indexes, s the crdinlity of the indexed ttriute increses However, mintennce cost is not the only fctor when evluting the performnce of indexes We should lso compre the complexity of query processing with respect to oth itmp indexings For oth itmp indexing techniques, the complexity is function of the numer of itmp vectors which re ccessed nd the numer of logicl opertions performed on the itmps 4 Following the exmple t the eginning of Section 22, suppose tht we hve n ttriute A, with domin f cg Both simple itmp index nd encoded itmp index re uilt on A, s Figure shows Consider the following two queries: Q: SELECT A Q2: SELECT A FROM T FROM T WHERE A = WHERE A in f g If the simple itmp index is used, then (IB ) nd (IB OR IB ) re used for retrieving tuples for Q nd Q2, respectively If the encoded itmp index is used, then ( IB IB ) nd ( IB ) re used to select the tuples for Q nd Q2, respectively Generlly speking, for single vlue selection, simple itmp indexing performs etter thn encoded itmp indexing However, for rnge serches, especilly lrge rnge serches, encoded itmp indexing performs etter thn simple itmp indexing As the ove exmple shows, for single vlue selection (Q), one itmp vector is ccessed if simple itmp indexing is used, while two itmp vectors re ccessed if encoded itmp indexing is used In contrst, for rnge serch (Q2), one itmp vector is ccessed if encoded itmp indexing is used, while two itmp vectors re ccessed if simple itmp indexing is used Let c s nd c e denote the numer of itmp vectors ccessed y simple itmp indexing nd encoded itmp indexing, respectively Oviously, c s jaj nd c e dlog 2 jaje For simple itmp indexing, c s =, where denotes the size of the intervl of the rnge serch nd jaj For exmple, =2in 4 Compring with the disk ccess costs, it is resonle to ignore the CPU time needed for performing logicl opertions, such sand, OR In ddition, in the following discussion, we consider the numer of itmp vectors which must e ccessed for query processing using encoded itmp indexing s the numer of itmps fter performing logicl reduction on the retrievl Boolen expressions, eg, if the retrievl Boolen expression is IB IB +IB IB,thenitisrstreducedto IB, nd the numer of itmps which need to e ccessed is considered s one

Q2 For encoded itmp indexing, c e is function of, where jaj, nd of the distriution of selected vlues For worst cses, c e = dlog 2 jaje For est cses, c e = dlog 2 jaje ;,where is the numer of itmp vectors reduced y performing logicl reduction (For detils plese refer to Property 3 in [8]) From the ove discussion, we cn see tht c e <c s, if > log 2 jaj + In ddition, the cost of processing simple itmp indexes is liner function of, while the cost of processing encoded itmp indexes is upper-ounded y step function dlog 2 jaje In other words, the encoded itmp indexes perform stly, even when is lrge, while simple itmp indexes degrde reltively fst Figure 9() nd () depict c e nd c s with jaj = 5 nd, respectively (c e is clculted ccording to Property 3 in [8], ie, of the est cses For worst cses, c e = dlog 2 jaje, nmely, c e = 6 in Figure 9(), nd c e = in Figure 9(), which re still much less thn c s ) No of Bitmp Vectors Accessed No of Bitmp Vectors Accessed 5 45 4 35 3 25 2 5 5 4 2 Performnce Anlysis: Anlyticl Approch Simple Bitmp Index (Cs) Encoded Bitmp Index (Ce) Worst Cse (Ce_w) Cs=Ce=[log A ]=6 Ce Cs Worst Cse, Ce_w=6 5 5 2 25 3 35 4 45 5 Intervl of Rnge Serches ()jaj = 5, dlog2 jaje =6 Performnce Anlysis: Anlyticl Approch Simple Bitmp Index (Cs) Encoded Bitmp Index (Ce) Cs Ce 2 4 6 8 Intervl of Rnge Serches ()jaj =, dlog2 jaje = Figure 9: Performnce nlysis Figure depicts the numer of it vectors required for uilding oth simple nd encoded itmp indexes with respect to the crdinlity of indexed ttriutes Agin, the spce requirement of simple itmp index is liner to the crdinlity of the ttriute, while tht of n encoded itmp index is logrithmic to the crdinlity of the ttriute 32 Worst cse nlysis Even for the worst cse scenrio, encoded itmp indexing performs etter thn simple itmp indexing, No of Bitmp Vectors Required 52 256 28 Spce Requirement: No of Bit Vectors Simple Bitmp Index Encoded Bitmp Index 52 24 536 248 Crdinlity of Indexed Attriutes Figure : Spce requirements if >log 2 jaj, sfigure9()shows Two resons tht might led to such ehvior re discussed elow Improper Encoding Given selection,, if the encoding ws not well-dened with respect to, for the worst cses, the numer of it vectors, which must e ccessed in query processing, is dlog 2 jaje An extreme cse will e tht, for ll types of selections there does not exist ny selection, such tht the encoding is well-dened The line c e w = 6 in Figure 9() depicts the extreme cse The rtio etween the res under the curve of the est cse nd the line c e w = 6 denotes the verge enet gined from well-dened encodings The rtio for the cse in Figure 9() is :84, ie, 6% sving of the processing cost is gined, nd the rtio for the cse in Figure 9() is :9, ie, % sving of processing cost is gined Note tht the ove clcultion did not tke the frequencies of selection types into considertion The verge svings re not very lrge in mgnitude, so tht hving well-dened encoding is desirle ut not essentil For specic situtions, the sving could e up to 83% (for the cse where = 32 in Figure 9()), or even up to 9% (for the cse where = 52 in Figure 9()) Logicl Reduction Awell-dened encoding only mkes sense together with the logicl reduction of the retrievl functions The complexity of performing logicl reduction using rute-force method is, however, exponentil to the numer of it vectors For the performnce gin from well-dened encoding, we hve to py the price of nding well-dened encoding nd the cost of the logicl reduction in exchnge We do not think it is unfesile, though the complexities for oth nding well-dened encoding nd performing logicl reduction re exponentil to the prolem size We hve explored some heuristics to solve the prolem, ut discussion of these preliminry results is eyond the scope of this pper Another strightforwrd, ut eective, pproch will e: since the rnges of selection predictes re pre-denle

(well-dened encodings re suject to predened selections), the retrievl functions for ll the predened predictes cn lso e reduced y humn experts, nd e veried with ssistnce of computers Furthermore, the cost for nding well-dened encoding is one-time cost, unless dynmic re-encoding is desired, which is lso eyond the scope of this pper The ove nlysis shows tht, for rnge serches, encoded itmp indexes perform essentilly etter nd more stle, even when the rnge of selection increses For single vlue selections, encoded itmp indexing is second to simple itmp indexing However, ccording to TPC-D [4], from 7 query types, 2 query types involve rnge serch(they re Q, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q, Q2, Q4 nd Q6) Therefore, we elieve tht the encoded itmp indexing cn ply n importnt role for improving query processing in the DW environment In ddition, the worst cse nlysis shows tht even if the est cses cnnot e reched, the worst-cse performnce of encoded itmp indexes in rnge serches is still etter nd more stle thn tht of simple itmp indexes In prctice, in order to improve oth the performnce of B-trees nd simple itmp indexes, hyrid method is implemented, ie, insted of storing tupleids (vlue-lists) t the lef-nodes of B-trees, itmp vectors re stored As the sprsity increses (one consequence of high crdinlity), the it vectors re expressed s vlue-lists However, when the crdinlity is very high (exctly the circumstnce where encoded itmp indexing is well-suited), the hyrid method might degrde to pure B-tree In such cses, the coopertivity of itmp indexing in the hyrid method is lost 4 Other Indexing Techniques In this section, we discuss some other indexing techniques which re proposed in the literture for DWs Projection Indexing In [], projection index is dened s mteriliztion of ll vlues of n ttriute in the tuple-id order We cn tret projection index s n encoded itmp index, where the mpping tle is simply the tle of internl codes, ie, the mpping function is M A () = k; where k is the length in its of the internl representtion of ttriute A, nd i is the i-th it (from LSB to MSB) of the ttriute vlue (i = ::: k;) One dierence etween projection index nd n encoded itmp index, which uses the tle of internl codes s the mpping tle, is the physicl storge lloction A projection index stores the vlues horizontlly, while n encoded itmp index stores the vlues verticlly A projection index stores the its of vlue continuously, while n encoded itmp index stores the its of the sme position of dierent vlues continuously, which resemles the physicl storge lloction of it-sliced indexes Bit Slicing In [], it-sliced index is dened s set of itmp slices which re orthogonl to the dt held in projection index In other words, itsliced index is itwise verticl prtition of projection index Bit-sliced indexes re suitle for numeric (xed-point) or ordinl ttriutes, nd re especilly good for wide-rnge serches Bit-sliced indexing with non-inry se ws lso introduced in [] Bit-sliced indexes cn lso e treted s specil cses of encoded itmp indexing They re encoded itmp indexes with totl-order preserving encoding, which istriv- illy the internl representtion of xed-point numers Both projection indexes nd it-sliced indexes re comprle to the trnsposed les [6] A trnsposed le stores one column from min tle in row, nmely, one row per trnsposed column A projection index stores only one trnsposed column, nd it slice is trnsposed representtion of column of its from the sme it position of n ttriute Vlue List Indexes Trditionlly, vlue list index stores key vlues nd list of tuple-ids for ech key vlue A vlue list index cn e structured s B-tree, or simply s n inverted le Ahyrid indexing using simple itmp indexes nd vlue list indexes ws sid to resolve the prolems of sprsity in simple itmp indexes cused y high crdinlity domins The B-tree structure is rst used to index the key vlues, nd t the lef nodes simple itmp vectors re stored However, if one it vector is too sprse, list of tuple-ids, insted of it vector, is stored Acontrdiction rises: B-tree is ecient for rndom ccess, if the numer of key vlues is lrge However, if the numer of key vlues is lrge, ie, the crdinlity of the indexed ttriute is lrge, then the prolem of sprsity is more severe As result, insted of it vectors stored t the lef nodes, vlue lists re stored Then, the so-clled hyrid index reduces to B-tree On the other hnd, if the crdinlity of the indexed ttriute is very smll, the enet of uilding B-tree on top of the itmp vectors is lso smll In [], the uthors hve proposed lgorithms for evluting some ggregte functions nd rnge selections directly on projection indexes, it-sliced indexes nd vlue list indexes The rnge selection predictes considered in [] ws only of the form \i <A<j", while in our pper, we hve generlized the cses, y tking oth in-lists nd conventionl rnge pred-

ictes into considertion For the specil cses of numeric/ordinl ttriutes, if the encoding is totl-order preserving, the lgorithms proposed y O'Neil nd Quss re lso pplicle to the encoded itmp indexes Slight chnges might, however, e required Group-Set Indexes Group-By opertions re often used for grouping the results of queries for etter understnding nd nlyzing A groupset itmp index ws introduced in [] to select tuples which stisfy the group-y condition The proposed groupset itmps fce the sme sprsity prolem of simple itmp indexes Some other pproches, such s clustering, or segmenttion, cn help to process Group-By opertions more eciently However, clustering cn e performed ccording to only one selection condition or one grouping condition Therefore, secondry indexes re needed An eligile cndidte for group-set indexing will e the encoded itmp index If we hd 3 ttriutes in the Group-By cluse, nd the crdinlities of the ttriutes re, 2, 5, respectively Then, the numer of ll possile comintions will e 7,whichmens 7 it vectors if simple itmp indexing is used, nd only 2 it vectors if encoded itmp indexing is used 5 Furthermore, if hierrchy encoding (discussed in Section 23) is pplied, groupset indexes cn e dynmiclly clculted t run-time, which results in more exiility, since it is not fesile to pre-compute ll possile Group-By comintions if the numer of dimensions is lrge Dynmic Bitmps Dynmic itmps re uilt dynmiclly from high crdinlity ttriutes [3] If there re n dierent vlues in the ttriute domin, they re encoded onto n (log 2 n)-it continuous inry integers Dynmic itmps re specil cses of encoded itmp indexes, where the encoding trivilly mps the domin onto continuous integer set The signicnce of encoding ws not discussed in dynmic itmps Rnge-Bsed Indexing A dynmic rnge-sed itmp indexing for high crdinlity ttriutes with skew ws proposed in [9] The ide is to prtition the domin into some equl popultion susets, nd simple itmp vectors re constructed, one for ech suset In tht work, the uthors lso took the distriution of the ttriutes into considertion In Section 23, we hve lso introduced similr ide of uilding rnge-sed indexes using encoded itmp indexing The twopproches dier from ech other in 5 Nturlly, in this prolem, the density of the products of the dimensions should lso e considered, eg, lthough there re 7 comintions, there might onlye 6 meningful comintions, ie, the density is only % the following spects: () In [9], prtition is done y distriution of the ttriute vlues, while we propose to prtition ccording to pre-dened rnge selections (2) In [9], Wu nd Yu investigte how to dynmiclly djust the prtition of the rnges to lnce the popultion of ll uckets with respect to the distriution of ttriute vlues However, we do not hve the prolem of imlnce Becuse we use the predened selection predictes to prtition the ttriute domins, the retrievl functions will, therefore, exctly mtch the desired tuples Even in the cses tht selection predictes re not pre-denle, or the predictes result in very lrge numer of smll prtitions, encoded itmp indexing cn hndle much lrger numer of smll prtitions thn simple itmp indexing cn do As mtter of fct, if the rnges of the selections re not pre-denle, rnge-sed indexes do not mke ny sense In this cse, we propose to use n encoded itmp index with totl-order preserving encoding, such thtny rnge selection predictes cn e eciently evluted directly on the it vectors Other Techniques Other indexing techniques for the wrehouse environment include multidimensionl B-trees [8, 4], compression techniques (eg, run-length) for simple itmp indexes, hierrchicl indexes [6, 7], join indexes [5, ] nd multidimensionl indexing for sptil dt [2] Index techniques used in Syse IQ, Red Brick Wrehouse nd Orcle re discussed in [3] 5 Concluding Remrks nd Future Work We introduced encoded itmp indexing for the DW environment The merits of this technique re: It inherits the good properties of simple itmp indexing, such scoopertivity of dierent itmp vectors, low cost of construction nd mintennce, nd low processing cost 2 It diers from simple itmp indexing in encoding Becuse of encoding, it solves the prolems of sprsity, t the sme time, improves the spce utiliztion, shortens the mintennce nd processing time, nd lso improves the performnce of processing rnge serches Most of ll, the crdinlity of the indexed ttriute hs no longer drmticl eects on the mintennce nd processing cost of the encoded itmp indexes 3 With customized denitions of encodings, the encoded itmp indexes re suitle for nd cple of (ut not limited to) indexing OLAP dt We hve discussed some of its pplictions, such s the hierrchy encoding for indexing dimensions with hierrchies, totl-order preserving encoding for numeric/ordinl ttriutes, rnge-sed encoded itmp indexes, etc Theorems were derived for identifying the properties

for well-dened encoding with respect to given set of predened selection predictes Under this encoding, the numer of it vectors, which must e ccessed in query processing, is minimized We hve given comprtive performnce nlysis of oth simple nd encoded itmp indexes using n nlyticl pproch The result is stisfctory nd shows tht s the crdinlity nd the rnge of selections increse, encoded itmp indexes perform etter nd more stle thn simple itmp indexes (even if the est cses descried y the theorems 22 nd 23 cnnot e reched) There re still some prolems to e solved First, n ecient lgorithm for logicl reduction of the retrievl Boolen functions is needed Second, n ecient lgorithm for nding well-dened encodings is required to tke full dvntges of optimiztion Third, forp- pliction domins where the set of predened selection predictes chnges over time, model for evluting the cost-eectiveness of reconstruction of the encoded itmp indexes is desirle Fourth, if selection predictes re not predictle, proper encoding is, however, chievle through n nlysis of the history of users' queries In other words, in such n environment, dt mining might e pplied for nding good encoding Fifth, in the text, we hve concentrted on how rnge selections re evluted directly on the encoded itmp indexes, since selections re the very sic opertion for other opertions However, in ddition to rnge predictes, some ggregte functions, or opertions cn lso e evluted directly on the itmps, such s sum(), verge(), medin, N-tile, column-product ggregtions, joins, etc Algorithms for performing these functions, or opertions using encoded itmp indexes, though of no diculty, must e dened References [] J-H Chu, G Knott, An Anlysis of B-Trees nd Their Vrints, Informtion Systems, Vol 4, No 5, 989 [2] D Comer, The Uiquitous B-Tree, Computing Surveys, Vol, No 2, 979 [3] H Edelstein, Technology Anlysis: Fster Dt Wrehouses, Informtion Week, Dec 4, 995 [4] M Freeston, A Generl Solution of the n-dimensionl B- tree Prolem, SIGMOD Conf, Sn Jose, CA, 995 [5] J Gry, A Reuter, Trnsction Processing: Concepts nd Techniques, Morgn Kufmnn, 993 [6] T Johnson, D Shsh, Hierrchiclly Split Cue Forests for Decision Support: description nd tuned design, TR 727, NYU, le://csnyuedu/pu/tech-reports/, Nov996 [7] T Johnson, D Shsh, Some Approches to Index Design for Cue Forests, Bulletin of the Technicl Committee on Dt Eng, Vol 2, No, Mr 997 [8] H Leslie, R Jin, D Birdsll, H Yghmi, Ecient Serch of Multidimensionl B-Trees, VLDB Conf, Zurich, Switzerlnd, 995 [9] P O'Neil, Model 24 Architecture nd Performnce, Springer-Verlg LNCS359, 2nd Intl Workshop on High Performnce Trnsctions Systems, Asilomr, CA, Sept 987 [] P O'Neil, G Grefe, Multi-Tle Joins Through Bitmpped Join Indices, SIGMOD Record, Vol 24, No 3, Septemer 995 [] P O'Neil, D Quss, Improved Query Performnce with Vrint Indexes, SIGMOD Conf, Tucson, AZ, My 997 [2] S Srwgi, M Stonerker, Ecient Orgniztion of Lrge Multidimensionl Arrys, ICDE, Houston, 994 [3] S Srwgi, Indexing OLAP Dt, Bulletin of the Technicl Committee on Dt Eng, Vol 2, No, Mr 997 [4] Trnsction Processing Performnce Council (TPC), TPC Benchmrk D, Decision Support, Stndrd Speciction Revision 2, Dec 5, 996 [5] P Vlduriez, Join Indices, ACM TODS, 2(2), June 987 [6] G Wiederhold, Dtse Design, 2nd Ed, McGrw-Hill Book Co, 983 [7] MC Wu, A Buchmnn, Reserch Issues in Dt Wrehousing, Dtennksysteme in Buro, Technik und Wissenschft, Editor: KRDittrich nd A Geppert, Springer Verlg, 997 [8] MC Wu, A Buchmnn, Encoded Bitmp Indexing for Dt Wrehouses, Tech Report DVS97-3, CS Dept, Technische Universitt Drmstdt, (http://wwwinformtiktudrmstdtde/dvs/st/wugermnhtml), July 997 [9] KL Wu, PS Yu, Rnge-Bsed Bitmp Indexing for High Crdinlity Attriutes with Skew, Reserch Report, IBM Wtson Reserch Center, My 996