SINCE its introduction, the data cube model [1] has found

Transcription

1 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE Practical Iferece Cotrol for Data Cubes Haibig Lu ad Yigjiu Li, Member, IEEE Abstract The fudametal problem for iferece cotrol i data cubes is how to efficietly calculate the lower ad upper bouds for each cell value give the aggregatios of cell values over multiple dimesios. I this paper, we provide the first practical solutio for estimatig exact bouds i two-dimesioal irregular data cubes (that is, data cubes i which certai cell values are kow to a sooper). Our results imply that the exact bouds caot be obtaied by a direct applicatio of the Fréchet bouds i some cases. We the propose a ew approach to improve the classic Fréchet bouds for ay high-dimesioal data cube i the most geeral case. The proposed approach improves upo the Fréchet bouds i the sese that it gives bouds that are at least as tight as those computed by Fréchet yet is simpler i terms of time complexity. Based o our solutios to the fudametal problem, we discuss various security applicatios such as privacy protectio of released data, fie-graied access cotrol, ad auditig, ad idetify some future research directios. Idex Terms Data cube, boud, iferece problem. Ç 1 INTRODUCTION SINCE its itroductio, the data cube model [1] has foud widespread applicatios i decisio support systems such as olie aalytic processig (OLAP), data warehousig [2], ad data miig [3]. A data cube ca be cosidered a high-dimesioal data abstractio that allows oe to view aggregated data at differet levels. Fig. 1 illustrates a data cube example with three feature dimesios aget, time, ad stock. The aggregatio measure of the data cube is the stock volume. I the core cuboid, each cell has a oegative value idicatig the stock volume bought by a particular aget at a particular time. Besides the core cuboid, the data cube cosists of three two-dimesioal (2D) cuboids (deoted by by stock ad aget, by aget ad time, ad by time ad stock, respectively), three oedimesioal (1D) cuboids (deoted by by stock, by aget, ad by time, respectively), ad oe zero-dimesioal cuboid (the grad total). These cuboids ca be computed by aggregatig the cell values i the core cuboid across oe or more dimesios. I geeral, a -dimesioal cube is associated with 2 cuboids. The various cuboids, except the core cuboid, are called star cuboids i this paper. We cosider the followig iferece problem i data cubes. Assumig that the core cuboid cotais sesitive iformatio about each cell but that oe of the star cuboids cotai sesitive iformatio, ca a data sooper ifer accurate sesitive iformatio about ay cell usig the osesitive iformatio provided i the star cuboids? I the above data cube example, each cell i the core cuboid shows which aget has bought which stock at what time ad i what volumes. Such iformatio ca be cosidered sesitive, as it reveals each aget s strategy for stock. H. Lu is with the Maagemet Sciece ad Iformatio Systems Departmet, Rutgers Uiversity, 10 Uiversity Aveue, Newark, NJ haibig@cimic3.rutgers.edu.. Y. Li is with the School of Iformatio Systems, Sigapore Maagemet Uiversity, 0 Stamford Road, Sigapore yjli@smu.edu.sg. Mauscript received 2 Jue 2006 accepted 17 Jue 2007 published olie 31 July For iformatio o obtaiig reprits of this article, please sed to tdsc@computer.org, ad referece IEEECS Log Number TDSC Digital Object Idetifier o /TDSC ivestmet. I may cases, the cell values i a core cuboid reveal private iformatio about idividuals. For example, i a patiet-treatmet cube, each cell idicates the umber of times that a patiet udergoes a certai treatmet (for example, for AIDS), which is highly sesitive i real life. I studet record maagemet, each cell i the data cube shows the grade a studet received for a particular course. The sesitive iformatio i these cases should ot be released to the public. However, although the data i a core cuboid must be protected, the aggregated iformatio i a star cuboid is cosidered osesitive i most cases. Thus, the star cuboids ca usually be provided to the public for statistical aalysis, data miig, ad OLAP services. The iferece problem exists sice aggregatios do ot completely protect the sesitive iformatio [4]. It is possible for data soopers to use the remaiig vestiges, together with exteral kowledge, to ifer sesitive iformatio i a core cuboid. Traditioal access cotrol [5] caot capture these ifereces, as the aggregatios themselves are seemigly iocet. However, limitig such malicious iferece of sesitive iformatio is a realistic cocer i practice, especially whe large data cube products such as a atioal cesus or survey are released. This cocer is demostrated by the US Departmet of Commerce requiremet that atioal statistical offices prevet uauthorized disclosure of sesitive subject-level data whe releasig aggregatios. To limit possible disclosure of sesitive iformatio i a data cube, we eed to kow how accurately a data sooper ca estimate the sesitive iformatio. I particular, we eed to kow how to calculate the lower ad upper bouds for each cell value give the aggregatio values i the star cuboids. This is the fudametal problem for iferece cotrol i data cubes. The lower ad upper bouds iduced by some fixed set of aggregatios are of great importace i measurig the disclosure risk associated with the release of aggregatios [6]. I recet years, the problem has become more acute i that applicatios of the data cube model eable olie ad query-based accesses to large-scale data sets. Eve atioal statistical offices are movig from periodic releases of static tabulatios to olie services that provide a large umber of users with dyamically updated /0/$25.00 ß 200 IEEE Published by the IEEE Computer Society

2 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 200 Fig. 1. A data cube example. data sets. The traditioal liear programmig approach to iferece cotrol would ot be efficiet i such a sceario. I this paper, we revisit the Fréchet bouds [7] to solve the iferece problem for 2D regular data cubes. The Fréchet bouds of a cell value are first prove to be exact lower ad upper bouds. We the propose the first practical solutio for estimatig exact bouds i 2D irregular data cubes, which are data cubes that cotai cell values that are kow to a sooper. Our results imply that the exact bouds caot be obtaied by a straightforward extesio of the Fréchet bouds i some cases. We the propose a ew approach to improve the classic Fréchet bouds for ay high-dimesioal data cube i the most geeral case. The proposed approach improves upo the Fréchet bouds i the sese that it gives o-looser bouds yet is simpler i terms of time complexity. Based o our solutios to the iferece problem, we discuss various security applicatios of our results icludig privacy protectio for released data, fie-graied access cotrol, ad auditig. The rest of this paper is orgaized as follows Sectio 2 formulates the iferece problem i data cubes. Sectio 3 argues why traditioal liear programmig is highly impractical for solvig the iferece problem. Sectio 4 solves the iferece problem i two dimesios based o the Fréchet bouds. Sectio 5 presets our ew approach to improve the Fréchet bouds i high-dimesioal data cubes. Sectio 6 discusses various security applicatios of our results. Sectio 7 reviews related work. Fially, Sectio cocludes this paper ad idetifies some future research directios. The appedices provide formal proofs for the theorems proposed i this paper. A six-page exteded abstract of this paper appeared i the 2006 IEEE Symposium o Security ad Privacy []. Added i this complete versio are all of the formal proofs ad detailed discussios, which are importat ad represet a major cotributio of this paper. 2 PROBLEM FORMULATION A -dimesioal data cube C is a collectio of cuboids, icludig a core cuboid ad star cuboids, across the spectrum of 1 dimesios to zero dimesio. Each dimesio i ð1 i Þ has d i idex values d i. The core cuboid is a -dimesioal array with i¼1 d i cell values. Let a t1 t 2 t be the value at cell ðt 1 t 2...t Þ, where 1 t i d i. There are m ð mþ-dimesioal star cuboids for data cube C, where 1 m. Each ð mþ-dimesioal star cuboid is a ð mþ-dimesioal array derived from the core cuboid by aggregatig the cell values alog m dimesios. The aggregatio fuctio is SUM. 1 Let fa þt2 t g be the ð 1Þ-dimesioal star cuboid derived by aggregatig the cell values alog the first dimesio. For ay meaigful t 2...t, we have a þt2 t ¼ P d 1 t 1 ¼1 a t 1 t 2 t. (There is o ambiguity whe þ is used i subscript it does ot mea a literal additio operatio.) Other star cuboids ca be deoted similarly. The iferece problem i data cube C is stated as follows give all ð 1Þ-dimesioal star cuboids, compute the lower ad upper bouds for each cell value a t1 t 2 t i the core cuboid. I mathematical terms, this ca be framed as follows compute the P lower ad upper bouds for each cell a t1...t such that di t 0 i ¼1 a t 0 ¼ a 1...t0 t 0 holds for ay 1 i ad for 1...t0 i1 þt0 iþ1...t0 ay meaigful combiatio of t t0 i1 t0 iþ1...t0. I the formulatio of the iferece problem, oly the aggregatios i ð 1Þ-dimesioal star cuboids are cosidered. The reaso is that the aggregatios i other star cuboids (that is, aggregatios of the cell values alog two or more dimesios) ca be easily derived from those aggregatios provided i the ð 1Þ-dimesioal star cuboids. A value a t1 t 2 t is said to be a lower boud of cell value a t1 t 2 t i data cube C if for ay possible core cuboid fa 0 t 1t 2t g from which the star cuboids of C ca be derived, the iequality a 0 t 1t 2t a t1 t 2 t holds. A value a t1 t 2 t is said to be the exact lower boud of cell value a t1t 2t i data cube C if 1) it is a lower boud ad 2) there exists a core cuboid fa 0 t 1 t 2 t g from which the star cuboids of C ca be derived ad the equality a 0 t 1 t 2 t ¼ a t1t 2t holds. A upper boud or exact upper boud a t1t 2t ca be defied similarly. A upper/lower boud a 0 of cell value a is said to be tighter (o tighter, respectively) tha aother upper/lower boud a 00 of the same cell value a if a 0 is closer (o closer, respectively) to the exact upper/lower boud of a tha a 00 otherwise, a 0 is said to be a o-looser (looser, respectively) boud i compariso with a 00, meaig it is at least as tight as a 00. Without loss of geerality, we assume throughout this paper that all cell values a t1 t 2 t i a data cube are oegative real umbers. If this is ot the case, oe ca add a appropriate costat positive value to all cell values so as to trasform the data cube to a data cube with oegative cell values. After a boud is computed for a trasformed cell value i the ew data cube, oe ca subtract the costat value from it i order to get the boud for the origial cell value. Note that i the statistical data protectio literature, a core cuboid with oegative iteger cell values is ofte called a cotigecy table. 3 THE IMPRACTICALITY OF USING LINEAR PROGRAMMING The exact bouds ½a t1 t 2 t a t1 t 2 t Š for ay cell value a t1 t 2 t i our iferece problem are the solutios to the followig two liear programmig problems (LPs) 1) a t1t 2t ¼ mi a t1t 2t, ad 2) a t1t 2t ¼ max a t1t 2t. Both are subject to liear costraits P d i t 0 i ¼1 a t 0 ¼ a 1...t0 t t0 i1 þt0 iþ1...t0 for ay 1 i ad for ay meaigful combiatio of t t0 i1 t0 iþ1...t0. The P i¼1 d 1 d i1 d iþ1 d 1. SUM ca be exteded to AVG provided that the umber of cells ivolved i aggregatio is kow.

3 LU AND LI PRACTICAL INFERENCE CONTROL FOR DATA CUBES 9 costraits defie a oempty covex feasibility set for the two LPs. Accordig to liear programmig theory, there exist optimal solutios a t1 t 2 t ad a t1 t 2 t, ad these solutios ca be computed i polyomial time. 2 We argue that liear programmig does ot scale sufficietly for solvig the iferece problem for realistic data cubes. Oe of the most efficiet algorithms for liear programmig is Karmarkar s algorithm [9], whose time complexity is OðN 35 LÞ, where N is the umber of variables, ad L is the umber of bits required to store the LP i a computer. I the LPs give above, N ¼ i¼1 d i, ad L ¼ Oð i¼1 d iþ. Thus, the time complexity of solvig each LP is Oðð i¼1 d iþ 45 Þ, which is prohibitive for processig realistic data cubes. This coclusio has also bee draw by Dobra et al. i [10] by showig a realistic data cube (14-dimesioal public survey table) with 4.5 billio cells. 4 TWO-DIMENSIONAL DATA CUBES I this sectio, we cosider the iferece problem for 2D data cubes. Based o the early work of Fréchet [7], it is well kow that the followig Fréchet bouds are exact for solvig the iferece problem. Statemet 1 (2D Fréchet bouds). Give two star cuboids fa þj g ad fa iþ g of 2D data cube C, the 2D Fréchet bouds for ay cell value a ij i C are maxf0a iþ þ a þj a þþ ga ij mifa iþ a þj g Theorem 4.1 (solutio to the iferece problem i two dimesios). Two-dimesioal Fréchet bouds are exact. A proof sketch of the above theorem was outlied by Cox i [11] via a steppig-stoe algorithm. A formal costructio proof is preseted i Appedix A. Compared with LP, the Fréchet bouds reduce the time complexity of computig the exact bouds of a cell value from Oððd 1 d 2 Þ 45 Þ to two additio/subtractio operatios ad two compariso operatios. Ufortuately, Theorem 4.1 may ot hold if some of the cell values are kow to a data sooper. This is show by a couterexample i Appedix B. A sooper may kow some of the cell values either because these values are osesitive ad, thus, ot protected or because the sooper has some exteral kowledge about these cells. For example, i a patiet-treatmet data set i which each cell idicates the umber of times that a patiet udergoes a certai treatmet, a sooper who is also a patiet would kow his or her ow cell value ad may also kow some of the other cell values for his or her patiet frieds. We ivestigate how to estimate the exact bouds i such a sceario. Assume that oe or more subcore-cuboids are kow to a sooper. A subcore cuboid is a subset of the cell values defied as fa ij j i 2 S 1 j2 S 2 g, where S 1 f1...d 1 g, ad S 2 f1...d 2 g. The, the iferece problem becomes the followig give all ð 1Þ-dimesioal star cuboids ad a collectio of subcore-cuboids, calculate the lower ad upper 2. If the core cuboid cosists of iteger couts, the LPs become iteger programmig problems (IPs). Sice the feasibility set of IPs is oempty ad fiite, there exist optimal solutios i this cotext as well. A IP usually takes a much loger time to solve tha the correspodig LP. Fig. 2. Fréchet upper boud is ot the exact upper boud i a irregular data cube. bouds for each cell value i the core cuboid i excess of the uio of the give subcore-cuboids. This problem is more geeric due to the modelig of uprotected cells ad/or a sooper s exteral kowledge. From the liear programmig poit of view, this problem is simpler tha the origial oe as it has fewer variables (the umber of costraits is ot ecessarily smaller). However, from the Fréchet bouds poit of view, this problem is more difficult to solve sice we eed to cosider what additioal iformatio a data sooper may obtai from the kow cells. To solve this problem, we trasform it ito a ormalized form. Let fa k g be the set of subcore-cuboids whose cell values are kow to a sooper. Let the core cuboid i excess of the uio of the subcore-cuboids be called the irregular core cuboid. Let the star cuboids derived from the irregular core cuboid be fa iþ P a ij 2[ k A a ijg, fa k þj P a ij 2[ k A a ijg, k ad a þþ P a ij 2[ k A a ij. Let the uio of the irregular core k cuboid ad the star cuboids derived from it be called the irregular data cube. The ormalized form of the iferece problem is described as follows Give all ð 1Þ-dimesioal star cuboids i a irregular data cube, calculate the lower ad upper bouds for each cell value i the irregular core cuboid. It is clear that the ormalized form is equivalet to the origial iferece problem. (After ormalizatio, the kow values ca be marked as zeros i the origial core cuboid. With o ambiguity, we still use fa iþ g, fa þj g, ad a þþ to represet the star cuboids i a irregular data cube after ormalizatio.) Cosequetly, the Fréchet bouds of a ij are still i the form of maxfa iþ þ a þj a þþ 0g a ij mi fa iþ a þj g i a irregular data cube. It is easy to verify that the Fréchet bouds after the ormalizatio are o looser tha the Fréchet bouds before the ormalizatio. Below, the Fréchet bouds i a irregular data cube always refer to those after the ormalizatio. Lemma 4.2 (exact lower boud i a particular irregular data cube). Give a 2D irregular data cube, if o cell values i row i or colum j are kow to a sooper, the the Fréchet lower boud of a ij is exact. A costructio proof of this lemma is provided i Appedix C. Note that the Fréchet upper boud of a ij may ot be exact. Fig. 2 illustrates a couterexample i which two zero values are kow to a sooper. I this example, the Fréchet upper boud of a 11 is 15, whereas the exact upper boud of a 11 is 11. I reality, certai cell values i row i or colum j may be kow to the sooper. The, Lemma 4.2 caot be applied for computig the exact lower boud. To improve the lower boud give i Lemma 4.2, we defie the compaio cuboid of a ij to be subcore cuboid A ij ¼fa t1 t 2 j a t1 ja it2 =2[ k A k g, where A k is a collectio of subcore-cuboids that are kow to a sooper. The, we have the followig

4 90 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 200 Theorem 4.3 (exact lower boud i a irregular data cube). Give a 2D irregular data cube, if the sum of all cell values i the compaio cuboid of a ij is kow to a sooper, the the Fréchet lower boud of a ij i the compaio cuboid is exact. A costructio proof is provided i Appedix D. Note that Theorem 4.3 caot be prove by a direct applicatio of the Fréchet bouds for two reasos 1) i the compaio cuboid of a ij, oe may ot kow all aggregatios except a iþ ad a þj, ad 2) there could be some cells iside the compaio cuboid that are kow to a data sooper. It might be possible that the first reaso gives a sooper less iformatio, whereas the secod reaso gives more. Regardless of these reasos, Theorem 4.3 asserts that the Fréchet lower boud is still exact. I the case that the sum of all of the cell values i the compaio cuboid of a ij is ot kow to a sooper, the Fréchet lower boud i the compaio cuboid caot be computed by the sooper. We will show that the Fréchet lower boud i the compaio cuboid is at least as tight as the exact lower boud of a ij. A iferece auditor who has access to all of the cell values ca always calculate the grad total of the compaio cuboid ad use the Fréchet lower boud i the compaio cuboid to estimate the exact lower boud. Theorem 4.4 (o-looser estimate of the exact lower boud i a irregular data cube). Give a 2D irregular data cube, if the sum of all cell values i the compaio cuboid of a ij is ukow to a sooper, the the Fréchet lower boud of a ij i the compaio cuboid is at least as tight as the exact lower boud of a ij. A formal proof is give i Appedix E. Although Theorem 4.4 gives a o-looser estimate of the exact lower boud i ay irregular data cube, we ow propose a otighter estimate of the exact lower boud. First, we defie two compaio sums of a ij to be c 1 ij ¼ P t 2 fa þt2 j a it2 =2[ k A k g ad c 2 ij ¼ P t 1 fa t1 þ j a t1 j=2[ k A k g, where fa k g is a collectio of subcore-cuboids that are kow to a sooper (ote that a sooper ca calculate the compaio sums from released aggregatios). The, we have the followig Theorem 4.5 (o-tighter estimate of the exact lower boud i a irregular data cube). Give a 2D irregular data cube, if the sum of all cell values i the compaio cuboid of a ij is ukow to a sooper, the maxfa þj þ a iþ c 1 ij a þj þ a iþ c 2 ij 0g is a lower boud of a ij. A formal proof is give i Appedix F. Sice the compaio sums are less tha the grad total, the lower boud proposed i Theorem 4.5 is o looser tha the Fréchet lower boud (after ormalizatio). Thus, the bouds proposed i Theorems 4.3 ad 4.4 are also o looser tha the Fréchet lower boud. So far, our study has bee primarily focused o estimatig the exact lower boud i a irregular data cube. I iferece cotrol, a frequetly asked questio is whether a particular cell value is greater tha zero (for example, a patiet is HIV positive) or greater tha a threshold (for example, a aget buys a large-eough volume of stock). Estimatig the exact lower boud of a cell value is the most useful way to aswer such questios. To estimate the exact upper boud i a irregular data cube, oe ca use the Fréchet upper boud (after ormalizatio) ad further improve it by usig the shuttle algorithm (see Sectio 5.2) based o the estimates of the exact lower boud provided above. 5 HIGH-DIMENSIONAL DATA CUBES For regular data cubes, the Fréchet bouds have bee exteded to dimesios [12]. Statemet 2 (-dimesioal Fréchet bouds). I a -dimesioal data cube, the Fréchet lower boud for ay cell a t1t equals the maximum of zero ad the 2 possible 2D Fréchet lower bouds max 0a t 1 t i1 þt iþ1 t þ a t1 t j1 þt jþ1 t a t1 t i1 þt iþ1 t j1 þt jþ1 t j 1 i<j ad the Fréchet upper boud of a t1t is the miimum of aggregatio values i the ð 1Þ-dimesioal star cuboids to which the cell value cotributes mif a t1 t i1 þ t iþ1 t j 1 i g I particular, the three-dimesioal Fréchet bouds for cell value a ijk are >< >= < a a max þjk þ a iþk a þjk = þþk a a þjk þ a ijþ a ijk mi a iþk > þjþ > a a iþk þ a ijþ a ijþ iþþ The time complexity of computig the -dimesioal Fréchet bouds for each cell value is Oð 2 Þ¼Oð 2 Þ. This complexity is sigificatly lower tha the complexity Oðð P i¼1 d iþ 45 Þ of usig liear programmig to compute the exact bouds. Ufortuately, the Fréchet bouds may ot be exact for ay high-dimesioal data cube i geeral (there are special cases based o decomposibility ito graph structures, as discussed i Sectio 7). This has bee prove by Cox [11] with couterexamples. Below, we propose a formulatio of ew bouds that are o looser tha the Fréchet bouds i the most geeral case ad whose time complexity is simpler tha that of the Fréchet bouds. Our bouds are also o looser tha the recet improvemets o the Fréchet bouds i high dimesios (see Sectio 5.2). Statemet 3 (-dimesioal ew bouds). Give the star cuboids of a -dimesioal data cube C, the ew lower boud for ay cell value a t1t i C is 9 < P 0a t1 t i1 þt iþ1 t = max mifa þt2 t i1 tt iþ1 t a t1 þt 3 t i1 tt iþ1 t a t1t i1tt iþ1t 1þg j1 i Let a t1 t be the ew lower boud of a t1 t. The ew upper boud of a t1 t is mi a Pt 1 t i1 þt iþ1 t a t1t i1tt iþ1t j 1 i I particular, the 3D ew bouds for cell value a ijk are 9 0 >< a þjk P t6¼i a ijk ¼ max mifa tþka tjþ g >= a iþk P t6¼j mifa þtka itþ g > a ijþ P t6¼k mifa > þjta iþt g a þjk P t6¼i a 9 >< tjk a ijk ¼ mi a iþk P >= t6¼j a itk > a ijþ P t6¼k a > ijt

5 LU AND LI PRACTICAL INFERENCE CONTROL FOR DATA CUBES 91 Theorem 5.1 (comparig ew bouds with Fréchet bouds). The ew bouds are at least as tight as the Fréchet bouds i dimesios. The proof is give i Appedix G. It is ot difficult to prove that the ew bouds are the same as the Fréchet bouds i two dimesios. We leave this as a exercise. Note that the ew bouds ca be directly applied to irregular data cubes followig the ormalizatio process show i Sectio 4. A alterative approach is to resort to the high-dimesioal Fréchet bouds. Recall that i dimesios, the Fréchet bouds are derived from 2 2D Fréchet bouds, each of which ca be computed usig the method proposed i Sectio 4. However, this approach is more complex tha the applicatio of our ew bouds. 5.1 Complexity Reductio At first glace, computig our ew bouds appears to be more complex tha computig the Fréchet bouds. If computed i a straightforward maer, the ew lower boud for each cell requires 2 compariso operatios to compute each mi, ðd i 1Þð 2Þ compariso operatios ad ðd i 2Þ additio operatios to get each P thus, it requires ð P i d i Þð 2Þþ compariso operatios ad P i d i additio ad subtractio operatios to get the fial max. The time complexity 3 of computig the ew lower boud i this way is Oð P i¼1 d iþ. After all of the lower bouds are obtaied, the ew upper boud for each cell ca be computed i P i¼1 d i additio ad subtractio operatios ad 1 compariso operatios. Sice the computatio of the upper boud depeds o the lower bouds, its complexity is also Oð P i¼1 d iþ. I compariso, the time complexity of computig the Fréchet bouds (which is domiated by computig the lower boud) is Oð 2 Þ. However, oe ca reduce the time complexity of computig the ew bouds by precomputatio ad trasformatio. Let a t1 t ¼ mifa þt2 t a t1 þt 3 t...a t1 t 1 þg. We have the followig Theorem 5.2 (trasformatio of the ew lower boud). The ew lower boud for cell value a t1t ca be trasformed as maxf0a t1t i1þt iþ1t X a t1t i1tt iþ1t j 1 i g The proof is give i Appedix H. Accordig to this theorem, oe ca precompute all a t1 t before computig the ew bouds. Durig this process, each cell requires at most 1 compariso operatios. The computatio of the ew lower boud for each cell requires P i¼1 d i additio ad subtractio operatios ad compariso operatios. After all of the lower bouds are obtaied, each ew upper boud requires P i¼1 d i additio ad subtractio operatios ad 1 compariso operatios. The time complexity of computig the ew bouds i this maer is thus Oð P i¼1 d iþ. This complexity Oð P i¼1 d iþ is ot oly much simpler tha that of liear programmig Oðð i¼1 d iþ 45 Þ but also simpler tha that of the Fréchet bouds Oð 2 Þ i the case that d i is bouded. I real-world applicatios, a data cube is usually built from a database relatio with a large umber of attributes (it is commo to see tes or hudreds of 3. The time complexity is derived solely based o the umber of additio, subtractio, or compariso operatios. We do ot address issues such as data structure, memory cost, ad I/O cost i this paper. Fig. 3. Compariso of bouds usig Fieberg s example [13, Table 1]. attributes i applicatios) however, the umber of categories (that is, d i ) for each attribute is usually bouded (certai attributes such as biary attributes have very small d i ). I such cases, the time complexity of our ew bouds is liear to, whereas the Fréchet bouds are quadratic. 5.2 Comparisos with Other Solutios I recet years, rigorous efforts have bee made to improve the Fréchet bouds i high dimesios. Most of the improvemets take place i three dimesios, although some of them ca be exteded to dimesios. I this sectio, we compare our ew bouds with the recet improvemets, icludig Fieberg s boudig approach [13], Chowdhury et al. s etwork models for bouds [14], Buzzigoli ad Giusti s shuttle algorithm [15], ad Dobra ad Fieberg s geeralized Fréchet bouds [6], [10], [16]. A review of most of these methods was give by Cox i [12] Fieberg s Boudig Approach Fieberg s boudig approach works i three dimesios [13]. As correctly poited out by Cox i [12], the lower boud provided by Fieberg is equivalet to the Fréchet lower boud, whereas the upper boud (also called the Boferroi upper boud of Fieberg) is o looser a ijk ðfieberg boudþ mifa þjk a iþk a ijþ a þþþ a iþþ a þjþ a þþk þ a ijþ þ a iþk þ a þjk g ðfrechet boudþ mifa þjk a iþk a ijþ g Theorem 5.3 (comparig ew bouds with Fieberg bouds). The ew bouds are at least as tight as the Fieberg bouds i three dimesios. The proof is give i Appedix I. The above theorem ca be illustrated usig the example show i Fig. 3. The core cuboid i this example is a table of sample couts take from the 1990 Deceial Cesus Public Use Sample. This example has also bee used by Fieberg [13] ad Cox [12] for comparig bouds. I this example, the Fieberg bouds are exactly the same as the Fréchet bouds. 4 I compariso, 4. Certai umeric errors i [13] regardig this example have bee poited out ad corrected by Cox i [12].

6 92 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 200 our ew bouds are the same as the exact bouds ad tighter tha the Fieberg bouds for certai cells Network Models for Bouds Chowdhury et al. [14] preseted etwork models for computig the exact bouds for iteger cells i three dimesios. The etwork models provide a atural laguage to express 2D tables (or 2D star cuboids) ad a efficiet mechaism to compute the exact bouds. The problem addressed i [14] is, assumig that oe 3D core cuboid ad oe of its three 2D star cuboids are protected, how to calculate the exact lower boud ad upper boud for each aggregatio value i the protected star cuboid, give the other two star cuboids. Chowdhury et al. costructed etworks for expressig the coectios betwee the star cuboids ad proposed two simple matrix operators for obtaiig the exact bouds. Although the method is very efficiet, it deals with 2D star cuboids oly. Cox s commets [12] o this method are that most geeralizatios beyod two dimesios are apt to fail ad that the problem ca be solved directly usig the Fréchet bouds without recourse to etworks Shuttle Algorithm The shuttle algorithm is a iterative algorithm proposed by Buzzigoli ad Giusti [15]. The basic idea is that for each cell value i three dimesios ad each 2D aggregatio cotaiig the cell, a cadidate lower boud is computed by subtractig from the aggregatio the sum of the curret upper bouds of all of the other cells cotaied i the aggregatio. If the cadidate lower boud improves the curret lower boud, it replaces it. A similar procedure is used to improve the curret upper boud with a cadidate computed from the sum of the curret lower bouds. The two-step procedure is repeated util the lower bouds or upper bouds for all of the cells are statioary. The shuttle algorithm ca be easily exteded to dimesios. The shuttle algorithm ca work with ay iitial lower ad upper bouds. The iitial lower ad upper bouds could be chose from the Fréchet bouds, the Fieberg bouds, or our ew bouds. I this sese, the shuttle algorithm is complemetary to our work. Cox has correctly poited out i [12] that the shuttle algorithm coverges i a fiite umber of iteratios if all of the cell values are itegers. However, it is ot clear how fast the algorithm coverges. The time complexity of this algorithm is at least as high as the algorithm used for providig the iitial bouds. A geeralized versio of this algorithm was developed by Dobra et al. [10] Dobra ad Fieberg s Geeralized Fréchet Bouds Dobra ad Fieberg [6], [10], [16] studied exact lower ad upper bouds, which they called geeralized Fréchet bouds, for a specific type of high-dimesioal statistical tables. A statistical table ca be cosidered a data cube i which a oegative radom variable is assiged to each cell. They assumed that the released set of margials (that is, values i star cuboids) is the set of miimum sufficiet statistics of a decomposible or reducible log-liear model. Uder such a assumptio, the exact lower ad upper bouds of each cell ca be expressed as explicit fuctios of relatig margials. The differece betwee our work ad Dobra ad Fieberg s is clear. Sice we do ot make ay assumptio about the distributio of cell values, our results ca be applied to ay data cube i the most geeral case, regardless of the distributio of cell values. I cotrast, Dobra ad Fieberg s results pertai oly to the reducible log-liear models with miimal sufficiet statistics. Their results represet oly a small part of those eeded to allow the computatio of upper ad lower bouds ½...Š [16]. I a recet developmet, Dobra et al. [10] preseted a hashtable-based structure ad a geeralized shuttle algorithm to exploit the extreme sparsity of large data sets. 6 DISCUSSIONS ON SECURITY APPLICATIONS I this sectio, we discuss some security applicatios based o the solutios to the iferece problem i data cubes. 6.1 Privacy Protectio for Released Data Privacy protectio for released data has bee a major cocer i may applicatios such as statistical data publicatio, survey, ad data miig. This cocer is about how to preserve a idividual s privacy i subject-level data whe aggregatio data is released. We cosider data cubes i this applicatio sceario (for example, data cube products such as a atioal cesus or survey are released). Whe data aggregatios are released, it is critical to esure that the released data caot be utilized by data soopers to obtai privacy iformatio. We classify the disclosure of privacy iformatio ito the followig types based o what the privacy iformatio meas. Existece disclosure. The lower boud of a cell value is greater tha zero (for example, a patiet visits a doctor at least oe time for a certai disease).. Threshold upward disclosure. The lower boud of a cell value is greater tha a certai threshold (for example, a aget buys a large-eough volume of certai stock).. Threshold dowward disclosure. The upper boud of a cell value is less tha a certai threshold (for example, a aget does ot buy a large-eough volume of certai stock).. Approximatio disclosure. The differece of the upper boud ad lower boud of a cell value is less tha a certai threshold (for example, a professor s salary falls ito a small-eough rage). The existece ad threshold upward disclosures are determied by the lower bouds that a sooper ca ifer, whereas the threshold dowward disclosure ad the approximatio disclosure ivolve the upper bouds of protected cell values. For ay type of disclosure, we ca determie which cells are subject to disclosure accordig to the exact bouds that a sooper may obtai (for example, through LP). There will be o mistakes i determiig the cells if we use the exact bouds. If the o-tighter bouds are used istead, there might be false egatives (cells subject to disclosure are cosidered subject to o disclosure) but o false positives (cells subject to o disclosure are cosidered subject to disclosure). If we use o-looser bouds, it may lead to false positives but o false egatives.

7 LU AND LI PRACTICAL INFERENCE CONTROL FOR DATA CUBES 93 Give a set A 0 ¼fa t 0 1 t 0 g of cells that might be subject to disclosure, we ow propose a geeric approach, called k-aoymity partitio, to limit the disclosure of those cells. Defie the projectio of A 0 to each dimesio i as P i ¼ft 0 i g. Assume that jp i j¼mifjp 1 j... jp jg, ad 0 <k d i. The k-aoymity partitio from dimesio i is defied by the followig procedures. Partitio the values i P i ito groups of k values. If jp i jk, the the last group may cosist of more tha k values (for simplicity, we describe our method oly for the groups of k values). If jp i j <k, the k jp i j values from D i P i are combied with the values i P i to form a group, where D i ¼ f1...d i g is the set of idex values for dimesio i.. For each group of k values t 1 i...tk i ad for each dimesio j 6¼ i (without loss of geerality, assume j>i), release the aggregatios of sum values a t1t 1 þ...þ a i tj1þtjþ1t t 1 t k i t j1þt jþ1 t istead of idividual sums a t1t 1tj1þtjþ1t...a i t 1 t k i t j1þt jþ1 t i the star cuboid fa t1t j1þt jþ1t g. I other words, ay k values a t1 t 1 i t...a t 1t k i t are summed together i all ð 1Þ-dimesioal star cuboids. Other star cuboids ca be processed similarly if they are released to the public. From the released star cuboids oly, a sooper caot differetiate amog ay k values a t1t 1t...a i t 1t kt. i Now, cosider ay cell a t 0 1 t 0 that might be subject to disclosure before k-aoymity partitio (that is, a t 0 1 t 0 2 A0 ). Sice t 0 i 2 P i, there exists a set of k values i the form of a t1t 1t...a i t 1 t k i t such that 1) a t 0 is oe of these k values 1 t0 ad 2) these k values are always summed together i all star cuboids. Therefore, a t 0 1 t 0 caot be differetiated from a group of k cells after k-aoymity partitio. A k-aoymity protectio is thus achieved for those cells that might be subject to disclosure at the price of reducig the umber of aggregated values that are released i the star cuboids. Let us cosider what a sooper ca ifer for each group of k values after the k-aoymity partitio. Assume that the sooper ca ifer a t 0 1 t 0 >for existece or threshold upward disclosure before the k-aoymity partitio, where 0 is a predetermied threshold. After the k-aoymity partitio, the sooper ca, at best, ifer that a t1 t 1 i t þ...þ a t 1 t k i t >. The sooper caot ifer ay of the k values havig a ozero lower boud. Thus, all k values are safe from existece ad threshold disclosures. For threshold dowward disclosure ad approximatio disclosure, however, the iferece of a group of k values is determied by its upper boud. Geerally, assume that a sooper ca obtai 1 a t1 t 1 i t þ...þ a t 1t k i t 2 after the k-aoymity partitio the, the sooper ca ifer that all of these k values are i the rage of ½0 2 Š.If 2 is small eough, it may be cosidered a disclosure. I such case, oe ca choose large-valued sums i the partitio or icrease k so as to icrease the upper bouds. 6.2 Fie-Graied Access Cotrol ad Auditig If the aggregatios i a data cube are ot to be released for public access, fie-graied access cotrol ad auditig ca be applied for protectig privacy iformatio whe users query the data cube o the server side. I this sceario, a user may be grated to access certai cell values ad/or aggregatios values provided that o privacy iformatio is revealed from these values. We assume that appropriate autheticatio is eforced whe a user queries the data cube. For each user, a subset of cell values is defied as privacy iformatio. The three types of disclosure defied i the above sectio ca still be used to describe the leakage of privacy iformatio. To esure that the server oly aswers those queries that do ot reveal ay privacy iformatio, a auditig moitor is implemeted to keep recordig all of the queries that have bee asked by ad aswered to each user. The auditig moitor should ot be bypassed or tampered with for the itegrity of auditig records. Whe a user costructs a ew set of queries, fie-graied access cotrol is implemeted to check whether the aswers to this set of queries, combied with historical auditig records, reveal ay privacy iformatio. If ot, grat the access request otherwise, dey it. The fie-graied access cotrol ca be easily performed with a applicatio of our results. The first reaso is that the bouds i our results ca be computed i the presece of kow cell values (thus, we do ot eed to resort to LP). The secod reaso is that the bouds of a cell ca be computed with a miimum umber of aggregatio values (istead of all aggregatio values i LP). As a result, the server ca quickly locate the relevat cells ad compute their bouds give a set of kow cell values ad aggregatio values. The last reaso is that the high efficiecy of our method is critical for eforcig the access cotrol i a olie eviromet. This access cotrol is fie graied because it deals with ad hoc sets of cell/aggregatio values. I compariso, the previous access/iferece cotrol method proposed for data cubes [17] deals with cuboids or slices of data as authorizatio objects. The previous method [17] derives privacy breaches based o the logical relatioships amog authorizatio objects, rather tha the bouds of uderlyig cell values. Due to these differeces, their method is complemetary to ours. 7 RELATED WORK Although the eed for security protectio i data cubes has bee idetified [1], the fudametal problem of iferece cotrol, which is how to efficietly calculate the lower ad upper bouds for each cell give the aggregatios, has ot yet bee fully addressed. A special case of this problem, the iferece of exact values (that is, the lower bouds ad upper bouds are the same) i data cubes, has bee studied recetly [19], [20], [21]. I [19], Brakovic et al. gave the maximum umber of queries that ca be aswered without compromisig ay previously ukow values i a data cube. I [20], Wag et al. gave a tight upper boud for the umber of kow values such that a data cube is iferece-free. I [21], it is prove that eve queries (that is, where a eve umber of cell values are ivolved i multidimesioal axis-parallel cuboids) are ot subject to exact ifereces. I compariso, we address a more geeric ad practical problem regardig the iferece of bouds rather tha exact values i data cubes. I the cotext of statistical databases, iferece cotrol (or privacy protectio) has bee extesively studied [22], [23], [24]. The proposed techiques ca be roughly classified ito perturbatio based ad restrictio based. The perturbatiobased techiques protect data agaist possible disclosure by addig radom oises to source data [25], [26], [27], [2], [29],

8 94 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 200 [30], query aswers [31], or database structures [32]. Sice these techiques ievitably itroduce errors, they may ot be appropriate for certai applicatios. The restrictio-based techiques limit possible disclosure by posig restrictios o queries ad/or source data. The advatage of this approach is that it does ot itroduce ay errors. The trade-off is that it may reduce the amout of iformatio that is provided for data services. Our k-aoymity partitio method falls ito this category. It borrows the k-aoymity cocept proposed by Samarati ad Sweeey [33], [34] for protectig microdata (idividual respodet data). We exted it for protectig cell values i data cubes. Our k-aoymity partitio also borrows ideas from the partitio approach [35], [36], i which idividual etities are clustered ito a umber of mutually exclusive subsets (called atomic populatios). The differece is that our method partitios the sum values i the star cuboids rather tha the idividual values at the lowest level. Our method is similar to the microaggregatio approach [23], [37] i the sese that certai sum values are clustered ito mutually exclusive groups prior to publicatio. The differece is that our method clusters oly a selected set of sum values, whereas the microaggregatio approach clusters all idividual records ad the publishes the average over each group istead of the idividual records. Aother related work is cell suppressio [11], [3], [39], [40], i which all cell values that might cause disclosure are suppressed either fully or partially from a released table(s). I compariso, we do ot suppress ay cell values but aggregate selected sum values i the star cuboids. CONCLUSIONS AND FUTURE DIRECTIONS Data cubes, icludig those related to data warehouses, data miig, ad OLAP, are importat decisio-support tools for busiess ad scietific applicatios. Data cubes ca be used to discover treds ad patters i a multidimesioal ad multilevel maer. Although data cubes restrict user access to predefied aggregatios, a iappropriate iferece of sesitive or private iformatio about cell values may still occur. To protect the data, it is critical to discover such disclosure effectively ad efficietly. The mai purpose of this paper is to provide practical solutios for calculatig the lower ad upper bouds for each cell value give the aggregatios i a data cube. The lower ad upper bouds tell us to what extet a data sooper ca compromise the protected values. Although this problem ca be solved usig liear programmig, the time complexity of this solutio makes it prohibitive i practice. The same problem has bee studied usig differet forms ad terms i statistical data protectio ad statistical databases. The best method for fidig practical solutios to this problem is oe that was formulated by Fréchet i 1940, providig exact lower ad upper bouds (Fréchet bouds) i the 2D case. We advace the cocept of Fréchet bouds by cotributig the followig. We provide the first practical solutio for estimatig the lower ad upper bouds i 2D irregular data cubes. Our results ca be cosidered a otrivial extesio of the Fréchet bouds i irregular data cubes. I particular, we give the exact lower boud for each cell value ad o-tighter ad o-looser estimates of the exact lower boud, all of which are at least as tight as a straightforward extesio of the Fréchet lower boud (after ormalizatio) i irregular data cubes. The upper boud for each cell value is the same as the Fréchet upper boud (after ormalizatio), ad it may be improved through the applicatio of the shuttle algorithm based o our lower bouds.. We provide the first improvemet of the Fréchet bouds i arbitrary dimesios for ay oegative data cubes. We prove that our ew bouds for each cell i dimesios are at least as tight as the -dimesioal Fréchet bouds ad that the time complexity of our approach ca be reduced to be liear i terms of the total umber of idices i all dimesios. I cotrast, the Fréchet bouds are quadratic i terms of the total umber of dimesios. We also compare our ew bouds with recet improvemets of the Fréchet bouds. I particular, we prove that our bouds are at least as tight as the Fieberg bouds, that they provide a good startig poit for the shuttle algorithm, ad that they are more geeric tha the etwork models for bouds ad the geeralized Fréchet bouds.. Based o the bouds that a data sooper ca obtai for each cell, we discuss two security applicatios icludig privacy protectio for released data ad fie-graied access cotrol ad auditig. We classify the disclosure of privacy iformatio ito three types ad propose a k-aoymity partitio method to protect the privacy iformatio. Our ogoig work icludes a extesio to dyamic data cubes i which the cell values may be frequetly updated over time. For dyamic data cubes, ew issues arise, icludig but ot limited to disclosure about which cells have bee updated ad to what extet they have bee updated. It would also be iterestig to develop practical algorithms for computig exact bouds for large sparse data cubes. APPENDIX A PROOF OF THEOREM 4.1 Lemma 1.1. Give two sets of oegative values fa þj g ad fa iþ g that satisfy the cosistecy coditio P j a þj ¼ P i a iþ ¼ a þþ, there exists a 2D (oegative) core cuboid fa ij g such that fa þj g ad fa iþ g are star cuboids of it. Proof. A costructio proof is provided. Cosider two cases 1) a þ1 þ a 1þ a þþ 0 ad 2) a þ1 þ a 1þ a þþ < 0. For Case 1, choose a 11 ¼ a þ1 þ a 1þ a þþ ad < a 1j ¼ a þj ðj 6¼ 1Þ a i1 ¼ a iþ ði 6¼ 1Þ a ij ¼ 0 ði 6¼ 1j6¼ 1Þ The 2D (oegative) core cuboid fa ij g costructed this way ca derive star cuboids fa þj g ad fa iþ g. For Case 2, choose a 11 ¼ 0. Due to the cosistecy coditio, there must exist fa 1j g j6¼1 ad fa i1 g i6¼1 such that a 1j a þj, a i1 a iþ, ad P Pj6¼1 a 1j ¼ a 1þ i6¼1 a i1 ¼ a þ1 Thus, the cell values i the first row ad the first colum are determied i the core cuboid that is to be

9 LU AND LI PRACTICAL INFERENCE CONTROL FOR DATA CUBES 95 Fig. 4. Costructig a 2D core cuboid i Lemma 1.1. costructed. Peelig off the first row ad colum, a smaller 2D core cuboid is to be costructed with the revised star cuboids (see Fig. 4) < a 0 þj ¼ a þj a 1j j ¼ 2...d 2 a 0 iþ ¼ a iþ a i1 i ¼ 2...d 1 a 0 þþ ¼ a þþ a þ1 a 1þ These revised star cuboids still satisfy the cosistecy coditio. Therefore, the above process ca be applied recursively. I ay recursive step, if Case 1 happes, the costructio stops otherwise, the costructio process cotiues util the last row or colum is peeled off. Fially, a 2D oegative core cuboid fa ij g is costructed, from which the star cuboids fa þj g ad fa iþ g ca be derived. tu Proof of Theorem 4.1. Without loss of geerality, the theorem is prove for cell value a 11 oly. First, prove that mifa þ1 a 1þ g is the exact upper boud of a 11. Sice all cell values are oegative, mifa þ1 a 1þ g is a upper boud of a 11. To prove that it is the exact upper boud, oe eeds to prove that there exists a core cuboid fa 0 ij 0g such that 1) a0 11 ¼ mifa þ1a 1þ g ad 2) the star cuboids fa iþ g ad fa þj g ca be derived from it. Without loss of geerality, assume that a þ1 a 1þ. Let a 0 11 ¼ mifa þ1 a 1þ g¼a þ1 ad a 0 i1 ¼ 0 for i 6¼ 1. The first colum i the core cuboid fa 0 ijg is thus costructed. Peelig off this first colum, a smaller 2D core cuboid is to be costructed with revised aggregatio values a 0 1þ ¼ a 1þ a þ1, a 0 þþ ¼ a þþ a þ1, a 0 þj ¼ a þj for j ¼ 2...d 2, ad a 0 iþ ¼ a iþ for i ¼ 2...d 1. These aggregatio values satisfy the cosistecy coditio. From Lemma 1.1, a oegative core cuboid fa 0 ijg ca be costructed with these aggregatio values. Combiig this core cuboid with the peeled-off colum, oe obtais the required core cuboid. The, prove that maxf0a 1þ þ a þ1 a þþ g is the exact lower boud of a 11. From a 11 þ a 12 þ...þ a 1d2 ¼ a 1þ ad a 1i a þi, oe ca derive a 11 a 1þ ða þ2 þ a þ3 þ...þ a þd2 Þ¼a þ1 þ a 1þ a þþ. Thus, maxf0a 1þ þ a þ1 a þþ g is a lower boud of a 11. To prove that it is the exact lower boud, oe eeds to prove that there exists a core cuboid fa 0 ij 0g such that 1) a0 11 ¼ maxf0a 1þ þ a þ1 a þþ g ad 2) the star cuboids fa iþ g ad fa þj g ca be derived from it. The proof of this is exactly the same as that of Lemma 1.1. tu APPENDIX B THEOREM 4.1 MAY NOT HOLD IN IRREGULAR DATA CUBES We show that Theorem 4.1 may ot hold i irregular data cubes. Cosider the simple example show i Fig. 5. I this Fig. 5. Example of a irregular data cube. example, a sigle subcore-cuboid A 0 is kow to a sooper, whereas the other three subcore-cuboids A 1, A 2, ad A 3 are protected. If the Fréchet bouds are directly applied to a cell value a ij 2 A 1, the a iþ þ a þj a þþ ¼ a iþ þ a þj X ða 0 A 1 A 2 A 3 Þ a ij mifa iþ a þj g where P A k deotes the sum of all cell values i sub-core cuboid A k (k ¼ 0, 1, 2, or 3). These bouds may ot be the exact bouds due to the existece of o-looser bouds a 0 iþ þ a þj a þþ ¼ a 0 iþ þ a þj X ða 1 A 2 A 3 Þ a ij mifa 0 iþ a þjg where a 0 iþ ¼ a iþ P s 1 i¼1 a ij ca be computed by a sooper. Moreover, oe ca verify that the above lower boud of a ij ca be further improved by the followig APPENDIX C a 0 iþ þ a þj X ða 1 A 3 Þa ij Proof of Lemma 4.2. Without loss of geerality, cosider a 11 ad assume that all a i1 ad a 1j are ot kow to a sooper. It is clear that the Fréchet lower boud of a 11 is a lower boud of a 11. To prove that it is the exact lower boud, we costruct a irregular core cuboid fa 0 ij 0g such that a 0 11 has the value of the Fréchet lower boud ad that the star cuboids fa iþ g ad fa þj g ca be derived from it. First, cosider the case where a 1þ þ a þ1 a þþ 0. From a 1þ þ a þ1 a þþ 0, we have a 11 P P ij6¼1 a ij. There exist f ij g ij6¼1 such that ij6¼1 ij ¼ a 11 ad 0 ij a ij. Let a 0 11 >< ¼ 0 a 0 1j ¼ a 1j þ P i6¼1 ij ðj 6¼ 1Þ a 0 i1 ¼ a i1 þ P j6¼1 ij ði 6¼ 1Þ > a 0 ij ¼ a ij ij ði 6¼ 1j6¼ 1Þ From fa 0 ij g, oe ca derive the star cuboids fa iþg ad fa þj g, because P < j a0 1j ¼ P j6¼1 ða 1j þ P i6¼1 ijþ ¼a 1þ for i 6¼ 1 P j a0 ij ¼ a0 i1 þ P j6¼1 a0 ij ¼ a i1 þ P j6¼1 ij þ P j6¼1 ða ij ij Þ¼a iþ

10 96 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 200 Fig. 6. Fréchet lower boud i the compaio cuboid is ot the exact lower boud. Similarly, oe ca verify P i a0 ij ¼ a þj for all j. The costructio is complete. The, cosider the case where a 1þ þ a þ1 a þþ > 0. Let a 0 11 ¼ a 1þ þ a þ1 a þþ >< a 0 1j ¼ a þj ðj 6¼ 1Þ a 0 i1 ¼ a iþ ði 6¼ 1Þ > a 0 ij ¼ 0 ði 6¼ 1j6¼ 1Þ It is clear that P j a0 ij ¼ a iþ ad P i a0 ij ¼ a þj for all i ad j.tu APPENDIX D Proof of Theorem 4.3. Without loss of geerality, cosider a 11 ad its compaio cuboid A 11 i a irregular cuboid A. Let c be the sum of all cell values i A 11. The Fréchet lower boud of a 11 i the compaio cuboid A 11 is maxf0a 1þ þ a þ1 cg (ote that a 1þ, a þ1, ad c are kow to a sooper). It is clear that this boud is a lower boud of a 11. To prove that it is the exact lower boud, we eed to costruct aother irregular cuboid A 0 ¼fa 0 ij g such that a0 11 has the value of the Fréchet lower boud i the compaio cuboid ad that the star cuboids of A ca be derived from it. We first costruct aother compaio cuboid A 0 11 such that a 0 11 has the value of the Fréchet lower boud i the compaio cuboid ad that the 1D sums of the compaio cuboid A 11 remai uchaged i A The costructio of such A 0 11 follows the proof of Lemma 4.2. The, the irregular cuboid A 0 is costructed by combiig A 0 11 with those cells i A A 11. It is clear that the star cuboids of A ca be derived from A 0. tu APPENDIX E Proof of Theorem 4.4. Without loss of geerality, cosider the Fréchet lower boud of a 11 ad its compaio cuboid A 11. Accordig to the proof of Theorem 4.3, there exists aother compaio cuboid A 0 11 such that a0 11 has the value of the Fréchet lower boud ad that the 1D sums of the compaio cuboid A 11 remai uchaged i A By combiig A 0 11 with those cells i A A 11, oe obtais a irregular core cuboid from which the star cuboids i the origial irregular data cube ca be derived. Sice the exact lower boud of a 11 is the lowest possible value i ay irregular core cuboid from which the origial star cuboids ca be derived, the Fréchet lower boud of a 11 i its compaio cuboid A 11 is o less tha the exact lower boud of a 11. tu Fig. 6 gives a example that shows that i certai cases, the Fréchet lower boud i the compaio cuboid is ideed tighter tha the exact lower boud. Fig. 7. maxfa þ1 þ a 1þ c 1 11 a þ1 þ a 1þ c g is ot the exact lower boud. Note that i Theorem 4.4, a sooper kows either the grad total of the compaio cuboid or the Fréchet lower boud i the compaio cuboid. The Fréchet lower boud i the compaio cuboid is a lower boud from a auditor s perspective it caot be cosidered a lower boud from a sooper s perspective (as i the proof of Theorem 4.3). APPENDIX F Proof of Theorem 4.5. Without loss of geerality, cosider a 11 ad its compaio sums c 1 11 ad c2 11. For ay irregular core cuboid fa 0 ij 0g from which the star cuboids of the origial cube ca be derived, we have a þ1 þ a 1þ c 1 11 ¼ a 0 11 P t 1t 26¼1 fa0 t 1t 2 j a 0 1t 2 =2[ k A k ga 0 11 ad a þ1þa 1þ c 2 11 ¼ a 0 11 P t 1 t 2 6¼1 fa0 t 1 t 2 j a 0 t 1 1 =2[ k A k ga 0 11 therefore, maxfa þ1 þa 1þ c 1 11 a þ1 þ a 1þ c g is a lower boud of a 11. tu Fig. 7 gives a example that shows that i certai cases, maxfa þ1 þ a 1þ c 1 11 a þ1 þ a 1þ c g is ideed looser tha the exact lower boud of a 11. I this example, a 41 ¼ a 14 ¼ a 44 ¼ 0 are kow to a sooper. The sooper ca compute maxfa þ1 þ a 1þ c 1 11 a þ1 þ a 1þ c g ¼3. If three is the exact lower boud of a 11, the a 21 ad a 31 must be five ad four, respectively, to satisfy a þ1 ¼ 12. Cosequetly, a 2j ¼ a 3j ¼ 0 for j ¼ for satisfyig a 2þ ¼ 5 ad a 3þ ¼ 4. A cotradictio is committed sice a þ4 ¼ 3 ca ever be satisfied. Therefore, maxfa þ1 þ a 1þ c 1 11 a þ1 þ a 1þ c g caot be the exact lower boud i this example. APPENDIX G Proof of Theorem 5.1. First, prove that the ew lower boud is ideed a lower boud for cell a t1 t. From ad we have a t1 t ¼ a t1 t i1 þt iþ1 t X a t1 t i1 tt iþ1 t a t1 t i1 tt iþ1 t mifa þt2 t i1 tt iþ1 t a t1þt 3t i1tt iþ1t a t1t i1tt iþ1t 1þg a t1 t X a t1 t i1 þt iþ1 t mifa þt2t i1tt iþ1t a t1þt 3t i1tt iþ1t a t1 t i1 tt iþ1 t 1 þg Thus, the ew lower boud is ideed a lower boud.

11 LU AND LI PRACTICAL INFERENCE CONTROL FOR DATA CUBES 97 The, prove that the ew lower boud is greater tha or equal to the -dimesioal Fréchet lower boud. For ay term i the max bracket i the formula of the -dimesioal Fréchet lower boud, oe has a t1t i1þt iþ1t þ a t1t j1þt jþ1t a t1t i1þt iþ1t j1þt jþ1t ¼ a t1 t i1 þt iþ1 t X a t1 t i1 tt iþ1 t j1 þt jþ1 t a t1 t i1 þt iþ1 t X mifa þt2 t i1 tt iþ1 t a t1þt 3t i1tt iþ1t a t1t i1tt iþ1t 1þg Thus, for ay of 2 terms i the max bracket of the lower Fréchet boud, there exists oe out of terms i the max bracket of our ew lower boud such that the latter is greater tha or equal to the former. Therefore, the ew lower boud is greater tha or equal to the Fréchet lower boud. Now, cosider the ew upper boud for a t1 t. From a t1t ¼a t1t i1þt iþ1t P a t1t i1tt iþ1t ada t1t i1tt iþ1t a t1t i1tt iþ1t, where a t1t i1tt iþ1t is the ew lower P boud of a t1t i1tt iþ1t, we have a t1t a t1t i1þt iþ1t a t1t i1tt iþ1t. Thus, the ew upper boud is ideed a upper boud. Compared with the Fréchet upper boud, it is clear that the ew upper boud is less tha or equal to the Fréchet upper boud. tu APPENDIX H Proof of Theorem 5.2. We prove that the trasformed lower boud is the same as the ew lower boud give before 9 < P 0a t1t i1þt iþ1t = max mifa þt2 t i1 tt iþ1 t a t1 þt 3 t i1 tt iþ1 t a t1 t i1 tt iþ1 t 1 þgj1 i If for all t 6¼ t i, oe has mifa þt2 t i1 tt iþ1 t a t1 þt 3 t i1 tt iþ1 t a t1t i1tt iþ1t 1þg ¼a t1t i1tt iþ1t the the theorem is prove. Otherwise, there exists a t 6¼ t i such that the followig equatio holds The, mifa þt2 t i1 tt iþ1 t a t1 þt 3 t i1 tt iþ1 t a t1t i1tt iþ1t 1þg > a t1 t i1 tt iþ1 t ¼ a t1 t i1 þt iþ1 t a t1t i1þt iþ1t X mifa þt2 t i1 tt iþ1 t a t1 þt 3 t i1 tt iþ1 t a t1t i1tt iþ1t 1þg < 0 a t1t i1þt iþ1t X a t1t i1tt iþ1t 0 The theorem is prove. tu APPENDIX I Proof of Theorem 5.3. Sice the Fieberg lower boud is equivalet to the Fréchet lower boud, we oly eed to prove that the ew upper boud a ijk is less tha or equal to the Fieberg upper boud. O the oe had, oe ca verify that X a ijk þ ¼ a þþþ a t1 t 2 t 3 t 1 6¼it 2 6¼jt 3 6¼k a iþþ a þjþ a þþk þ a ijþ þ a iþk þ a þjk O the other had, from the formula of a ijk, oe ca derive a ijk a þjk X t 1 6¼i a t1 jk ¼ ¼ a þjk X ða t1þk X a t1t 2þÞ t 16¼i t 26¼j a þjk X ða t1jk X a t1t 2t 3 Þ t 1 6¼i t 2 6¼jt 3 6¼k X a ijk þ a t1 t 2 t 3 t 1 6¼it 2 6¼jt 3 6¼k Combiig this with (1) ad give the obvious fact that a ijk mifa þjk a iþk a ijþ g, oe has a ijk mifa þjk a iþk a ijþ a þþþ a iþþ a þjþ a þþk þ a ijþ þ a iþk þ a þjk g. tu ACKNOWLEDGMENTS The authors would like to thak the Editor-i-Chief, Professor Virgil Gligor, ad the aoymous reviewers for their help i the review process. H. Lu would like to thak Dr. Shuhog Wag from Sigapore Maagemet Uiversity for his help for the proof of Lemma 4.2. This work was coducted whe he was at Sigapore Maagemet Uiversity. Y. Li would like to thak Professor Ramayya Krisha from the Heiz School of Public Policy ad Maagemet, Caregie Mello Uiversity, for his valuable commets. This work was partially supported by the SMU Office of Research uder 04-C220-SMU-003. REFERENCES [1] J. Gray, S. Chaudhuri, A. Bosworth, A. Layma, D. Reichart, M. Vekatrao, F. Pellow, ad H. Pirahesh, Data Cube A Relatioal Aggregatio Operator Geeralizig Group-By, Cross-Tab, ad Sub-Totals, Data Miig ad Kowledge Discovery, vol. 1, o. 1, pp , [2] S. Chaudhuri ad U. Dayal, A Overview of Data Warehousig ad OLAP Techology, SIGMOD Record, vol. 26, o. 1, pp , [3] G. Dog, J. Ha, J.M.W. Lam, J. Pei, ad K. Wag, Miig Multi- Dimesioal Costraied Gradiets i Data Cubes, Proc. 27th It l Cof. Very Large Data Bases, pp , [4] D.E. Deig, Cryptography ad Data Security. Addiso-Wesley, 192. [5] E. Bertio ad R. Sadhu, Database Security Cocepts, Approaches, ad Challeges, IEEE Tras. Depedable ad Secure Computig, vol. 2, o. 1, pp. 2-19, Ja.-Mar [6] A. Dobra ad S.E. Fieberg, Bouds for Cell Etries i Cotigecy Tables Iduced by Fixed Margial Totals with Applicatios to Disclosure Limitatio, Statistical J. Uited States, vol. 1, pp , 2001.

12 9 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 200 [7] M. Fréchet, Les Probabilities, Associées a u Système d Evémets Compatibles et Dépedats, vol. Premiere Partie, Herma & Cie, [] Y. Li, H. Lu, ad R.H. Deg, Practical Iferece Cotrol for Data Cubes (exteded abstract), Proc. IEEE Symp. Security ad Privacy, pp , [9] J.P. Igizio ad T.M. Cavalier, Liear Programmig. Pretice Hall, [10] A. Dobra, A. Karr, ad A. Sail, Preservig Cofidetiality of High-Dimesioal Tabulated Data Statistical ad Computatioal Issues, Statistics ad Computig, vol. 13, pp , [11] L. Cox, O Properties of Multi-Dimesioal Statistical Tables, J. Statistical Plaig ad Iferece, vol. 117, o. 2, pp , [12] L. Cox, Boudig Etries i 3-Dimesioal Cotigecy Tables, Iferece Cotrol i Statistical Databases From Theory to Practice. Spriger, pp , [13] S. Fieberg, Fréchet ad Boferroi Bouds for Multi-Way Tables of Couts with Applicatios to Disclosure Limitatio, Proc. Cof. Statistical Data Protectio, pp , [14] S. Chowdhury, G. Duca, R. Krisha, S. Roehrig, ad S. Mukherjee, Disclosure Detectio i Multivariate Categorical Databases Auditig Cofidetiality Protectio through Two New Matrix Operators, Maagemet Scieces, vol. 45, pp , [15] L. Buzzigoli ad A. Giusti, A Algorithm to Calculate the Lower ad Upper Bouds of the Elemets of a Array Give Its Margials, Proc. Cof. Statistical Data Protectio, pp , [16] A. Dobra ad S.E. Fieberg, Bouds for Cell Etries i Cotigecy Tables Give Fixed Margial Totals ad Decomposable Graphs, Proc. Nat l Academy of Scieces, vol. 97, o. 22, pp , [17] L. Wag, S. Jajodia, ad D. Wijesekera, Securig OLAP Data Cubes agaist Privacy Breaches, Proc. IEEE Symp. Security ad Privacy, pp , [1] B.K. Bhargava, Security i Data Warehousig (Ivited Talk), Proc. Secod Data Warehousig ad Kowledge Discovery, pp , [19] L. Brakovic, P. Norak, M. Miller, ad G. Wrightso, Usability of Compromise-Free Statistical Databases, Proc. Nith It l Cof. Scietific ad Statistical Database Maagemet, pp , [20] L. Wag, D. Wijesekera, ad S. Jajodia, Cardiality-Based Iferece Cotrol i Sum-Oly Data Cubes, Proc. Seveth Europea Symp. Research i Computer Security, pp , [21] L. Wag, Y. Li, D. Wijesekera, ad S. Jajodia, Precisely Aswerig Multi-Dimesioal Rage Queries without Privacy Breaches, Proc. Eighth Europea Symp. Research i Computer Security, pp , [22] N.R. Adam ad J.C. Wortma, Security-Cotrol Methods for Statistical Databases A Comparative Study, ACM Computig Surveys, vol. 21, o. 4, pp , 199. [23] L. Willeborg ad T. de Walal, Statistical Disclosure Cotrol i Practice. Spriger, [24] J. Domigo-Ferrer, Advaces i Iferece Cotrol i Statistical Databases A Overview, Iferece Cotrol i Statistical Databases From Theory to Practice, pp. 1-7, [25] J.F. Traub, Y. Yemii, ad H. Woziakowski, The Statistical Security of a Statistical Database, ACM Tras. Database Systems, vol. 9, o. 4, pp , 194. [26] Y. Li, L. Wag, ad S. Jajodia, Prevetig Iterval-Based Iferece by Radom Data Perturbatio, Privacy Ehacig Techologies, pp , [27] D. Agrawal ad C.C. Aggarwal, O the Desig ad Quatificatio of Privacy Preservig Data Miig Algorithms, Proc. 20th ACM SIGACT-Sigmod-SIGART Symp. Priciples of Database Systems, [2] K. Muralidhar ad R. Sarathy, A Geeral Additive Data Perturbatio Method for Database Security, Maagemet Scieces, vol. 45, pp , [29] H. Kargupta, S. Datta, Q. Wag, ad K. Sivakumar, O the Privacy Preservig Properties of Radom Data Perturbatio Techiques, Proc. Third IEEE It l Cof. Data Miig, pp , [30] Z. Huag, W. Du, ad B. Che, Derivig Private Iformatio from Radomized Data, Proc. ACM SIGMOD 05, pp. 37-4, [31] L.L. Beck, A Security Mechaism for Statistical Databases, ACM Tras. Database Systems, vol. 5, o. 3, pp , 190. [32] J. Schlörer, Security of Statistical Databases Multidimesioal Trasformatio, ACM Tras. Database Systems, vol. 6, o. 1, pp , 191. [33] P. Samarati ad L. Sweeey, Protectig Privacy Whe Disclosig Iformatio k-aoymity ad Its Eforcemet through Geeralizatio ad Suppressio, techical report, SRI It l, 199. [34] L. Sweeey, Achievig k-aoymity Privacy Protectio Usig Geeralizatio ad Suppressio, It l J. Ucertaity, Fuzziess ad Kowledge-Based Systems, vol. 10, o. 5, pp , [35] F.Y.L. Chi ad G. Özsoyoglu, Statistical Database Desig, ACM Tras. Database Systems, vol. 6, o. 1, pp , 191. [36] J. Schlörer, Iformatio Loss i Partitioed Statistical Databases, Computer J., vol. 26, o. 3, pp , 193. [37] J. Domigo-Ferrer ad J.M. Mateo-Saz, Practical Data-Orieted Microaggregatio for Statistical Disclosure Cotrol, IEEE Tras. Kowledge ad Data Eg., vol. 14, o. 1, pp , Ja [3] L.H. Cox, Suppressio Methodology ad Statistical Disclosure Cotrol, J. Am. Statistical Assoc., vol. 75, o. 370, pp , 190. [39] M. Fischetti ad J.J. Salazar, Solvig the Cell Suppressio Problem o Tabular Data with Liear Costraits, Maagemet Scieces, vol. 47, pp , [40] M. Fischetti ad J.J. Salazar, Partial Cell Suppressio A New Methodology for Statistical Disclosure Cotrol, Statistics ad Computig, vol. 13, pp , Haibig Lu received the BSc ad MSc degrees i mathematics from Xi a Jiaotog Uiversity, Chia, i 2002 ad 2005, respectively. He is curretly workig toward the PhD degree i iformatio techology i the Maagemet Sciece ad Iformatio Systems Departmet, Rutgers Uiversity. He was a research assistat i the School of Iformatio Systems, Sigapore Maagemet Uiversity, from 2005 to His research iterests iclude data security, data miig, access cotrol model, ad optimizatio. Yigjiu Li received the PhD degree i iformatio techology from George Maso Uiversity i He is curretly a assistat professor i the School of Iformatio Systems, Sigapore Maagemet Uiversity. His research iterests iclude applicatios security, privacy protectio, ad data rights maagemet. He has published 39 techical papers i the refereed jourals ad coferece proceedigs. He is a member of the ACM ad the IEEE.. For more iformatio o this or ay other computig topic, please visit our Digital Library at