Fast, Secure Ecrypto for Idexg a Colum-Oreted DBMS Tgja Ge, Sta Zdok Brow Uversty {tge, sbz}@cs.brow.edu Abstract Networked formato systems requre strog securty guaratees because of the ew threats that they face. Varous forms of ecrypto have bee proposed to deal wth ths problem. I a database system, there are ofte two cotradctory goals: securty of the ecrypto ad fast performace of queres. There have bee a umber of proposals of database ecrypto schemes to facltate queres o ecrypted colums. Order-preservg ecrypto techques are well-suted for databases sce they support a smple, ad effcet way to buld dces. However, as we wll show, they are secure uder straghtforward attack scearos. We propose a ew lght-weght database ecrypto scheme (called FCE for colum stores data warehouses wth trusted servers. The low decrypto overhead of FCE makes comparsos of cphertexts ad hece dexg operatos very fast. Sce t s hard to use classcal securty deftos cryptography to prove the securty of ay exstg symmetrc ecrypto scheme, we propose a relaxed measure of securty, called INFO-CPA-DB. INFO- CPA-DB s based o a well-establshed securty defto cryptography ad relaxes t usg formato theoretc cocepts. Usg INFO-CPA-DB, we gve strog evdece that FCE s as secure as ay uderlyg block cpher (yet more effcet tha usg the block cpher tself. Usg the same securty measure we also show the heret securty of ay order preservg ecrypto scheme uder straghtforward attack scearos. We dscuss dexg techques based o FCE as well.. Itroducto Typcally a DBMS provdes two ways to acheve securty: access cotrol ad data ecrypto. Of the two, access cotrol s a relatvely older way to protect sestve data. However, access cotrol by tself s ot suffcet. A adversary who gas access to the database fles ca access sestve data, thus, bypassg the access cotrol mechasm. As a result, t s ecessary to ecrypt data the DBMS. Ecrypto s well studed cryptography. However, whe used a DBMS, the tradtoal securty deftos ad propertes of classcal ecrypto schemes have a cosderable performace mpact o queres o ecrypted data. Frst, stadard deftos of securty cryptography [7,8,3] do ot allow cphertext values to reveal ay formato about the platext values, cludg the relatve order formato betwee ther correspodg platexts. Ths mples that eve comparsos have to go through decrypto frst. Secod, ecrypto ad decrypto of exstg cryptographc schemes have hgh CPU cost. Aalogous to dsk I/O, eve though the speed of moder symmetrc ecrypto schemes s mprovg, t remas costly for the database CPU. Therefore, data ecrypto sgfcatly slows dow query processg. For example, evaluatg predcates that referece a ecrypted colum would geerally requre a expesve decrypto step. DBMS-specfc ecrypto schemes that perform well for queres, but that preserve securty are, thus, very desrable. Oe state-of-the-art techque s order-preservg ecrypto (e.g., OPES []. Such a scheme supports drect comparso of cphertexts, whch allows us to buld dces support of rage queres. However, as we wll show, such schemes are heretly ot secure uder straghtforward attack scearos. I ths paper, we propose a effcet lght-weght database ecrypto scheme (called FCE, whch comparsos ca be doe wth partal decrypto (Early Stoppg. FCE uses ay block cpher to ecrypt oly a few bytes of radom seeds each page of the database, ad uses lghter-weght computato to ecrypt the actual data a page. The low overhead of FCE eables effcet comparso ad, therefore, effcet dexg o the cphertext. We preset evdece regardg the securty of ths scheme. There have bee a few proposals of homegrow ecrypto schemes for database systems for the purpose of fast search [,,9,5]. But how secure are these schemes? We stress that the mportace of securty caot be uderestmated, because after all, hdg the formato of data s the goal of usg ay ecrypto to beg wth (otherwse the usage of t would ot exst. Let us look at a specfc example. Suppose a mortgage compay uses a customer table wth schema Customer (ame, age, address, loa type, et assets. Assume that oly et assets s sestve. So t s more effcet to ecrypt oly sestve colums whle leavg other colums the clear. Ths way, fetchg results of a o-sestve colum, say age, does ot requre decrypto. For example, Oracle [4] provdes colum-level ecrypto. I ths paper we assume ths usage scearo. Cosder usg ay order preservg ecrypto scheme (e.g., [] to ecrypt et assets whle leavg other colums the clear. The, a adversary who has access to the database fles ca dscover the relatve order of the et assets values betwee two customers detfed by other attrbutes. Ths pheomeo of securty of ewly proposed database ecrypto
schemes s farly commo. The bucketg approach of [9] reveals value rage correlato betwee colums. For example, a adversary could dscover that the records that have values the same bucket o oe colum are very lkely to have values the same bucket o aother colum. Colum value dstrbutos ca also be revealed as poted out by []. Smlarly, colum value dstrbutos ca be revealed by other schemes such as summato of radom umbers [], or by polyomal fuctos [5]. The securty of a database ecrypto scheme must be examed more carefully. Ufortuately t s hard to prove the securty of ay symmetrc ecrypto scheme used databases today usg the establshed securty deftos cryptography [7,8,3] (publc key ecrypto schemes, whch are provably secure, are slower ad geerally ot used databases. I lght of ths fact, we propose a relaxed measure of securty based o the Real-Or-Radom defto [3] cryptography. The relaxato uses the cocept of etropy, ad gves strog evdece that FCE s as secure as ay block cpher that t uses for ecryptg the radom seeds (a few bytes each page header the database. We also show the securty of ay order preservg ecrypto scheme usg ths same measure. As ay other ew scheme, besdes the theoretcal aalyss, the securty of FCE eeds to stad the scruty of cryptaalysts as well as the test of tme. But those are beyod the scope of ths paper. FCE s specfcally talored to database systems the followg ways: Comparso s fast, whch facltates the search of dces. We show ts securty usg INFO-CPA-DB, a securty measure defed the database cotext. Secret radom fuctos are stored at the database page level, whch correspods to the ut of I/O. The rest of the paper s orgazed as follows. We frst dscuss the threat model Secto. Secto 3 presets our ew database ecrypto scheme: FCE. We also show a varat of t, r-fce, whch wll be used for aalyss. Secto 4 dscusses our securty measure. I Secto 5, we provde a detaled aalyss of the securty of ay orderpreservg scheme, ad of FCE. The, Secto 6, we dscuss how dexg works wth FCE. Secto 7 gves our expermetal performace results. We preset related work Secto 8 ad summarze the paper Secto 9.. Threat Model We separate the ssue of the securty of the commucato chael betwee clet ad DBMS server, ad the ssue of the securty of the o-dsk data [],.e., data securty the storage system of the server ( dvde ad coquer approach. We assume the database software s trusted, partcular, the adversary does ot have access to the values the memory of the database software, ad the database software s trusted to decrypt colum values to evaluate predcates o ecrypted colums (otherwse effcet processg by the server o ecrypted data s ot possble ayway. See Fgure for the system model. Clet Commucato may be protected Trusted DBMS Server Sestve Data Ecrypted O Dsk Fgure : Dvde ad coquer approach for DBMS securty. I ths work, we specfcally am at ecrypto to esure the securty of o-dsk data. We leave ope the securty of the commucato betwee the clet ad server as a separate orthogoal ssue, whch ca be protected by a tradtoal symmetrc key ecrypto scheme f eeded. Our threat model ad protecto goal are also cosdered by [], whch s ot a surprse, as both cosder effcet processg of ecrypted data by the server. 3. The New Ecrypto Scheme 3. C-Store: A Colum Oreted DBMS We wll aalyze the usage of the ewly proposed ecrypto scheme the cotext of a ope-source colum-oreted DBMS called C-Store [9]. C-Store s a read-optmzed relatoal DBMS. The most salet dffereces betwee C-Store ad a tradtoal row-store system are that C-Store orgazes data by colum rather tha by row, ad that t makes heavy use of sortg ad compresso [8]. 3. Fast Comparso Ecrypto (FCE 3.. r-fce Algorthms We frst cosder a verso of FCE based o radom permutatos, hece the ame r-fce. C-Store stores the values of a colum together a set of pages. Suppose we somehow assocate wth each (ecrypted data page a truly radom permutato whose sze s the same as the page sze ( bytes. Let the page sze be P bytes (for C-Store, P=B. Thus each page (of the ecrypted colum s assocated wth oe of the P! radom permutatos. To represet the radom permutato, at least log(p! radom bts are eeded. Whe P=, usg Strlg s formula [3], we ca compute that each page eeds log (! 95437 radom bts to represet the permutato. Ths s urealstc. Hece, r-fce s a dealzed verso. But we use r-fce just as a termedate scheme, purely
for the purpose of formato theoretc aalyss. I secto 3.., we ll gve a actual FCE scheme that uses k-wse depedet fuctos, ad we ll argue that t s computatoally dstgushable [7] from r-fce. r-fce s a symmetrc key ecrypto scheme for a DBMS. As C-Store, we assume we are ecryptg a whole page of data values of some colum. Let s deote the key by K, ad ts bt-legth by K. A typcal value s 3Kb ( K here followg a umber deotes a ut of 4, ot to be cofused wth the secret key K. It should be clear from the cotext. The key geerato algorthm s smply to geerate a radom K bt umber. We ext descrbe the ecrypto algorthm for a page of platext values. Ecrypto Algorthm Iput: ecrypto key K (oe key for the etre database, a page of platext values (P bytes, ad a radom permutato (fucto assocated wth the page (oe for each page perm :{,.., P} {,.., P}. Output: a page of cphertext values. We cosder ad ecrypt each byte of the page separately: For = to P, ( Let d = perm( mod K whch s clearly the rage of [, K -]. d ( The the cphertext byte c of the platext byte b of the page s smply the btwse XOR of byte b ad the byte startg from the d th bt of K (See Fgure. If d falls o the last 7 bts of K, wrap aroud ad use both the edg bts ad the startg bts of K to form a pad byte. For example, f d = K, the the pad byte s the last two bts of K cocateated wth the frst 6 bts of K. Example: Let s say we are ecryptg the 3 th byte of a page, ad the platext byte value s bary. As C-Store, let K =3768, ad P=65536. I step ( of the ecrypto algorthm, the 3 th byte gets a radom permutato value d = perm(3 mod 3768, where perm s the radom permutato assocated wth the page. Suppose perm ( 3 = 33466, so d = 33466 mod 3768 = 698. Next, 8 bts used for pad Fgure : Usg d to fd oe byte K as a pad K step ( of the algorthm, we fd the byte value startg from the 698 th bt of the key K. Assume the byte ( K s. The the cphertext byte s smply the XOR of ths key byte ad the platext byte: =. The decrypto algorthm s the reverse of the ecrypto. The detals are omtted, as t s farly easy to derve. 3.. The FCE Algorthms for C-Store The oly dfferece betwee FCE ad r-fce s that FCE, we replace the radom permutato perm assocated wth each page by a secret k-wse depedet fucto [] (formally, t meas ay k pots of the fucto are completely depedet. Specfcally, we ca use a 4-wse depedet fucto famly (.e., k=4; we ll expla the reaso Secto 5 step ( of the ecrypto algorthm. We use a rather atural ad effcet costructo of a 4- wse depedet fucto famly, amely, radom polyomals of degree k- (where k=4 as descrbed []: 3 p ( x = ax + bx + cx + d where a, b, c, d, x [, P ]. We ow descrbe the mplemetato C-Store where the page sze s B (hece the doma [, 6 ], ad 5 5 K = (hece mod FCE algorthms. We eed oe such (secret radom polyomal per page, whch meas we eed four (secret radom values a, b, c, d per page, totalg 64 bts. Therefore, we ca store a radom 64-bt seed at the page header (for a B page, a 64-bt seed s certaly acceptable, ad use a block cpher (say DES to get a 64-bt actual (secret seed from the orgal seed, ad splt t to get a, b, c, d. We wll use the same block cpher key (e.g., a 64-bt DES key for every page of the database, ad a dfferet seed for each page of the database. The key of the block cpher ad the ecrypto key K descrbed above together form the secret key of FCE. Observe that step ( of the ecrypto algorthm, the fucto p (x s appled o each byte posto of the page, whch meas the set of put values ([, 6 ] are the same across pages. We therefore ca pre-compute x 3 ad x values for each byte posto, ad use them uversally for ay page. Thus a evaluato of the radom polyomal fucto smply volves 3 multplcatos ad addtos ad s very effcet. For example, ths s fewer tha CPU cycles per ecrypted byte o TMS3C6 [4, ], cosderg both computato ad possble cache mss cost. The detaled aalyss s the full verso of the paper, due to space costrats. I cotrast, a DES mplemetato o the same processor eeds 3 to 5 cycles per byte [6]. Note that we ca crease the umber of radom bts for more securty.
The comparso of two cphertext values starts from the most sgfcat byte (assumg ths ca be kow from the value type ad proceeds byte by byte from left to rght. It s essetally a Early Stoppg (ES partal decrypto of the two cphertext values. The procedure stops as soo as a byte dfferece s foud. Ths s feasble wth FCE because ecrypto s doe byte by byte, whereas other block cphers (e.g., DES t s doe a ut of 8 bytes or more. FCE uses a block cpher as a subroute to ecrypt oly a small umber of bytes per page. The ecrypto of the remader of the data o the page s very lght-weght. I FCE, comparg two cphertext values ad comparg a cphertext wth a platext value are very smlar, as both work the same maer startg from the most sgfcat byte. As a result, jog two FCE ecrypted colums ad jog a ecrypted wth a o-ecrypted colum wll work smlarly. We ll dscuss Secto 5. that ths s ot the case wth OPES. C-Store s read-optmzed ad targets data warehousg applcatos [8]. I such a system, a UPDATE s rare ad s of less cocer. Updates are appled batch, rather tha cremetally. Durg batch updates, a fresh radom seed s geerated for a page that uses FCE, to esure securty. 4. The Securty Measure It would be best to prove the securty of our scheme accordg to a establshed defto. Ufortuately, the fact s that o symmetrc ecrypto scheme s provably secure that regard. We therefore propose a relaxato of a exstg securty defto cryptography, the socalled Real-Or-Radom defto [3]. The relaxato gves a formal securty measure of a ecrypto scheme. We use etropy, whch s a basc cocept formato theory [5] that gves a uversal measure of radomess. The etropy bts of a dscrete radom varable X s gve by H ( X = Pr( X = xlog Pr( X = x x where the summato s over all values x the rage of X. We assume the threat model defed Secto ad that ecrypto s specfed per attrbute. Observe that a colum-wse storage s most fredly to such selectve ecrypto. Noetheless, ths defto ca be easly exteded for cases that must ecrypt every colum of the table. We frst descrbe the tuto behd the securty measure. A ecrypto scheme s secure f the adversary caot dstgush the cphertexts of ay two (equal legth messages (.e., platext values. I tur, the scheme s secure f the adversary caot tell apart the cphertext of ay Real message ad that of a equal-legth Radom message (By trastvty, oe caot tell apart the cphertext of ay two real messages. Namely, ths s exactly what the Real-Or-Radom defto requres. To capture the oto of ay message, we smply let the adversary (to her advatage arbtrarly choose ay message. The more power we gve to the adversary (ad f we ca stll demostrate certa securty codtos are met, the more secure the system s. Such a oto a securty measure s termed Chose Platext Attack (CPA [7,3]. I our securty measure (INFO-CPA-DB, we add a player (Guard of the cryptosystem to the game. The Guard has to come up wth radom messages, whch uder a legally geerated key, ecrypt to the exact same cphertext of the messages that the adversary has chose. Cosequetly, ot kowg whch key s actually used, but just seeg the cphertext, o oe ca tell whether t was from a real message or from radom garbage. The securty measure leaves space for certa mperfectos of the cryptosystem by measurg the etropy of the (supposedly radom strg that the Guard comes up wth. The closer t s to the maxmum etropy, the more radom t s, ad hece, the more secure the scheme s. I other words, we use etropy as a metrc that measures how far the system s from beg perfectly secure. See Fgure 3. Player A (Adversary Player G (Guard Defto [INFO-CPA-DB] Let SE = ( κ, ε, D be a symmetrc ecrypto scheme used for a DBMS (the three parameters are key geerato, ecrypto, ad decrypto algorthms respectvely. A relatoal table T that has sestve formato cludes two colums: ID, whch s the prmary key, ad MSG, whch s ecrypted. Cosder a game betwee player A (Adversary ad player G (Guard of the cryptosystem. ( Frst, the key geerato algorthm s ru for A ad A gets key K. To her ow advatage, A arbtrarly geerates q records for the table T (.e., A chooses q ID values ad MSG platext values ( d, m ( q. A ecrypts the MSG values usg her key K, ad the table data s stored o dsk. Let q m ad c = ec( m, K. The database fles (but = = m m ot K are passed to G. ( G, wthout kowg K, tres to come up wth a smulato scrpt that would create the exact same database fles: G eeds to ru the key geerato algorthm to get K, ad come up wth a sequece of radom messages ( q such that m = m m Geerate K; Ecrypt Geerate K ; Ecrypt Judge: How close s ths to a radom strg? Fgure 3: A metal game betwee A & G INFO-CPA-DB defto. c
ad c = ec( m, K for all q. I other words, usg K to ecrypt the radom messages, together wth the same ID values used by A must produce the exact same fles that A passes to G. Let m = m m m... q (where s bt-strg cocateato. (3 The success of G (ad the securty of SE s measured by how close m s to a uformly radom bt strg, specfcally, how close H (m s to. The we say that SE s ( H ( m, secure. Clearly, the most secure scheme would be (, secure. Usg etropy, INFO-CPA-DB relaxes the Real-Or- Radom defto, ad s a cotuous measure of securty. We ote a caveat here that the exact relatoshp betwee the amout of etropy ad the amout of resources ecessary to break the scheme s ukow, ad s left as future work. 5. Aalyss of Order Preservg Ecrypto Schemes ad of FCE 5. Aalyss of Ay Order Preservg Scheme Uder a order-preservg ecrypto scheme, cphertext values preserve the order relatoshp of the correspodg platext values. It s deal for query performace sce comparsos ca operate drectly o cphertext, savg the cost of expesve decryptos. A state-of-the-art orderpreservg ecrypto scheme s OPES []. As expected, B- tree dces ca be bult ad used o ecrypted colums as well. However, as we wll show, ay order preservg ecrypto scheme s heretly ot secure uder the commo usage scearo that oly a subset of the colums s ecrypted, whch s our assumed usage model. Itutvely, ay order-preservg scheme reveals the order of colum values betwee records. Ths formato may be qute sgfcat. We have see a example Secto. Further, f the adversary somehow kows oe or more platext values of the colum, he or she ca arrow dow the possble rage of other values. Thus, orderpreservg schemes are proe to ferece attacks. To be cocrete, our example f we kow Alce ad Charles bracket Betty, ad we have sde formato about the assets of Alce ad Charles ($M ad $.M, the we have a good estmate for Betty s assets. [] uses percetle exposure as the securty measure. However, that oly tells f the scheme hdes the colum value dstrbuto (whch a adversary mght already kow to beg wth. Securty va ecrypto must hde much more tha that. Now we formally aalyze the securty of orderpreservg ecrypto schemes usg our formato theoretc securty measure. Recall that the dea of our defto s to gve the adversary (player A advatage ad freedom to choose a arbtrary set of platext, ad the player G (Guard of the cryptosystem eeds to respod wth a set of radom message that (uder some key ecrypts to the same cphertext. Itutvely, ths defto rules out the securty of ay order-preservg scheme, because f the adversary chooses two values m < m, the ther cphertext must satsfy c < c. Ad whatever radom messages ( r, r the Guard comes up wth must satsfy r < r, whch makes r r as a whole ot radom. Theorem that follows s based o ths observato ad says that there s a set of messages (whch the adversary may choose that leaves the Guard othg but oe choce of platext that s the same as the adversary s, ad hece s ot at all radom. So the etropy s, ad the order preservg ecrypto s secure. Theorem : Cosder the INFO-CPA-DB defto of securty for a order preservg ecrypto scheme. I the game, there exsts a strategy for player A, such that whatever player G s strategy s, t holds that H ( m =. Proof: Let us gve such a strategy for player A. We smply let platext be fxed-legth bt strgs, of legth l bts. We ca thus represet the platext doma as [, l ], that l order,.e., < < <... <. Note that A has the freedom to choose l such that l s a reasoable umber ad A ca fll the table wth ths may records. A s strategy s to fll the table wth l records, where the MSG colum values are dstct ad creasg order (as the ID colum. Now, because the ecrypto s order preservg, the cphertext values must also be dstct ad creasg order. Player G, gve the cphertext, has to create a smulato scrpt ad come up wth radom MSG values l m ( wth the same legth ( l bt, whch ca ecrypt to the same set of cphertexts wth G s key K. Aga due to the order-preservg property, t must be that m < m < m 3 <... < m l. Clearly, due to the platext l doma, t must be that m = (. I other words, l there s oly oe possblty for m (. Therefore, H ( m =. Theorem verfes our observato that ay order preservg scheme s heretly secure. Put aother way, the INFO-CPA-DB defto protects us from attacks based o the order-preservg property of the ecrypto. OPES assumes that the adversary does ot have pror formato about the value dstrbuto. But realty, may applcatos may have a colum of sestve data whose dstrbuto s well kow by the adversary, or whose dstrbuto ca be easly guessed (e.g., f there are oly a lmted umber of probable dstrbutos. Therefore, OPES caot be used these cases. O the other had, t should be oted that oce platext values are ecrypted, ulke FCE, OPES ca be used a utrusted server evromet, where decrypto s ot a opto.
Further, OPES requres that the dstrbuto s well kow to the database (e.g., whe a large amout of data already exsts, before the key ca be geerated ad ecrypto ca happe. If data updates chage the dstrbuto, ths process has to be repeated. For applcatos, a lot of tmes the colum data s dstrbuto s upredctable before ecrypto s requred, ad may chage over tme. A complete recodg costs too much. Also, cosder the JOIN operato o two OPES ecrypted colums of two tables. Most lkely, the two colums do ot have the same dstrbuto, whch meas they are ot drectly comparable. Coverso from oe sde to the other must be carred out whch volves expesve decryptos ad/or ecryptos. Overall, the most severe problem wth ay order preservg ecrypto scheme s ot ts usage lmtatos, but the heret securty problem. 5. Securty Aalyss of r-fce We ow show that r-fce s deed secure. I the game defed by INFO-CPA-DB, player A s the adversary, ad we play the part of the player G (Guard. A pcks a sequece of platext messages (ad ID s totalg bts, the ecrypts them usg her key K uder r-fce. The resultg fles o dsk are haded to G. G ow eeds to create the smulato scrpt. Our strategy for G s to smply call the key geerato algorthm to geerate a key K, decrypt the cphertext values o dsk usg K (ote that the cphertext values were ecrypted usg A s key K. Durg the decrypto, a fresh radom permutato for each page (ad hece d values for each cphertext byte s obtaed as the ecrypto r-fce. Let the resultg platext values be m. Now we try to obta a lower boud of H (m, the etropy of m. To compute the etropy of m, we frst eed to uderstad the radom factors that determe the dfferet outcomes of m. I the game of the INFO-CPA-DB defto, A passes to G a sequece of cphertext values, whch we deote as c. G apples a radomzed decrypto algorthm to c. So, c s fxed. There are two probablstc factors that determe the value of m : The radom permutatos for each page, whch derve the bt offsets to the key for decrypto of each byte. The key K, whch determes what bts are actually XOR ed wth c to get m. We frst cosder the effect of the radom permutatos. Suppose a radom K (the secod radom factor s fxed. The process of decryptg each byte of c uses some radom d value (determed by the permutato to get 8 bts from K ad XOR s them wth the byte of c. Two d values may result the same sequece of 8 bts for the XOR. We put the 3K d values ( to 3K to 8 = 56 groups, such that each group correspods to a dstct sequece of 8 bts (let s call t a pad byte from ow o. We cosder radom varables X ( that are the cardaltes of each group. We wat to compute the umber of uque assgmets of pad bytes for a B page. Ths s the same as the umber of ways to wrte 56 umbers, X tmes respectvely, o a board that ca hold umbers. So the umber of uque assgmets a page (resultg uque m values s. There are pages, totalg = [( X 8 ( [(X = uque assgmets, each wth equal probablty (ote that the equal probablty property wll greatly smplfy the computato of etropy, resultg uque m values. Now we compute a lower boud of H (m. Resortg to codtoal etropy [5], we have: H ( m > H( m X, X,..., X = = = x, x,..., x x, x,..., x Pr( X ( log Pr( X ( E ( log = x, X = [(x = x,..., X = = x,..., X = x [(X = x log( = We ca approxmate ths boud by H ( m > log [ ( E(! ] = X ( = [(x [(x ( = [(x For ow we use ths approxmato, ad later we use Cheroff bouds [3] to show that wth hgh probablty the actual boud wll be very close to ths oe, so ths s deed a good approxmato. What we have doe s to smplfy the problem of computg H (m to gvg a lower boud usg the codtoal etropy, codtog o the secod (harder probablstc factor. All that remas s to compute E( X (. Let X be the cardalty of the group wth pad byte value. Cosder 3K radom varables Y ( 3K satsfyg f d = gves a pad byte value =. Y = f d = gves a pad byte value Therefore, Pr ( Y = = ( 3K 8 due to the radomess of K. The, ( (
E( Y = ( 3K. We also have 8 X 3K = Y = (3 ad from the learty of expectato, we have 3K E( X = E( Y = = 8 8 3K = Because key K s uformly radom, the from symmetry, we have E( X = 8 (. From all the above, we ca compute the lower boud of H (m : H ( m > log = = log (56! ( E( X! We ca use Strlg s Formula [3] to compute log(! ad log 56!. Fally we get the lower boud value: H ( m >. 9974. Recall that from ( to ( we used a approxmato. We ca use Cheroff bouds, uo boud [3] (whch bascally says Pr ( A or B Pr( A + Pr( B, ad the costrat to show that wth hgh probablty ( deed s X = 3K = very close to (. We omt the detals due to space costrats. They are the full verso of the paper. The lower boud result (.9974 dcates that H (m s very close to, or other words, m s very close to a uformly radom bt strg. Ths gves us great cofdece the securty of r-fce accordg to the INFO-CPA-DB ad Real-Or-Radom deftos. However, the mssg etropy mght be a cocer for a applcato that requres strct securty. We leave the problem of aalyzg the effect of the leak as future work. We ca also obta a geeral lower boud of etropy as a fucto of page sze P ad key sze K, whch dcates that a bgger page sze or key sze mples more securty, but hgher overhead. The detals are the full verso. From the securty aalyss of a order preservg ecrypto scheme ad FCE, we ca see that to show somethg s secure, we gve a smulato scrpt or strategy for player G. To dsprove the securty of some scheme, we gve a strategy for player A. 5.3 Coecto Betwee FCE ad r-fce As we metoed earler, r-fce uses deal radom permutatos. So we have to use cryptographc techques to realze t. We have troduced the FCE scheme Secto 3... Most cryptographc techques are based o the computatoal dstgushablty [7] framework. Iformally, t meas that gve reasoable resources 56 (e.g., probablstc polyomal tme, oe caot dstgush betwee two dstrbutos. We use A B to deote that dstrbuto A s computatoally dstgushable from dstrbuto B. r-fce: Radom permutato mod m (famly Radom fucto famly Pseudo-radom fucto famly FCE: k-wse depedet fucto famly Fgure 4: From deal to realzato, a road coected by computatoal dstgushablty. As we show Fgure 4 (omttg the proofs here, r- FCE uses a radom permutato retur value mod K (as the d values to probe to the key. Ths s computatoally dstgushable from a radom fucto famly whose fucto has the same doma as the permutato, but has the rage of {,, K -} (provded that the permutato sze s a multple of K. I tur, ths radom fucto famly s computatoally dstgushable from a pseudoradom fucto famly (wth the same doma ad rage. I the fal step of Fgure 4, the FCE scheme uses a k-wse depedet fucto famly. Hoory et al. [] dscuss the motvato for uderstadg the relatoshp betwee k-wse depedece ad pseudo-radomess. They preset a educated cojecture that 4-wse depedece suffces to acheve cryptographc pseudo-radomess. FCE bulds o ths by usg a 4-wse depedet fucto famly. We choose a rather atural ad effcet costructo of a 4- wse depedet fucto famly: radom polyomals of degree k- (where k=4 as descrbed []: 3 p ( x = ax + bx + cx + d where a, b, c, d, x [, P ]. Clearly a hgher k value the k-wse depedet fucto results more securty, but hgher cost. I summary, we ed up wth a FCE scheme that s computatoally dstgushable from r-fce, whch we have proved to be formato theoretcally secure. Therefore, we combe the cocepts of formato theoretc securty ad computatoal securty. We ca aalyze the securty of some other ecrypto schemes usg our securty measure. It s ot hard to show that the deal (ad mpractcal schemes of Oe-Tme-Pad ad the CTR scheme usg a radom fucto [3] are both (, secure, ad DES s (56, secure. We omt the aalyss due to space costrats. Whle our aalyss Secto 5. seems to suggest that r-fce s more secure tha
a block cpher (e.g., DES, FCE, ulke r-fce, uses a block cpher to ecrypt 8 bytes per page to obta the a, b, c, d values. Thus the securty of FCE s bouded above by the securty of the (subroute block cpher. 6. Idexg wth FCE I ths secto, we descrbe dexg o FCE ecrypted data. We wll be exclusvely talkg about wdely used tree dexg (e.g., B+ trees, although FCE s also applcable to hash dexg, etc. Each page of the B+ tree wll be lad out as usual o dsk, except that each page wll have a 64-bt seed at the top ad parts of the page wll be ecrypted usg FCE. For a teral ode of a B+ tree, we oly ecrypt the key values. We leave poters (to other dex odes the clear. For a leaf ode of a B+ tree, we ecrypt both the key values ad the record IDs. We eed to ecrypt record IDs, because otherwse from the leaf odes, the order of the records could be ferred (whch s exactly the problem wth OPES. Effcet comparso betwee key values s the ma challege of dexg ecrypted data. Therefore, we focus o how ths works uder FCE. Typcally we are cocered wth searchg for a platext key value the ecrypted B+ tree, as ths s what we eed to do processg a query. The comparso we do s betwee a platext ad a cphertext value, whch, as we dscussed Secto 3.., s ot much dfferet from comparg two cphertext values. As usual, the tree traversal starts from the root, ad uses our specal comparso method. Recall that we have the Early Stoppg (ES mechasm for comparsos. Observe that ES s more effectve as the search s closer to the root of the B+ tree (upper levels, sce t s more lkely that the comparso s betwee two values wth a bg dfferece. As the search approaches the leaf level, key values approach the target value, ad more byte comparsos are eeded. Note that classcal dex key compresso methods (o the platext key values, whe appled, stll work as usual, ad fact help ES, as redudat leadg bytes are lkely to be compressed, whch further saves the CPU cost for decrypto. Uless the whole table s ecrypted, a clustered dex s geeral ot feasble wth a ecrypted colum. Ths s because the order of the cphertext values would be revealed by meas of assocato wth other colums otherwse (the same reaso as eedg to ecrypt record IDs at leaf odes. Ths s ot specfc to FCE ad s uversal for ay ecrypto method. As a cosequece, a sparse dex s also ot feasble geeral. Wth aother classcal ecrypto method, such as DES, B+ tree dexg s stll possble prcple. The dffereces are: It has bgger ecrypto blocks (e.g., 8 bytes for DES, hece a tree traversal may decrypt more tha eeded. Its mmum ut of decrypto s larger (e.g., 8 bytes for DES, so oe has to perform more decrypto all alog the search path. The decrypto has more overhead tha FCE. As a result, t s less effcet to buld a dex wth classcal ecrypto methods. The performace comparso wth dexg usg DES s further coducted the ext secto. 7. Expermets We have metoed that securty has to be show by aalyss/proof ad demostrated t for FCE Secto 5. I ths secto, we study the followg performace ssues through expermets: ( How much overhead does FCE decrypto/ecrypto have the database cotext? ( How much does the Early Stoppg mechasm help the dex search? (3 FCE has a small ecrypto block of oe byte, compared to eght bytes of, say, DES. Combed wth Early Stoppg, what performace mpact does ths have o varous kds of queres (e.g., whether or ot the dex s coverg for a query? 7. Setup We have mplemeted the FCE scheme, ad B+ tree dexg o FCE ecrypted colums, as well as DES ecrypted colums. We exteded the code to support DES ad FCE ecrypto C-Store o Deba Lux. We use the crypto lbrary OpeSSL.9.8b for DES. We use DES as the uderlyg block cpher of FCE, whch s used to ecrypt the 8-byte seed o each page. We have also mplemeted sort merge JOIN, whch was ot avalable C-Store before. The algorthms were mplemeted C++. The expermets were ru o a Lux workstato wth a AMD Athlo-64 Ghz processor, 5 MB memory ad a Samsug HD6JJ dsk. 7. Overhead of FCE Our frst expermet compares the retreval overhead of FCE wth DES ad uecrypted data. I ths expermet, we select a sgle teger-valued colum from a database. More precsely, we sequetally sca all the B data pages, each cotag 8K ecrypted 8-byte <colum value, recordid> pars. All FCE rus clude the cost of geeratg a, b, c, ad d values for each page wth DES. The fle cache was warm these expermets. Fgure 5 shows the retreval cost per tuple. The FCE- le shows what happes whe we oly decrypt the colum value, but ot the record ID (sce we are selectg a sgle colum. Wth DES, ths case, we eed to decrypt both the colum values ad the record IDs, as the DES block sze (8 bytes s larger tha the colum value sze. (Some block cphers requre eve bgger block szes. Therefore, Fgure 5 DES s decryptg twce as much cphertext as
FCE-. To compare DES ad FCE whe both decrypt the same amout of data, we cluded rus where FCE decrypts the record IDs as well. Ths s show by the FCE- le Fgure 5, whch s slghtly faster tha DES. For both FCE ad DES, the ecrypto cost s about the same as decrypto, so we do t clude those measuremets here. For a arbtrary data type, the dfferece performace betwee DES ad FCE for sequetal scas wll fall somewhere betwee the relatve performace of DES ad FCE- or DES ad FCE-. The reaso s that we eed to decrypt a dfferet amout of extra cphertext depedg o the sze of a data value, especally f t s a varable sze data type. DES has bee aroud for almost 3 years, ad we beleve that ts OpeSSL mplemetato has bee carefully tued. Our mplemetato of FCE s ot hghly tued, but s already outperformg DES. There s a good chace that wth tug, FCE ca be eve faster. Ths ca also be see from the cycle cout comparso o the TMS3C6 processor, Secto 3... 7.3 Idexg wth FCE for Rage Queres I ths secto, we look at the performace of a smple SELECT query that has a predcate o the ecrypted colum: SELECT COUNT(* FROM t WHERE c>? AND c<?, where c ad c are a teger-valued colums. The query pla uses B+ dexes to fd the record IDs that satsfy the rage restrctos. The frst step s to traverse the dex to fd the smallest value that satsfes the rage restrcto, the vst each subsequet value the leaf odes utl t fds the smallest value that does ot satsfy the rage restrcto. The cout of the umber of satsfyg records s accumulated as the leaves are traversed. The data pages themselves are ot vsted, ad the record IDs do ot eed to be decrypted. Fgure 6 shows the performace of ths query uder varous data szes, but fxed selectvty (5%. We ca see that wth DES ecrypted colums, eve though a dex ca be bult (wth more complex code chages, t s ot as effcet as a FCE-based dex, sce a dex search has to decrypt 8-byte DES blocks o the B+ tree search path. O the other had, comparsos durg a FCE dex search are effcet for three reasos: FCE has lower decrypto overhead. FCE does ot eed to ecrypt ad decrypt poters teral odes, whereas DES may have to, due to the 8-byte block sze. Early Stoppg (ES happes durg comparsos. To evaluate how much savgs ES cotrbutes, we measure the cost wth both ES dsabled ad eabled (FCEES ad FCE-ES, respectvely, Fgure 6. Observe that the effectveess of ES depeds o the value dstrbuto of the colum. If the colum has mostly small teger values (say, all less tha 6, for FCE-ES, the ES s less effectve tha whe the colum values are uformly dstrbuted the rage of [, 3 ] (FCE-ES. Note that for some data types, such as character strgs, t s less lkely to have may commo, redudat prefx bts betwee values ad ES s more effectve. 7.4 Varatos of Queres The ext set of expermets vestgates the performace mpact of havg to fully decrypt the dexed colum at leaf odes for a query. (I the prevous COUNT query, we may ot fully decrypt t due to Early Stoppg. There are at least two cases for a query: The dexes are coverg (.e., oly dexed colums appear the query, thus there s o eed to decrypt record IDs or read the base tables. Therefore oly the keys eed to be decrypted the leaf odes. The dexes are o-coverg, therefore both the keys ad record IDs a dex eed to be decrypted. Example queres correspodg to these two cases are SELECT c from t where c>? AND c<?, ad SELECT c, c from t where c>? AND c<?, respectvely, where teger-valued c s ot ecrypted ad the dex s o the ecrypted teger-valued colum c. I Fgure 7, the les correspodg to these two queres are labeled FCE-cover ad FCE-c, respectvely. We also compare them wth DES ecrypto ad wth the COUNT query of Secto 7.3, whch s show as FCE-cout Fgure 7; the dex s coverg ths case. I all cases, we exclude the cost of retrevg data from the base tables, as that cost s depedet of the ecrypto scheme. Due to ts 8-byte block ut of decrypto, DES always decrypts the keys ad record IDs the leaf odes, ad hece the DES cost s the same for both queres. Observe that these small varatos of queres cause a performace dfferece for FCE. Due to Early Stoppg, FCE-cout, whch oly does comparsos, s faster tha FCE-cover, whch decrypts all of the key bytes at the leaves. I tur, FCE-cover s faster tha FCE-c because FCE has a small ecrypto block sze ( byte ad decrypts oly as eeded (FCE-cover does ot decrypt the record IDs the leaves. I all of the cases, FCE s more effcet tha DES. Tme per tuple (mcrosecods.45.4.35.3.5..5..5 K 5K M 5M M M # of tuples retreved Platext FCE- DES FCE- Fgure 5: Tuple retreval cost.
Query rug tme (secods Executo Tme (secods 7 6 5 4 3 7 6 5 4 3 Platext FCE-ES DES FCE-ES FCE-ES 4M 8M M 6M M # of qualfed records (5% selectvty Fgure 6: Performace of a rage query utlzg a dex bult uder dfferet ecrypto schemes. DES FCE-cout FCE-cover FCE-c 4M 8M M 6M M # of qualfed records (5% selectvty Fgure 7: Performace of slghtly dfferet queres uder DES ad FCE. 8. Related Work The work of [8] (Goldwasser ad Mcal ad [3] (Bellare et al. studed the formal otos of securty for ecrypto. Our formato theoretc measure s based o the Real-Or-Radom (ROR defto agast choseplatext attack (CPA [3]. [, 9, 5, 7] are smlar to our work that they typcally propose a ew scheme of ecrypto such a way that effcet query processg o ecrypted data s possble. Although there are smlartes wth our work [7], ther goal s that a utrusted server caot lear aythg about the platext, but stll ca perform search, whch s oly equalty search. We have a dfferet threat model, ad our goal s to support fast queres a DBMS, partcular, to use dces o cphertext. The dea [9] s to map ecrypted values to buckets for early flterg wthout decryptg the value. The result of the rewrtte query cotas false hts that must be removed a postprocessg step. We have dscussed ts securty problem Secto, ad the performace ad securty problems are also dscussed [, ]. [] proposes a order preservg ecrypto scheme. Although deal for comparso, t has heret securty problems that we have dscussed at legth. 9. Coclusos Ecryptg sestve data a DBMS becomes more ad more crucal for protectg t from beg msused by truders who bypass covetoal access cotrol mechasms ad have drect access to the database fles. Oe must study the securty of a ew scheme a systematc way. I ths paper, we proposed the FCE database ecrypto scheme ad demostrated ts securty ad effcecy for databases. We dscussed dexg ssues wth FCE ad expermetally evaluated the overhead ad performace.. Ackowledgmets & Refereces Ths work was supported by the NSF, uder the grats IIS-8657 ad IIS-35838, ad a gft from Vertca Systems, Ic. [] R. Agrawal, J. Kera, R. Srkat, ad Y. Xu. Order preservg ecrypto for umerc data. ACM SIGMOD 4 Jue 3-8, 4, Pars, Frace. [] G. Bebek. At-tamper database research: Iferece cotrol techques. Techcal Report EECS 433 Fal Report, Case Wester Reserve Uversty, November. [3] M. Bellare, A. Desa, E. Jokp, P. Rogaway. A cocrete securty treatmet of symmetrc ecrypto. I Proceedgs of the 38th Symposum o Foudatos of Computer Scece, IEEE, 997. [4] H. Cho. TMS3C6 Archtecture Overvew. http://cx.org/cotet/m87/latest/. [5] T. M. Cover ad J. A. Thomas. Elemets of Iformato Theory. A Wley-Iterscece Publcato, 99. [6] DES. Data Ecrypto Stadard. FIPS PUB 46, Federal Iformato Processg Stadards Publcato, 977. [7] O. Goldrech. Foudatos of Cryptography. Cambrdge Uversty Press, 3. [8] S. Goldwasser ad S. Mcal. Probablstc Ecrypto. I J. of Computer ad System Sceces, Vol. 8, Aprl 984, pp. 7-99. [9] H. Hacgumus, B. R. Iyer, C. L, ad S. Mehrotra. Executg SQL over ecrypted data the database-servce-provder model. I Proc. of the ACM SIGMOD Cof. o Maagemet of Data, Madso,Wscos, Jue. [] S. Hoory, A. Mage, S. Myers ad C. Rackoff. Smple permutatos mx well. The 3st Iteratoal Colloquum o Automata, Laguages ad Programmg (ICALP, Lecture Notes Computer Scece 34, Sprger, 4, pp. 77 78. [] M. Katarcoglu ad C. Clfto. Securty ssues queryg ecrypted data. The 9th Aual IFIP WG.3 Workg Coferece o Data ad Applcatos Securty. August 7-, 5, Storrs, Coectcut. [] E. Kapla, M. Naor, ad O. Regold. Deradomzed costructos of k-wse (almost depedet permutatos. I APPROX- RANDOM, pages 354 365, 5. [3] M. Mtzemacher, E. Upfal. Probablty ad Computg: Radomzed Algorthms ad Probablstc Aalyss. Cambrdge Uversty Press, 5. [4] Oracle Corporato. Database Ecrypto Oracle 8, August. [5] G. Ozsoyoglu, D. Sger, ad S. Chug. At-tamper databases: Queryg ecrypted databases. I Proc. of the 7th Aual IFIP WG.3 Workg Coferece o Database ad Applcatos Securty, Estes Park, Colorado, August 3. [6] R.S. Pressg. Data Ecrypto Stadard (DES Implemetato o the TMS3C6. I Texas Istrumets Applcato Report, SPRA7, November,. [7] D. X. Sog, D. Wager, ad A. Perrg. Practcal techques for searches o ecrypted data. I IEEE Symp. o Securty ad Prvacy, Oaklad, Calfora,. [8] M. Stoebraker, D. Abad, A. Batk, X. Che, M. Cherack, M. Ferrera, E. Lau, A. L, S. Madde, E. O'Nel, P. O'Nel, A. Ras, N. Tra ad S. Zdok. C-Store: A Colum Oreted DBMS. I VLDB 5, Norway. [9] http://db.csal.mt.edu/projects/cstore/. [] TMS3C6 Cache Aalyss. I Texas Istrumets Applcato Report, SPRA47, September, 998.