International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Cot Model for Selecting Materialized View in ublic Cloud Romain erriot, Clermont Univerité, Univerité Blaie acal, Aubière Cedex, France Jérémy feifer, Clermont Univerité, Univerité Blaie acal, Aubière Cedex, France Laurent d Orazio, Clermont Univerité, Univerité Blaie acal, Aubière Cedex, France Bruno Bachelet, Clermont Univerité, Univerité Blaie acal, Aubière Cedex, France Sandro Bimonte, IRSTEA, Clermont-Ferrand, France Jérôme Darmont, Laboratoire ERIC, Univerité de Lyon, Lyon, France ABSTRACT Data warehoue performance i uually achieved through phyical data tructure uch a indexe or materialized view. In thi context, cot model can help elect a relevant et of uch performance optimization tructure. Neverthele, election become more complex in the cloud. The criterion to optimize i indeed at leat two-dimenional, with monetary cot balancing overall query repone time. Thi paper introduce new cot model that fit into the pay-a-you-go paradigm of cloud computing. Baed on thee cot model, an optimization problem i defined to dicover, among candidate view, thoe to be materialized to minimize both the overall cot of uing and maintaining the databae in a public cloud and the total repone time of a given query worload. It experimentally how that maintaining materialized view i alway advantageou, both in term of performance and cot. Keyword: Cot Model, Materialized View, Optimization, ay-a-you-go, ublic Cloud, Query Optimization. INTRODUCTION Recently, cloud computing, led by companie uch a Google, Microoft and Amazon, attracted pecial attention. Thi paradigm allow acce to on-demand, configurable reource that can be quicly made available with minimal maintenance. According to the pay-a-you-go pricing model, cutomer only pay for reource (torage and computing) they actually ue. DOI: 0.408/ijdwm.204000
2 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 erformance in the cloud uually relie upon the ue of a large number of intance, with parallel computing being tranparent to the uer. Data warehoue and OLA (On-Line Analytical roceing) are technologie for deciion upport enabling the online analyi of large data volume. Thee technologie rely on optimization technique uch a indexe, cache or denormalized logical model that allow multidimenional analyi (aggregation on multiple axe of analyi) while enuring good performance. With the broader and broader availability of cloud, organization tend to deploy data analytic in the cloud to benefit from computing power and cheap torage, and to eliminate maintenance cot. In thi article, we focu on iue related to materializing view in the cloud and it impact on the pay-a-you-go pricing model. Materialized view are ued to phyically tore the reult of relevant and frequent querie to reduce repone time. A major challenge i to elect the bet view to materialize. Traditionally, the criteria ued for view election mainly include torage and maintenance cot (Aouiche & Darmont, 2009; Baril & Bellahene, 2003). In the cloud, torage i virtually infinite, o toring all view could be enviaged. However, materialized view till incur torage and maintenance cot. The performance optimization problem i then to find a trade-off between repone time and cot, and depend on the need and aet of a particular uer. At one end of the pectrum, uer under a hard budget contraint can accept long repone time, while at the other end, uer may diregard cot if they need very fat repone. We addre the multi-criteria optimization problem of electing a et of view to materialize in order to optimize both the budgetary cot of toring and querying a data warehoue in the cloud, and the overall repone time. To achieve thi goal, our main contribution i the deign of cot model for toring, maintaining and querying materialized view in the cloud. Thi article extend our previou propoal (Nguyen, Bimonte, d Orazio, & Darmont, 202) in three way. Firt, we propoe more flexible cot model that can be applied to different vendor. Second, we introduce a new formulation that olve the optimization problem uing a CLEX olver. Finally, our olution i experimentally validated with the Star Schema Benchmar (O Neil, O Neil, Chen, & Revila, 2009). The remainder of thi paper i organized a follow. In Section 2, we provide the bacground information that i ued throughout the paper. In Section 3 and 4, we define cot model for cloud data management and materializing view, repectively. In Section 5, we decribe the optimization proce that i baed on thee cot model. In Section 6, we preent an experimental evaluation and the firt performance analye of our model. In Section 7, we dicu the tate of the art and compare it to our approach. Finally, in Section 8, we conclude thi paper and hint at future reearch direction. 2. BACKGROUND We preent in thi ection the bacground information related to view materialization in the cloud. We firt introduce a imple fictitiou ue cae that erve a a running example throughout thi paper. Then, we decribe different pricing model in the cloud. Finally, we briefly recall the principle of view materialization. 2.. Running Example To illutrate our wor, we rely on a imulated dataet toring the ale of an international upply chain. Buine uer need to analyze the total profit per day, month, and year; and per adminitrative department, region, and country. Our full dataet tore 0 year (2000-200) of ale data. It ize i 500 GB. We run over thi dataet a query worload Q that include uch querie a Q= ale per year and country, whoe proceing time i 0.2 hour. The ize of Q reult i 0 GB. A typical materialized view we may conider to optimize overall repone time i V = ale per month and country, whoe proceing time i 0. hour. The whole et of elected materialized view i denoted V. V ize i 50 GB. Finally, the time to proce
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 3 Q with and without exploiting V are 40 hour and 50 hour, repectively. 2.2. Cloud ricing olicie Cloud Service rovider (CS) upply a pool of reource, uch a hardware (CU, torage, networ), development platform, and ervice. There are many CS on the maret, uch a Amazon, Google, and Microoft. Each CS offer different ervice and pricing. Thi paper relie on limited, yet repreentative enough, model that include the main, commonly billed element, i.e., CU, torage, and bandwidth conumption. Thee model are fully compliant with both relational (Amazon RDS, SQL Azure, Google Cloud SQL) and data intenive ytem (MapReduce, ig, SCOE, Hive, Jaql), a for now, query repone time are conidered parameter of thee model. In order for the reader to have an overview of the pricing policie taen into account in the propoed model, we preent in thi ection an example for both Microoft Azure and Amazon Web Service (AWS). Even if the performance (repone time and torage volume) differ from a ytem to another, identical value will be ued in the example for clarity reaon. The objective of thi wor i indeed not to compare the different provider. Microoft Azure (Microoft, 203) and Amazon Elatic Compute Cloud (EC2) (Amazon, 203) provide computing reource. Different intance configuration can be rented (micro, extra mall, mall, large, extra large, etc.) at variou price, a illutrated in Table and Table 2. For example, the cot for a mall intance (coniting in a.7 GB RAM, EC2 Computing Unit, 60 GB of local torage under Linux for Amazon EC2; and a.75 GB RAM, Computing Unit, 224 GB of local torage under Window for Azure), are repectively $0.06 and $0.09 per hour for Amazon and Azure. Bandwidth conumption i billed with repect to data volume (Table 3). Within Amazon and Azure model, input data tranfer are free, wherea output data tranfer cot varie with repect to data volume. Note that the ame price are applied by both provider. Finally, CS upply torage capabilitie. rice uually vary with repect to data volume. However, a mentioned previouly, Table. EC2 computing price Intance Configuration t.micro $0.02 m.mall $0.06 m.medium $0.2 m.large $0.24 m.xlarge $0.48 rice per Hour Table 2. Azure computing price Intance Configuration Extra mall $0.02 Small $0.09 Medium $0.8 Large $0.36 Extra large $0.72 rice per Hour
4 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Table 3. Amazon and Microoft bandwidth price Input data Any input data Output data Firt 5 GB Up to 0 TB Next 40 TB Next 00 TB Next 350 TB Data Volume Free Free $0.2 per GB $0.09 per GB $0.07 per GB $0.05 per GB rice per Month Table 4. Amazon EBS torage price $0.0 per GB rice per Month CS provide different ervice, the pricing model differing from one to another. Amazon EBS propoe a per intance model, wherea Amazon S3 and SQL Azure enable a global torage. Amazon EBS (Table 4) propoe a fixed price, wherea Amazon S3 (Table 5) and SQL Azure (Table 6) enable an earned rate when volume increae. 2.3. Materialized View In Databae Management Sytem, a view i a virtual table aociated to a query anwer. View help indirectly ave complex querie, format the ame data in different form, upport logical independence, and reinforce ecurity by maing ome piece of data from unauthor- Table 5. Amazon S3 torage price Firt TB Next 49 TB Next 450 TB... Data Volume rice per Month $0.095 per GB $0.08 per GB $0.07 per GB Table 6. Microoft Azure torage price Firt TB Next 49 TB Next 450 TB... Data Volume rice per Month $0.053 per GB $0.049 per GB $0.045 per GB
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 5 ized uer. Materializing a view, i.e., toring it phyically into a table, further help improve repone time by avoiding recomputing the correponding query each time the view itelf i queried. However, materialized view mut be refrehed when ource data are updated, which induce ome maintenance overhead. In thi wor, we aume that we have at our dipoal a et of candidate view for materialization that have already been preelected by an exiting view election method (e.g., (Baril & Bellahene, 2003)). We aim at chooing the bet candidate with repect to the cloud pay-a-you-go model, taing pricing contraint into account before any view materialization. Reearch perpective include extending exiting view election algorithm to conider pricing apect in order to upply a uniform proce. 3. CLOUD RICING MODELS Thi ection preent general cot model for data management in the cloud, i.e., without conidering the ue of materialized view. In cloud computing, cutomer rent reource to a CS to run ome application. Figure recall the cot involved (Section 2.2), i.e., bandwidth conumption for input data tranfer and query reult retrieval, data torage, and application proceing time. Let C c be the um of computing cot, C be the um of torage cot, and C t be the um of data tranfer cot. Then, the total cot C for cloud data management i: C = C + C + C () c t Let u define the general parameter and function that we ue to expre our cot model (Table 7). Let Q = { Q i } i =.. nq be the query worload and A = { A i } i =.. nq the anwer to the querie. The whole dataet i denoted D. Function X ( ) return the ize in GB of X, e.g., ( A i ) ( ) i the ize of the anwer A i. Function t X Figure. Cot involved in cloud data management
6 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Table 7. General parameter arameter Decription Q = { Q i } i =.. nq Query worload A = { A i } i =.. nq Query anwer D = { D } =.. nd Dataet Cloud ervice provider IC = { IC j } j =.. nic Intance configuration ( X) Size in GB of X t ( X) Storage time of X return the torage time of X, e.g., t( D) i the torage time of dataet D in the cloud. 3.. iecewie Linear Function Some cot (tranfer and torage cot epecially) are piecewie linear function. In thi paper, we define a piecewie linear cot function C( x) a a function decompoed into egment (Figure 2). Each egment e repreent C( x) in an e e+ interval of input value [ x ; x [ and i characterized by a gradient a e and an initial cot b e = C( x ). Conidering a egment e e uch that x [ x ; x [, the cot e e + C x c x a x x b ( ) = e ( ) = e ( e ) +. T h e e ( ) i thu ex- piecewie linear function C x preed a follow: Figure 2. iecewie linear cot function
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 7 ( ) = ( ) ( ) + (2) C x c x e = a x x b e e e where e i uch that x x < x +. 3.2. Data Tranfer Cot e Data tranfer cot C t depend on the ize of uploaded data, i.e., querie Q i and dataet D (including the initial dataet and additional inerted data if any), on the ize of downloaded data, i.e., query anwer A i, and on the pricing model applied by the provider. The cot can be decompoed into an upload tranfer cot C t and a download tranfer cot C + t : ( ) = ( ) + ( ) C D, Q, A, t C D, Q, C + A, t t e (3) Note that mot cloud provider, uch a Amazon or Microoft, do not charge for input data tranfer, o input querie, initial and inerted data can be ignored for now. A a conequence, data tranfer cot can be reduced to C A, and thu depend only on A and : + ( ) t C A, C + A, t ( ) = ( ) t (4) The pricing of Azure and Amazon EC2 i variable. It i null for the firt 5 GB. Then, it become $0.2 per GB up to 0 TB, $0.09 per GB up to 40 TB and o on (ee Section 2.2). The cot function C t of uch a pricing policy i a piecewie linear function (cf. Formula 2). Example : In our running example, with 0 GB of bandwidth conumption, x ( A) < x, o data tranfer cot i: 2 3 ( ) = ( ( )) ( ) + = C A, c A t 2 = 0. 2 0 5 0 $ 0. 60 3.3. Computing Cot Computing cot C c depend on the worload Q, the intance configuration IC, i.e., the type (micro, mall, medium, etc.) and the number of node to be ued, and the pricing model applied by provider : C Q, IC, c ( ). Both Amazon and Microoft aociate a price with a type of intance. Each intance may bear variable performance (with repect to it number of CU, it available RAM, etc.), and thu different cot. rovider then compute the price to be paid by intance a the product of uage time by intance price. Finally, they um the reult for all allocated intance. Let u et that querie are executed on an intance configuration IC compoed of n IC computing intance IC j : IC = { IC j } j =.. nic. The cot for renting intance IC j i denoted c IC c ( j). roceing time of query Q i on ( Q, IC ). Then, the intance IC j i denoted t i j proceing cot of running the et of querie Q = { Q i } i =.. nq can be expreed by the following function: C ( Q, IC, ) c n n Q IC = t ( Q, IC ) c ( IC ) i= j= i j c j (5) Example 2: In our running example, let u conider that query worload Q = { Q } i proceed in 50 hour on two mall intance of Amazon EC2. Then, it proceing cot i:
8 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 C Q IC EC c (,, 2) EC 2 EC 2 = t ( Q, IC c IC ) c ( ) EC 2 EC 2 + t ( Q, IC c IC 2) c ( 2) = 50 0. 06 + 50 0. 06 = $ 6 3.4. Storage Cot Storage cot C depend on the ize and torage time of the whole dataet D, the intance configuration IC, and the pricing policy of provider : C D, IC, ( ). Storage time t D ( ) of dataet D can be D divided into period uch that t( D) = t( D ), n = where D repreent the whole dataet for period. In each period, the ize D the tored data i fixed. Total torage cot i thu the um of the price to be paid for each period: C D, IC, C D, IC, nd ( ) = ( ) = ( ) of (6) Within Amazon EBS, total torage cot i the product of data ize by torage time, fixed price c EBS per GB, and the number of computing intance n IC. Indeed, a mentioned previouly, each EC2 intance depend on a different EBS volume. Then torage cot can be expreed by: C ( D, IC, EBS) nd = ( ) ( ) EBS c D t D n = IC (7) Example 3: We ue Amazon EBS for torage pricing (Table 4), two mall EC2 intance and we conider that 0.5 TB (52 GB) data have been tored for 2 month. At the beginning of the 8 th month, we inert 2 TB (2048 GB) of new data in the cloud. Thu we have two period. The torage cot i: C D IC EBS (,, ) EBS = c D n t D ( ) IC ( ) EBS + c D n t D ( ) 2 IC ( 2) = 0. 0 52 2 7 + 0. 0 ( 52 + 2048) 2 2 7 = $ 3276. 8 ( ) The pricing of Amazon S3 i variable. It i $0.095 per GB for the firt TB. Then, it become $0.08 per GB up to 450 TB and o on (Section 2.2). Unlie EBS, S3 price i independent from the number of EC2 intance. Thi pricing c S3 i a piecewie linear function (cf. Formula 2), and the torage cot can be expreed a follow: C ( D, IC, S 3) nd = ( ( )) ( ) S 3 c D t D (8) = Example 4: We ue Amazon S3 for torage pricing (Table 5) and conider the ame cenario a in Example 3. Thu we have two period, and due to their data volume, egment of the cot function i conidered for period, and egment 2 for period 2. The whole torage cot i hown in Box. 4. COST MODELS FOR MATERIALIZING VIEWS IN THE CLOUD Thi ection preent cot model for materializing view in the cloud, relying on the cot model developed in Section 3. We aume here that querie are executed on a contant number n IC of identical intance IC 0 ( IC = IC, j =.. n ). In future wor, we j 0 IC
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 9 Box. ( ) = ( ( )) ( ) + ( ( )) ( ) S 3 S 3 C D, IC, S 3 c D t D c D t D 2 2 2 S 3 S 3 S 3 = ( a D ( ( ) x ) + b ) t ( D ) S 3 S 3 S 3 + ( a ( ( D ) x ) + b ) t ( D ) = 0. 095 2 2 2 2 2 ( ( 52 0) + 0) 7 + ( 0. 08 (( 52 + 2048) 024) + 024 0. 095) 5 = $ 44. 28 hall conider the evaluation proce on multiple, variable intance. Let V = { V } be a et of candidate view for materialization provided by an cand =.. n V exiting view election technique (cf. Section 2.3). In thi ection, we aume that view to be materialized have been elected from the client ide, outputting a final et of view V V cand that are materialized in the cloud. The problem of chooing the bet et of view V from V cand baed on the cot model preented here i addreed in the next ection. 4.. Data Tranfer Cot Materializing view help ave bandwidth and benefit from the computing performance of the cloud. With materialized view created in the cloud, tranfer cot due to materialization are null. A a conequence, total tranfer cot C t i not impacted and remain expreed by Formula 4. 4.2. Computing Cot Uing materialized view implie modifying the computing cot model, ince query proceing may exploit materialized view, and view mut be materialized and maintained. Computing cot C c now depend on V : C Q, V, IC, c ( ). Applying Amazon and Microoft pricing model with materialized view and conidering that the cloud conit of a contant number of identical intance 2, Formula 5 thu become: C Q V IC c (,,, ) = T ( Q, V, ) c ( IC ) n (9) 0 T Q, V, c ( ) i the total computing time, ( Q V ) for which i the um of the time T,, proc proceing the querie in worload Q uing the et V of materialized view, the time T V ( ) for materializing thee view, and ( V ) for maintaining them. A mat, the time T maint, a conequence, the total computing time can be expreed a: ( ) = proc ( ) + T ( V, ) + T ( V, ) (0) T Q, V, T Q, V, mat maint If a view i materialized, it aociated query mut be executed, which mut be paid for in the cloud. Let the materialization time of view V be t V mat ( ). The total materialization time i: ( ) = ( ) T V, t V mat () mat V V The maintenance cot of materialized view i directly proportional to the time required for updating materialized view when they are impacted by modification of the ource dataet. Note that we conider that querying and IC
0 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 maintenance do not occur at the ame time. For example, querie are poed during day-time and maintenance i performed during nighttime. Let the maintenance time of view V be t V maint ( ). Then, the total maintenance time of V i: ( ) = ( ) T V, t V maint (2) maint V V When uing materialized view, query proceing time i defined by two main parameter: query worload Q and et of materialized view V. Querie may ue the content of materialized view intead of recomputing their reult. Note that we conider that Q i fixed, variable worload i left for future wor. Since view are uually materialized with repect to a given worload and our i fixed, then V i alo fixed. Let t Q V ( ) be the proceing i, time of query Q i when exploiting the et of materialized view V. Thu, the total proceing time of Q againt V i: T Q, V, t Q, V proc nq ( ) = ( i ) i= 4.3. Storage Cot (3) Uing materialized view doe not impact the torage cot model a preented by Formula 6, 7 and 8. Exploiting materialized view to enhance query performance implie toring them in the cloud and paying the correponding cot. A a conequence, ome data can be duplicated. In that cae, the ize of D + V, i.e., ( D) + ( V ), i ued intead of the ize V V ( ) of D alone in the torage cot model. D Therefore, the torage cot model depending on D and V can be expreed a: C D, V, IC, C D V, IC, ( ) = ( + ) (4) Note that we aume that original data and materialized view are tored for the whole conidered torage period. Example 5: In our running example with Amazon S3, the dataet (0.5 TB) ha been tored for a year, the ize of duplicated data due to materialized view i 50 GB. In addition, no data are inerted during the conidered period. Thu, we have a ingle period, and torage cot i: ( ) C D, V, IC, S 3 = ( 52 + 50) 0. 095 2 = $ 640. 68 5. OTIMIZING VIEW MATERIALIZATION IN THE CLOUD In thi ection, we invetigate how to elect the view to materialize in order to improve query performance with a minimum overhead of torage cot. We define optimization problem to elect the bet et of materialized view by exploiting the cot model introduced in Section 4. Thee problem are expreed here a linear program with continuou and integer variable, in order to be olved efficiently uing a mixed-integer programming (MI) olver uch a CLEX 3. However, for large intance, MI olver could not provide an optimal olution in a limited time. Thu, a GRAS heuritic (Feo & Reende, 995) i propoed to find good olution fater. 5.. Optimization Objective Baed on the idea in Kllapi, Sitaridi, Tangari, and Ioannidi (20), we propoe three optimization problem, labelled MV to MV 3, with
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 different objective function to atify the need and capacity of cutomer: roblem MV : Find a et of view V that minimize repone time T proc under budget limit C max for total cot C (See Box 2); roblem MV 2 : Find a et of view V that minimize total cot C under limit T max for repone time T proc (See Box 3). Solving the bi-objective optimization problem (i.e., to minimize both C and T proc ) i not directly addreed in thi paper, a it require pecific optimization technique (e.g., (Coello, Lamont, & Veldhuien, 2007)) that differ from mono-objective optimization technique. It i not poible to provide a olution that optimize both objective. However, nondominated olution can be provided (i.e., olution of the areto frontier), meaning olution that cannot be improved on one objective without worening the other objective (Ehrgott, 2005). The model preented here i fully valid for multi-objective optimization, by removing budget and repone time limit. We can conider uing exact method (e.g., Two-hae Method (Viée, Teghem, irlot, & Ulungu, 998)) or approximate method (e.g., NSGA-II, Non-dominated Sorting Genetic Algorithm-II (Deb, ratap, Agarwal, & Meyarivan, 2002)) to olve the bi-objective problem. The former approach i baed on olving the following problem MV 3, which aim at optimizing both objective with a coefficient α etting the relative importance of the repone time criterion againt the cot criterion: roblem MV 3 : Find a et of view V that i a trade-off between minimum repone time T proc and minimum total cot C (See Box 4). 5.2. Mixed Integer rogramming Formulation In order to elect the et of view V to materialize, we rely on an exiting algorithm, uch a Baril and Bellahene (2003), enabling to obtain a et of candidate view for materialization V = { V }. In a firt tep, we =.. cand n V mae ome aumption on V cand. Notably, to determine the gain of uing a view, or everal view, for a given query require either to now enough detail on the functioning of the cloud to be able to expre analytically thi gain, or to run many experiment to meaure or etimate the gain. Box 2. minimize ( MV ) ubject to T proc C C max and contraint thatdef ine : thecot model theelectionof view (5)
2 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Box 3. minimize ( MV2 ) ubject to C T proc T max and contraint thatdef ine : thecot model theelectionof view (6) Box 4. minimize ( MV3 ) ubject to α T + ( α) C proc contraint that def ine : thecot model theelectionof view (7) To implify matter, we aume here that each query Q i can ue only one ingle view among a et of candidate view, meaning that i for each query Q i, there i a et V V of candidate view, and no more than one view in thi et mut be elected for query Q i. In future wor, one can conider the candidate view of query Q i to be V n i candidate et of view V i ij = { V } that contain =.. j n i ij V. The objective will thu be to elect a et of view V ij for a query Q i intead of a ingle view. The deciion of the optimization problem i to elect a view V for each query Q i. For thi purpoe, deciion variable x i are introduced: x i = if query Q i ue V, and x i = 0 otherwie. Let u define g i a the gain on repone time for query Q i when uing view V with cloud provider. Gain are conidered contant in thi problem, meaning that they have been meaured or etimated uptream (from experiment, tatitic, or model). The repone time of query Q i uing view i expreed a follow: nv ( i ) = i t Q, V t g x (8) i = i
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 3 where t i i the repone time with cloud provider for query Q i without any view, and auming that no more than one view i elected for query Q i. Thi latter point i enured by the following contraint: nv x, i =.. n (9) Q i = Deciion variable y are alo introduced to determine whether view V i materialized: y = if view V i materialized, and y = 0 otherwie. A view V i materialized if it i ued by at leat one query (i.e., when at leat one query Q i exit uch that x i = ), which i expreed by the following contraint: x y, i =.. n, =.. n (20) i Q V Moreover, there i no need to materialize V if it i not ued at all, which i expreed by the following contraint: y x, =.. n (2) nq i i= A for the gain on repone time, materialization and maintenance time ( t V mat ( ) and t V maint ( )) have been etimated uptream. Thee time mut be conidered for a view V only if thi view i materialized: T V, t V y mat nv mat = nv T V, t V maint ( ) = ( ) ( ) = ( ) maint = V y (22) A long a we aume that no data are inerted in the dataet during operation, we can conider a ingle period of length t( D) for torage. The ize S of thee data i: nv = ( ) + ( ) S D V y (23) = and their torage cot i: C D, V, IC, c S t D n ( ) = ( ) ( ) (24) where c i the torage cot function (we aume that it i piecewie linear, Section 3.4) applied by provider, and n = n if provider enable global torage (lie Amazon IC EBS, Section 3.4) or n = otherwie. Note that tranfer cot C t i not impacted by the election of view. It i a contant value in the optimization problem and remain expreed by Formula 4. To um up, the whole optimization problem, for intance MV ( MV 2 and MV 3 being very imilar), i finally expreed a hown in Box 5. 4 Notice that c i piecewie linear. It can be reformulated with linear contraint on continuou and integer variable (cf. (Chen et al., 200), page 64), maing the formulation fully linear. 5.3. GRAS Heuritic GRAS (Greedy Randomized Adaptive Search rocedure) i a metaheuritic with two phae: a randomized contruction and a local earch (Feo & Reende, 995). The two phae are repeated a given number of time ( it GR time), and the bet olution of all iteration i ept. In the contruction phae, which i a greedy approach, a olution i iteratively contructed, by
4 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Box 5. n Q nv minimize T = t g x proc i i i i= = ubject to C = C + C + C C C = c S t D n c t max ( ) ( ) n V C = T + T + T c IC c ( proc mat maint ) c ( 0) n S = ( D) + ( V ) y IC = n n V V MV x, i =.. n T = t V y i Q mat mat ( ) (25) = = n V x y, i =.. n, =.. n T = t V y i Q V maint maint ( ) = nq y x = n,.. i V i= x i i { 0, }, =.. n, =.. n Q V y 0,,.. n { } = V adding one element at a time in the olution. Then, the local earch iteratively improve the olution obtained in the firt phae by moving from olution to olution in the pace of candidate olution (by adding or removing element in the olution). There are two et of deciion variable in the view materialization problem: y that indicate whether view V i materialized, and x i that indicate whether view V i ued by query Q i. Note that if all y are fixed, then finding the optimal value for all x i i traightforward: electing the materialized view V that maximize gain g i for each query Q i provide an optimal olution for x i, ince all y are fixed. In our GRAS heuritic, a olution i thu repreented by vector y = ( y ), and adding an element in the olution mean material- =.. nv izing a view (i.e., for a given, et y = ). 5.3.. Randomized Contruction The contruction phae tart with a olution where no view i materialized. Iteratively, one view i elected and materialized in the olution. The aim here i to generate a feaible olution, meaning that the cot C of final olution y mut be lower than C max. Therefore, only view that reduce cot C are inerted in the olution. Several indicator are neceary to etimate the impact of materializing a view. Let c be the um of the cot C V c ( ) of materializing view V (including proceing the materialization and the maintenance of the view), and the cot C V ( ) of toring the view (See Box 6). Let g be the gain on total repone time T proc of materializing view V, which i the um of all the gain induced by the materialization of view V on each query Q i ( V provide
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 5 Box 6. c C V C V = ( ) + ( ) ( ) = ( ( ) + ( )) ( ) ( ) = ( + ( )) ( ) c C V t V t V c IC n 0 c mat maint c IC ( ) ( ) C V c S V c S t D n (26) a gain for query Q i only if g i > g, where V l i the current view elected for query Q i ): nq nv g = g g y max 0, i (27) il l i= l= Let w be the benefit of materializing view V, which i the difference between the cot of the gain on repone time and the cot of materializing the view: w = g c IC n c c ( ) (28) 0 IC At each iteration of the greedy contruction, view that are not materialized yet are raned according to w. Only the view with w > 0 are conidered, and among a given proportion ( el RC %) of the bet candidate (i.e., with highet w ), one i choen randomly. Thi elected view i materialized in olution y, and the procedure repeat, until there i no more candidate view to add in the olution. At the end of the procedure, if cot C > C max for olution y, then a new attempt to build a feaible olution i performed. If, after a given number it RC of attempt, no feaible olution i found, then the heuritic top with no feaible olution. 5.3.2. Local Search The goal of the local earch i to improve the feaible olution obtained from the randomized il contruction, by reducing total proceing time T proc. The procedure move from olution to olution by adding a view to be materialized at each iteration. For thi purpoe, the indicator g of each view V that i not materialized yet i computed. Neighborhood olution of y will be olution with one more materialized view V uch that g > 0, and that are till feaible, i.e., uch that C w C. The max heuritic move to the olution that i randomly elected among a given proportion ( el LS %) of the bet olution (i.e., with highet g ) of the neighborhood. Note that each time a new view i materialized, it can mae ome already materialized view uele, meaning that there can exit materialized view that are not ued anymore by any query. To not materialize uch view reduce total cot without increaing total proceing time. Therefore, uch view are detected and removed from each new olution of the local earch. The procedure end when no more view can be added to improve the olution. 6. EXERIMENTS Thi ection decribe our experimental environment and the reult we achieved. Experiment are run at once in the cloud and on the client ide. Both the dataet and materialized view are tored on the cloud and querie are proceed on the cloud. View election algorithm are executed at the client. The idea i to elect view at the client and materialize them on the cloud.
6 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 6.. Environment We run our experiment on a cluter compoed by 20 virtual machine with 8 GB hard drive, 2 GB of RAM and vcu. The phyical architecture correpond to four 2.2 GHz proceor with 2 core and 96 GB of RAM. All machine run Hadoop (verion 0.20.2) and ig (verion 0.9.). Since client configuration ha no effect on the following reult, we do not detail it. There are two main way to generate worload to tet thi ind of configuration. The firt i uing real trace. Thi approach uually help provide good etimation of real ue cae. However, a trace repreent only a particular cae and doe not allow fully repreenting reality. Furthermore, if the main objective i to undertand why a olution fit in a particular context, the ue of one trace will be inufficient to highlight all operating mechanim. The econd approach conit in uing a ynthetic worload. It main drawbac lie in it artificial nature, but it allow comparing many configuration. In ummary, if trace are available, uing them help chooe the model and calibrate it. Chooing the model i very important to provide a good repreentation of the target context. Since our goal i to illutrate the interet of our approach in data warehoue, we ue the Star Schema Benchmar (verion 2..8.8). Querie are written in ig Latin and executed by the ig compiler a MapReduce ta uing a Hadoop cluter. 6.2. Querying We have teted optimization MV, MV 2 and MV 3 decribed in Section 5. on a 5.5 GB databae and Amazon S3 and EC2 price. Meaure have been performed with variable parameter. The firt parameter i experiment duration, from to 24 month. The econd parameter i worload frequency, i.e., the number of time the worload i executed during the conidered period, from to 5 execution per wee. The third parameter i the number of node ued to proce querie, from 5 to 20. A typical experiment fixe two parameter and varie the lat. Fixed value ued for experiment duration, worload frequency and node number are 2, 4 and 0, repectively. The experimental reult we achieved clearly how that our approach allow electing view that ignificantly improve both repone time and cot (Figure 3). Repone time can indeed be divided by about 2 (for example, with 4 worload execution per wee during 2 month on a 0-node cluter, repone time Figure 3. Experimental reult
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 7 i 20 hour with materialized view, while it i 42 hour without), and cot i about 30% lower (for example, till with 4 worload execution per wee during 2 month on a 0-node cluter, cot with materialized view i $22, while it i $32 without). Our olution allow reaching the two objective. When the objective i to reduce cot, materializing view allow paying the minimum for a given repone. When the objective i performance, materializing view lead to the minimum repone time for a given cot. However, note that both objective do not eem contradictory. When fixing repone time, we decreae cot by about 0%. When fixing cot, we improve repone time by about 5%. But in large-cale environment, a gain of 5 to 0% i till etimated to thouand of dollar and hour of proceing. 6.3. View Selection In thi erie of experiment, computational performance and olution quality of the GRAS heuritic are compared with thoe obtained by olving the optimization problem MV with CLEX 2.4. The experiment are performed on an Intel Core 2 Quad roceor (2.5 GHz), with 4 GB of RAM. We empirically teted different parameter value for GRAS and retained the following: it GR = 00, it RC = 200, el RC = 0., and el LS = 0.. Solver are teted on intance where value ( V ), t V mat ( ), t V maint ( ), t i, g i are randomly generated. Intance of variou ize (i.e., with different number n Q of querie, and number n V of candidate view) are built. The pricing of Amazon EC2 and S3 ervice i ued, uing 2 mall computing intance for an operating period t D ( ) =. Moreover, we tet the ame intance for three different value of C max, in order to conider a more or le retrictive budget contraint. The minimum cot C of the intance (obtained by olving problem MV 2 without repone time limit) and the maximum cot C + of the intance (obtained by olving problem MV without budget limit) are ued to define three value for ( ) + C max : C = C + 0. 05 C C i 5% above minimum cot C, C 2 i 5% above C, and C 3 i 25% above C. Therefore, three group are formed, denoted G, G 2, and G 3, with C max being repectively equal to C, C 2, and C 3. For each group, two table are preented: on the firt table, the number of view i fixed to 00 and the number of querie varie. On the econd table, the number of querie i fixed to 00 and the number of view varie. Table feature average value from 5 different intance with the ame ize. They how the CU time needed by CLEX and GRAS to olve the problem and the relative difference (gap) between the repone time T proc of the olution found by both method. A gap of n % mean that GRAS achieved a repone time n % greater than that found by CLEX. Note that if CLEX cannot find the optimal olution within two minute, the gap i computed with the bet olution found by CLEX (uch cae are mared with * in table). The reult for group G are preented in Table 8 and 9. The time needed by CLEX to olve the intance increae ignificantly with their ize. For ome intance, CLEX cannot find an optimal olution within two minute. A GRAS run fat, i.e., in le than one econd, it ha difficultie to find an optimal olution. However, the repone time T proc of it olution i uually le than % greater than that of the bet olution found by CLEX. The reult for group G 2 are preented in Table 0 and. With the budget contraint relaxed, CLEX eem to olve the intance more eaily, about twice fater, while GRAS find better olution (the gap i a little maller). Finally, the reult for group G 3 are preented in Table 2 and 3. With the budget limit ignificantly looened, CLEX ha no
8 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Table 8. Reult for G, with n V = 00 Querie CU Time (Second) Gap (n Q ) CLEX GRAS (%) 0 0. 0.0 0.5 20 2.6 0.. 30 8.6 0. 0.4 40 7.8 0. 0.7 50 4. 0. 0.5 60 33.3 0.2.0 70 5.8 * 0.2 0.3 80 67.6 * 0.2 0.9 90 48.6 0.3 0.6 00 84.6 * 0.3 0.4 Table 9. Reult for G, with n Q = 00 View CU Time (Second) Gap (n V ) CLEX GRAS (%) 0. 0.0 0.6 20 3.4 0.0 0.6 30 9.7 0. 0.8 40 7.4 0. 0.4 50 26.4 0. 0.5 60 47.7 0.2 0.4 70 43.3 0.2 0.4 80 69.2 * 0.2 0.3 90 73.4 * 0.3 0.8 00 84.6 * 0.3 0.4
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 9 Table 0. Reult for G 2, with n V = 00 Querie CU Time (Second) Gap (n Q ) CLEX GRAS (%) 0 0.4 0.0.5 20.8 0..2 30 4.3 0. 0.5 40 3.5 0. 0.3 50 0.4 0.2 0.3 60 0.7 0.2 0.3 70 2. 0.3 0.2 80 27.6 0.3 0.3 90 7. 0.4 0.4 00 45. 0.5 0.2 Table. Reult for G 2, with n Q = 00 View CU Time (Second) Gap (n V ) CLEX GRAS (%) 0 0.6 0.0.6 20 2.2 0.0 0.6 30 6.5 0. 0.4 40 9.8 0. 0.3 50 3.5 0.2 0.3 60 3.8 0.2 0.3 70 4.0 0.3 0.3 80 38.9 0.3 0.3 90 53.0 * 0.4 0.3 00 45. 0.5 0.2
20 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Table 2. Reult for G 3, with n V = 00 Querie CU Time (Second) Gap (n Q ) CLEX GRAS (%) 0 0.2 0.0.0 20 0.5 0. 0.7 30 0.2 0. 0.0 40 0.8 0.2 0. 50 0.6 0.2 0.0 60 0.8 0.3 0.0 70 5. 0.3 0.0 80 2.6 0.4 0.0 90.5 0.5 0.0 00.0 0.6 0.0 Table 3. Reult for G 3, with n Q = 00 View CU Time (Second) Gap (n V ) CLEX GRAS (%) 0 0.6 0.0 0.8 20 0.9 0. 0.5 30 3.3 0. 0.2 40 6.5 0.2 0.3 50 5.8 0.2 0. 60 3.5 0.3 0. 70 2.6 0.3 0.0 80.6 0.4 0.0 90 2.3 0.5 0.0 00.0 0.6 0.0
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 2 difficulty to olve the intance, while GRAS doe not alway find an optimal olution (even if for the bigget intance, optimal olution are found). 7. RELATED WORKS We dicu in thi ection previou reearch related to the main domain addreed in thi paper, i.e., cloud data management, data acce optimization through materialized view, and cot model for large-cale ditributed ytem. Cloud data management brought about a lot of reearch and variou operational ytem. The mot popular include o-called NoSQL ytem, uch a Amazon DynamoDB (DeCandia et al., 2007), or Caandra (Lahman & Mali, 2009), which cale up very efficiently but only propoe eventual conitency, in contrat to traditional ACID (Atomicity, Conitency, Iolation, Durability) guarantee. Full cloud relational ytem enforcing ACID contraint are alo available, e.g., Microoft SQL Azure (Campbell, Kaivaya, & Elli, 200), Amazon RDS (Amazon, 203) and Oracle Databae Cloud Service (Oracle, 203), but they currently operate on a maller cale. Finally, there exit large-cale data analytic ytem that are pecifically tailored for the cloud, uch a ig (Gate et al., 2009) and Hive (Thuoo et al., 200). The olution we propoe in thi paper i generic and can be applied within any of thee ytem. In the cloud, performance i mainly managed by exploiting computing power elaticity. Neverthele, well-nown performance optimization technique from the databae domain, uch a indexing, view materialization or caching, may be ued to decreae the global monetary cot of querying data in the cloud. In thi paper, we particularly focu onto view materialization. Numerou approache help elect (Agrawal, Silbertein, Cooper, Srivatava, & Ramarihnan, 2009; Ceri & Widom, 99; Luo, Naughton, Ellmann, & Watze, 2003; Mami & Bellahene, 202; Yang, Karlapalem, & Li, 997; Zhou, Laron, & Elmongui, 2007) materialized view, whether in tranactional databae, in deciion-upport databae (i.e., data warehoue) or even on the Web. In the view election problem we addre, all combination of attribute in a databae contitute a lattice of candidate materialized view. Cot model then help determine the materialized view that allow the bet global performance improvement, uually under di pace contraint. Variou optimization technique are ued to exploit thee cot model, ranging from imple greedy algorithm (Vijay Kumar & Ghohal, 2009) to imulated annealing or genetic algorithm (Bellatreche et al., 2006). Finally, to reduce the dimenionality of the input candidate view et, materialized view may alo be pre-filtered with repect to the query worload, e.g., with data mining technique uch a frequent itemet mining or clutering (Aouiche & Darmont, 2009). Our own cot model aim at extending exiting materialized view election trategie by ubtituting claical pace contraint by the pay-a-you-go economic model of the cloud, in order to achieve the bet trade-off between torage cot (including the cot of toring materialized view) and computation cot. Exiting cot model for cloud computing addre a variety of problem. For intance, Kllapi et al. (20) wored on data tream cheduling with repect to monetary cot and proceing time. On each cloud node, they lice the time into window. Financial cot i then aumed to be the count of the time window that have at leat one operator running, multiplied by the cot of leaing the node. Kantere, Dah, Gratia, and Ailamai (20) ought to amortize the cot of data tructure uch a indexe and materialized view to enure the economic viability of the cloud ervice provider. With the help of tochatic model, they compute, among other indicator, the influence Inf(S) of the cot of a new data tructure S on the economy of the cloud ervice provider. Then, the number n of amortization payment for building S i uch that the gain probability of at mot n payment i equal to Inf(S). Dah,
22 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Kantere, and Ailamai (2009) alo wored on thi topic, but for automatically managing cache. They conider the cot of a query plan Q a the um of the cot of executing Q and the amortized cot of any tructure ued by Q. Then, cot model for cache querie, networ querie, building and maintaining caching tructure are detailed. Finally, Upadhyaya, Balazina, and Suciu (202) enviaged the problem of electing and pricing optimization in the cloud a a mechanim deign problem, i.e., maximizing the expected value to uer of exploiting a et A of optimization minu the actual cot of A, including it maintenance. Other cot model target pecific application, epecially in the field of atronomy. In thi domain, imulation experiment howed that a good trade-off between toring optimization tructure and computing power help reduce the global cot of data proceing in the cloud (more preciely, in Amazon free olution) without reducing performance (Deelman, Singh, Livny, Berriman, & Good, 2008). The performance of three application managing data tream, bearing variou characteritic in term of I/O and memory and CU conumption, have alo been compared on Amazon EC2 and a high-performance cluter, to identify what application achieved the bet performance at the lowet cot (Berriman et al., 200). Our wor complement thee exiting cot model, but alo differ from them in two way. Firt, we introduce a new billing model that i generic enough to repreent the billing model of all cloud ervice provider we are aware of. Second, our propoal ret on a detailed model of the optimization proce that lead to materialized view election. Finally, our materialized view election approach i alo independent from any particular target application. 8. CONCLUSION We propoe in thi paper an approach to improve data management in the cloud uing materialized view. Our main contribution are extended cot model for materializing view that tae exiting cloud pricing model into account. Our cot model are then exploited by an optimization proce, which provide a compromie between the performance improvement due to materialization and budgetary contraint. Experiment on a private data center have highlighted the relevance of our approach. Thi wor open many perpective. Firt, we aim at extending our cot model to overcome ome limit (multiple and variable intance, for example). Then, we plan to integrate our model in exiting view election algorithm to avoid plitting the election proce into two phae, i.e., we aim at fuing the candidate view generation and view election procee. In addition, it would be relevant to conider other optimization technique, uch a indexing or caching. It ha indeed been hown that, jointly employed, indexe and materialized view benefit from each other (Aouiche & Darmont, 2009). Finally, we plan to validate our propoal on a larger cale and cloud-pecific query worload (with good and bad candidate for parallelim). ACKNOWLEDGMENT Thi wor wa upported by the French National Reearch Agency. We acnowledge the contribution to thi wor of Thi Van Anh Nguyen, Vilmar Jefté Rodrigue de Soua, Michael David de Souza Dutra, and member of the ERIC, LIMOS and IRSTEA. We alo than the anonymou referee for many helpful comment. REFERENCES Agrawal,., Silbertein, A., Cooper, B. F., Srivatava, U., & Ramarihnan, R. (2009). Aynchronou view maintenance for VLSD databae. In roceeding of the International Conference on Management of Data (SIGMOD 2009), rovidence, RI (pp. 79 92). Amazon. (203). Amazon EC2. Retrieved from http:// aw.amazon.com/ec2/
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 23 Amazon. (203). Amazon relational databae ervice (Amazon RDS). Retrieved December 203, from http://aw.amazon.com/rd/ Aouiche, K., & Darmont, J. (2009). Data miningbaed materialized view and index election in data warehoue. Journal of Intelligent Information Sytem, 33(), 65 93. doi:0.007/0844-009-0080-0 Baril, X., & Bellahene, Z. (2003). Selection of materialized view: A cot-baed approach. In roceeding of the International Conference on Advanced Information Sytem Engineering (CAISE 2003), Klagenfurt, Autria (pp. 665 680). doi:0.007/3-540-4507-3_44 Bellatreche, L., Bouhalfa, K., & Abdalla, H. I. (2006). SAGA: A combination of genetic and imulated annealing algorithm for phyical data warehoue deign. In roceeding of the Britih National Conference on Databae (BNCOD 2006), Belfat, Northern Ireland, UK (pp. 22 29). doi:0.007/7889_8 Berriman, G. B., Juve, G., Deelman, E., Regelon, M., & lavchan,. (200) The application of cloud computing to atronomy: A tudy of cot and performance. In Worhop on e-science challenge in Atronomy and Atrophyic, in conjunction with the International Conference on e-science (e- Science 200), Bribane, Autralia. doi:0.09/ esciencew.200.0 Campbell, D. G., Kaivaya, G., & Elli, N. (200). Extreme cale with full SQL language upport in Microoft SQL Azure. In roceeding of the International Conference on Management of Data (SIGMOD 200), Indianapoli, IN (pp. 02 024). doi:0.45/80767.807280 Ceri, S., & Widom, J. (99). Deriving production rule for incremental view maintenance. In roceeding of the 7 th International Conference on Very Large Data Bae (VLDB 99), Barcelona, Spain (pp. 577 589). Chen, D.-S., Baton, R. G., & Dang, Y. (200). Applied integer programming: Modeling and olution. John Wiley and Son. Coello, C. A. C., Lamont, G. B., & Van Veldhuien, D. A. (2007). Evolutionary algorithm for olving multi-objective problem. Springer. Dah, D., Kantere, V., & Ailamai, A. (2009). An economic model for elf-tuned cloud caching. In roceeding of the International Conference on Data Engineering (ICDE 2009), Shanghai, China (pp. 687 693). doi:0.09/icde.2009.43 Deb, K., ratap, A., Agarwal, S., & Meyarivan, T. (2002). A fat and elitit multiobjective genetic algorithm: NSGA-II. IEEE Tranaction on Evolutionary Computation, 6(2), 82 97. doi:0.09/4235.99607 DeCandia, G., Hatorun, D., Jampani, M., Kaulapati, G., Lahman, A., ilchin, A., et al. (2007). Dynamo: Amazon highly available ey-value tore. In roceeding of the Sympoium on Operating Sytem rinciple (SOS 2007), Stevenon, WA (pp. 205 220). Deelman, E., Singh, G., Livny, M., Berriman, G. B., & Good, J. (2008). The cot of doing cience on the cloud: the Montage example. In roceeding of the Conference on High erformance Computing (SC 2008), Autin, TX (p. 50). doi:0.09/ SC.2008.527932 Ehrgott, M. (2005). Multicriteria optimization. Springer. Feo, T. A., & Reende, M. G. C. (995). Greedy randomized adaptative earch procedure. Journal of Global Optimization, 6(2), 09 34. doi:0.007/ BF0096763 Gate, A., Natovich, O., Chopra, S., Kamath,., Narayanam, S., & Olton, C. et al. (2009). Building a high level dataflow ytem on top of MapReduce: The pig experience. VLDB, 2(2), 44 425. Kantere, V., Dah, D., Gratia, G., & Ailamai, A. (20). redicting cot amortization for query ervice. In roceeding of the International Conference on Management of Data (SIGMOD 20), Athen, Greece (pp. 325 336). doi:0.45/989323.989358 Kllapi, H., Sitaridi, E., Tangari, M. M., & Ioannidi, Y. E. (20). Schedule optimization for data proceing flow on the cloud. In roceeding of the International Conference on Management of Data (SIGMOD 20), Athen, Greece (pp. 289 300). doi:0.45/989323.989355 Lahman, A., & Mali,. (200). Caandra: A decentralized tructured torage ytem. Operating Sytem Review, 44(2), 35 40. doi:0.45/77392.773922 Luo, G., Naughton, J. F., Ellmann, C. J., & Wate, M. (2003). A comparion of three method for join view maintenance in parallel RDBMS. In roceeding of the International Conference on Data Engineering (ICDE 2003), Bangalore, India (pp. 77 88). doi:0.09/icde.2003.26079
24 International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 Mami, I., & Bellahene, Z. (202). A urvey of view election method. SIGMOD Record, 4(), 20 29. doi:0.45/2206869.2206874 Microoft. (203). Microoft Azure. Retrieved from www.windowazure.com/ Nguyen, T.-V.-A., Bimonte, S., d Orazio, L., & Darmont, J. (202). Cot model for view materialization in the cloud. In DanaC (pp. 47 54). Berlin, Germany: EDBT. doi:0.45/2320765.2320788 O Neil,. E., O Neil, E. J., Chen, X., & Revila, S. (2009). The tar chema benchmar and augmented fact table indexing. In roceeding of the TC Technology Conference (TCTC 2009), Lyon, France (pp. 237 252). Oracle. (203). Your oracle databae in the cloud. Retrieved December 30, 203, from http://cloud. oracle.com/databae Thuoo, A., Sen Sarma, J., Jain, N., Shao, Z., Chaa,., Anthony, S., & Murthy, R. (200). Hive - a petabyte cale data warehoue uing Hadoop. In roceeding of the International Conference on Data Engineering (ICDE 200) (pp. 996 005). doi:0.09/ ICDE.200.5447738 Upadhyaya,., Balazina, M., & Suciu, D. (202). How to price hared optimization in the cloud. VLDB, 5(6), 562 573. Vijay Kumar, T. V., & Ghohal, A. (2009). Greedy election of materialized view. International Journal of Computer and Communication Technology, (), 56 72. Viée, M., Teghem, J., irlot, M., & Ulungu, E. L. (998). Two-phae method and branch and bound procedure to olve the bi-objective napac problem. Journal of Global Optimization, 2(2), 39 55. doi:0.023/a:00825830679 Yang, J., Karpalem, K., & Li, Q. (997). Algorithm for materialized view deign in data warehouing environment. In roceeding of the International Conference on Very Large Data Bae (VLDB 997), Athen, Greece (pp. 36 45). Zhou, J., Laron,., & Elmongui, H. G. (2007). Lazy maintenance of materialized view. In roceeding of the International Conference on Very Large Data Bae (VLDB 2007), Vienna, Autria (pp. 23 242). ENDNOTES Future wor hall propoe a generic framewor to map to any CS. 2 Conidering variable intance i out of the cope of thi paper and i part of our perpective. 3 IBM ILOG CLEX Optimizer: http://www. ibm.com/oftware/integration/optimization/ cplex-optimizer 4 For clarity reaon, notation i implified: parameter Q, D, V, IC, and are maed. Romain erriot ha been a hd tudent at Univerité Blaie acal ince 203. He earned an engineering degree in computer cience from the ISIMA chool, France, in 203. Hi reearch activitie relate to query rewriting, caching, cloud computing and optimization. Jérémy feifer ha been woring in a leading provider of invetment deciion upport tool ince 203. He earned an engineering degree in computer cience from the ISIMA chool, France, in 203.
International Journal of Data Warehouing and Mining, 0(4), -25, October-December 204 25 Laurent d Orazio ha been aitant profeor at Univerité Blaie acal, LIMOS CNRS ince 2008. He obtained hi hd Grenoble Intitut National olytechnique, France (2004-2007). From 2007-2008, he carried out reearche at LCIS, Valence, France. He ha been involved in everal lecture (uch a Information Sytem, Concurrency ractice and Experience, International Journal of Data Warehouing and Mining, Journal of Deciion Sytem, Knowledge And Information Sytem) and program (lie International Conference on Buine roce Management, International Conference on Management of Emergent Digital EcoSytem, International Worhop on Cloud Intelligence) committee. Hi reearch activitie concern Big Data, Cloud Computing and Optimization. Bruno Bachelet ha been aitant profeor of computer cience at the Univerité Blaie acal, France, ince 2007. He received hi h.d. in 2003 from the Univerité Blaie acal. He carried out reearche at the LIMOS laboratory, CNRS, France, from 2003 to 2005, and at the INRA intitute, France, from 2005 to 2007. Hi reearch activitie concern operation reearch (modeling, imulation and optimization) and oftware engineering (deign of librarie for cientific computing, generic programming, metaprogramming). Sandro Bimonte, Born in 978, i reearcher at IRSTEA, and more exactly he i at TSCF. He obtained hi hd at INSA-Lyon, France (2004-2007). From 2007-2008, he carried out reearche at IMAG, France. He i Editorial Board member of International Journal of Deciion Support Sytem Technology, and International Journal of Data Mining, Modelling and Management and member of the Commiion on GeoViualization of the International Cartographic Aociation. Hi reearch activitie concern Spatial Data Warehoue and Spatial OLA, Viual Language, Geographic Information Sytem, Spatio-temporal Databae and GeoViualization. Jérôme Darmont i full profeor of computer cience at the Univerité de Lyon, France, and the director of the ERIC laboratory. He received hi h.d. in 999 from the Univerity of Clermont- Ferrand II, France, and then joined the Univerity of Lyon 2 a an aociate profeor. He became full profeor in 2008. Hi reearch interet mainly relate to databae and data warehoue performance (performance optimization, auto-adminitration, benchmaring...) and cloud buine intelligence (data ecurity, query performance and cot, peronal BI...). He i a member of everal editorial board and ha erved a a reviewer for numerou conference and journal. Along with Torben Bach ederen, he initiated the Cloud Intelligence worhop erie in 202.