: A Global -based Redistributio Approach to Accelerate RAID-5 Scalig Chetao Wu ad Xubi He Departet of Electrical & Coputer Egieerig Virgiia Coowealth Uiversity {wuc4,xhe2}@vcu.edu Abstract Uder the severe eergy crisis ad the fast developet of cloud coputig, owadays sustaiability i large data ceters receives ore attetio tha ever. Due to its high perforace ad reliability, RAID, particularly RAID-5, is oe of the ost popular copoets ad widely used i these data ceters. However, a challege o the sustaiability of RAID-5 is its scalability, or how to efficietly expad/reduce the disks. The ai reaso causig this proble is the special layout of RAID-5 with parity blocks, which is difficult to be exted efficietly. To address this proble, i this paper, we propose a ovel redistributio approach to accelerate RAID-5 scalig, called Global -based Redistributio (). The basic idea is to aitai the layout of ost stripes while sacrificig a sall portio of stripes accordig to a global view of all stripes. has four ai advatages: 1) It supports bidirectioal RAID-5 scalig (both scale-up ad scale-dow); 2) iiizes the overhead of scalig process, icludig the data igratio cost, parity odificatio ad coputatio cost, ad the operatios of etadata; 3) differet fro previous approaches, provides high flexibility ad high availability for the write requests; 4) A disk array ca achieve higher capacity, perforace ad storage efficiecy by extig ore disks via. I our atheatical aalysis, aitais uifor distributio, saves up to 81.5% I/O operatios ad reduces the igratio tie by up to 68.%, which speeds up the scalig process by a factor of up to 3.13. Idex Ters RAID-5; Scalig; Reliability; Scalability I. INTRODUCTION Redudat Arrays of Iexpesive (or Idepet) Disks (RAID) [19] [4] is a popular choice to supply both high reliability ad high perforace storage services with acceptable spatial ad oetary cost. I recet years, scalability i RAID systes is i high dead due to the followig reasos, 1) To eet the requireets of larger capacity ad higher throughput [22]. Addig ore disks ito a existig disk array is a cost-perforace effective solutio. 2) To fulfill the eeds of eergy savig. By reovig soe iefficiet disks of a disk array, the power cosuptio ca be reduced to be cost-effective. 3) To atch the icreasig deads of olie applicatios. Typically, RAID is widely used i various olie services such as cloud coputig [1]. High scalability ot oly satisfies the sharp icreasig o user data i various olie applicatios [7], but also avoids the extreely high dowtie cost [18]. 4) Necessity i data ceters. RAID-based architectures are widely used for clusters ad large scale storage systes, where scalability plays a sigificat role i these systes [15] [2]. Aog differet RAID layouts, RAID-5 is oe of the ost sigificat fors ad widely used i large scale data ceters. Recetly, research o RAID-5 scalig 1 receives uch attetio ad ay approaches are proposed i this area, icludig Roud-Robi () [9] [17] [23], Sei- [8], ALV [24], MDM [12], etc. However, there are two challegig issues o RAID-5 scalig. The first challege is the high overhead of the scalig process. I traditioal -based approaches [9] [17] [23], alost all data are igrated ad thus all parities should be recalculated ad odified. It also causes additioal updates o etadata. Sei- [8] suffers fro ubalaced data distributio. ALV [24] aggregates the igratio I/O ad decreases the total uber of redistributio I/Os, but it caot decrease the total uber of access to data blocks. Although MDM [12] ca decrease the data oveets ad the uber of parity odificatio, it causes soe ew probles. Copared to ad Sei- approaches, the storage efficiecy ad the perforace are ot iproved after scalig usig MDM. Furtherore, MDM adds aother parity ito the origial RAID-5 layout, which akes the data appig ore coplicated whe read ad write requests are processed. The secod challege is the support o both scale-up (addig disks) ad scale-dow (reovig disks). Except, other approaches oly support scale-up. To address the above challegig issues, i this paper we propose Global -based Redistributio (), a ew approach to RAID-5 scalig. Based o a global view o all stripes, a proper uber of stripes are retaied i while others are selected to fill the epty blocks i the extig disk(s). has the followig advatages: provides bidirectioal scalig by addig or reovig ay uber of disks to/fro a RAID5. ot oly iiizes the total uber of igratio ad odificatio I/Os, but also reduces the parity coputatio cost ad the operatios of etadata. It draatically accelerates the scalig process of RAID-5. 1 I this paper, scalig is a process to add disks (scale-up) to or reove disks (scale-dow) fro a existig disk array.
TABLE I SYMBOLS IN THIS PAPER Paraeters & Sybols Descriptio uber of disks i a disk array (before scalig) scaled uber of disk(s) ( is egative whe scale-dow) B total uber of data blocks S, S total uber of stripes (before/after scalig) i, X, Y stripe ID (row ID) before scalig j disk ID (colu ID) before scalig i stripe ID (row ID) after scalig j disk ID (colu ID) after scalig P i parity block i i before scalig Q i parity block i i after scalig D k data block with ID is k before scalig D k data block with ID is k after scalig S s stripe set ID N s total uber of stripe sets S r total uber of retaied OUS/NUS S total uber of reapped OUS/NUS S d total uber of destructed OUS/NUS N d total uber of igrated data blocks N p total uber of odified parity blocks R d data igratio ratio R p parity odificatio ratio R etadata odificatio ratio T b access tie of a read/write request to a block T igratio tie By efficietly addig ore disks to a disk array, the perforace ad storage efficiecy are iproved. The rest of this paper cotiues as follows: Sectio II discusses the otivatio of this work ad details the backgroud of existig scalig ethods. Global -based Redistributio () approach is described i detail i Sectio III. Sectio IV provides quatitative aalysis o perforace ad scalability. Fially we coclude the paper i Sectio V. II. BACKGROUND AND MOTIVATION To iprove the efficiecy of the RAID-5 scalig, differet approaches have bee proposed. I this sectio we discuss the backgroud of the scalig schees, probles i existig schees ad the otivatios of our work. To facilitate our discussio, we suarize the sybols used i this paper i Table I. A. Desired Features to Scale RAID-5 To scale a disk array, soe data eed to be igrated to achieve a balaced data distributio. Durig the data igratio process, we eed to keep a approxiate evely distributed workload ad iiize the data/parity oveet. Cobied with existig scalig approaches i RAID-5, the followig six features are desired for efficiet scalig, Feature 1 (Uifor Data & Parity Distributio): Each disk has the sae aout of data ad parity blocks to aitai a evely distributed workload. Feature 2 (Miial Data & Parity Migratio): By icreasig/decreasig disks to a RAID-5 syste with disks storig B data blocks, the expected total uber of data oveet is B + (scale-up) or B (scale-dow). Parity oveet should also be iiized. Feature 3 (Fast Data Addressig): The locatios of blocks i the array ca be easily coputed at low cost. Feature 4 (Miial Parity Coputatio & Modificatio): A oveet o data block could cause odificatio cost o its correspodig origial parity ad coputatio cost o the ew parity, so the origial parity chai should be reserved as uch as possible. Feature 5 (High Flexibility o Scalig Process): Flexible schees should be provided for scalig process with various ubers of ad. Feature 6 (Better Storage Efficiecy ad Perforace by Extig More Disks): I RAID-5, the storage efficiecy is 1. By addig disks ( > ), the storage efficiecy is iproved ( + 1 + > 1 ). The write perforace ad throughput should also be icreased after scalig [22]. B. Existig Fast Scalig Approaches Existig approaches to iprove the scalability of RAID- 5 syste iclude Roud-Robi () [9] [17] [23], Sei- [8], ALV [24], MDM [12], FastScale [25], etc. To clearly illustrate various strategies i RAID-5, the default data ad parity distributio is right-asyetric 2. 1) Roud-Robi (): As show i Figure 1, traditioal scalig approach is based o roud-robi order where early all data are igrated except the first stripe (early 1% data igratio). Obviously, all parities eed to be regeerated after data igratio. is siple to ipleet o RAID-5 ad has bee used i soe products [5] [13]. However, the overhead is high due to the large data igratio. Gozalez et al. [9] foud that achieves better perforace i left-syetric or right-syetric distributio, where Gradual Assiilatio (GA) algorith is used o RAID- 5 scalig (as show i Figure 2). A little ore data blocks ca be reserved without ay chage, but all parities still eed to be odified ad recalculated after data igratio. Based o approach, Brow [17] desiged a reshape toolkit i a Liux MD driver (MD-Reshape), which writes appig etadata usig a fixed-size widow. Due to the liitatio of approach, etadata are updated frequetly by callig a MD-Reshape fuctio, which is iefficiet. 2) Sei-: Sei- [8] is proposed to decrease high igratio cost i scalig as show i Figure 3. Ufortuately, by extig ultiple disks, the data distributio is ot uifor after scalig [8]. It ca easily lead to load balacig proble, which is a iportat issue i disk arrays [14] [1]. 3) ALV: ALV [24] is show i Figure 4. Differet fro -based approaches, ALV chages the oveet order of igrated data ad aggregates these sall I/Os. However, ALV is essetially based o roud-robi order ad thus caot 2 There are ay layouts of RAID-5 based o the placeet of parity blocks. Typically four types of data ad parity distributio are preferred, leftsyetric, left-asyetric, right-syetric ad right-asyetric [16]. 2
Disk Disk1 Disk2 Disk3 Disk4 Disk Disk1 Disk2 Disk3 Disk4 (a) RAID-5 scalig fro 4 disks to 5 disks (all data blocks eed to be igrated except blocks, 1 ad 2). Disk Disk1 Disk2 Disk3 (reoved) Disk Disk1 Disk2 4) MDM: MDM [12] eliiates the parity odificatio/coputatio cost ad decreases the igratio cost, however it causes ew probles. For exaple, as show i Figure 5, blocks, 4 ad 8 are oved to the ew disk ad their origial positios are served as a ew parity (P4), which leads to a ueve data ad parity distributio. I MDM approach, all parity blocks are aitaied but it caot iprove the storage efficiecy by addig ore disks. The layout after scalig becoes uch ore coplex tha a typical RAID-5. Because the uber of data blocks i a parity chai reais uchaged, the perforace is liited. Disk Disk1 Disk2 Disk3 Disk4 Disk Disk1 Disk2 Disk3 Disk4 (b) RAID-5 scalig dow fro 4 to 3 disks (all data blocks eed to be igrated except blocks ad 1). Fig. 5. RAID-5 scalig fro 4 to 5 disks usig MDM approach. Fig. 1. Roud-Robi approach. Disk1 Disk2 Disk3 Disk4 Disk Disk Disk1 Disk2 Disk3 Disk4 5) FastScale: FastScale [25] is the latest RAID- scalig approach with low overhead ad high perforace. However, as show i Figure 6, it caot be used i RAID-5. Fig. 2. RAID-5 scalig fro 4 to 5 disks usig GA algorith (early all data blocks eed to be igrated except several special blocks, 1, 2, 4, etc.). Fig. 6. RAID- scalig fro 3 disks to 5 disks usig FastScale approach. Disk Disk1 Disk2 Disk3 Disk4 Disk Disk1 Disk2 Disk3 Disk4 Fig. 3. RAID-5 scalig fro 4 to 5 disks usig Sei- approach (ay blocks reai i the origial disks by chagig the etadata, e.g., blocks 6, 1 ad 13). decrease the total I/Os caused by data igratio ad parity odificatio. Except for the above scalig approaches, soe RAID-based systes focus o the scalability issue. I 199s, HP AutoRAID [21] perits a olie expasio of disk array. Later, several RAID-based architectures [15] [2] are proposed for large scale storage systes, ad scalability is oe of the ost sigificat ipacts i these systes. Brika et al. [3] gives atheatical aalysis o a storage syste by addig several disks. Frakli et al. [6] itroduces a feasible ethod to support extesio of RAID systes, but it eeds a additioal disk as spare space. Recetly, with the support of differet file systes, RAID-Z [2] ad HDFS RAID [11] achieve acceptable scalability i distributed storage systes. Disk1 Disk2 Disk3 Disk4 Disk Disk Disk1 Disk2 Disk3 Disk4 Fig. 4. RAID-5 scalig fro 4 to 5 disks usig ALV approach (all data blocks eed to be igrated). C. Our otivatio We suarize the existig scalig approaches i Table II. Although existig scalig approaches offer soe advatages, they have soe drawbacks. First, previous approaches cause high overhead o the scalig process, icludig high overhead o data igratio, high parity odificatio, XOR calculatios ad updates o etadata. Secod, MDM has low igratio cost, but it caot iprove the perforace ad storage 3
TABLE II SUMMARY ON VARIOUS FAST SCALING APPROACHES IN RAID-5 (FEATURES 1-6 COME FROM SECTION II-A) Nae Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Dow-scale support? Others oe Sei- oe ALV aggregate sall I/Os MDM low storage efficiecy high availability ad flexibility efficiecy via scalig. The last proble is the reliability issue, particularly o ovig data durig the scalig process. I suary, existig scalig approaches are isufficiet to scale a RAID5 efficietly, which otivates us to preset a ew approach,, to acieve efficiet RAID scalig. 1 2 1 2 III. APPROACH 3 3 I this sectio, Global -based Redistributio () approach is desiged to accelerate RAID-5 Scalig. The purpose of is to iiize the data igratio, parity odificatio ad coputatio cost fro a global view o all stripes, ot liited to operatios o ay sigle data/parity eleet as Roud-Robi [9] [17] [23]. Except for reducig the overhead of scalig process, retais the origial data ad parity layout of the RAID- 5 (ulike the MDM approach [12]), which achieves better perforace after scalig. To clearly illustrate the stripes before/after scalig, we defie four types of stripes as follows, Old Used (OUS): A used stripe before scalig. Old Epty (OES): A epty stripe before scalig. New Used (NUS): A used stripe after scalig. New Epty (NES): A epty stripe after scalig. 1 2 3 4 4 5 6 7 8 9 (a) Scale-up (addig disks). 1 2 3 4 5 4 A. Overview of is show i Figure 7, which is a stripe-level scalig approach. The data oveets i scale-dow (reovig disks) is i a opposite directio of scale-up (addig disks). Accordig to the differece o parity chais before/after scalig, soe stripes with shorter parity chais are retaied i the origial disks, while the others are destructed for igratio. Based o differet fuctios, the stripes with shorter parity chais are further divided ito three categories i : Retaied OUS/NUS (s -2 with shorter parity chais i Figure 7): all data ad parity blocks are retaied i a sae disk. The parity blocks will be odified if data blocks are igrated ito (or reoved fro) the correspodig parity. Reapped OUS/NUS (s 3-5 with shorter parity chais i Figure 7): all data blocks are retaied i a sae disk by reappig to a ew stripe. Destructed OUS/NUS (s 6-9 with shorter parity chais i Figure 7): all data blocks are igrated to aother disk(s). I each destructed OUS/NUS, the blocks are Fig. 7. (b) Scale-dow (reovig disks). 6 7 8 9 approach for RAID-5 scalig. igrated to the ew disk(s) for scale-up or the reaiig disk(s) for scale-dow. abides by the followig four steps, Step 1 (Idetificatio): Idetify the disk array before scalig. Check the free space of each disk (icludig ew disk(s)) ad acquire the related paraeters, such as ad. Step 2 ( Distributio): Calculate the aout of the retaied, the reapped ad the destructed OUS/NUS. Step 3 ( Processig): Hadle the retaied, the reapped ad the destructed OUS/NUS cocurretly. Reliability ad availability schees are provided. (For retaied OUS/NUS): Update the stripe ID; 4
(For reapped OUS/NUS): Reap all data blocks ad distribute ew stripe IDs; (For destructed OUS/NUS): Migrate all data blocks ad distribute ew stripe IDs. Step 4 (Parity Processig): Modify all parities. Accordig to these four steps, i Figure 7(a), we take RAID-5 scalig fro 3 to 5 disks as a exaple ( = 3, = 2) ad the total uber of stripes is 1. After idetificatio, we calculate the aout of retaied, reapped ad destructed OUS which are 3, 3, ad 4, respectively. I the stripe processig step, blocks 6-11 are reapped ad the etadata iforatio are updated, blocks 12-19 are igrated to the ew disks. The correspodig stripe IDs are updated accordigly. Fially, we odify the parities Q Q4. For exaple, Q = P D 14 D 17. As show i Figure 7(b), scale-dow is the reverse process of scale-up. I this paper, we oly preset the theores, equatios, algoriths ad schees o scale-up (addig disks), the related theores ad equatios o scale-dow (reovig disks) ca be easily derived by siilar ethods or through atheatic trasforatios, ad are ot preseted here due to the page liit. B. Scalig Process i Sectio III-A describes the process to scale a RAID-5 of disks by disks. I this sectio, we give detailed descriptio of the scalig process. To siplify the descriptio, the default data ad parity distributio i RAID-5 is right-syetric or right-asyetric, siilar equatios ca be derived for the leftsyetric or left-asyetric distributio. Figure 7(a) shows a siple scale-up exaple. Actually, for large aout of stripes, a detailed scalig process is show i Figure 8, which presets ultiple stripe sets after scalig ad each stripe set cosists of + stripes. 1) Distributio: The portio of various types of stripes are based o the followig theore, Theore 1: I approach, the ratio aog the retaied, reapped ad destructed OUS is + 1 : ( 1)( + 1) : 1 Proof: Based o the layout of RAID-5, each stripe has ( 1) data blocks before scalig ad ( + 1) data blocks after scalig. The total uber of data blocks reais uchaged, { B = ( 1)S B = ( + 1)S (1) The total uber of stripe set is, N s = S + = B ( + )( + 1) Each stripe set cotais retaied OUS ad 1 reapped OUS, obviously, S r = N s = B ( + )( + 1) (2) (3) 1 2 3 4 5 X X+1 X+2 X+3 X+4 X+5 Y Y+1 Y+2 Y+3 Y+4 Y+5 Y+6 Y+7 X X+3 Y Y+3 X+1 X+4 Y+1 X+2 2X+1 2X+11 X+5 Y+4 Y+2 2Y+1 2Y+11 Y+5 Y+6 2Y+14 2Y+12 2Y+13 Y+7 2Y+15 2X+1 2X+11 2Y+11 2Y+1 2Y+14 2Y+13 2Y+15 2Y+12 1 2 3 4 5 6 7 8 9 Fig. 8. RAID-5 scalig fro 3 to 5 disks usig approach (ultiple stripe sets after scalig with = 3 ad = 2). S = 1 N B s = ( + )( 1)( + 1) (4) The reaiig stripes are destructed OUS, S d = S S r S = B ( + )( 1) Accordig to Equatios 3, 4 ad 5, the ratio aog the retaied, reapped ad destructed OUS is ( 1)(+ 1) : 1. (5) + 1 : Obviously, i Figure 8, for the stripe ID of the reapped OUS ad destrcuted OUS, X = S r, Y = S r + S. 2) Processig: Differet strategies are applied to various types of stripes i the stripe processig step. Assuig that the stripe ID ad disk ID of a OUS before scalig are i ad j, the correspodig stripe ID ad disk ID after scalig are i ad j. 2.1) For Retaied OUS: The stripe ID will be chaged for retaied OUS. Based o Theore 1, the followig equatio ca be derived, i = ( + ) i + (i od ) (6) For exaple, as show i Figure 8, if we eed reap 5 before scalig, first we should calculate the stripe set ID ( i = 5 3 = 1). Secod we have the correspodig stripe ID after scalig which is 7 (1 5 + 5 od 3 = 7). 5
TABLE III DIAGONAL ORDER OF DATA MIGRATION USING IN FIGURE 8 First Set Secod Set 2Y+2 2Y+5 2Y+1 2Y+4 2Y 2Y+3 2Y+8 2Y+11 2Y+7 2Y+1 2Y+6 2Y+9 2Y+14 2Y+13 2Y+12 2Y+15 It is also clear that the correspodig disk ID for all data blocks i the retaied OUS reais uchaged (j = j). 2.2) For Reapped OUS: For data blocks i reapped OUS, the key proble is to deterie their correspodig positios. Assuig that the correspodig stripe set ID of a data block is deoted by S s, it ca be calculated by, i Sr S s = 1 (7) Suppose the related data block after scalig (i i with disk ID j ) ad we have the followig equatio, { i = ( + ) S s + + ( i ( 1)+j Sr ( 1) od ) j = j (8) For exaple, as show i Figure 8, if we wat to reap the Block (2X+9) before scalig, first we should calculate the Set ID ( i S r 1 = 4 2 2 3 = 1). Secod we have the correspodig ID after scalig which is 9 (1 5 + 3 + 3 od 2 = 9). 2.3) For Destructed OUS: Typically, processes the blocks i destructed OUS for every stripes to esure high reliability. As show i Figure 8, Y to (Y+2) are distributed to the ew disks siultaeously (6 data blocks are processed together). Thus for a data block i the destructed OUS, the rage of a stripe set ID S s after scalig is, (i Sr S ) ( 1)+j (+ 1) 1 S s (i Sr S (9) ) ( 1)+j + 1 (+ 1) first calculates the rages of stripe ID ad disk ID blocks i the destructed OUS, { Ss ( + ) i (S s + 1) ( + ) j + 1 (1) The igrates the data blocks i diagoal order as show i Table III ad Algorith 1. Regradig the stripe processig, we have the followig theore o the the total uber of data oveets, Theore 2: I approach, the total uber of igrated data blocks is B +. Proof: For each stripe set, ( + ) data blocks are igrated (as show i Figure 8), so the total uber of data blocks to be oved is, N d = N s [ ( + ) ] = B + (11) Algorith 1: Get the Blocks i Diagoal Orderig /*Get the data blocks i destructed OUS for every stripes*/ k: a rado iteger i: stripe ID i stripes, i 1 j: disk ID, j 1 forall the k = 1;k 1;k + + do forall the j = ;j 1;j + + do i = (j + k) od ; if i! = j (/*right-syetric or right-asyetric*/) the get the block i stripe i ad disk j; else break; 3) Parity Processig: I our scalig process, each parity is odified oly oce, savig the odificatio ad coputatio cost of parity blocks. The total uber of odified parities is, N p = 1 S = B + 1 (12) Accordig to the exaples i the last subsectio, by extig disks, the legth of parity chais is icreased by a value of. Thus XOR calculatios are take for each odified parity ad the total uber of XOR calculatios is B + 1. C. Data Addressig Algorith I RAID scalig, a critical issue is to ap the address of a block before scalig to its address after scalig. We propose the fllowig data addressig algorith i Algorith 2 to calculate the addresses, which is a fast addressig ethod ad ca be easily ipleeted. Algorith 2: Data Addressig Algorith of Calculate the aout of the retaied OUS (S r), the reapped OUS (S ) ad the destructed OUS (S d ). if data or parity block is i retaied OUS ( i < S r) the calculate i based o Equatio 6, j = j. if data block is i reapped OUS (S r i < S r + S ) the data block is reapped accordig to Equatio 8. if data block is i destructed OUS (S r + S i < S r + S + S d ) the (1) specify the address rage based o Equatios 9 ad 1; (2) retrieve the data blocks i diagoal order (siilar to Algorith 1); (3) distribute ew addresses sequetially. forall the i = ;i 1 (S r + S + S d ) + 1 ;i + + do forall the j = ;j + 1;j + + do if j! = i (/*right-syetric or right-asyetric*/) the distribute the address i stripe i ad disk j ; 6
D. Properties of Sectio II-A ad Table II list six desired features o RAID-5 scalig. Our satisfies all these features. Fro the discussios i Sectio III-B ad III-C, satisfies the features 1-3, which guaratee uifor data ad parity distributio, iial the oveets of data/parity eleets ad fast data addressig. Features 4 ad 6 are discussed i detail i Sectio IV. also satisfies Feature 5 (felxible) as explaied below. 1) High Flexibility (Feature 5): Most previous approaches are ot flexible to adapt RAID-5. Roud-Robi approach has various effects o differet data ad parity distributio of RAID-5 [9]. FastScale should cosider differet cases accordig to the uber of extig disk(s) (value of ) [25]. Fro the exaples show i Figures 7 ad 8, our perfors well i ay data ad parity layouts of RAID-5 ad ay value of. Therefore deostrates higher flexibility tha other approaches due to the global view of all stripes. I additio to satisfy all these desired features of RAID-5 scalig, our also deostrates high avaialbility. 2) High Availability: approach provides a availability schee whe o space is available for ew write requests, If a ew epty stripe (NES) is available with the correspodig ID i S r + S + S d, write the NES sequetially. I the scalig process, if o NES is available ad a stripe set is available with epty blocks, first copletes the stripe ad parity processig i this stripe set, ad the writes the epty blocks for the ew requests. Let s take a exaple. Assue all disks have the sae capacity i a disk array based o RAID-5 (icludig the exted disks), before scalig, 2% space is available ad 8% space are used for storig data. If we expad the disk array accordig to i Figure 8, we ca provide ore tha 69% free space available for write requests 3. IV. SCALABILITY ANALYSIS I this sectio, we evaluate the scalability of copared to other approaches to show its advatages o scalability. A. Evaluatio Methodology We copare approach to Roud-Robi () [9] [17] [23], Sei- [8], ALV [24] ad MDM [12] approaches. FastScale [25] is ot copared because it caot support RAID-5. I our copariso, a two-iteger tuple (, ) deotes scalig a RAID5 of disks by disks. A egative uber of eas to reove disks fro the array (scale-dow). Our coparisos iclude: 1) Scale-up (addig disks) aog various approaches: coparisos aog, Sei-, ALV, MDM ad, several represetative values of ad are chose; 3 3+2 3 1 2% +8% 3 3 2 3+2 1 2%+1 2 3 69%. 2) Bidirectioal RAID-5 scalig (both scale-up ad scaledow): coparisos betwee ad. A origial RAID-5 array with six disks ( = 6) by addig or reducig disks whithi a rage fro 3 to 3 ( =, ±1, ±2, ±3). We defie Data Migratio Ratio (R d ) as the ratio of the uber of igrated data/parity blocks to the total uber of data blocks. Parity Modificatio Ratio (R p ) deotes the ratio of the uber of odified parity blocks to the total uber of data blocks, which is caused by the data/parity igratio. Metadata Modificatio Ratio (R ) is used to deote the ratio of the uber of odified etadata to the total uber of data blocks. For exaple, for scale-up, we have the data igratio ratio of based o Equatio 11 ( > ), R d = N d B = + (13) Accordig to Equatio 12, all parity blocks eed to be odified usig ad the parity odificatio ratio is, R p = N p B = 1 + 1 (14) Fro the stripe processig i, all data blocks i the retaied OUS keep their origial etadata iforatio, ad oly the etadata of the blocks i the reapped or destructed OUS are chaged. Therefore, the total uber of odified etadata is (the total uber of data ad parity blocks ius data ad parity blocks i retaied OUS), B + + 1 S r = (2 + 2) B ( + )( + 1) The etadata odificatio ratio is, R = 2 + 2 ( + )( + 1) (15) I RAID-5 scalig, each data igratio oly costs two I/O operatios, ad the odificatio cost of each parity also causes two I/Os. Accordig to the data igratio ratio (R d ) ad parity odificatio ratio (R p ), the total uber of I/O operatios is 2 N d + 2 N p = 2 (R d + R p ) B. If we igore the coputatio tie ad assue the sae access tie o a read or write request to a block usig various RAID-5 scalig approaches (deoted by T b ), suppose the igratio I/O ca be processed i parallel o each disk, the igratio tie T usig approach for scale-up is (Assue the igratio tie of each origial disk is T 1 ad the igratio tie per exted disk is T 2 ), { T1 = (N d + T = ax(t 1, T 2 ), where + N p) T b / T 2 = (N d + + N p) T b / (16) I our aalysis, the default data ad parity distributio of RAID-5 is right-asyetric. Siilar results ca be derived for other distributios. 7
B. Nuerical Results I this sectio, we give the uerical results of scalability usig differet scalig approaches. 1) Data Distributio: Regardig data distributio, we use the coefficiet of variatio as a etric to exaie whether the distributio is eve or ot as other approaches [8] [25]. A sall value of the coefficiet of variatio eas highly uifor distributio. Fro the itroductio i Sectio II, Sei- ad MDM suffer fro I/O load balacig proble, which are chose to be copared with. The results are show i Figure 9. We otice that sei- ad MDM cause excessive oscillatio by up to 46.8%, which fail to satisfy Feature 1 (uifor distributio). 6 4 2 Coefficiet Variatio (%) -2 1 2 3 4 5 6 7 Sei- MDM Fig. 9. Data distributio uder various ubers of exted disk(s) ( 7, = 3). 2) Storage Efficiecy: Secod, we copare the storage efficiecy betwee ad MDM as show i Figure 1. Copared to MDM, it clearly shows that saves the disk space by up to 23.3%. 1 8 6 4 2 Storage Efficiecy (%) 1 2 3 4 5 6 7 MDM Fig. 1. Storage efficiecy uder various ubers of exted disk(s) ( 7, = 3). I the followig Figures 11-16, the ubers (, ) i X- axis deote to scale a disk array of disks by disks. To the right of each figure, we also briefly list the results of scaledow whe is a egative uber. 3) Data Migratio Ratio: Third, we calculate the data igratio ratio (R d ) aog various fast scalig approaches as show i Figure 11. It is obvious that has the iial data igratio ratio as Sei- ad MDM. 4) Parity Modificatio Ratio: Fourth, parity odificatio ratio (R p ) aog various fast scalig approaches is preseted i Figure 12. Copared to, Sei- ad ALV, reduces the parity odificatio ratio by up to 87.5%. 5) Metadata Modificatio Ratio: Fifth, Figure 13 shows the etadata odificatio ratio (R ) uder various scearios. Copared to other fast scalig approaches (excludes MDM), reduces the parity odificatio ratio by up to 69.2%. 6) Coputatio Cost: Next, we calculate the coputatio cost i ters of the total uber of XOR operatios uder various cases as show i Figure 14. -based approaches have siilar coputatio cost. Except for MDM, we otice that schee sharply decreases ore tha 66.7% coputatio cost copared to other approaches. Figure 14(b) shows that perfors better for scale-up (addig disks), which is reasoable because the effects o the optiizatio of XOR calculatios are dropped uder the the fewer uber of disks ad the shorter parity chais. 7) Total uber of I/O Operatios: The results are show i Figure 15. Copared to, Sei- ad ALV, reduces up to 81.5% I/Os durig the scalig process. 8) Migratio Tie: Next, we evaluate igratio tie which is show i Figure 16 (the igratio tie of is based o Equatio 16). Due to the ueve data distributio, the igratio tie of Sei- ad MDM caot be calculated by our ethodology. Copared to other approaches, perfors well i ultiple disks extesio ad decreases the igratio tie by up to 68.%, which ca speed up the scalig process by a factor of up to 3.13. Copared to, is also efficiet o scale-dow as show i Figure 16(b). 9) Throughput: Fially, we use the axiu throughput of RAID-5 ( = 3) as the baselie (1%), the expected axiu I/O throughput after scalig ca be calculated as show i Figure 17. We ca see a clear perforace gap betwee ad MDM approach. Copared to MDM approach, ca iprove the write perforace of storage syste up to 15.2%. 36 3 24 18 12 6 1 1 Expected Maxiu I/O Throughput (%) 129.3137.3 174.6 159.4 211.5 189.7 248.7 219.7 25.3 285.3 28.6 322.3 311.3 1 2 3 4 5 6 7 MDM Fig. 17. Expected axiu I/O throughput after scalig uder various ubers of exted disk(s) ( 7, = 3, 1% write ode with uifor data access). C. Aalysis Fro the results i Sectio IV-B, copared to, Sei- ad ALV, has great advatages. There are several reasos to achieve these gais. First, is a global aageet schee cosiderig all stripes, which saves ost stripes by retaiig their data ad parity blocks. It plays a iportat role to decrease the igratio cost. Secod, by usig a parallel ethod, optiizes the XOR coputatios i the scalig process, which decreases the coputatio cost. Third, 358.8 8
12 1 8 6 4 2 Data Migratio Ratio (%) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 57 57 57 5 5 5 5 5 5 4 4 4 43 43 43 44 44 44 38 38 38 25 25 25 29 29 29 2 2 2 17 17 17 Sei- ALV MDM Data Migratio Ratio (%) 12 1 8 6 4 2-3 -2-1 1 2 3 (b) Scale-dow ad scale-up. Fig. 11. Data igratio ratio uder differet RAID-5 scalig approaches. 12 1 8 6 4 2 Parity Modificatio Ratio (%) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 67 6 57 5 5 5 43 4 25 25 25 2 17 2 14 2 17 17 14 13 Sei- ALV MDM Parity Modificatio Ratio (%) 12 1 8 6 4 2-3 -2-1 1 2 3 (b) Scale-dow ad scale-up. Fig. 12. Parity odificatio ratio uder differet RAID-5 scalig approaches. 18 15 12 9 6 3 Metadata Modificatio Ratio (%) 1 1 1 125 12514 125 125 125 12 1215 117 117 117 117 117 117 114 114 114 12 12 12 114 114 114 113 113 113 12 12 12 117 117 117 96 9 89 92 78 7 67 69 58 57 57 5 5 45 43 44 4 37 38 29 25 2 17 Sei- ALV MDM Metadata Modificatio Ratio (%) 15 12 9 6 3-3 -2-1 1 2 3 (b) Scale-dow ad scale-up. Fig. 13. Metadata odificatio ratio uder differet RAID-5 scalig approaches. 24 2 16 12 8 4 Coputatio Cost (%) 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 67 5 5 57 4 43 25 2 Sei- ALV MDM 5 Coputatio Cost (%) 24 2 16 12 8 4-3 -2-1 1 2 3 (b) Scale-dow ad scale-up. Fig. 14. Coputatio cost uder differet RAID-5 scalig approaches (the uber of B XOR operatios is oralized to 1%). sacrifices a sall aout of destructed old used stripes (OUS), which helps keep the origial data ad parity layout of RAID- 5. This aitais a uifor workload ad achieves high storage efficiecy. also has potetial to have positive ipact o igratio by aggregatig sall I/Os as ALV [24] ad FastScale [25]. Copared to MDM approach, has a little higher cost o parity/atadata odificatio ad coputatio. This is reasoable because MDM approach keeps the whole parity chais well, which saves the parity odificatio cost as uch as possible. However, as show i Figure 5, MDM chages the origial layout of RAID-5, which causes several probles, such as extreely ueve data distributio, low storage efficiecy ad poor write perforace. 9
4 4 4 4 4 4 4 4 4 4 4 4 48 4 32 24 16 8 Total Nuber of I/O Operatios (%) 4 4 4 4 4 4 4 4 4 4 4 4 248 22 214 188 18 186 162 13 14 148 146 16 12 128 124 116 116 114 1 92 14 114 1 9 9 86 76 88 8 74 74 66 58 5 4 34 Sei- ALV MDM Total uber of I/O Operatios (%) 48 4 32 24 16 8-3 -2-1 1 2 3 (b) Scale-dow ad scale-up. Fig. 15. Total uber of I/O operatios uder differet RAID-5 scalig approaches (the uber of B I/O operatios is oralized to 1%). 1 8 6 4 2 Migratio Tie (%) 1 1 8. 8. 8. 8. 66.7 66.7 66.7 66.7 66.7 66.7 57.1 57.1 57.1 57.1 57.1 57.1 5. 5. 5. 5. 45. 44.4 44.4.3 36.7 25. 26.7 22.2 22.6 19.9 22.6 16.1 17.2 14.2 ALV Migratio Tie (%) 15 12 9 6 3-3 -2-1 1 2 3 (b) Scale-dow ad scale-up. Fig. 16. Migratio tie uder differet RAID-5 scalig approaches (the igratio tie of B T b is oralized to 1%). V. CONCLUSIONS I this paper, we propose a Global -based Redistributio () approach for bidirectioal RAID-5 Scalig (both scale-up ad scale-dow). Our coprehesive atheatic aalysis shows that achieves better scalability i RAID-5 copared to other schees i the followig aspects: 1) uifor data distributio; 2) fewer operatios o data igratio, parity/etadata odificatio ad XOR calculatio; 3) reduced igratio cost by up to 68.% ad faster scalig process by a factor of up to 3.13; 4) high reliability ad availability durig the igratio process; ad 5) iproved storage efficiecy ad perforace after scalig. VI. ACKNOWLEDGEMENTS We thak aoyous reviewers for their isightful coets. This research is sposored by the U.S. Natioal Sciece Foudatio (NSF) Grats CCF-11265, CCF-112624, ad CNS-112629. Ay opiios, digs, ad coclusios or recoatios expressed i this aterial are those of the author(s) ad do ot ecessarily reect the views of the fudig agecies. REFERENCES [1] M. Arbrust et al. Above the clouds: A berkeley view of cloud coputig. Techical Report EECS-29-28, UC Berkeley, Feb. 29. [2] J. Bowick. RAID-Z. http://blogs.su.co/bowick/etry/raidz, 21. [3] A. Brika et al. Efficiet, distributed data placeet strategies for storage area etworks. I Proc. of the ACM SPAA, 2. [4] P. Che, E. Lee, et al. RAID: High-perforace, reliable secodary storage. ACM Coputig Surveys, 26(2):145 185, Jue 1994. [5] E. Corporatio. Leveragig EMC CLARiiON CX4 fully autoated storage tierig (FAST) for eterprise applicatio deployets. Techical Report H-6951, EMC Corporatio, February 21. [6] C. Frakli ad J. Wog. Expasio of RAID subsystes usig spare space with iediate access to ew space. US Patet 23/115412 A1, Jue 23. [7] S. Ghadeharizadeh ad D. Ki. O-lie reorgaizatio of data i scalable cotiuous edia servers. I Proc. of the DEXA 96, 1996. [8] A. Goel et al. SCADDAR: A efficiet radoized techique to reorgaize cotiuous edia blocks. I Proc. of the ICDE 2, 22. [9] J. Gozalez ad T. Cortes. Icreasig the capacity of RAID5 by olie gradual assiilatio. I Proc. of the SNAPI 4, 24. [1] A. Gulati et al. BASIL: Autoated I/O load balacig across storage devices. I Proc. of the USENIX FAST 1, 21. [11] Hadoop Wiki. HDFS RAID. http://wiki.apache.org/hadoop/ HDFS-RAID, 211. [12] S. Hetzler. Storage array scalig ethod ad syste with iial data oveet. US Patet 2827657, Jue 28. [13] Hitachi Data Systes. Hitachi Virtual Storage Platfor Architecture Guide. http://www.hds.co/assets/pdf/ hitachi-architecture-guide-virtual-storage-platfor.pdf?wt.ac=uk hp sp2r1, March 211. [14] M. Hollad ad G. Gibso. Parity declusterig for cotiuous operatio i redudat disk arrays. I Proc. of the ASPLOS 92, 1992. [15] K. Hwag et al. RAID-x: A ew distributed disk array for I/O-Cetric cluster coputig. I Proc. of the HPDC, 2. [16] E. Lee ad R. Katz. Perforace cosequeces of parity placeet i disk arrays. I Proc. of the ASPLOS 91, 1991. [17] N. Brow. Olie RAID-5 Resizig. drivers/d/raid5.c i the source code of Liux Kerel 2.6.18. http://www.kerel.org/, Septeber 26. [18] D. Patterso. A siple way to estiate the cost of dow-tie. I Proc. of the USENIX LISA 2, Philadelphia, PA, October 22. [19] D. Patterso et al. A case for Redudat Arrays of Iexpesive Disks (RAID). I Proc. of the ACM SIGMOD 88, 1988. [2] Y. Saito et al. FAB: Buildig distributed eterprise disk arrays fro coodity copoets. I Proc. of the ASPLOS 4, 24. [21] J. Wilkes et al. The HP AutoRAID hierarchical storage syste. ACM Trasactios o Cop. Sys., 14(1):18 136, February 1996. [22] X. Yu et al. Tradig capacity for perforace i a disk array. I Proc. of the USENIX OSDI, 2. [23] G. Zhag et al. SLAS: A efficiet approach to scalig roud-robi striped volues. ACM Tras. o Storage, 3(1):1 39, March 27. [24] G. Zhag et al. ALV: A ew data redistributio approach to RAID-5 scalig. IEEE Tras. o Coputers, 59(3):345 357, March 21. [25] W. Zheg ad G. Zhag. FastScale: Accelerate RAID scalig by iiizig data igratio. I Proc. of the USENIX FAST 11, 211. 1