Block1. Block2. Block3. Block3 Striping

Introduction to RI Team members: 電機一 94901150 王麒鈞, 電機一 94901151 吳炫逸, 電機一 94901154 孫維隆. Motivation Gosh, my hard disk is broken again, and my computer can t boot normally. I even have no chance to burn my cartoon and drama into V, it will cost me a lot of time to download again. Is there a method to automatically backup my data when I download a file? Yes, that is RI (Redundant rray of Independent isks) or (Redundant rray of Inexpensive isks).. RI LEVEL The basic conception of RI is that combine couple of small and cheap drives into array that offered greater capacity, reliability, speed. So the I has double meanings Independent and Inexpensive. RI is a means of spreading data into many drives by using disk striping (RI 0), disk mirroring (RI 1), disk striping with parity (RI 5). epending on the level chosen, the benefit of RI is one or more of increased data integrity, fault-tolerance, throughput or capacity compared to single drives. To spread data to every drive evenly, data must be divided into a lot of same-sized chunks (usually 32K or 64K). epending on the RI level chosen, write every chunk into the drives of array. When data are read, use the counter process. So that it can make an illusion that many drives are a big drive. RI 0 lock1 Mirroring lock1 lock1 Simply put, we divide the data lock2 into parts and store them in more Striping lock2 Mirroring lock2 than two disks, the division makes the disks work more faster than one Logical rive disk does. RI 0 wouldn t store reduplicate data. When storing one data, RI 0 has the lowest disk capacity requirement. ut if any block in RI 0 goes wrong, the combination has no ability to recover the data. RI 1 We divide data into part, and store the same data in more than two disks. In other words, we OPY the lock1 lock1 lock2 data and store in different disks as backup. RI 1 has lock1 lock2 Mirroring lock1 highest disk capacity requirement, but it provides the most reliable data and best recovery ability. lock2 Logical rive

RI 0+1 It seems like that RI 0 and RI 1 are simple and they have their own advantages and disadvantages. If we combine them as RI 0+1, then we have a proper lock1 Mirroring lock1 lock1 lock2 Striping lock2 Mirroring lock2 Logical rive way to apply RI in use. First, we divide data into many parts, storing them in two(or three, four ) disks(as RI 1). Second, we copy the divided data and store the copy ones in other two(or three, four )disks. This way, we will have a combined disks with advantages as RI 0 & RI 1. In other words, the combined disk work efficiently and has recovery ability. RI2 & RI3 RI2 and RI3 have the highest IO speed because the controller run all drives simultaneously (they divide a datum into bit or byte and spread to all the drives). ut they can t service multiple requests simultaneously, a datum is read all of drives woke and none have time to read another datum. So they are not used today. RI4 & RI5 RI4 and RI5 are both use parity to evaluate their fault-tolerance. Parity are computed by data in other drives of array. When a drive of array can t work, data in it can be computed by parity and data in remainder drives. When the broken drive is replaced by a same standard drive, original data can be rebuilt in the new one. Parity in RI4 is store in a specific drive. However the writing speed is limited by the parity drive. So RI5 break the limitation, it spread parity into all drives in the array. The speed limitation is only the process of computing parity, and it will cost lot of time. Now let s see how RI5 work when a drive is broken. Supposed that RI5 use (the operator we learn in chapter1) to compute parity. Why can compute parity see the form. data broken compute parity data data parity 1 1 0 data data broken parity data

1 1 0 How RI5 works see the illustrations Logical rive parity parity Logical rive parity parity broken Normal situation Logical rive One drive broken parity computing computing Write parity Write Rebuild data The comparison of different version RI, see the front. Name escription of disk array ata reliability ata Transition Rate Max IO Transition Rate RI 0 Store data in parallel but no fault-tolerance lower than single disk very high high in read &write data parity data 1 1 0

RI 1 ll data copied to higher than RI2,3,4 read:higher than one read:twice than single N disks but lower than RI 6 single disk. disk. write:like single write:like single disk disk RI 4 ata written in Much higher than read:just like RI read:like RI 0. different disks in single disk. Just like 0.write:much lower write:much lower than parallel. RI5 than single disk single disk RI 5 ata written in Much higher than read:just like RI read:like RI 0. different disks in single disk. Just like 0.write: lower than write:usually lower parallel. RI4 single disk than single disk RI Have advantages Higher than RI very high high in read &write 1+0 of RI0,RI1 2,3,4. Hardware RI & Software RI The implementation of RI can be sorted into two groups: the software ones and the hardware ones, due to what controls the spanning, mirroring, or parity calculation RI. For the software RI implementation, the operating system handles the disks of array through a normal disk drive controller, controlled by program codes. The speed of software RI depends on how fast the PU of the computer is. Since the quality of PU has improved a lot recently, the software approach is likely to be better than the hardware one. However, the software RI has disadvantages: because the PU has a lot of tasks to perform, it is recommended not to use it to do RI calculations, especially when the computer is busy. Sometimes it will cause data loss when there is a crash and sometimes it consumes time to wait for the arrangements of array to be rebuilt. So that s when the hardware approach comes in. Unlike the software RI, this approach needs at least a special RI controller. Sometimes it is set in the motherboard, sometimes it is in the form of expansion card. The controller ménages the parity calculations of the disks and allow operating system to rest. y the way, in order to boot up the speed, hardware RI also have a special battery back-up write back cache system. This cache allows quick storage and access to disk drive (flushed by controller). From the comparison above, we know that hardware implementation seems to be better, but actually the controllers may sometimes make mistakes. So the most common use of RI control nowadays is the hybrid RI (partly hardware, partly software). For this combined one, controllers are normal but there is a back up cache and users can set up RIs control through IOS. For the OS, the disks will still look like one big block of disk. The parity calculations are still performed by software, but cache can ensure the security and speed of data transaction.. onclusion Over all, RI is not a new technology. t least it is mature enough. However, why

doesn t it become popular? The answer is clear: most of us don t need it. ctually, only servers that need to store great amounts of data in a short time with faults prevention need RI. For Ps, it isn t necessary and for supercomputers, only high transaction rate of PU and RM is needed. Therefore, only servers that contains a lot of data, like those of bbs boards, need RI. Still, there are improvements for this interesting technology: even for the most common RI 5, the speed is still limited by the parity calculation and the word redundant means waste of resources. Even though RI is a useful concept, there might be new approaches replacing it in the future. E. Reference 1. http://en.wikipedia.org/wiki/redundant_array_of_independent_disks 2. http://www.aopen.com.tw/tech/techinside/ri.htm 3. http://www.linwei.com.tw/raid_2.html 4. http://www.hk.redhat.com/docs/manuals/linux/rhl-9-manual-tw/custom-guide/ch-r aid-intro.html