HYDRAstor: New Architecture for Disk-based Backup

HYDRAstr: New Architecture fr Disk-based Backup GlassHuse Whitepaper Intrductin Disk is n lnger an ptin in a prperly designed backup system it is an essential cmpnent. At this pint in the industry, upgrading yur tape drives withut adding disk t the picture can actually result in a decrease in backup perfrmance. This tw part paper will start with an explanatin f why this is the case and will then explain hw many f the current methds fr adding disk t yur backup system have limitatins that can create significant perfrmance, manageability, and data integrity issues. The secnd part f the paper will describe an ideal backup architecture and then explain hw a new additin t the disk based backup market HYDRAstr frm NEC gives yu the benefits f disk withut the drawbacks assciated with ther disk slutins. The Prblem with Tape-based Backups Mst everyne acknwledges that disk increases the perfrmance and reliability f a backup system. They als understand that disk helps them deal with the recurring prblems f shrter backup windws, increasing amunts f data, and mre stringent restre requirements. What isn t cmmnly understd is the actual prblem that disk slves. The cmmn miscnceptin is that tape is slw and disk is fast; hwever, this perceptin masks the real reasn why disk perfrms s well as a backup target. The real reasn is that backups perfrm better when sent t a device that can match the speed f the backup whether slw r fast and disk des that much better than tape. Tape s inability t match incming backup speeds is the key reasn behind its perfrmance and reliability issues during bth backup and restre. A fundamental architectural issue affecting perfrmance fr tape backups exists at a purely physical level. A tape drive recrding head must be mved at a brisk pace acrss its recrding medium in rder t achieve a high signal t nise rati, and a high signal t nise rati is essential in reliably recrding data t a tape. Based n this fact, all tape drives have a minimum speed at which they can reliably write data t tape they cannt write slwer than that speed. This is true even fr newer drives that supprt variable speeds; they have a minimum speed belw which they cannt perate. In additin, cmpressin actually increases this minimum speed. Fr example, if a tape drive s minimum speed is 30 MB/s and if data being sent t the drive is cmpressing at a rate f 1.5:1, the drive s real minimum speed is 45 MB/s. When yu send data t a tape drive at a rate slwer than its minimum speed, it will write shrt bursts at that speed, stp, rewind, and then spin up t its minimum speed again nce the tape drive buffer is full. This prcess f stpping, rewinding, and starting again is called backhitching, and if yu d it a lt, it s called sheshining. If yur backup system is already perfrming a great deal f sheshining and this is typically the case and yu upgrade t faster tape drives, things culd actually get wrse, and yu will experience an even greater hit t yur perfrmance! Fr example, if yu Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 1

are currently sending 15 MB/s t a tape drive whse minimum speed is 20 MB/s, and yu upgrade t a drive whse minimum speed is 30 50 MB/s, yu will actually increase the level f sheshining. This s called upgrade will cause a decrease in backup perfrmance and an increase in backup failures due t media prblems! Buying faster tape drives may actually be slwer than yu think. Many readers may be surprised t learn that their servers cannt supply this kind f speed t a tape drive fr backups. The prblem is that fragmented file systems, fragmented databases, and file systems with many files ften supply nly a few megabytes per secnd especially during incremental backup. As t the netwrk, peple ften upgrade nly a few servers t Gigabit, and even Gigabit Ethernet is ften unable t supply this kind f thrughput. These tw bttlenecks physical attributes f tape architecture, and file fragmentatin leading t speed cnstraints cnspire t stp us frm sending enugh data t stream mdern tape drives. Sme backup sftware prducts use multiplexing, als called interleaving, t send multiple simultaneus backups t the same tape drive in rder t mitigate these issues. Hwever, every additinal backup jb sent simultaneusly t a tape drive reduces the restre speed f all jbs sent t that tape drive because a restre f a single jb frm a multiplexed tape must read and thrw away all ther backups t get t the ne backup frm which it is restring. This drawback is why many peple cnsider multiplexing a necessary evil when backing up t tape getting mre necessary and mre evil all the time. Disk-based Prducts have their Pr s and Cn s Disk slves these challenges because disk can read r write data at any required speed. As a result, disk can be used t simultaneusly and reliably receive all f yur slwer backups withut the issues assciated with multiplexing. Once backups have been sent t disk, they can then easily be streamed t tape at tape s native speeds, r sent t anther disk backup device. In additin, there are a number f newer backup and recvery methds (CDP, near CDP, and data de duplicatin) that can be applied nly if sent t a disk target; they wrk with backups t tape. Nw that we ve established hw imprtant disk is t a backup and recvery system and why it is becming a required element in mst enterprises, let s take a clser lk at the usual methds fr deplying disk in a backup system. Disk targets fr backup systems are mst ften categrized int tw distinct types: diskas disk and disk as tape. A disk as disk system presents t yur backup server a target f either raw disk r a file system. A disk as tape targets presents t yur backup system as devices that lk like tape drives (when in reality they are disk). The first challenge with bth types f current disk targets is cst. Where a fullycnfigured mid range tape library laded with media tends t range frm $2 5/GB, disk system prices tend t range frm $3 30/GB, with mst recgnizable names twards the high end f that scale. Mst VTL systems range frm $4 $12/GB. The answer t the cst issue lies in the eliminatin f duplicate data. Writing tw sequential full backups f 1TB each t disk r tape requires 2TB f capacity. Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 2

Duplicate eliminatin technlgies (als referred t as cmmnality factring, data de duplicatin, and similar terms) recgnize that much data perhaps as much as 99% f it des nt change frm ne backup t the next. Because f this fact, these technlgies eliminate the duplicate data in the secnd backup. While yu still need 1TB capacity fr the first backup, yu may need nly 0.05TB fr the secnd. Fr a standard backup plicy, this happens week after week. After tw t three mnths f backups, yu find that fr 20TB written, yu need nly abut 1TB f capacity. This finding is equivalent t a duplicate eliminatin rati f 20:1. Therefre, effective pricing per GB will change as data de duplicatin features start t becme generally available, as less disk space will be required fr data prtectin. Hwever, sme vendrs implementatins f de duplicatin have cme at the expense f perfrmance r capacity. Fr example, sme disk targets can de duplicate nly within a single appliance and can handle nly a certain amunt f strage within that appliance. If yu need mre perfrmance r capacity than a single appliance can prvide, yu re frced t buy anther appliance. Data will nt be de duplicated acrss thse appliances, resulting in a significant lss f aggregate capacity as the verall duplicate eliminatin rati can drp frm 20:1 t abut 10:1 r even lwer with each additinal appliance. Thus, the higher the number f appliances, the lwer the duplicate eliminatin rati and the higher the cst f hardware (mre capacity) and cst f peratins (mre systems t manage)! The secnd challenge with mst disk targets is hw they achieve resiliency, r hw they ensure a system survives disk and system failures withut suffering data lss. Resiliency is als increasingly imprtant as data de duplicatin becmes ppular. In rder t eliminate duplicate data, the data stream is split int small chunks such as a part f a spreadsheet with financial infrmatin. The system then checks if the same chunk has be stred befre. If s, nly a pinter t the stred cpy f the chunk is saved, the secnd cpy is discarded. One piece (smetimes referred t a fragment r a chunk, indicating a piece belw the file r the blck level) f data is nw used by ptentially many different backups. Withut sufficient resiliency, lsing ne data chunk can result in the lss f many, many backups! Mirrring wuld be t expensive due t its 100% strage verhead, s mst f these units use parity based RAID (RAID 3 6) fr resiliency purpses. Unfrtunately, parity RAID cmes with a number f limitatins: RAID levels 3 5 can handle the lss f nly a single disk in the RAID grup, while RAID 6 can recver frm tw simultaneus drive failures. RAID has anther disadvantage, and that is perfrmance. The first perfrmance penalty cmes frm the calculatin and strage f parity infrmatin, which is why all parity RAID levels have a perfrmance penalty during write peratins. There s als a significant perfrmance penalty when reading data with a missing r damaged drive referred t as degraded mde because all data read frm the RAID grup must be rebuilt frm parity. Hwever, nthing cmpares t the perfrmance hit f a rebuild. In additin t rebuilding any requested data frm parity, an additinal prcess is reading every parity blck in the RAID grup and using thse parity blcks t recalculate the blcks f data fr the missing disk, then writing thse blcks f data t the replacement drive. These Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 3

peratins are all managed by the same RAID cntrller that s using the same RAID grup t write new data fr backups r t read data fr a restre. Anybdy wh has suffered a parity RAID rebuild knws hw big this perfrmance hit can be and the recent utilizatin f large ATA disk drives in RAID arrays has made this issue nly wrse. These larger disk drives have rebuild times that take days instead f hurs, increasing the length f time that yu culd ptentially lse mre drives than yur RAID cnfiguratin was built t withstand leaving yur data vulnerable and creating t much risk. RAID 5/6 arrays als cannt easily be expanded, which is why mst disk units are expanded by adding anther RAID 5/6 grup t RAID grups that are already cnfigured. Disk-as-disk targets Disk as disk targets ften experience a number f challenges when being used with backup systems; the bigger the backup system, the mre challenges yu are likely t have with a typical disk as disk unit. T start with, it can be quite a challenge t prvisin a disk as disk system fr use with multiple backup servers. Yu must decide hw much data each backup server is likely t generate, and then create and prvisin a file system f apprpriate size fr each backup server (r grup f backup servers in sme cases). T prevent the challenges assciated with under prvisining, mst peple verprvisin. Over prvisining, hwever, results in a significant amunt f wasted disk, which in turn makes the slutin even mre expensive. Anther methd fr dealing with the challenges f prvisining disk as disk systems is t create a large share n a NAS filer and share it between multiples backup servers. Hwever, this methd creates anther perfrmance bttleneck, as all data is ruted thrugh a single filer head. There are newer NAS filers that can help slve this prblem thrugh use f a glbal namespace, but sme filers are nt as gd at reading r writing very fast streams f data. In additin, they re still based n parity RAID and therefre suffer frm the challenges assciated with parity RAID. Speaking f perfrmance, althugh mst backup systems derive the mst perfrmance frm many smaller vlumes, this setup is in direct ppsitin t the desire t minimize the number f file systems t reduce the amunt f wrk assciated with prvisining s many file systems. Finally, mst disk as disk systems suffer frm fragmentatin when used as a target fr backup and recvery systems. This fragmentatin can result in a significant decrease in perfrmance ver time and ften can be fixed nly by re prvisining and mving data frm ne file system t anther. Disk-as-tape (VTL) targets Virtual tape libraries (i.e., disk as tape) slve many f the challenges listed abve. Fr example, it is much easier t prvisin a VTL between multiple backup servers because backup sftware cmpanies have already figured ut hw t share a tape library. They Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 4

als use custmized file systems that were designed t stre backup data, usually remving the fragmentatin issue. Hwever, VTLs aren t perfect. Sme still require a significant amunt f prvisining wrk behind the scenes. Fr instance, sme VTLs require yu t create RAID grups and file systems just as yu wuld with a disk as disk system. All VTLs als use parity RAID, which means they suffer the previusly mentined limitatins f parity RAID, including resiliency, rebuild and risk prblems. Nt all VTLs are equally scalable, either. While sme can scale with additinal disk and CPUs, many are fixed capacity units that can be scaled fr perfrmance r capacity nly by buying anther separate unit. These limitatins increase management and lad balancing related tasks. Finally, while VTLs are designed t enhance traditinal backup sftware, they cannt be used as targets fr advanced backup techniques such as CDP r near CDP, r fr ther general purpse strage needs, since these applicatins require that their targets that acts like disk, nt tape. A new idea Fr the secnd part f this white paper, let s cnsider a disk as disk target that slves all f the previusly mentined limitatins f tday s disk based targets. T d s, it wuld have t have the fllwing attributes: Affrdable effective cst cmparable t tape System wide data de duplicatin that reduces the amunt f disk required by a factr f 20:1 r mre, but des nt slw dwn as the amunt f data stred n it increases. Enterprise level scalability A single system designed t minimize peratinal cst and prvide hundreds f thusands f megabytes per secnd and handle tens f petabytes as a single lgical pl, all f which is managed as a single entity. Capacity n demand by simply plugging anther strage nde int the netwrk. The system will autmatically see and begin using the additinal capacity, withut having t enlarge r rebuild any RAID grups (since there aren t any), prvisin capacity, r perfrm any lad balancing. Any required behind the scenes wrk is dne autmatically withut affecting perfrmance. Perfrmance n demand by simply adding acceleratr ndes t bst verall perfrmance f the system. Highest resiliency f any system n the market User cnfigurable resiliency level t prvide prtectin against as many cncurrent disk r system failures as yu wuld like, withut the Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 5

limitatins f RAID. Hw many simultaneus failures d yu want t prepare fr? Three systems r disks? Fur? Seventeen? The level f resiliency is up t yu. N RAID grups, vlumes, r RAID cntrllers, hence nne f their limitatins. Self managing and self healing system N capacity pre allcatin f any particular file system is necessary. Never again ask, Hw big d I need t make this vlume behind this backup server? Hw much data will this backup server (r ther applicatin) generate? Autmatic lad balancing amng all cmpnents t ptimize capacity and perfrmance Autmatic self healing f all cmpnents. i.e., the system recgnizes failed disks and cmpnents autmatically and rebuilds lst data in the backgrund while maintaining full access and perfrmance t all data. HYDRAstr frm NEC NEC is ffering a new answer t these age ld prblems and limitatins the HYDRAstr data prtectin appliance. Althugh Glasshuse has nt yet been able t perfrm testing f this new system, we discussed its design extensively with NEC, and they described t us a strage system based n a grid strage architecture that includes the abve essential attributes and mre. The name HYDRAstr cmes frm Greek mythlgy. The Hydra was a multi headed, creature that was incredibly hard t kill. It was s hard t kill that killing it was given as a test t Hercules. If he cut ff ne head, it grew anther ne, and that is what the mdern HYDRAstr is like. Tell it hw many heads yu think yu might lse at ne time, and it will autmatically grw heads as needed. Frm ne perspective, HYDRAstr is a distributed system f inexpensive, very resilient NAS shares that are managed as a single entity and cllectively scale t tens f petabytes withut requiring the custmer t create r manage RAID grups, lgical vlumes r physical vlumes. All yu need t determine in rder t setup a HYDRAstr system are the fllwing three requirements fr yur envirnment: Yur thrughput requirements (i.e., what yu want yur aggregate thrughput t be) Yur capacity requirements (i.e., hw much data yu want t stre) Yur resiliency requirements (i.e., hw many simultaneus failures yu can withstand withut data lss) Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 6

Like the mythical hydra, a HYDRAstr cnsists f many heads series f independent, industry standard servers (ndes) that act as a single entity and are cnnected via a private netwrk. Sme f HYDRAstr s characteristics include: Thrughput is determined by the number f Acceleratr Ndes (ANs) yu deply, as each prvides at least ne NAS share and ver 100 MB/s f thrughput. Capacity is determined by hw many Strage Ndes (SNs) yu have, as each Strage Nde cntains 3 TB f raw strage, which (after resiliency verhead cnsideratins) can stre rughly 40 50 TB f de duplicated data. Resiliency is determined by yur requirements and the ttal number f ndes yu have; the mre ndes yu have, the mre resilient yur system will be. Even the smallest HYDRAstr is designed t survive at least three simultaneus independent disk failures and ne cmplete system failure. Once yu determine yur thrughput, resiliency, and capacity requirements, yu simply need t supply HYDRAstr with enugh Acceleratr Ndes and Strage Ndes t meet yur specified requirements (the standard base package is 2 ANs with 4 SNs). Additinal ndes can then be easily added at any time t increase thrughput r capacity. Strage in a HYDRAstr system is fund inside Strage Ndes. There are n strage arrays, n RAID cntrllers just a series f intercnnected ndes. Cnsider the HYDRAstr depicted in Figure 1, which represents the minimum recmmended cnfiguratin cnsisting f fur Strage Ndes (SNs), each f which currently has six 500 GB disk drives, and tw Acceleratr Ndes (ANs), each f which can access all Strage Ndes via a private netwrk. As will be discussed later, additinal ndes including newer Acceleratr Ndes, with faster prcessrs, and/r Strage Ndes with larger and/r faster disks can be added independently at any time t add additinal capacity r thrughput.. HYDRAstr supprts hetergeneus ndes; this factr allws IT t take advantage f new hardware technlgy withut a frklift upgrade and lts f dwntime. As yu add SNs, yu increase capacity and cmpute pwer fr accessing the data the disks within the SN, and this is a main key t HYDRA scalability. With a standard strage array, yu dn t add mre cntrllers every time yu add mre strage disks. AN AN AN Acceleratr Nde SN Strage Nde Private Netwrk SN SN SN SN Figure 1: A minimum HYDRAstr cnfiguratin Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 7

The cnfiguratin as depicted will supply ver 200 MB/s f thrughput while als prviding uninterrupted service t all NAS shares during any f the fllwing scenaris: Simultaneus failure f multiple disk drives (the number is up t yu) The default setting prtects against a minimum f any three simultaneus disk failures, but yu can increase this number if desired. Failure f a single Strage Nde The default setting als prtects data against the failure f an entire Strage Nde. As will be explained later, as mre Strage Ndes are added t the system, HYDRAstr autmatically increases the number f simultaneus Strage Nde failures that can be tlerated withut data lss. Failure f a single Acceleratr Nde Each Acceleratr Nde in a HYDRAstr grid supplies ver 100 MB/s f thrughput, and all Acceleratr Ndes run in an active active cnfiguratin. If ne Acceleratr Nde fails, anther Acceleratr Nde can take ver and serve the shares that the failed Acceleratr Nde was prviding. In the minimum standard cnfiguratin f tw ANs and fur SNs (illustrated in Figure 1), all shares wuld end up being served by the remaining Acceleratr Nde in the event the ther Acceleratr Nde failed. (Thrughput wuld, f curse, reduce t 100 MB/s, as that s the amunt f thrughput any single Acceleratr Nde can prvide.) In larger HYDRAstr deplyments, the lad culd be spread acrss all remaining Acceleratr Ndes. Just like Strage Ndes, thugh, with each Acceleratr Nde yu add t the grid, yu increase the number f simultaneus Acceleratr Nde failures yu can tlerate. Since shares can be redistributed amng all peratinal Acceleratr Ndes in the event f an Acceleratr Nde failure, the mre Acceleratr Ndes yu deply, the less f a perfrmance effect the lss f a single Acceleratr Nde wuld have n yur system. But hw des HYDRAstr wrk? If yu re like I was when NEC first explained HYDRAstr t me, yu re chmping at the bit wanting t knw mre abut hw this new thing wrks. Let s nt prlng the suspense any lnger. The fllwing sectin prvides an verview f hw HYDRAstr wrks, starting with a minimal cnfiguratin example then fllwing a backup stream thrughut its lifecycle t illustrate the unique features and benefits f the HYDRAstr architecture. We will then illustrate hw the expansin f a HYDRAstr system increases its capacity, thrughput, and resiliency. The best way t explain hw HYDRAstr wrks is t begin with pwering up a system, understand hw it initializes itself, and then fllw hw backup data is handled. Cnsider the HYDRAstr pictured in Figure 2, which represents the minimum cnfiguratin f tw Acceleratr Ndes and fur Strage Ndes. Fr simplicity s sake in this example, we will use the default settings. (Yu wuldn t need t knw r Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 8

understand these settings fr HYDRAstr t wrk, but we d want t explain them s yu can see hw HYDRAstr wrks.) In nrmal peratin, yu can, f curse, always accept the defaults r change them at any time. In either case, HYDRAstr adjusts autmatically. Initialize HYDRAstr Ease f installatin HYDRAstr s autmated setup simplifies data prtectin by establishing basic system parameters by default while als prviding ptins t custmize if yu desire mre cntrl. We pwer n the HYDRAstr system by turning n all Acceleratr Ndes and Strage Ndes in the HYDRAstr rack. Then, all we need t d is setup an administratr accunt and passwrd and an ptinal email address fr alerts. Then we tell HYDRAstr t create at least ne share (i.e., a file system) n each f the Acceleratr Ndes. T d s, we simply give each share a name that s it, we are dne. We dn t have t specify hw large the file system is, r assciate it with any vlumes, real r therwise. (Remember, there are n RAID grups r lgical vlumes in HYDRAstr, a fact which greatly simplifies the strage management verhead and cnfiguratin time.) All ther tasks are handled by HYDRAstr s self management feature. The Strage Ndes autmatically discver each ther, detect the default resiliency setting f three, and autmatically start setting up what needs t be created fr a new strage system. High Perfrmance Duplicate Eliminatin HYDRAstr uses patentpending technlgy t break files int data chunks, tag them with their wn unique DNA fingerprint and eliminate duplicate chunks with very high efficiency. Data Backup: HYDRAstr de duplicates data The fllwing sectin explains the next step in the prcess, namely, backing up t the HYDRAstr as a backup target. When backing up t the HYDRAstr, a backup applicatin wuld write a file t ne f the shares served by the HYDRAstr. HYDRAstr wuld then break that file int chunks. Althugh a chunk is similar t a blck, since HYDRAstr des nt always split data chunks at specific blck bundaries, the term chunk is used t prevent pssible cnfusin. (The detail f hw data are split int chunks is beynd the scpe f this paper. Suffice it t say the HYDRAstr dynamically splits data int variable size chunks with very high perfrmance based n NEC s patented technlgy. The patents filed by NEC n chunking, hashing, and duplicate eliminatin fcus n ptimizing perfrmance and speed. Once HYDRAstr splits each file int chunks, it needs t verify that:. 1. This same chunk has nt been stred befre. 2. The chunk is written in a way t meet the default r user specified resiliency requirements. The next thing that happens fr each chunk f data is that the Acceleratr Nde which receives the chunk calculates a hash ( DNA fingerprint ) n the chunk. Think f the hash as a unique r virtually unique multi bit DNA fingerprint r digital DNA that is tied t that chunk f data. Regardless f the hashing algrithm, n Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 9

vendr can state t a cmputatinal certainty that nly ne chunk f data wuld result in a given hash value. We can state that the dds f tw chunks creating the same hash (referred t as a hash cllisin ) are extremely rare (1 in 2 160 t 1 in 2 256 ). Hwever even thugh the chances f a hash cllisin are extremely small, NEC perfrms a bit level cmparisn as a further safety net fr data integrity t verify that tw chunks are the same befre discarding ne f them. The hash key is then passed n (based n a least recently used algrithm) t an available Strage Nde. Once a Strage Nde has been selected t either write r discard a given chunk f data, it requests the actual chunk in questin and then has bth the hash, which represents the data chunk, as well as the actual data chunk. If the hash lkup matches a hash already seen by HYDRAstr, the Strage Nde requests the previus data chunk that the new data chunk allegedly matches frm the super nde, then perfrms a bit level cmparisn f the tw chunks. As lng as the bit level cmparisn verifies the match, the new data chunk is discarded, and a reference pinter is stred instead. In the extremely rare chance that it desn t match, the new chunk will be written t the Strage Nde. The system will als write the data chunk t disk, f curse, if the hash lkup des nt shw a match t a previus hash. String Data with High Resiliency, Increasing Rebuild Perfrmance Backup Perfrmance During Rebuild Its unique rebuilding capability is hw HYDRAstr is able t maintain its backup perfrmance, even when ne r mre disks r Strage Ndes are being rebuilt. This next sectin will explain hw HYDRAstr writes new data chunks in a manner that prvides maximum resiliency while guaranteeing the default r user defined resiliency settings. First, each data chunk is brken int nine fragments (by default). Three parity fragments are calculated against the nine riginal data fragments in such a way that the data chunk can be recreated frm any nine f the twelve fragments. T ensure maximum resiliency HYDRAstr then distributes the twelve fragments acrss as many Strage Ndes and within each Strage Nde acrss as many disks as pssible. Management f all fragments f a chunk is handled by ne f several virtual super ndes. A super nde is a cllectin f several members, where a member is a prtin f a disk. Hence in ur example, each super nde has 12 members. With the default set s that three members share ne disk, in the minimum cnfiguratin setup we are using fr ur example, 24 physical disks in the fur Strage Ndes nw becme 72 (24 x 3) members which allws fr 6 (72 / 12) super ndes. Withut requiring any user interventin r cnfiguratin, HYDRAstr autmatically determines hw many members are in a super nde based n resiliency setting defaults. Simply pwering n the HYDRAstr with its default values results in the creatin f six super ndes, each with twelve members. Given the number f ANs and SNs in the cnfiguratin, the distributin f thse super ndes is autmatically ptimized t achieve maximum resiliency fr that cnfiguratin. A HYDRAstr system with fur Strage Ndes autmatically cnfigured with standard default settings is depicted in Figure 2. Super Nde A cnsists f the first third f the first twelve disks in the system, Super Nde B cnsists f the secnd third f the first twelve disks, and s n. Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 10

Figure 2: Six super ndes in a fur Strage Nde cnfiguratin (this is a simplified representatin fr ease f understanding) In rder t reduce the chance f data lss and t decrease the time needed t restre, nt all members f A, B and C (r D, E, and F) share the same physical disks. If yu need t restre r just read/write yur data, yu want t spread the lad acrss as many disks as pssible. While the final data write may be cnsidered analgus t RAID, there are key differences between the resiliency ffered by HYDRAstr and that ffered by RAID technlgy. Besides the bvius fact that there are n RAID cntrllers, vlume managers, r RAID grups in HYDRAstr (a nt insubstantial time and management headache saving task), the parity in HYDRAstr is always calculated in RAM. Writing parity t RAM thus requires nly ne chunk t be written with n need t re read data t recalculate parity, as is the case with RAID levels 5 & 6, which suffer frm significant perfrmance impacts as a result. Cmparing the perfrmance f an NEC system that s in rebuild mde fr ne f its disks r Strage Ndes t a RAID system rebuilding a disk, and there will be n dubt that there is a majr difference between the tw. The fllwing explanatin f hw reads wrk will help further explain hw HYDRAstr achieves this perfrmance.. The HYDRAstr de duplicates data. When an Acceleratr Nde requests a data chunk, it requests the chunk frm ne f the twelve members, which then requests the data chunk s fragments frm the ther members f the super nde. HYDRA then takes the first nine fragments it has been prvided in rder t recreate the chunk. If sme f the nine supplied fragments are parity fragments, the parity fragments are used t autmatically recnstruct the fragment in RAM befre being passed n t the applicatin. This unique rebuilding capability is hw HYDRAstr is able t maintain its backup perfrmance, even when ne r mre disks r Strage Ndes are being rebuilt. This rebuilding ptimizatin is a result f the fact that when add an SN, yu nt nly get mre capacity, but yu get additinal CPU prcessing pwer as well fr that SN CPU t handle data n that nde. The tw tier architecture and the unique methd that HYDRAstr prvides resiliency allws the system t maintain perfrmance even during a data rebuild peratin. Unlike in current strage prducts, HYDRAstr desn t have ne cntrller that becmes a bttleneck t the system. Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 11

Speed and Scalability HYDRAstr has a unique methdlgy fr creating, distributing, and string its hash table t greatly increase scalability while als imprving the de-duplicatin prcessing speed. Hw HYDRAstr Prcesses Hash Tables As hashes fr all data chunks are calculated as discussed abve, a list f all knwn hashes is als created and is referred t as the hash table. Fr perfrmance reasns, each super nde is autmatically nminated t manage ne prtin f the verall hash table. Distributing the hash table in this way ptimizes the scalability characteristics while greatly increasing the speed f de duplicatin prcessing f the underlying data structure. T ensure prper distributin f bth the hashes themselves as well as the de duplicatin key (i.e. the hash) each super nde is given respnsibility nly fr a certain range f hashes. Figure 2 illustrates this feature; fr example, Super Nde A will be respnsible fr any hashes starting with 00, Super Nde B will be respnsible fr thse beginning with 010, and s n. Fr resiliency purpses, each Strage Nde cntaining a member f any given super nde will als be given a cpy f the prtin f the hash table that manages the data fr that super nde. In the abve minimum cnfiguratin, every Strage Nde has a member frm every super nde. As a result, every Strage Nde als actually has a cpy f the entire hash table. As Strage Ndes are added t the cnfiguratin, the distributin wuld change t spread the hash table acrss mre Strage Ndes. (This built in data prtectin functinality is ne reasn t cnsider deplying mre than the minimum cnfiguratin.) HYDRAstr s divisin f its hash table acrss multiple Strage Ndes, is quite different than tday s systems that use hashing, because in thse systems, the mre data referenced by a system, the bigger the hash table grws, and the slwer the perfrmance becmes. HYDRAstr s methd f distributing the hash table acrss multiple Strage Ndes allws it t scale linearly as yu add Strage Ndes, increasing the speed f hash table lk ups as mre HYDRA heads grw. It is imprtant t nte that all we did t make all the steps discussed s far happen was t create the tw shares/file systems n the Acceleratr Ndes and pint backup prcesses at them. Everything else including the creatin f the Super Ndes, the system s resiliency, and the distributin f the hash table fr maximum system perfrmance happens autmatically when we pwered n the system. Expanding a HYDRAstr System Plug-n-Play Expansin Adding capacity t an existing HYDRAstr system is simple: if yu need mre capacity, yu simply plug anther Strage Nde int the private netwrk. Autdiscvery and aut-lad balancing allws scalability t tens f petabytes. Even with the strage efficiencies f data de duplicatin methdlgies, yu will at sme pint still need mre strage. Unlike with ther strage systems, adding capacity t an existing HYDRAstr system is simple: if yu need mre capacity, yu simply plug anther Strage Nde int the private netwrk. HYDRAstr autmatically sees the new nde, and the existing Acceleratr Ndes autmatically start using its new capacity t write new data. HYDRAstr als autmatically redistributes existing data t the new Strage Nde in rder t distribute the read write lad, ptimize perfrmance, and increase resiliency. This prtectin is autmatically achieved by splitting hash tables and extending Super Ndes r creating new nes: the larger the HYDRAstr system gets, the mre ndes it Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 12

can create t distribute the lad acrss, a key reasn why HYDRAstr can scale t tens f petabytes with n decrease in perfrmance. Aut lad balancing ccurs as a backgrund task and des nt impact the system s perfrmance. Figure 3: An expanded HYDRAstr (this is a simplified representatin fr ease f understanding) Cnsider the HYDRAstr cnfiguratin depicted in Figure 3. We have nw added eight mre Strage Ndes and fur mre Acceleratr Ndes t get a ttal f six ANs and 12 SNs. At least ne share wuld need t be created n each new Acceleratr Nde as yu may remember, yu can accmplish this task by simply giving each share a name in rder t take advantage f that nde s available bandwidth. Hwever, yu d nt have t assciate that name with any vlumes, file systems, Strage Ndes, r super ndes, as HYDRAstr autmatically cmpletes that level f cnfiguratin fr yu. If adding ANs sunds gd, let s take a lk at what happens when we add new Strage Ndes. This prcess is the true magic act, especially when yu cmpare the fllwing scenari t a typical strage system expansin prcess: T expand HYDRAstr, all yu have t d is simply plug additinal Strage Ndes int the private netwrk, that s it! During expansin, the HYDRAstr system autmatically discvers the capacity f the additinal Strage Ndes and starts expanding Super Ndes t increase its perfrmance and resiliency. In ur example where we are expanding utward frm the basic setup, HYDRAstr ntices that the Super Ndes in the riginal cnfiguratin f fur Strage Ndes culd be even rganized with ptimized resiliency by assigning ne member t each f the twelve Strage Nde instead f keeping 3 members n each f the riginal fur SNs and nne n the added eight SNs. HYDRAstr will therefre autmatically mve ne member t each f the twelve Strage Ndes in ur nw expanded cnfiguratin. Because there are nw many mre disks available, HYDRAstr has als autmatically created mre super ndes (super ndes G R), and divided the hash table int smaller pieces fr imprved resiliency and imprved speed. Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 13

Replacing r Upgrading Ndes in a HYDRAstr System Ease f Expansin and Manageability HYDRAstr enables data migratin with n active management required. All yu have t d is say hell t new ndes, gdbye t ld ndes, and HYDRAstr handles everything else. As with any strage system, it s imprtant t be able t replace r upgrade cmpnents n demand, as, fr example, when larger, faster hard drives becme available as inevitably seems t ccur. When that happens, HYDRAstr prvides yu with the chice f seamlessly replacing the lder units, r leaving them in place and adding the newer units as additinal capacity, the chice is yurs. T update a Strage Nde cnfiguratin, all yu need t d is plug in the newer, faster, Strage Nde(s) int the private netwrk, then tell HYDRAstr which Strage Nde(s) yu want t retire. HYDRAstr then autmatically migrates the data ff the sn t beretired Strage Nde(s) and mves it nt the new Strage Nde(s) yu ve added t the system cnfiguratin, all withut impacting end user perfrmance r requiring scheduled dwntime. It is imprtant t nte just hw lw impact this update prcess is yu dn t have t actively manage any f this data migratin. All yu have t d is the equivalent f say hell t these new ndes, say gdbye t these ld ndes, and HYDRAstr handles everything else. Pssibility fr Imprvement Starting ut as a NAS filer makes HYDRAstr easier t deply and easier t interface t existing backup systems, and HYDRAstr even in its initial rllut actually has mre available thrughput than mst backup servers can even use. Since netwrk backup servers typically can handle nly abut 30 60 MB/s via their Ethernet interface, yu wuld actually need multiple backup servers t generate enugh data t saturate even ne Acceleratr Nde. In sme really large envirnments, individual servers may have mre than a few terabytes behind them. These servers, hwever, have typically mved t LAN free backups due t the IP lad they were experiencing with LAN based backups. Since HYDRAstr s initial release is IP based, these servers will nt be able t deply HYDRAstr fr LAN free backups, since they need t send their backup data via the LAN t the NFS/CIFS share prvided by the HYDRAstr. Fr this reasn, GlassHuse believes the additin f blck level access cnstitutes a pssible area fr imprvement fr HYDRAstr. Glasshuse has discussed this suggestin with NEC, and NEC has indicated plans t add this functinality accrding t custmer demand. Due t the HYDRAstr architecture, an enhancement f this type wuld require a change nly in the Acceleratr Nde cde and wuld nt require any change at the Strage Nde level, thus greatly simplifying future deplyment shuld this change be made available in a future release. Ttal Cst f Ownership On the surface, it s easy t see that a device that csts at mst $.90/GB wuld be less expensive t perate than a device that csts $3 5/GB, which is what mst mid range fully ppulated tape libraries cst. The real savings, hwever, cmes frm the cst f management. Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 14

The first type f savings will cme frm ding away with the management activities typically assciated with tape. Media errrs will disappear, and failed backups due t media errrs will disappear. Therefre, system administratrs can spend less time ensuring that backups have cmpleted. In additin, all f the time spent trying t ensure that the tape drives are streaming will als cease. A disk device desn t need t be streamed, s these activities are simply nt needed. The biggest savings, thugh, will cme t an envirnment that is deciding between HYDRAstr and any system that requires maintaining RAID arrays. Think f the time assciated with laying ut and creating RAID vlumes. Cnsider the time spent ensuring that peratins cntinue successfully when any RAID arrays perate in degraded mde. The real time and management savings cme when it s time t upgrade hardware. Mst RAID based systems require frklift upgrades and migratin f data, csting management time, dwntime, and reducing system availability. Sme f this cst can be mitigated with migratin sftware, but that sftware desn t cme free either. With the HYDRA architecture, there are never any frklift hardware upgrades. Yu simply plug in new ndes and tell it which ndes yu want t retire, and HYDRAstr des everything else. Last but nt least, HYDRAstr grws with yur needs and n matter hw large, there is nly ne system which is mstly self managed a big difference t managing and prvisining many RAID arrays separately. Summary HYDRAstr truly is a unique creature. While much f the industry is talking abut grid strage, NEC is delivering it tday with the HYDRAstr platfrm. It s difficult t find fault with such a scalable, rbust system that allws yu t start small, yet incrementally and ecnmically grw t hundreds f petabytes with nly a few terabytes at a time. With beta deplyments starting in Winter f 2006, Glasshuse Technlgies lks frward t testing its capabilities as sn as we can. HYDRAstr, DataRedux, and Distributed Resilient Data (DRD) are trademarks f NEC Crpratin; NEC is a registered trademark f NEC Crpratin. All ther trademarks and registered trademarks are the prperty f their respective wners. All rights reserved. All specificatins subject t change. Cpyright 2007 GlassHuse Technlgies, Inc. All rights reserved. 15