ac. Self-Prtecting Strage TM Simplifying Yur Data Strage Infrastructure.......... Dave Therrien CTO and Funder ExaGrid Systems Inc.
. Overview Tday, string, prtecting and managing yur crprate business data is a cmplex, cstly, errr prne, and time cnsuming endeavr. This paper details the challenges that IT departments face in delivering reliable data strage services t their business users and applicatins. Then, a cllectin f emerging data strage and data management technlgy will be reviewed technlgy that is instrumental in addressing the requirements f an ideal strage system. Finally, an innvative Self-Prtecting Strage TM system that leverages these emerging technlgies will be described. Tp 8 Data Strage Challenges IT departments have always been challenged t deliver increasing amunts f reliable, available data strage capacity. Tday, fr each and every data strage system that must be deplyed, it takes a creative mix f multiple independent hardware strage cmpnents, sftware licenses and management tls t meet the budgetary limitatins as well as the availability requirements f business units applicatins. Data strage and strage management wasn t always this fractured and cmplicated. In the days f mainframe strage, disk and tape subsystems were highly integrated with data management sftware that guaranteed reliable, available data strage. With pen systems, this integrated data strage and data management envirnment was replaced with dzens f independent, ften incmpatible hardware cmpnents, sftware packages and management tls. The strage administratr tday must be an expert at prperly installing, cnfiguring, mnitring and maintaining dzens f disparate vendrs prducts. They must be able t blend their selected mix f data strage, data prtectin and data management prducts int a wrkable, reliable strage system. Here s a list f 8 specific data strage challenges that IT administratrs are facing tday: 1. Primary Strage Management Every time a filesystem runs ut f available strage capacity, IT administratrs must get invlved in manually creating additinal space. Out f Space capacity management remains a cmplex, manually intensive, errr prne prcess that requires Page 2
immediate attentin frm IT administratrs, regardless f the hur f the day. 2. Primary Strage Utilizatin - IT administratrs must ften make decisins n the specific placement f data acrss multiple strage subsystems and technlgies with little t n infrmatin n the criticality f data t the business. 3. Traditinal Backup - Backup sftware and backup systems are cmplex t install, cnfigure and manage n a daily basis. Manual backup peratins are time-cnsuming and errr-prne. Cstly backup infrastructures (servers, backup sftware, tape libraries, drives, and media) are cmplex, difficult t maintain, and must be expanded each year t accmmdate the increasing amunt f data that has t be backed up every weekend n dzens f new backup tapes. 4. Traditinal Restre Custmers have reprted tape-based restre success rates as lw as 70%. Anything less than 100% restre success rates have never been acceptable t business users, but tday, when regulatry auditrs visit t request data that must be restred frm tape, it s nt gd enugh t restre just MOST f the data. 5. Traditinal Archiving When data is archived t tape, it s deleted frm the servers t free up space fr new data. This archive/delete mdel frustrates end-users and causes applicatins t fail when the wrng data is archived and deleted frm servers. In additin, data n lder generatin tapes need t be regularly upgraded t newer tape media technlgy as lder tape drives becme bslete. This is prhibitively expensive t remedy fr cmpanies with hundreds r thusands f lder generatin tapes. 6. Tapes in Trucks fr Disaster Recvery - Mst custmers dn t build ut a disaster recvery site until a site disaster ccurs, s when disaster strikes, it takes weeks just t purchase, deply, and cnfigure the recvery systems, strage, sftware and netwrking assets. Once this infrastructure is in place and peratinal, the prcess f restring data frm tapes that are brught in frm an ffsite tape strage facility begins. It culd easily take days, and smetimes weeks t cmpletely and successfully recver all f the data. 7. Over-replicatin - Fr each megabyte f user/applicatin data that is created and mdified, tday s islated data management prducts can easily create 10 t 20 megabytes f data in varius replicated frms. RAID, snapshts, nsite backups, ffsite backups, archive tapes, HSM tapes and ffsite replicatin cpies all cntribute t the ver-replicatin prblem. Page 3
. 8. In Tapes, we dn t trust - Strage administratrs have lst their trust in successfully restring data frm backup magnetic tapes. And as these tapes age r are reused, they are less likely t reliably retain their data. With s many data strage and data management issues, there needs t be sme new thinking abut hw data strage systems shuld be architected in the future. Emerging Strage Technlgies Frtunately, university and industry research in the field f data strage scalability, reliability, and availability is being adapted t address the issues that strage administratrs are facing tday. Many f these emerging strage technlgies cannt just be layered nt tday s existing data strage systems and data prtectin tls. It will take a different kind f strage architecture t bring these features and assciated benefits t cmmercial use. Here are 10 emerging strage technlgies that will bring abut dramatic imprvements in reducing the cst and cmplexity f data strage and data management slutins while increasing the verall availability f data. 1. Integrated Data Management Tday s islated data management prducts (backup, archiving, HSM, and replicatin) will be replaced by efficient, space cnserving integrated data management prducts. These will maximize availability f business data and cnsume the minimal amunt f strage capacity. These integrated data management prducts will be driven by a single, simple prtectin plicy that replaces tday s multiple, independent data prtectin management interfaces. 2. Redundant Array f Inexpensive Servers (RAIS) This new strage architecture will replace tday s independent primary strage and data prtectin hardware and sftware prducts with lw-cst cmpute/netwrk/strage server bricks. Each brick will perate independently t reliably stre and prtect its wn data and all bricks will wrk with each ther t ensure verall availability acrss multiple internetwrked data centers. 3. Lcatin Independent Strage This is a key technlgy t prviding high availability systems that can survive server, netwrk r pwer failures. When a single brick fails, data can be re-replicated t surviving bricks withut a lss f access by clients and applicatins. 4. Grid cmputing systems tday are delivering distributed, scalable, highperfrmance, high-availability, prcessing pwer t business applicatins with heavy cmputatinal demands. Grid cmputers allw cmpute resurces t be discvered autmatically, t be serviced and upgraded Page 4
nn-disruptively and t be shared by multiple applicatins t increase verall resurce utilizatin rates. Nw imagine a grid cmputer that is designed slely fr prviding primary strage, nsite and ffsite backups, fast disaster recvery, lng-term data preservatin, and tiered strage alng with all f the additinal benefits f grid cmputing. And every time mre strage capacity is added t the system, mre prcessing pwer is added t deliver unprecedented scalability. 5. Reverse Delta Cmpressin This delivers cnstant full-backup grade restre perfrmance frm an incremental-nly cntinual backup mdel. This technlgy cnsumes 40x t 1000x less capacity (and peratr time) than tday s backup prcesses and sftware. 6. Cntent Naming Hashing cdes like SHA-1 and MD5 prvide a number f useful functins fr underlying data management prducts. They can be used t uniquely name files, t autmatically place files at specific destinatin ndes based n their cntent name, and t perfrm integrity checking and crrectin f all data n a cntinual basis. 7. Versin Chains These represent a cncise packaging f the cmplete time-based lifecycle f a file frm creatin, thrugh each mdificatin t deletin. This space-efficient strage mechanism replaces wasteful weekly full tape backups where mre than 90% f the backup cntent is unchanged frm week t week, but still gets written t anther set f backup tapes anyway. 8. Delta-based WAN transfers In rder t replace tday s tapes in trucks mdel f ffsite strage / disaster recvery management with MAN r WAN distributed disk-based repsitries, smart difference nly inter-site replicatin is a requirement since these MAN/WAN links have limited bandwidth cmpared t LAN bandwidth. 9. Data integrity scrubbers Tens t hundreds f terabytes f backup and archive data can be checked AND crrected n a daily basis by scrubbers that run in parallel acrss tens t hundreds independent, intelligent strage bricks. This helps in prviding ultra-reliable data restres cmpared with tday s lwer restre success rates. 10. Hierarchical Strage Management (HSM) What if 90% f yur cmpany s inactive data culd be autmatically migrated t a lwer cst tier f disk strage frm high perfrmance FibreChannel r SCSI disk strage t lwer cst SATA disk strage? HSM is a technlgy that s been arund since the heyday f the mainframe. The benefits f HSM are numerus. HSM eliminates the manual prcesses assciated with file systems filling up imagine eliminating the manual prcesses f LUN allcatin, vlume expansin and filesystem expansin activities. Page 5
. With HSM, all files remain visible t applicatins, whereas with archiving, inactive files are deleted frm servers nce they are cmmitted t tape. HSM knws which files are inactive, s it takes the peratr guesswrk ut f determining which files t migrate t lwer cst strage and which files shuld remain n high perfrmance strage. HSM can actually help t accelerate disaster recvery times by allwing users t access their data as sn as the much smaller pinter files have been re-established. Strage prduct vendrs that can incrprate these emerging strage technlgies int their strage strategy will mre effectively be able t address the requirements f the ideal strage system. One such system is explained belw. Self-Prtecting Strage TM The emerging strage technlgies utlined abve can be leveraged t create a Self-Prtecting Strage TM system that integrates primary strage with cmplete nsite and ffsite backup, autmated data migratin, site disaster recvery, and lng-term data preservatin. InfiniteFiler InfiniteFiler InfiniteFiler InfiniteFiler InfiniteFIler Repsitry Repsitry A Self-Prtecting Strage TM system prvides the fllwing features: InfiniteFilers prvide distributed NAS primary strage fr clients and applicatins that generate file data. HSM technlgy allws these NAS servers t effectively never run ut f strage capacity. Page 6
Repsitries represent a virtual pl f scalable, disk-based strage capacity fr string InfiniteFiler backup data and fr maintaining inactive InfiniteFiler data in a lwer-cst tier f strage. Each repsitries is cmprised f 2 r mre s that act like a distributed grid cmputer t deliver incrementally scalable, pay-as-yu-grw, strage capacity. GRID cmputing and RAIS technlgies allw the s f these repsitries t be aut-discvered, and aut-cnfigured in rder t eliminate tedius manual strage allcatin tasks. Data is transparently migrated between high perfrmance InfiniteFiler disk strage and lwer-cst Repsitry disk strage based n client r applicatin access patterns. HSM incrprated int the InfiniteFilers prvides this autmated migratin capability. Fast, cntinual, unattended, incremental-nly nsite & ffsite backups. Versin chains, incremental-nly backups, delta cmpressin and deltabased WAN cmpressin all cntribute t delivering dramatic imprvements in backup capacity cnsumptin as well as backup executin times. Fast and ultra-reliable restres f any versin f a file, r any directry f files frm a previus pint in time. Fast tw-phase site disaster recvery. HSM and delta-based WAN cmpressin help t reduce site disaster recvery time by 30x. Self-healing mechanism that checks and crrects its data cntinually. RAIS technlgy, GRID-cmputing technlgy, data integrity scrubbers, cntent naming, and lcatin independent strage all cntribute t a selfhealing architecture. The benefits f Self-Prtecting Strage TM include: Reduced capital equipment csts Leverages lw-cst cmmdity PC servers, high perfrmance SCSI and lw-cst SATA disk strage, gigabit Ethernet Inactive data is autmatically placed at a lwer cst tier f strage N tapes t purchase, n backup sftware, backup servers, tape library units r tape drives fr backup N mre replicatin sftware and replicatin servers t purchase N mre archiving sftware and archive servers t purchase Reduced peratinal csts An integrated apprach t data strage and data prtectin reduces the cmplexity, eliminates the ver-replicatin f data and increases the availability f crprate data. Page 7
. Simple capacity expansin just add disks t the netwrk n primary strage LUN allcatin, vlume management r filesystem management N tapes t purchase, lad and unlad, ship ffsite and request nsite N tape backup prcess hassles (tapes stuck in drives, n available tapes in the tape pl, ) N weekends tied up mnitring and managing backup jb failures Simple pint and click disaster recvery cmpleted within hurs Self-crrecting data delivers ultra-reliable restres N mre having t guess abut what data t archive frm high perfrmance strage t lwer cst strage with fully autmated data migratin N mre having t hunt dwn ld tapes frm an archive all data is accessible all f the time. Reduced service csts N mre string tapes ffsite at a tape strage vault If yu re interested in learning mre abut hw a Self-Prtecting Strage TM system can reduce yur strage cst and cmplexity while increasing yur applicatins data availability, please cntact us at www.exagrid.cm. ExaGrid Systems Inc., 2000 West Park Drive, Westbr, MA 01581 Telephne : 508-898-2872 x226 Email : marketing@exagrid.cm Page 8