NEXTGEN BACKUP SEMINAR PRESENTATION DOWNLOAD Optimizing Your Backup System presentation one Optimizing Your Backup System The backup system is one of the most expensive and troublesome systems in your data center, and yet it s rarely configured for optimum performance and maximum utilization of resources. This hour kicks off with an explanation of some overlooked features of typical commercial backup software products. Covering real-life examples of what to look for in your backup system to gauge whether or not it is optimized (including success rates, partial backups, consecutive failures, throughput rates and media utilization), Curtis drives home his points with customer data pulled from actual customers backup systems. This session will also explore the importance of using disk in your backup system. also covered The three rules of encryption: key management, key management, key management Role-based administration Separation of powers Background checks and other techniques presented by
Backup School 2008 W. Curtis Preston VP Data Protection GlassHouse Technologies
Optimizing Your Backup System Things you might not know about your backup system and what to do about them
A Little About Me When I started as backup guy at $35B company in 1993: Tape Drive: QIC 80 (80 MB capacity) Tape Drive: Exabyte 8200 (2.5 GB & 256KB/s) Biggest Server: 4 GB ( 93), 100 GB ( 96) Entire Data Center: 200 GB ( 93), 400 GB ( 96) My TIVO now has 5 times the storage my data center did! Consulting in backup & recovery since 96 Author of O Reilly s Backup & Recovery & Using SANs and NAS Webmaster of BackupCentral.com VP of Data Protection, GlassHouse Technologies
A Little bit about where I work GlassHouse is an independent professional services firm specializing in IT infrastructure This is important so you understand where I m coming from We don t make or resell any hardware or software No reason to promote or bash any product The information you will hear today is based on real experiences with hundreds of companies, including the largest companies in the world
Optimizing Your Backup System Your true success rate Partial backups Consecutive failures Tape throughput rates Media utilization Backup software features
Real Customer Data The following slides use real customer data (anonymized) gathered using our own backup assessment tool that collects & parses backup data and puts it in a database This allows us to collect and compare PIs across multiple backup servers and customers First time this data is being presented publicly 112 customers Average of 128K jobs per customer 13.7M backup jobs
Average Customer Success Rate Average success rate of 78% (average of averages) Weighted average of 91.2% (of 13.7M jobs)
Customers by success rate 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% <10 <20 <30 <40 <50 <60 <70 <80 <90 <95 <96 <97 <98 <99 <100 >50% have <90% success; >30% have <80% success!
Success Rate by Industry Retail Oil/Gas Manufacturing Insurance Healthcare Government Food Finance/Banking Entertainment Software/Hardware Communications Biotech/Pharm Airlines 0 10 20 30 40 50 60 70 80 90 100
So what? Success rate of 91% sounds good, right? A lot of customers assume that a success rate in the 90s is a good thing. Let s examine why it s not. That s 1.2M failed backups! Overall success rate doesn t tell the whole story. What about: Client-level success rates Consecutive failures
Client-level success rates Must zoom in on an individual customer for this level of detail Stats of this customer 35 backup servers 10,000 clients 477,217 backup jobs 92% successful 5% partial 3% failed Not too bad, right?
Successful 50% 52.49% 62.50% 68.30% 68.48% 68.94% 71.90% 74.37% 75.23% 76.20% 77.77% 79.79% 82.05% 82.74% 83.78% 86.45% 87.47% 92.12% 93.31% 94.44% 94.97% 95.96% 98.56% 98.67% 99.25% 99.47% 99.57% 99.70% 99.93% 99.95% 100% 100% Partial 39.47% 34.39% 36.25% 22.14% 17.97% 24.52% 26.56% 13.84% 22.02% 10.04% 19.38% 17.23% 16.67% 16.96% 14.90% 13.05% 11.25% 0.49% 2.66% 5.56% 5.03% 0.09% 0.13% 0.23% 0.05% 0.02% 0% 0.03% 0.02% 0.04% 0% 0% Failed 10.53% 13.12% 1.25% 9.56% 13.55% 6.55% 1.54% 11.79% 2.75% 13.76% 2.85% 2.98% 1.28% 0.31% 1.32% 0.50% 1.29% 7.39% 4.03% 0% 0% 3.95% 1.30% 1.10% 0.70% 0.51% 0.43% 0.27% 0.05% 0.01% 0% 0% Number of Jobs 38 1,044 80 12,896 47,679 3,789 3,053 23,378 109 458 2,627 3,489 78 2,937 5,013 6,002 6,295 406 32,219 144 179 35,759 33,493 100,501 89,836 8,906 21,871 10,257 14,323 7,708 1,053 1,597 Success Rate by Backup Server Same view by backup server Some with rather high failure rates But, still. Nothing too terrible, right?
Clients by Failure % Failed % Count 25-30 310 30-40 210 40-50 99 Failure percentage is the total number of failures divided by the total number of backups for a given client 50-60 60-70 70-80 80-90 90-100 88 13 162 61 57 Shown here is a summary of the top 1000 (out of 10,000) clients when sorted by percentage of failures Getting worse, right?
Consecutive Failures Consecutive Failures 4 5 6 7 8 9 10 11 Count 499 258 108 57 32 15 31 5 When two days go by without a single successful backup for a client, that s a consecutive failure. Shown here is a summary of the top 1000 (out of 10,000) clients when sorted by number of consecutive days of failure We collected 11 days of data Now things are getting ugly
Partial Backups Partial backups mean partial restores. People don t like partial restores! Definition: a backup that backs up some, but not all of the files it s supposed to Valuable resources are wasted Backing up database files in addition to using agent Constantly changing log files that no one cares about Important data gets missed New databases Applications that lock files with exclusive read lock! If you re overlooking your partials, you could be in for another surprise! Either exclude it or figure out how to back it up properly
Success Rate Lessons Success rate isn t everything Unless you re 100% successful, you have to look beyond it Different levels By backup server By client By consecutive failures Don t forget partial backups Consecutive failures is the single most surprising section of our backup assessments The best way to gather this level of detail is a commercial data protection management tool. If the tool is free, you re getting what you paid for. I can t imagine maintaining a reasonably sized backup environment without such a tool
Tape Utilization Again, let s look at a large customer s data 535,286 tapes with data on them Approximately 64% utilized 1 (pricing as of 2/08)
Real Money 1% increase in utilization = 5352 fewer tapes for this customer That s real money! $133,800 if LTO-1 $187320 if LTO-3 $535,200 if LTO-4 A 10% increase in utilization could save between $1.3M and $5.3M on media alone Further savings in tape library size, Offsite Vaulting contract How much could you save?
Increasing Tape Usage All: Reduce the number of pools especially for offsite tape NBU: Do not allow multiple retention periods per tape NW: Use Full/Non-Full pools and expire Non- Fulls sooner than Fulls NBU/NW: Minimize number of MPX settings TSM: Use collocation groups instead of nodelevel collocation. Spend what you need to in order to get expiration & reclamation done. Start reclamation of emptiest tapes first by slowly lowering your reclamation threshold.
Tape Drive Utilization Again, we must look at an individual customer for this level of detail Stats from this customer Backup assessment of >100 locations 285 backup servers 777,422 Backup Jobs 71% Success Rate 2.84 PB of backup data
Lots of tape drives! This customer has deployed hundreds of tape drives over hundreds of locations 971 tape drives DLT 7000 DLT 8000 LTO-1 LTO-2 LTO-3
More is less! 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% DLT-7000 DLT-8000 LTO-1 LTO-2 LTO-3 As tape drives get faster and faster, they are getting less and less out of them
More is less! They weren t streaming their 5 MB/s DLTs! LTO-3 =~ 40-150 MB/s w/their compression ratio. They re getting 13 MB/s Yes, faster drives are variable speed, but they re not like a CVT. They re a multi-speed bike. Ever tried to peddle a multispeed bike up a hill in a gear that was too high? That s your tape drive when you re not streaming it 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% DLT-7000 DLT-8000 LTO-1 LTO-2 LTO-3
Here s the problem 180 160 140 120 100 80 60 40 20 0 The servers can t get the data fast enough The spikes are LAN-free servers. Few others are >15-20 MB/s (For readability, 4 of the 285 servers were removed from this graph, >299 MB/s)
More Real Money What if average server throughput was increased from 20 MB/s to 60 MB/s (easily done with GbE)? Reducing server count by 185 (100 locations) would save this customer from $1M to $20M. Could increase even further with 10GbE What if average tape drive utilization was increased from 20% to 60% (easy)? Reducing tape drives count by 600 would save them from $6M to $18M on tape drives alone. What server/drive money are you wasting?
Get Better Plumbing Move what backups you can off the LAN Use LAN-free backups Use virtual full backups Increase LAN throughput any way you can TCP Offload Engine Updated TCP/IP stacks (e.g. Solaris 10) Jumbo Frames 10 GbE (600 MB/s+) backup network Don t buy another backup server, build a network for the ones you have! Backbone s probably not ready, so build your own -- just like we did before Fibre Channel and GbE were ready 10GbE switch with 24 GbE ports & 2 10GbE ports is <$4,000. Even if you had to buy 24 $1K NICs, that s under $25,000, which is less than the cost of most backup servers.
Get a Better Toilet Tape can be great if you keep it happy, and that s getting harder every day Using disk as an intermediary staging device to tape can make it much easier to stream your tape drive Storing all onsite backups on disk is now more possible and affordable than ever before Before you replace your existing tape library with yet another tape library, please seek independent advice on the cost of alternative solutions.
Important NetBackup Features Synthetic Full/Cumulative Backups Possible to adopt an incremental-forever approach for filesystem backups Flashbackup for the million-file problem SharedDisk (formerly SSO for disk) Enhanced disk backups (6.5.1) Storage lifecycle policies (6.5.1) Puredisk functionality
Important TSM Features Expiration & Reclamation Some people turn them off or cripple them Collocation Groups Minimize number of tapes a given server will be on Active Data Pools 1. FILE type DISK pool or sequential TAPE pool specifying pooltype=activedata 2. Update node's domain(s) specifying ACTIVEDESTINATION=<active-data-pool> 3. Issue COPY ACTIVEDATA <node_name>
Important NetWorker Features Max Sessions (7.3+) Hard limit to the number of sessions to a device Usage of multiple groups Previous versions did not handle multiple groups well Current versions allow you to specify whatever makes sense for you Saveset consolidation Possible to adopt an incremental-forever approach for filesystem backups EMC Avamar
Using Disk in Your Backup System
Disk Backup Targets Easier to stream tape drives when copying from backups sent to disk A good D2D2T system should easily be able to stream any tape drive (not all can do this) One reason is that randomly distributed data on source disks is serialized on backup disks Recoveries are also easier/faster if at least one copy of all backups is left on disk Enables other interesting possibilities
All your friends are doing it! 2007 survey of 163 respondents by the Enterprise Strategy Group Yes No 64% of respondents said they will implement disk backup by end of 2008 33% of respondents believed that deduplication was the key to making this happen
Disk Staging Backup to disk, copy/migrate to tape All else is the same Requires enough disk for one night s backups (e.g. 14%: 1/28 th or 4% for full, 10% for incrementals if you do full backup once a month and incrementals daily) Helps backups, not restores Restores still come from tape Still requires shipping tape Those who cannot remember the past are condemned to repeat it. * * Life of Reason, Reason in Common Sense, Scribner's, 1905)
Disk Backups Store all onsite backups on disk Offsite backups can be disk or tape Requires Bigger shift in thinking & procedures Operational restores come from disk Requires enough disk to hold all backups (e.g. 2000%: 400% for 4 monthly fulls, 100 days of 15% incrementals = 1000%, 15 10% differentials = 150%) Requires deduplication to be as affordable as tape next session
Resources from our sponsors q Whitepaper: Deduplication Storage for Nearline Applications q Best Practices Guide: Backup and Recovery for Microsoft Exchange Best Practices with Data Domain q Storage Research Report: Why Deduplication Technology is Causing a Paradigm Shift in Storage Tiering q The Growing Importance of Data De-Duplication q ESG Understanding the Power of Data De-Duplication q Cool Vendors in Data Protection q Regulatory Compliance: How Digital Data Protection Helps q Data Protection and Recovery The Why, The How, and Who to Go To q Top Ten Reasons for Using Disk-Based Online Server Backup and Recovery
Resources from our sponsors q Download this eguide, featuring articles from Storage magazine to learn how data deduplication works and how its products differ. q Download this Podcast for an insightful Q&A session with Curtis Preston, Vice President of Data Protection Services at GlassHouse Technologies to learn all about data deduplication. q Comparing Deduplication Approaches: Technology Considerations for Enterprise Environments q The Forrester Wave: Enterprise Open Systems Virtual Tape Libraries, Q12008 q TCO Comparison Report: Reducing Costs in the Data Center with Deduplication q Overview: Backup & Recovery Top 10 Reasons to Upgrade q Symantec Backup Exec System Recovery 8: The Gold Standard in Complete Windows System Recovery q Veritas NetBackup 6.5: Designing and Implementing Backups Using Storage Lifecycle Policies
Don t forget to download the other presentations from NEXTGEN BACKUP SCHOOL presentation two Dedupliction This session delves into the most talked about new technology in years: deduplication. Curtis will explain the basics of deduplication, why it should work for you and how it should work for you. Learn the difference between inline and post-process dedupe, forward and reverse referencing, hashing and delta differentials and source and target dedupe. Discuss which types are most appropriate for which types of data centers, and learn how deduplication affects (or doesn t affect) the most important thing of all: restores. presentation three Protecting Stored Data, Backing Up Virtual Infrastructure & Remote Data This session will start by explaining the challenges of backing up and recovering virtual machines residing in a virtual infrastructure such as VMware, Microsoft s Virtual Server or Virtual Iron. Curtis will cover the pros and cons of several backup techniques that can be used to back up any virtual machines, including VM-based backup and console-based backups. Curtis will conclude this hour by explaining several options that are specific to VMware, including VMware Consolidated Backup (VCB) and commercial tools aimed at this market. This session will examine the various threats to your stored data including your SAN, your backup system, your people and your tape and what to do to protect from each type of threat.