A Druva Whitepaper Corporate PC Backup - Best Practices This whitepaper explains best practices for successfully implementing laptop backup for corporate workforce. White Paper WP /100 /009 Oct 10
Table of Contents Introduction... 2 Understanding Who Needs Data Backup in Your Organization... 3 Deciding What Needs to be Backed Up... 3 PC Imaging and Bare-Metal Restore... 4 Deciding Backup Policies and Resources... 5 Data Deduplication Saving Bandwidth and Storage... 5 WAN Optimization... 5 Your Private Cloud - Planning Backup Infrastructure... 6 Service Level Guarantees - Understanding RPO and RTO... 6 Scale Up Vs Scale Out... 7 Training Users Power of Self Service... 8 Management and Reporting... 9 Summary... 9 About Druva... 10 Data Sources and References... 10 Introduction With the volume of enterprise data doubling every eighteen months, businesses are seeking new ways to tackle their data protection challenges. While data growth is not new, the pace of growth has become more rapid, the location of data more dispersed, and the linkage between data sets more complex. Backing up all that data is clearly a major and growing concern for enterprises. Only 32% of enterprises report that they are successfully able to protect their critical remote corporate data. The reason: while PC backup initially appears similar to server backup, it has completely different goals and service requirements. With users data scattered all across the organization, consolidating all of these requirements requires a different approach. With over 80% corporate data duplicate across PC users, Data deduplication offers companies the opportunity to dramatically reduce the amount of storage and bandwidth required for backups. The whitepaper looks at various available contemporary technologies to recommend best recommended practices for corporate PC backup. 2 Druva Software
Understanding Who Needs Data Backup in Your Organization The best starting point for your corporate PC data backup strategy is to identify the users who are most in need of consistent data backup. Part of that is based on determining where the need is greatest. This for many companies translates into where the risk of lost data is greatest. Consider the following cases: 1. The loss of critical user data which is not saved locally 2. Critical data is modified unintentionally and an older version is needed 3. A remote or traveling user loses his laptop 4. A remote or traveling user is unable to find critical data on laptop 5. In the case where a central copy of data is maintained, but a self-service data restore could save a lot of time Clearly some departments easily qualify for the above mentioned use case 1. Marketing and sales staff working remotely 2. Accounts department working with critical data 3. Business critical users working on sensitive data 4. CXO s with business critical information 5. Consultants and remote workers Some departments like the software engineering team could skip the backup as most of the work is already maintained in a centrally managed revision control system. Deciding What Needs to be Backed Up The simplest approach would be to backup everything, but this may not be practical, considering the average size of data on corporate PCs: 1. Operating system, applications and settings 25 GB 2. Critical corporate user data 10 GB 3. Browser cache and temporary files 2 GB 4. Personal Images and videos 4 GB 5. Daily change in data 2% Simple math shows that the storage needed to accomplish just the initial backup of critical corporate data for 100 users could be more than 1 TB. With daily changes in data, a simple approach of weekly full and daily incremental could consume about 5 TB. This clearly indicates that the decision on which data to back up is a selective one. Blanket selections like user s home directory should be avoided as it may contain a large number of unwanted files e.g. browser cache and temporary files. Popular file formats which should be included o Microsoft Outlook o Microsoft Office documents o PDFs, HTML, Text files File formats which should be excluded from backup o Images files o Video and audio files 3 Druva Software
o o o Executables and DLLs (most common source of malwares and virus) Archives Temporary and cache files Folders which should be filtered from backup o Browser Cache and temporary files o System files and folders PC Imaging and Bare-Metal Restore Backing up the complete PC including operating system, application and settings is called PC Imaging. And restoring everything back to a PC from scratch is known as Bare-Metal Restore (BMR). BMR definitely makes the job of the IT administrator a lot simpler as he doesn t need to install the operating system and application on a faulty PC again, he can simply restore them from the backed up data. But, considering the size of modern day operating system, this may create a huge amount of data on the backup server - about 30GB for each PC. Considering the scalability issues, it s recommended to limit BMR to limited set of users. Imaging may again not be suitable for the remote users as any automatic operating system update on the PC may trigger a large backup which may takes a long time to complete. 4 Druva Software
Deciding Backup Policies and Resources It is highly recommended that you set up the backup policies centrally for all users. Since all users have different backup needs, it is recommended to group the users in smaller sets based on following needs: 1. Network availability: Users working locally vs. remotely 2. Bandwidth access: What percentage of bandwidth should be dedicated to backup 3. Storage quota: Power users vs. normal users 4. Backup frequency: How often to backup 5. Retention Policy: How many data revisions to maintain 6. Special privileges: Should the user be allowed to add his/her own folders to backup Services remote users means that just like the Email and Web server, the backup server should be made available to remote users over VPN or a firewall. It s highly recommended to choose backup software which supports client triggered backup (rather than more common - server triggered backup). Client triggered backup ensures that the backup server only accepts and services incoming backup and restore requests, and can work with same set of network and security policies applicable for Email and Web servers. Bandwidth is the most precious resource especially for the remote user. The bandwidth usage should be carefully chosen to make sure that the backups don t disrupt the user s work and at the same time it s sufficient to complete backups within the designated time. Storage quota and retention policy greatly affect the central storage available for storing backed up data. These should be carefully chosen to avoid excessive use of the central storage. Data Deduplication Saving Bandwidth and Storage Bandwidth and storage are two biggest concerns for corporate PC backup. But, if you look closely, over 80% data is duplicated across corporate users [3]. To avoid duplicity of data on backup server, it s highly recommended to use backup software which supports source based data deduplication. Data deduplication ensures that files duplicated across users are backed up only once saving over 90% backup time, bandwidth and storage. Since a large majority of data on PCs is in the form on emails, the deduplication technology should be smart enough to find out email level duplicates. WAN Optimization Users working remotely face another challenge - the low-latency of VPN or WAN network. In order to effectively utilize the available bandwidth it is recommended to choose the backup software or the network infrastructure capable of WAN optimization. WAN Optimization optimizes packet size, removes redundancy in protocol and maintains multiple network connections to overcome the network latency. Since backup software is aware of the type or data, it is highly recommended that WAN optimization is provided as part of the backup software. 5 Druva Software
Your Private Cloud - Planning Backup Infrastructure PC backup is very different from server backup. While backing server, the data usually originates from few servers usually in same locality. But the PCs may be larger in number and the data is usually scattered across multiple offices or even travelling with the user. At any given point in time, the PC backup server could be serving several parallel backup and restore requests. Building the backup infrastructure, needs the knowledge of following important parameters 1. Number of parallel backups 2. Service level guarantees 3. Total number of users 4. Data retention policy 5. Number of remote users The biggest bottleneck for PC backup setup is the network. A simple 10/100 Mbps Ethernet (LAN) network can only support 10-11 MBps (Mega Bytes per sec.) traffic which means at most 50 parallel backups. In order to plan a larger setup, it s recommended to consider multiple networks and NICs attached to the backup server. The total number of users and the retention policy decides the size of storage needed to manage the backed up data. Using deduplication aware backup software could greatly impact the storage size. With over 80% data being duplicate across users, using deduplication could reduce the size of the storage by over 90%. Service Level Guarantees - Understanding RPO and RTO Recovery Point Objective (RPO) describes the acceptable amount of data loss in time, i.e. up to what point can you recover the data and Recovery Time Objective (RTO) describes the time taken to recover it. Both these parameters form the foundation of any backup implementation. Example - 60 minutes RPO would mean that backup of every PC must be taken at least once every hour and 15 minutes RTO means that it should take no longer than 15 minutes to get the last copy back. The service deployment, solution availability and disaster recovery plans should be designed to meet these goals. Few important points to consider for lower RPO 1. Backup small, backup often 2. Provide backup to remote workers over WAN/VPN 3. The backup software should support near-cdp (continuous data protection), i.e. it shouldn t undertake scanning and full backup every time 4. Make sure backup is immediately scheduled when user connects to a network or tries to shut down the computer Few important points to consider for lower RTO - 1. Fast self-service online data restore 2. Search based interface to restore a must 3. The traditional approach of incremental/differential backup is a major deal breaker 4. Service high-availability a must 6 Druva Software
Scale Up Vs Scale Out The Scale-Up architecture uses fewer heavy duty servers capable of handling the entire load. Scale Out architecture uses a number of small servers, each of which handles some of the load. Scale-Up Vs Scale-Out Architecture The scale-out architecture definitely requires more management time, but since the biggest bottleneck for PC backup is the network, the architecture scales much better and is highly recommended. Even from the service availability point of view, the scale-out architecture does not compromise the entire service if one of the servers goes down. 7 Druva Software
Training Users Power of Self Service Servicing users especially those working remotely could be a major challenge of they don t understand or are not comfortable using the backup software. In fact statistics show that remote users demand 240% more administration time than local users [10]. Before deciding upon the backup software, it s highly recommended to pilot the software with a variety of users to test its acceptance. Ease of use and self-service could really aid the adoption of the solution and greatly reduce the administration time. Some of the highly recommended features for faster adoption are 1. Completely automated deployment with remote push for backup policy 2. A simple 5 step guide to start the first backup and restore data 3. Simple search based self service restore 4. Automated troubleshooting for simple issues like network error, VSS failures etc. 8 Druva Software
Management and Reporting For larger organizations it s recommended to choose software which supports role based administration. This way the backup administrator can delegate some of the responsibilities of each user group. Good reporting features ensure the reliability of the backup process and help the administrator manage large enterprise installations. Readily available information such as live server statistics, instant alert notifications and scheduling comprehensive reports, make administration interactive and allow the administrator to troubleshoot, manage configuration and fine-tune the system better. It is highly recommended to configure the backup software for following reports 1. Failed backup daily report 2. Restore failures instant alert 3. Service availability instant alerts 4. Status of last backup weekly report 5. Storage utilization weekly report 6. Bandwidth utilization weekly report 7. Configuration changes weekly report Summary This document showcases the best practices an enterprise can adopt while deploying a backup solution to backup their critical data. Given the diversity of data, rapidly increasing volume of critical enterprise data, a large number of remote users, enterprises need to be smart about identifying which users need backup and which data is critical enough to be backed up. Once this is done, they need a backup solution that saves storage and bandwidth, offers scalability, is simple to install and use and, allows hands-on, interactive administration. 9 Druva Software
About Druva Druva provides premium enterprise-class solutions for data protection and disaster recovery. Our productspowered by our patented Continuous Data Protection and Data deduplication technologies are changing the way enterprises manage and protect their data. For more information please refer to the website at www.druva.com. Data Sources and References 1. Gartner Report #160375 Options for PC Data Backup 2. Gartner Report #616611 Storage Management Software Usage Driven by Replication, Deduplication and Virtualization (March 2008) 3. Microsoft Report SIS and its effects at Microsoft 4. Computer Data Trends Matt s computer trends - http://www.mattscomputertrends.com/ 5. Hardware Trends Tom s hardware trends - http://www.tomshardware.com/ 6. Tom s 15 Years of Disk Performance (here) 7. Matt s hard disk data trends (here) 8. PC World / Ponemon Institute Laptops lost like hot cakes (see here). 9. Detailed insync Benchmarks Druva insync Benchmark Results 10. Biztalk Data Leakage Prevention, Emerging Trends (here) 10 Druva Software