VM backup is the new standard fare The right answer to backup and restore these hundreds of VMs Andreas Neufert System Engineer
Contents Introduction 3 Starting situation as a central theme of this document 4 Considerations for goal definition 4 Project planning options 5 Basic backup and replication technologies 6 Differences between backup and replication 10 Replication 10 Backup 10 Basic recovery technologies (replication) 11 Basic recovery technologies (backup) 11 Virtual lab on demand test environment 15 Separation of production and backup networks 16 Incorporation into private or public cloud environments 17 Secure recovery by third parties 17 Central administration and scalability 18 Command Line Interface 19 Data centre mirroring (2nd data centre or on demand cloud) 19 External site backup 20 External site replication 21 Automation example 21 Summary 22 2 Next generation data recovery for large companies
Introduction Now more than ever, IT departments are under pressure to implement high availability while reducing costs. Tried and tested concepts, innovatively implemented, are required to reduce strain on both personnel and financial budgets. This white paper illustrates disaster and data recovery solutions for VMware and Hyper-V environments and their savings potential, particularly for large companies with more than 500 virtual servers. In doing so, it will take into account the specific scalability and network security concept requirements necessary for this. All of the processes presented here should be considered with regard to best performance and maximum automation issues, and represents only a small part of the possibilities and range of functions offered by Veeam products. The document is aimed at technical decision makers with background knowledge of VMware infrastructure and backup systems, and provides technical information and processes that goes beyond marketing information. 3 Next generation data recovery for large companies
Starting situation as a central theme of this document In this document, we assume the following starting situation with regards to your infrastructure: One main data center containing the majority of the server systems One backup data center with emergency systems. This data center may be set up as a cold-standby cloud, or may itself contain productive systems which are mirrored in the main data center Several external sites with substantial numbers of servers and data volume WAN IP connections between sites A mixture of hardware systems and virtualization with VMware Existing backup using an agent-based backup process (e.g. IBM TSM) Requirement: Strict separation of productive networks and backup/ administrative networks to increase security Considerations for goal definition Increase the availability of all virtual systems Every virtual machine restarted within ~5 minutes in the event of a restore, regardless of the volume of data present Automatic restore checks of each backup should guarantee almost 100% recovery Two site processes for critical systems and external sites Separation of production network areas from the backup/administrative network Standalone, department-linked restore (after authorisation by backup admin) Incorporation into existing hardware backup process Cost reduction A high degree of automation should be achieved. 4 Next generation data recovery for large companies
Project planning options Automatic logging of the entire virtual environment with the Veeam ONE Reporting function. Automatic logging of all settings (Word/PDF/Excel) Figure 1: Automatically created report of all VMware settings Automatic logging of all capacity and performance figures Automatic creation of layout drawings (Visio) Figure 2: Automatically created Visio drawing with extended shape data 5 Next generation data recovery for large companies
Dependency checks in the recovery laboratory environment (which server/service is dependent on which server?) Logging of the daily change rate (on the block level) External site replication Replication in the backup data centre Backup (incl. logging of deduplication and compression rates Figure 3: Logging of change, deduplication and compression rates Logging of backup space required Logging of resources for fast restore and automatic restore checks Compilation of documentation and planning of the installation and backup job Basic backup and replication technologies Figure 4: Backup process overview Agentless application- and operating system-consistent backup of every virtual machine, regardless of what file or operating system is installed on a virtual server No resource load from agents 6 Next generation data recovery for large companies
No workload during updates Budget savings as application backups require no additional licenses Use of Microsoft VSS technology, established on the market for 10 years, or pre-/post-scripts in Linux Appropriate handling of database log files Data export using change block tracking (in Version 6, change block tracking is possible not only with VMware but also Microsoft Hyper-V) Only data blocks changed on the storage system (vmdk-datastores/ vrdm RAW LUNs) are backed up after the first initial full Synthetic full and reverse incremental technology removes the need to export further full backups. This substantially reduces strain on VMware Host Disk and system resources. Data can be backed up directly from the SAN (comparable with LAN-free backup). This is recommended for SAN disk systems and is also the fastest method of backing up virtual servers. For backups from local disks or via NFS-based datastores, VM snapshots are mapped directly to the backup server (similarly to VMware HotAdd) and so can be directly backed up. A further backup method is backup via the VMware LAN. This function is also used as a fallback function for the other two methods. Figure 5: Backup methods 7 Next generation data recovery for large companies
Reduction of backup memory requirements through integrated deduplication and compression. For 20 years, backup software manufacturers have required their customers to regularly perform recovery checks. Veeam automates this process with SureBackup technology, and checks every backup in a sandbox environment (virtual lab). Check jobs can be carried out centrally or distributed across several VMware ESX(i) systems. Dedicated VMware ESX(i) systems and VMware ESX(i) systems with productive servers can be used for checks. This allows savings in budgets for quarterly or half-yearly manual recovery tests, including the necessary hardware and reserved disk storage space. Figure 6: SureBackup automatic recovery check after every backup During replication, several recovery points are retained so that in the event of an error, you are not limited to starting at the time of the last replication. This ensures you have a greater recovery period if there is a delay in discovering an error on a system and the error has already been transferred to the other replication site. Figure 7: Replication options Application consistencies are also taken into account Replication supports both ESX and ESXi systems and replication between them. Thick disks can be replicated onto thin disks to save space. 8 Next generation data recovery for large companies
Storage systems from different manufacturers can be used. Veeam uses IP connections for replication. This offers high potential for savings, usually in six figures, as there is no need for dedicated SAN or dark fibre cables, SAN/IP routers or disk storage replication and mirror licenses. It is also possible to replicate between different VMware cluster/vcenters, each with different disk storage vendors. The initial replication can be carried out via a removable storage medium in order to keep the initial volume of data to be transferred as small as possible. With both backup and replication, the data is deduplicated and compressed and transferred over the network or the WAN connection. 9 Next generation data recovery for large companies
Differences between backup and replication Veeam uses the same technology to export data from VMware systems during both backup and replication (see above). Data transfer to the target site is also the same. The following features are different: Replication With replication, the server is set up in VMware, directly on a VMware datastore at the target site, in such a way that it can start immediately Regardless of the source disk, the disk space at the replication target can be set up as a thick disk (100% space allocation) or as a thin disk to save space (space allocation according to need). Further space is kept available for additional recovery points. In doing so, only incremental versions are backed up to save space. The server can be immediately started from one of the replication statuses in the event of an error and receives maximum capacity Backup Data is deduplicated and compressed before archiving in individual backup files to save space. Target medium: Local hard drives, iscsi or SAN disk systems, NAS/file servers with NFS or CIFS shares Whole server started from backup in ~5 minutes using Instant Restore in the event of reduced performance for a certain length of time Use of SureBackup (automatic recovery checks) Figure 8: Recovery options (Backup & Replication Version 5) 10 Next generation data recovery for large companies
Basic recovery technologies (replication) Replica started in the event of complete failure of the source system All replicated statuses started from Veeam manager (replica boot) It is also possible to start the recovery points via the Virtual Center (Version 5 last status; Version 6 all saved recovery points) Automatic adjustment of network settings and IP addresses (Windows) for the replicated server on start up. In Version 6, in the event of complete failure of the first site, it is possible to reconstruct this efficiently and with reduced bandwidth useage. For this purpose, a backup of the VMs is recovered at the first site and linked to the replication job. This then transfers only the delta between the current status and the status recovered from the backup back over the network. Recovery of a file from a replication status The selected replication status is linked to the backup server and individual files, whole folders or VMware system files can be recovered Basic recovery technologies (backup) Figure 9: One backup file, several recovery options During backup, the current status is only saved once. All the recovery processes described here may be used from this backup status. It is therefore not necessary to create further backup files or jobs to recover applications, and thus saves administrative effort and the cost of additional disk space. Wizard-based recovery of individual files (Windows/Linux/Unix) One-click recovery of individual files from a web search. 11 Next generation data recovery for large companies
Figure 10: One-click recovery via the central Enterprise Manager web interface Instant Restore Quicker restart of virtual servers from backup in ~5 minutes - regardless of the volume of data to be recovered Several servers can be started at the same time via Instant Restore For a restart from backup, it is NOT necessary for the primary storage system to be available Background data transfer when the primary storage system is available again (VMware Storage V-Motion or Veeam Replication) Instant Restore helps you to effectively minimise loss of production in the event of a recovery to a few minutes. Figure 11: Instant Restore 12 Next generation data recovery for large companies
Universal object recovery (U-AIR) Server backups are started in an isolated environment Both individual files and application objects from every operating system and every possible application can be restored from the started server ʡʡ Use of operating system-specific copying processes (e.g. directory comparison between live system and system started from backup in Microsoft Robocopy) ʡʡ Use of standard application and database management tools to be able to recover individual objects - the application administrator can use the tools they are accustomed to Figure 12: U-AIR example: recovery of an SQL table via a standard Oracle administration tool ʡ ʡ Use of Veeam U-AIR wizards for Active Directory/Exchange/SQL Single Object Restore 13 Next generation data recovery for large companies
Figure 13: U-AIR wizard example: recovery of an Active Directory user Incorporation into existing hardware backup systems and data transfer Alongside the possibility to transfer backup data to another site with the incorporated replication function, it can also be saved at another site or on offline tapes using your existing hardware backup process. A backup agent of your existing solution is installed on Veeam Backup & Replication (repository) servers. This transfers Veeam backup files onto tape and allows them to be stored for instance in a safe or bunker. Existing tape ciphering methods can continue to be used here. Depending on the existing hardware backup system you use, Veeam jobs will be adapted to your system, allowing perfect collaboration with minimum use of media. Veeam backup jobs can be controlled both through Veeam and through your existing hardware backup software. It is possible to continue to use existing processes to monitor backup jobs. The (central) Veeam Backup & Replication server will be promptly replicated to a further site to allow immediate recovery in the event of an error. This can also be additionally protected by your existing hardware backup. Stored backup files can even be transferred and imported to a fresh installed Veeam Backup & Replication environment. This allows all of the restore processes described above to be used within just a few minutes. Existing processes for backup monitoring, job scheduling and backup storage (safe/bunker) can usually continue to be used without adjustment or with minimal changes. This protects your existing investments. 14 Next generation data recovery for large companies
Virtual lab on demand test environment Figure 14: Virtual lab Veeam can start single or several systems in the virtual lab, a 'sandbox' isolated from the production environment. These systems work with the last backup data set, or another selected by you, and require no storage space on the primary storage system. The backup files are used as a source data set, and changes which occur are stored in separate files, similar to the well-known SnapShot process. Possible application scenarios are: Patch testing Patch testing taking into account the effects on several systems Intrusion, penetration, virus attack and security testing without endangering live systems. The test environment is automatically provided with the status of the last backup. Data mining analysis jobs without any load on the primary storage system Script tests and group policy tests testing of infrastructure-wide changes in the lab environment with live data sets, but without any risk to the live system Testing out of migration processes (e.g. Exchange 2003 to Exchange 2010, SAP Version X to Version Y) Flexible on demand multi-server development environment Using Veeam Lab Manager, these test environments can be requested by an application administrator or an end user on a time-related basis through a ticket system, and the Veeam administrator can release these environments on resources selected by them. Figure 14: Virtual lab ticket system 15 Next generation data recovery for large companies
For example, the SAP administrator requests SAP systems 1, 2 and 3 for testing on Friday for two hours starting at 12 noon. The Veeam administrator approves this request, supplements it with additional servers that may be necessary for operation, and decides in which virtual lab the system will be started and with which resources. On Friday at 12 noon, this environment is then automatically made available to the SAP administrator. The SAP administrator only has access to systems started in the virtual lab - but not to the Veeam administration itself or the backup files. The cost savings potential of on demand virtual lab test environments versus traditional procedures for testing, consolidation and integration systems is enormous, while at the same time increasing quality/precision of runtime and results achieved! Separation of production and backup networks Generally, backup data streams are conducted through a separate network or VLAN. However, this means that every productive server has access to the backup network. Once an attacker is on the server, they also have access to the backup network. With Veeam Backup & Replication, it is possible for the first time to completely separate the production network from the backup network. Figure 15: Complete separation of the productive network from the backup and administrative network This is made possible by the following Veeam technologies: Agentless, application- and operating system-consistent backup Veeam Backup & Replication communicates with the operating system via VMware interfaces and creates consistency Data exported directly from VMware or the SAN on a block level Recovery of whole servers via Instant Restore Recovery of files and application objects via the virtual lab using the U-AIR technology described above. If, for example, a database object needs to be restored from the backup, first the server involved is additionally started in the LAB from the backup. Application objects (or files) can now be transferred to the productive server from the server started in the LAB. The backup network and the backup files remain protected against access from the LAB or the productive network. 16 Next generation data recovery for large companies
One-click recovery of files via the web interface. Recovery of files via volumes mapped directly from the backup to the productive system. For example, the c:\ drive from the backup is additionally mapped as the g:\ drive with an older status. Through this, files can then be copied back to c:\. Incorporation into private or public cloud environments Veeam Backup & Replication is completely controllable by script, and can therefore be incorporated into any cloud management system. When servers are made available in the cloud, backup or replication can be automatically set up taking into account the selected service level. A restore of a virtual server can take place within approx. 5 minutes in the event of an error, and can be controlled from the cloud management system. In the next chapter, you will find further recovery processes that can be carried out directly by the respective server administrators, without them having direct access to backups or the backup server. Figure 16: Incorporation into cloud management systems Secure recovery by third parties The recovery of files and application objects can also be carried out by an end user or application administrator via the virtual lab ticket system. They receive NO access to the backup network, backup jobs or backup files. They can also use the one click file restore website to quickly restore individual files. The virtual lab ticket system frees your infrastructure department from time-consuming installation and configuration of test system environments. Together with the U-AIR wizard, responsibility for recovery can be passed to the application administrators trained for each application. Both free up your backup administrator from time consuming application recoveries. 17 Next generation data recovery for large companies
The following recovery options can be requested by application administrators responsible for servers: Recovery of files and folders via a web interface Via, for example, a cloud management system or one or more servers Wizard-based recovery of Active Directory objects (e.g. users), emails (Exchange) and Microsoft SQL Server contents U-AIR recovery of application objects, files and folders from every possible application, every operating system and every file system Central administration and scalability With Veeam Backup & Replication, thousands of virtual machines can be protected. With Veeam Backup & Replication Version 5, further instances were installed for this purpose depending on requirements (scale-out). These can be centrally managed through "Veeam Enterprise Manager". Jobs are set up on the respective backup server. Figure 17: Scaling of Backup & Replication in large environments With Version 6, this approach is further refined: Jobs are centrally set up on the management server Backup and replication jobs are allocated and carried out via Veeam Backup & Replication proxy servers - job allocation on the proxy servers takes place through a load balancing and fully-automated best practice process. In the case of backup, data saved by the proxy server (delta to the last backup) is transferred to repository servers which then efficiently store it. In the case of replication, the data is transferred to a further proxy server for storage in a data store. 18 Next generation data recovery for large companies
Figure 18: Veeam won the "BEST OF vmworld 2011 New Technology GOLD AWARD" for this scale-out and load balancing approach. Optimum allocation of backup jobs further reduces the backup timeframe, makes optimum use of backup server systems, and thereby saves hardware (backup servers). The volume of data transferred from the proxy server to another proxy or repository server is the smallest optimum volume of data to be transferred for the backup or replication process. Here deduplicated and compressed data is efficiently transferred over the network or the WAN connection during both backup and replication. Compared with Version 5, which carries out all these function roles on the same server, a further increase in backup and replication performance via WAN or LAN connections is achieved (up to 11x faster). Command Line Interface Veeam Backup & Replication can be controlled via Windows Powershell. Along with job control, this command line interface allows the setup of backup, replication, recovery checks and restore processes to be controlled and if necessary integrated into any number of scripts. Data centre mirroring (2nd data centre or on demand cloud) Veeam Backup & Replication already has a replication function via IP connection built into the standard version. Replication is application and operating system consistent, and is carried out on an adjustable time scale of "monthly, weekly, daily, hourly", up to a time scale we call "smart CDP". In "smart CDP", replication is started again as soon as it is successfully completed. For each virtual server it is possible to specify whether it will be replicated and the time scale in which the replication will take place. Expensive disk storage based mirroring or replication functions or licenses are not required. 19 Next generation data recovery for large companies
Figure 7: Replication options In Version 6 automatic adjustment of IP and network settings to the replication target and extended options for transferring data back to the main site are available. Only the currently changed data sets on a block level will be transferred between sites (changed block tracking process). The initial replication of the whole data volume can also take place via a removable storage medium, and therefore independently of any available WAN bandwidth. Veeam uses IP connections for replication. This offers high potential for savings, usually in six figures, as there is no need for dedicated SAN cables, SAN/IP routers or disk storage replication and mirror licenses. It is also possible to replicate between different VMware hosts with different disk storage. Further savings potential exists if the data centre is supplemented by the cloud as on demand infrastructure. Here, the virtual servers are replicated via IP VPN connections to a cloud specially set up for you. As the target servers are only started in an emergency, a cost-effective on demand computing power contract can be signed for this. This removes the need to purchase, build and operate a second data centre of your own. External site backup External sites can be backed up on local hard drives or disc storage systems, and also efficiently transferred to the main site. At smaller sites, the data in the the main data centre is incrementally backed up via the WAN IP connection, and additionally stored on low-cost local hard drives or mid-range NAS systems for a quick restore. The initial full backup can take place on removable storage media to ease the load on the IP WAN connection. 20 Next generation data recovery for large companies
External site replication Critical systems can also be efficiently replicated from external sites to another site via IP WAN connections in order to keep the timeframe between the restore and the last backup/replication status as small as possible. With Version 6 it is possible to automatically adjustment IP and network settings to match the replication target and you have the capability for extended failback functionality (back to the main site with backtransfer of only the dataset delta). Automation example 1. In the automatic server creation process, a virtual machine, including operating system, is set up by script. 2. The server is automatically logged in Veeam (datastore backup) or the script sets up a Veeam backup job for the server (backup per VM) 3. The Veeam SureBackup job will be adjusted or the script supplemented so that the server recovery will be automatically checked after backup 4. The backup job will be set up in an existing hardware backup system or a job there will be supplemented so that the data will be stored on tape after Veeam backup on disk 5. In the backup data centre, a replica of the server will also be set up, and the time scale for the replication adjusted according to the given service level (daily, hourly,... up to continuous replication) 6. In Veeam ONE, new documentation is stored in the file system via the reporting function and/or sent by email to those responsible 7. The documentation will be extended with all of the virtual server's settings, the Visio overview plans will be automatically updated with an entry, and the performance and capacity overviews including trend logging will be adjusted 8. The changes created in VMware are logged in Veeam ONE change management, and if necessary also written to your existing change management database (CMDB) 21 Next generation data recovery for large companies
Summary Veeam Backup & Replication offers extremely fast restore from backup together with the secure feeling that recovery will be 100% successful as the recoverability of all backups can be automatically checked. Flexible management, simple operation, application-consistent replication via IP networks and the on demand virtual lab offer massive savings potential while extending the service offering with options that stand out on the market. Integration into existing hardware backup systems allows the budgets invested in them to be safeguarded, and at the same time to achieve additional benefits and savings potential on the other hand. Veeam Backup & Replication used in conjunction with VMware Enterprise functions and current hardware systems offers you THE opportunity to virtualize tier 1 applications securely, flexibly and with additional benefits. Introduce your application managers to the virtual lab - they'll be thrilled. 22 Next generation data recovery for large companies
About the author Andreas Neufert, System Engineer Central Europe, Veeam Software With a 14 year IT infrastructure background, Andreas Neufert is a consultant in the fields of high availability and operational optimisation of virtual environments for strategic key accounts and partners at Veeam Software. The focus of his career has been consulting and project management of major IT infrastructure projects for public customers in the IBM storage and virtualization environment. As a business development manager, he gained several years of additional experience implementing growth plans and sales concepts at and for IBM (premier) business partners. Andreas Neufert studied business administration at the VWA-Stuttgart in Germany. About Veeam Veeam Software, an Elite VMware Technology Alliance, develops innovative management software for VMware vsphere. Veeam Backup & Replication is the market-leading solution for virtualization backups. Through the use of the underlying Veeam vpower technology, it offers you outstanding data protection. Veeam nworks expands enterprise monitoring for VMware and offers the nworks Management Pack for VMware management via Microsoft System Center and the Veeam nworks Smart Plug-in for VMware management via HP Operations Manager. Veeam ONE allows optimisation of performance monitoring, configuration and load management of VMware environments in a single solution, and includes Veeam Monitor for VMware monitoring as well as Veeam Reporter for VMware capacity planning, change management, reporting and chargeback and Veeam Business View for a commercial view of VMware environments. Further information about Veeam Software can be found www.veeam.com. 23 Next generation data recovery for large companies