BACKUP AND RECOVERY OVERVIEW Version 2.15 Date: 09/16/2014 SECURITY WARNING The information contained herein is proprietary to the Commonwealth of Pennsylvania and must not be disclosed to un-authorized personnel. The recipient of this document, by its retention and use, agrees to protect the information contained herein. Readers are advised that this document may be subject to the terms of a non-disclosure agreement. DO NOT DISCLOSE ANY OF THIS INFORMATION WITHOUT OBTAINING PERMISSION FROM THE MANAGEMENT RESPONSIBLE FOR THIS DOCUMENT. EDC BACKUP & RECOVERY OVERVIEW PAGE 1 OF 12
Version History Date Version Modified By / Approved By Section(s) Comment 03/04/2002 1.1 Kannan Arunachalam All Initial version 11/09/2005 2.0 Dana Greggs All Not available 11/09/2005 2.1 Kannan Arunachalam All Not available 09/13/2006 2.2 Dana Greggs All Not available 10/19/2007 2.3 C. Reber All Updated to new template design. Updated URLs. 08/20/2008 08/26/2008 2.4 C. Reber 6.1 (New) Added new section referencing Recovery Time Objectives and Data Loss Thresholds. Reviewed by D. Greggs Reviewed and approved by K. Arunachalam and V. Cheong 11/06/2008 2.5 C. Reber Cover page Insert new OA logo onto cover page 07/07/2009 2.6 Dana Greggs, K. Arunachalam C. Reber 12/22/2009 2.7 Dana Greggs, K. Arunachalam C. Reber All All Updated product information (change Legato to EMC) Added a link to the ESF SAN Tape Library ROE in 2.1 Added Appendix C Virtual Server Backups Verbiage updates to sections 4.1 and 5.1 Added sections and diagram on Avamar. Restructured Networker info and added diagram. Minor updates to other sections as required. 3/23/2011 2.8 Chadney Greene All Update for 2011 based on ESF SOP 3/23/2011 2.9 Chadney Greene All Update for 2011 based on ESF SOP with edits from managment 6/22/2011 2.10 Chadney Greene All Update for 2011 based on ESF SOP with edits from managment 12/12/2011 2.11 C. Reber All Change ESF to EDC 12/15/2011 2.12 Chadney Greene All Update for 2011 based on EDC SOP with edits from management 2/14/2012 2.13 Chadney Greene / C. Reber 3.3 3.3 Backup Group Schedules update 4/11/2014 2.14 C. Reber All Replace Remedy with general term incident and update cover page to OA standard 9/16/2014 2.15 Chadney Greene All Update for PAC tranfer EDC BACKUP & RECOVERY OVERVIEW PAGE 2 OF 12
Table of Contents 1 EDC BACKUP & RECOVERY INTRODUCTION... 4 1.1 PURPOSE / BRIEF OVERVIEW... 4 2 BACKUP INFRASTRUCTURE... 5 2.1.1 EMC Avamar... 5 2.2 HARDWARE DETAILS... 5 3 BACKUP SCHEDULE AND LEVELS... 6 3.1 BACKUP SCHEDULE AND LEVELS OVERVIEW... 6 3.2 BACKUP SCHEDULE AND LEVELS OVERVIEW DATABASE (SQL & ORACLE)... 7 3.3 BACKUP GROUP SCHEDULES... 7 4 RECOVERY... 8 4.1 RECOVERY OVERVIEW... 8 4.2 RECOVERY TIME OBJECTIVE AND DATA LOSS THRESHOLD... ERROR! BOOKMARK NOT DEFINED. 4.3 OFFSITE STORAGE... 8 5 APPENDIX A - EMC NETWORKER... 9 5.1 EMC NETWORKER ARCHITECTURE... 9 5.2 EMC LICENSE AND SOFTWARE INFORMATION... 10 5.2.1 EMC Licensing of Software... 10 5.2.2 Base Client Connection License... 10 5.3 EMC APPLICATION MODULES... 10 5.3.1 Autochanger Software Module... 10 5.3.2 Archive Software Module... 10 5.3.3 EMC NetWorker Module for MS SQL... 10 5.3.4 EMC Networker Module for Oracle... 11 6 APPENDIX B EMC AVAMAR... 12 6.1 EMC AVAMAR ARCHITECTURE... 12 EDC BACKUP & RECOVERY OVERVIEW PAGE 3 OF 12
1 EDC Backup & Recovery Introduction 1.1 PURPOSE / BRIEF OVERVIEW This document describes the backup system for the Commonwealth of Pennsylvania s Office of Administrations Enterprise Data Center (EDC). It describes the physical configuration of the backup system and details the backup and recovery policies and procedures. The audience for this document is the EDC and commonwealth agencies who use this service. EDC BACKUP & RECOVERY OVERVIEW PAGE 4 OF 12
2 Backup Infrastructure The EDC leverages EMC NetWorker software for daily backup and disaster prevention/recovery of servers and application data residing in Managed Services. EMC NetWorker operates in a client/server model. The Backup Server is the Server and Clients are the Servers backed up nightly in both our Staging and Production environments. The Commonwealth of Pennsylvania Enterprise Data Center (EDC) Storage Area Network (SAN) Tape Library is comprised of the following technologies from IBM and Cisco: IBM 3584 Ultra Scalable Tape Library Cisco MDS Director 9509/9513 fabric switches There is a single Tape Library on the EDC SAN at CTC, and a single Tape Library on the EDC SAN at the Interim Alternate Site (IAS) on Cameron St. Each site offers similar architecture solutions for tape backup operations. Each IBM 3584 Tape Library available at the EDC houses up to 12 LTO3/LTO4/LTO5/JAG2 tape drives and over 200 tapes, with the capability to expand to a 16-frame Library capable of supporting up to 192 tape drives and over 6000 tapes. Cisco MDS fabric switches is deployed by EDC SAN as the backbone for its architecture to support the Tape Library at both sites. One Virtual Tape Library is created for each Backup Server. Each Virtual tape Library is assigned 64 Slots and a minimum of two LTO drives each. Please refer to EDC SAN Tape Library Rules of Engagement for additional details. Please refer to Appendix A EMC NetWorker for additional details. 2.1.1 EMC Avamar EMC Avamar has been added as an option to the backup services available to agencies. EMC Avamar allows for source (or client) based data de-duplication, which means that during the initial full backup, deduplication hashes are created on the backup client. This allows for files that subsequently have the same hashes in the backup location, not to be sent to the backup device. Instead, the backup device creates internal links to reference the duplicated data. This type of de-duplication ultimately lessens the amount of data being sent across the network. EMC NetWorker can be used with or without the Avamar option. Please refer to Appendix B EMC Avamar for additional details. 2.2 HARDWARE DETAILS The hardware configuration of the Backup Servers is detailed below: Servers: IBM x3650-7979 FC Card: Qlogic4GB PCIe HBA Tape Library: IBM 3584 Tape Library Tape Storage Capacity: 51 TB (Terabytes) native and 102 TB compressed Tape Drives: IBM 3588 LTO Tape Drive Backup Media: IBM LTO data cartridges Tape Capacity: 800 GB native and 1.6 TB compressed EDC BACKUP & RECOVERY OVERVIEW PAGE 5 OF 12
3 Backup Schedule and Levels 3.1 BACKUP SCHEDULE AND LEVELS OVERVIEW EDC utilizes various methods of backing up data while maintaining the basic schedule outlined below. We use backup-to-disk then tape and data de-duplication backups to a data grid. With all of these methods, the retention time will remain the same but the actual location may not be on tape but some other type of highly available storage. A Full backup is performed on all the clients twice a week. The Full backup is a copy of all the files on a hard disk. An Incremental backup is performed on all clients five days a week. An Incremental backup is a backup of all files that have changed since the last Full backup. Daily (Son) - A daily Incremental backup is performed Sunday through Thursday. These tapes are sent off-site using a 7-day off-site/on-site process. Sunday Monday Tuesday Wednesday Thursday Friday Saturday Incr. Incr. Incr. Incr. Incr. Weekly* Full Weekly* Full Weekly (Father)* - A Full backup is performed every week on Friday and Saturday. Each Weekly tape is sent off-site and held until the Monthly tape is sent off-site. Weekly tapes are therefore held off-site for up to 5 weeks. Weekly tapes are recycled every month (roughly 30 days). Monday Tuesday Wednesday Thursday Friday Daily Daily Daily Daily Weekly * Monthly (Grandfather) ** - A Full backup is performed on the last Friday / Saturday of every month. Each month these tapes are sent off-site and are held for one year. After one year lapses the tape hold time expires and the tape is reused. (Example: October 2010 will be recalled the end of month October 2011 and reused.) Monthly Tapes January February March April May June July August September October November December **Yearly Yearly (Archive Tape) *** - A Full backup is performed on the last Friday / Saturday of the year and is sent off-site. In December one set of tapes is generated on the last Friday / Saturday of the month, both for the Monthly and for the Yearly. Retention of the Yearly overrides the Monthly. These tapes are held off site for 7 years and are not put back into rotation. EDC BACKUP & RECOVERY OVERVIEW PAGE 6 OF 12
3.2 BACKUP SCHEDULE AND LEVELS OVERVIEW DATABASE (SQL & ORACLE) Database Level backup differ from OS level backup in they focus on just the data necessary to recover the database. Once the database is recovered it may be placed on any server with then necessary software installed. EDC uses EMC NetWorker Database modules for backup. Production: Full database backup is performed on all the clients daily Sunday through Saturday. Transactional logs are backed up 9:00 AM (9:00), 1:00 PM (13:00), and 5:00 PM (17:00). Monday through Friday. Staging Development and Test: A Full database backup is preformed on all clients daily Sunday through Saturday. Transactional logs are not backed up. 3.3 BACKUP GROUP SCHEDULES To avoid causing unnecessary network latency, backups are conducted during non-critical production times. Servers are backed up in various defined Groups. These Groups make up the backup jobs. The backup of the Groups are set to begin after 5:00 PM (17:00) Monday thru Friday and 10:00 AM (10:00) Saturday and Sunday. Groups are scheduled to start with 15-30 minutes increments between groups until they are finished. There are applications and agencies that require exceptions based on special needs such as the criticality, size, and or amount of data being backed up EDC BACKUP & RECOVERY OVERVIEW PAGE 7 OF 12
4 Recovery 4.1 RECOVERY OVERVIEW There are multiple scenarios requiring a restore of the data from backup, including data corruption, hardware failure, and accidental deletion of data. Irrespective of the cause, at some point in time requests are made for some level of data restoration. To start the process, the user creates an incident ticket specifying the files (and/or folders) that require restoration and the date, time and version (if known) when the restoration is to begin. This ticket is assigned to the backup coordinator in the ESF-TOT group. Based on this information, the operator uses the NetWorker Administrator tool to review the index on the Backup Server to determine which tape(s) if any are needed. The operator obtains the tapes from the onsite safe or an off-site location. If the tapes are offsite they can be brought back from storage on the next scheduled pickup with no additional cost to the Commonwealth of PA. If the restoration of data is critical and the tapes are stored offsite they can be returned within four hours. However since this is an out-of-schedule return there is an additional cost to the Commonwealth of PA. This procedure has to meet the level of Disaster Recovery as defined by the Commonwealth s agreement and contract with the offsite vendor. The details are not contained in this document. The length of time needed for a restore can vary from minutes, hours or days based upon the quantity of data to be restored as well as the current activity occurring on the Backup Server. 4.2 RECOVERY TIME OBJECTIVE AND DATA LOSS THRESHOLD The Recovery Time Objective (RTO) and Data Loss Threshold are based on the current backup and storage procedures defined in the previous sections. The RTO is the expectation of the time it takes to recover lost data. The base RTO is 1 (one) day for server data and 4 business hours for database server transactional logs. This time varies depending on the amount of data being restored. Depending on the size of the server data store, the RTO is more specifically defined as the length of time required for the restore + 1 (one) day (time required to retrieve the tape). For example, if a server has a large data store that normally takes 8 hours to backup, the RTO would be 32 hours. The same logic applies for database server transactional logs in 4 business hour intervals. The Data Loss Threshold is the expectation of how much lost data can be recovered. The maximum loss of data is the amount of data stored from the last backup (of the previous evening) to the next day when the data loss occurs. For example, if the data loss occurred Tuesday at noon, all data from Monday will be recovered, however, any data from Tuesday cannot be recovered from the server. The maximum loss of data for database server transactional logs is 4 business hours of data in any given period. Note: Transactional logs are backed up at 9:00 AM, 1:00 PM, and 5:00 PM Monday through Friday.. This applies to Production systems only. 4.3 OFFSITE STORAGE The EDC utilizes a third party company for its offsite storage requirements. The storage facilities are located outside Pennsylvania. All shipments are containerized in a locked container from pickup until their return to the CTC Enterprise Server Farm. AVAMAR DeDuplication GRID replication at our disaster recovery server farm receives a complete copy of all Deduplicated server data nightly with our standard retention of 30 days. EDC BACKUP & RECOVERY OVERVIEW PAGE 8 OF 12
WS-X9016 STATUS 1/2 Gbps FC Module 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 WS-X9032-SMV 1 7 9 15 17 23 25 31 STATUS 2 8 10 16 18 24 26 32 FC Services Module WS-X9032-SMV 1 7 9 15 17 23 25 31 STATUS 2 8 10 16 18 24 26 32 FC Services Module WS-X9530 SFI SUPERVISOR WS-X9530 SFI SUPERVISOR STATUS STATUS SYSTEM SYSTEM ACTIVE ACTIVE PWR MGMT RESET PWR MGMT RESET CONSOLE CONSOLE MGMT 10/100 MGMT 10/100 COM 1 COM 1 CFI CFI System Storage WS-X9016 STATUS 1/2 Gbps FC Module 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 WS-X9032-SMV 1 7 9 15 17 23 25 31 STATUS 2 8 10 16 18 24 26 32 FC Services Module WS-X9032-SMV 1 7 9 15 17 23 25 31 STATUS 2 8 10 16 18 24 26 32 FC Services Module WS-X9530 SFI SUPERVISOR WS-X9530 SFI SUPERVISOR STATUS STATUS SYSTEM SYSTEM ACTIVE ACTIVE PWR MGMT RESET PWR MGMT RESET CONSOLE CONSOLE MGMT 10/100 MGMT 10/100 COM 1 COM 1 System Storage CFI CFI COMMONWEALTH OF PENNSYLVANIA 5 Appendix A - EMC Networker 5.1 EMC NETWORKER ARCHITECTURE The EDC leverages EMC NetWorker software for daily backup and disaster prevention/recovery of servers and application data residing in Managed Services. EMC NetWorker operates in a client/server model. The Backup Server is the Server and Clients are the Servers backed up nightly in both our Staging and Production environments. The Commonwealth of Pennsylvania Enterprise Data Center (EDC) Storage Area Network (SAN) Tape Library is comprised of the following technologies from IBM and Cisco: IBM 3584 Ultra Scalable Tape Library Cisco MDS Director 9509/9513 fabric switches There is a single Tape Library on the EDC SAN at CTC, and a single Tape Library on the EDC SAN at the Interim Alternate Site (IAS) on Cameron St. Each site offers similar architecture solutions for tape backup operations. Each IBM 3584 Tape Library available at the EDC houses up to 12 LTO3/LTO4/JAG2 tape drives and over 200 tapes, with the capability to expand to a 16-frame Library capable of supporting up to 192 tape drives and over 6000 tapes. Cisco MDS fabric-switches are deployed by EDC SAN as the backbone for its architecture to support the Tape Library at both sites. One Virtual Tape Library is created for each Backup Server. Each Virtual Tape Library is assigned 64 Slots and a minimum of two LTO drives each. Please refer to EDC SAN Tape Library Rules of Engagement for additional details ESF Backup Diagram Managed Services Database Zone Managed Services External DMZ Zone Staging and Production Clusters Staging and Production Servers Application and Web Servers Managed Services Lite ESF SAN Application and Database Servers Managed Services Internal DMZ Zone Application and Database Servers Managed Services Lite JNET Enterprise Tape Library Application and Database Servers Application and Database Servers Storage 1 EDC BACKUP & RECOVERY OVERVIEW PAGE 9 OF 12
5.2 EMC LICENSE AND SOFTWARE INFORMATION This section provides a brief description and explanation of EMC Networker software and licensing model for different Networker editions, modules, and features that the EDC leverages. Detailed information can be found in documentation provided by EMC Corporation; this section is not intended to be comprehensive in scope of all the various EMC software or feature sets that are available from the vendor. 5.2.1 EMC Licensing of Software In EMC terms, the licensing of software means the registration of license codes with EMC in order to receive authorization enablers, which are then entered on the Server, used as the NetWorker Backup Server. The authorization codes permanently enable or unlock the software. Without these codes, the software or added features will not run beyond the evaluation period (usually 30-45 days). Each installation of EMC NetWorker Network Edition Server software must be licensed with a base enabler which "turns on" (or enables) the software and allows you to use a particular set of features, such as backing up X number of clients and devices. All licensing takes place on the NetWorker Server. The licenses are entered and stored on the server, and the server enforces the licensing. All other licenses are dependent on the NetWorker Server license, which is referred to as the base license. 5.2.2 Base Client Connection License Every Server that is to be backed up by a NetWorker Server requires a client connection license, even the Backup server itself. The client connection license may be one of the licenses supplied with the base enabler or purchased separately. 5.3 EMC APPLICATION MODULES 5.3.1 Autochanger Software Module The EMC Auto-changer license enables the NetWorker Server to manage a wide variety of Auto changers automatically. Library devices are licensed according to the number of slots supported by the robotic device. It provides automated media inventory management including media handling, cartridge cleaning, electronic labeling, barcodes, cartridge access ports (CAP or Mail Slot) and media verification. EMC NetWorker provides native library sharing and optional dynamic drive sharing for sharing the tape devices in a Library between several backup servers. 5.3.2 Archive Software Module The archive process makes point-in-time snapshots of files and directories as they exist at a specific time and writes the data to special archive storage volumes that are not normally recycled, enabling a more efficient use of primary storage. Archived data backup tapes can be stored indefinitely for future access unlike standard data backup tapes, which are subject to retention and expiration policies. Archived data backups are not subject to automatic recycling thus preventing them from accidentally being overwritten. Archived data backup sets can also be restored at the file level. Standard data backups are volume based after the retention period has expired. Therefore, the entire backup set for a client would need to be restored rather than restoration of the individual files and directories. Data restores are faster when made from an Archive data tape because the data is written to the tape sequentially (in order) which is unlike a Standard data backup. With a Standard data backup the NetWorker Server simultaneously backs up multiple clients to multiple drives. Therefore, a specific client s data may be interspersed with another client s data which may be written to multiple tapes. Therefore, Standard data restores may require multiple non-sequential tapes. 5.3.3 EMC NetWorker Module for MS SQL The NetWorker Module for Microsoft SQL Server is an application module that integrates data protection procedures for Microsoft SQL Server databases with the NetWorker Server Software. Traditionally, Microsoft SQL Server supports backup of database, file, file group, and transaction log backups. The EDC BACKUP & RECOVERY OVERVIEW PAGE 10 OF 12
NetWorker Module provides the mechanism that integrates the Microsoft SQL database backup technology with the EMC NetWorker software. It provides the following features: Back up and restore of Microsoft SQL Server databases and transaction logs Capability to integrate database and file system backups, helping to relieve the burden of backup from the database administrator while still allowing the administrator to retain control of the restore process. Automatic database storage management through automated scheduling, Auto-changer support, electronic tape labeling, and tracking 5.3.4 EMC Networker Module for Oracle The NetWorker Module for Oracle provides similar benefits as those mentioned for the Microsoft SQL Server Module. The NetWorker Module for Oracle and EMC s NetWorker Server and client software, work in conjunction with the standard Oracle backup and recovery system to create an efficient Oracle data-storage management system. Any implementation of an Oracle backup and recovery strategy requires knowledge of NetWorker together with the Oracle components. The regular Oracle backup and recovery system consists of: Oracle Server RMAN Recovery Catalog (optional) Oracle Enterprise Manager (OEM) Backup Management Tools (optional) The NetWorker software consists of the following components: NetWorker Server NetWorker Client NetWorker Module for Oracle EDC BACKUP & RECOVERY OVERVIEW PAGE 11 OF 12
0 2 4 0 2 4 0 2 4 0 2 4 1 3 5 1 3 5 1 3 5 1 3 5 System Storage System Storage COMMONWEALTH OF PENNSYLVANIA 6 Appendix B EMC Avamar 6.1 EMC AVAMAR ARCHITECTURE The EDC utilizes EMC Avamar for its data de-duplication abilities as well as its disk-based backup function. This can be utilized in conjunction with EMC Networker or by itself as a standalone backup solution. The Avamar hardware can perform as an end-point backup server, from a server client perspective, or it can be used with an existing EMC Networker backup server. The EDC Avamar architecture consists of 2 storage grids at the CTC and CAM locations within EDC. Each storage grid is comprised of 8 nodes. These nodes have 3 purposes. 1 is for overall grid communication and administration (utility node), 1 is for replacement in the event of a node failure (spare node) and the other 6 are for data storage (storage nodes). EDC Avamar Environment EDC Legato Networker backup servers System x3650 Legato Networker Communication EDC Servers being backed up by Legato Networker to tape media System x3650 Fiber Channel Commuication to tape drive Avagent process communication with MCS service on utility node System x3650 TS3500 Avtar process communication with gsan service on storage nodes EDC servers being scheduled for backup by Legato System x3650 CTC to CAM replication traffic Networker. De-duped data is backed up through the Legato client to the CTC storage grid. CTC Avamar Grid CAM Avamar Grid App Servers CTC storage nodes directly communicate with CAM storage nodes for replication of App Servers File Servers data on a daily basis. (occurs @ 6pm) StorageNode File Servers EDC servers being backed up by a standalone Avamar client then de- Allied Telesyn Layer 3 Gigabit Switch Spare Node CTC utility node communicates with CAM utility node to coordinate the replication process. Allied Telesyn Layer 3 Gigabit Switch Spare Node duped to the CTC storage grid. Utility Node Utility Node EDC BACKUP & RECOVERY OVERVIEW PAGE 12 OF 12