SYMMETRY WHITE PAPER Business Continuity & High Availability Options Adam Shane
Introduction Today, more than ever, security is a mission critical business component. This is not only true for our customers that are responsible for critical national infrastructure, but also for our corporate customers who have shareholder s interest to protect. They need a security management system that will meet their needs when they need it most. Background There already exists a degree of redundancy in our system architecture due to the distributed intelligence model employed. If the server/communication client were to lose communications with the controllers (due to a network failure or a computer hardware or software failure), the system still behaves as usual from a user perspective. Of course, there are some limitations such as real-time alarms not getting routed and the limited amount of memory available for queuing transaction history. But when the communications are restored, the field panels automatically download their queued information to the server, and any changes made at the server are also synchronized with the controllers. While such a distributed intelligence model will meet the need of the majority of customers, more sophisticated server-based solutions are required to meet mission critical availability goals. AMAG Technology provides a growing number of options to meet the business continuity (in the face of a disaster) and high availability concerns of our customers. The purpose of this document is to summarize these options and to compare and contrast the differences. A Definition of Terms As in any new topic, there are terms that may not be familiar to the reader. There are two categories that will be covered in this document. The first is high availability and the other is business continuity or disaster recovery. High Availability refers to the likelihood that a system will be available at any given time. The opposite of availability is down time. A system that experiences unscheduled downtime an average of one hour per year (60 minutes) meets 99.99% availability. This is often referred to as Four 9 s. A system that only sees 5 minutes of unscheduled downtime meets 99.999% availability, or Five 9 s. These levels are often benchmarks for mission critical business systems. Redundancy is a more common term, and is used here just as it is used in the common vernacular. A redundant system is one that has two or more units running in tandem or one available as a backup to the other to provide system availability. As described below, redundant systems, depending on how they are configured can provide either high availability or disaster recovery. Fault Tolerance is a version of redundancy. In the case of a redundant system, if the primary computer becomes unavailable to those resources requiring it then a backup computer will take over the task. There is a necessary transition period while the backup machine starts services or makes other required configuration changes. In a fault tolerant solution, two nodes (effectively separate computers) are running simultaneously. The FT control system virtualizes these nodes so that the outside world only Business Continuity & High Availability Options 2
sees a single unit (single MAC, single IP, single computer name, etc.), but internally both nodes process the same information at the same time. If one node goes bad (disk or other hardware failure) then the other is already doing the same operations and the task continues to perform without interruption. Business Continuity is synonymous with disaster recovery. This refers to a system s ability to continue operations in the face of a significant event. The most common example is the case in which the location of a computer system is left inoperable either due to damage or communications failures. Business Continuity & High Availability Options 3
Redundancy and High Availability Options AMAG Technology recognizes these various customer needs and has devised hardware and software solutions to meet them. There are four different configurations that are constructed to meet the needs of either high availability or business continuity requirements. These systems can be combined together in ways to meet both requirements simultaneously. Historically, high availability (HA) was limited to the realm of very high-end computing systems. One of the first affordable options for Windows-based HA computing systems was clustering software that monitors the operation of services on the primary server. Microsoft clustering is capable of starting services on the backup server if there is a problem. This software also controls the IP addresses of the overall system so that other network devices that rely on the primary server are not affected by the switchover. Traditionally, Microsoft Cluster Service (MSCS) provided with Windows 2000 Server Advanced Server Edition (and more recently in the Windows Server 2003 Enterprise Edition) has been an option but MSCS based solutions are more costly and limited because it requires MS SQL Server Enterprise Edition and is only compatible with Symmetry Enterprise and Global. Due to these limitations, AMAG Technology has teamed with NEC to offer ExpressCluster high availability software solutions that support standard and enterprise editions of Windows and MS SQL Server (including MSDE) to enable significantly lower cost, high-availability solutions. In addition, NEC ExpressCluster offers product editions that provide disk mirroring capability to further reduce total solution cost by eliminating the need for external shared storage systems required by MSCS. AMAG Technology can also provide the NEC Express5800 Fault Tolerant server in support of mission critical systems. The Express5800 Fault Tolerant server can be configured to meet the requirements of the computing environment, but is typically comprised of fully redundant nodes (a node can be thought of as a complete PC) running in lock-step. In essence, there are two computers running at the same time, doing the same thing. The hardware and software are constantly evaluating system operation and if something indicates an anomaly or has a failure in a module (Hard drive, memory, CPU, motherboard, fan failure), the system is able to automatically recover in real-time with no user disruption. If a component fails, the redundant unit is already running so there is no noticeable change in operation to the user. This system is rated as Five 9 s uptime (only 5 minutes per year of unscheduled downtime). This solution comes in single processor or dual processor nodes (two nodes in each FT Server). The dual processor (per node) solution requires a two-processor license is using the standard version of MS SQL Server, but this solution supports MSDE as well so it is compatible with all Symmetry versions (Business, Professional, Enterprise and Global security management software). Business Continuity & High Availability Options 4
Business Continuity Solutions One limitation that affects both MSCS and the Fault Tolerant server is that both the primary and backup systems are located in close proximity (typically in the same rack). If the cause of the failure (computer system failure or communications failure) is due to physical damage at the server facility, then remote facilities that rely on the server will not be serviced. One solution to this is remote redundancy where a backup server is located in a well-connected remote facility so that physical damage to one facility will not necessarily affect the other. One basic option is Double-Take software which provides only asynchronous data replication and recovery against site or whole server failures. The more advanced option is NEC ExpressCluster which provides both synchronous or asynchronous data replication over LAN, WAN, and SAN with granular application service and resource monitoring, system recovery, and virtual server identity support for simpler and faster migration of application and data workloads to and from the backup server. Synchronous data replication ensures that the data in all the storage systems is identical at every point in time. Asynchronous mode of operation is written to the primary storage system and then sent to one or more remote storage systems at some later point in time. NEC ExpressCluster supports both synchronous and asynchronous with easy manageability and offers the superior recovery within minutes or seconds. ExpressCluster continuously synchronizes the primary and backup data including databases and files and monitors application services, virtual server identity (host name and IP addresses) and other resources. When a problem is detected, the backup server takes over the virtual server identity and functionality of the primary so the overall system operations can resume instantaneously. When the cause of the failover is rectified, the system quickly re-synchronizes and fails back to normal operations. ExpressCluster does not require a specific edition of MS SQL Server so it is compatible with all Symmetry security management system versions. Business Continuity & High Availability Options 5
System Component Descriptions Each of the systems described above has different requirements for hardware and software components. The following tables provide guidelines for assembling redundant and/or high availability systems. Option 1 (NEC ExpressCluster): 2 Servers These can be workstation or server class machines 2 MS Windows 2003 Only standard edition is required. Server Standard or Enterprise Edition 2 MSDE or MS SQL Any version of MS SQL Server (including MSDE) is supported. Server Standard or Enterprise Edition 2 Symmetry Professional, Enterprise or Global Because this solution does not require special versions of software, any of the AMAG products can be used (Symmetry Professional requires MSDE version of MS SQL Server). Furthermore, you are not required to use the Unrestricted license level. 1 NEC ExpressCluster One unit of ExpressCluster X LAN (covers 2 servers) is recommended for local high availability with synchronous disk mirroring over LAN connection. One unit of ExpressCluster X WAN (covers 2 servers) is recommended for remote disaster recovery with synchronous or asynchronous disk mirroring over WAN connection. 2 Symmetry Options Since there are two versions of Symmetry software running, the same options must be purchased for both systems. 5 Professional Services AMAG recommends 5 days of Professional Services for configuring the NEC software.* 1 AMAG Disaster Disaster recovery fee for matched license keys. * Recovery Option 1 Software Support AMAG recommends that customers maintain their software support license. Option 2 (NEC Fault Tolerant Server): 1 NEC Express5800 This server offers two nodes (each configured as dual or quad (320 or 340) processors). 1 MS Windows 2003 ExpressBuilder software contains the proper drivers and performs Server Enterprise, & installation of the OS on the FT server system utilizing the Windows NEC Express Builder 2003 Enterprise Edition media. for Windows 1 MSDE,MS SQL Server Standard or Enterprise Editions 1 Symmetry Professional, Enterprise or Global Since the NEC Express5800 looks like a single computer to all the software, no special edition is required. This solution does not require special versions of SQL Server, therefore any of the AMAG products can be used (Symmetry Professional requires MSDE). Furthermore, you are not required to use the Symmetry Enterprise Unrestricted license level. 5 Professional Services AMAG Professional Services for configuring the NEC system.* 1 Software Support AMAG recommends that customers keep current on their software support agreement Option 3 (HP Clustered Server): Business Continuity & High Availability Options 6
1 HP DL380 This clustered server offers single processor, dual or quad processor options. Requires 2 nodes and shared storage. 2 MS Windows 2003 The Enterprise Server version is required to get the Cluster Service Server Enterprise tools. 1 MS SQL Server The Enterprise Edition is required for cluster awareness. Enterprise Edition 1 Symmetry Required to be Enterprise or Global because of need to use MS SQL Enterprise/UNR-CA, Server Enterprise Edition. Must be Unrestricted, Cluster Aware Global-CA version 5 Professional Services AMAG recommends 5 days of Professional Services be included with each cluster or disaster recovery system 1 Software Support AMAG recommends that customers keep current on their software support agreement Option 4 (Double-Take Software Remote Redundancy): 2 Servers These can be workstation or server class machines 2 MS Windows Server 2003 The standard edition of the OS is acceptable, but Microsoft licenses each machine separately. Server version of OS is required even for 2 MSDE or MS SQL Server Standard Edition 2 Symmetry Professional, Enterprise or Global Symmetry Professional. The standard edition is acceptable, but two licenses are required. Because this solution does not require special versions of software, any of the AMAG products can be used (Symmetry Professional requires MSDE, not MS SQL Server). Furthermore, you are not required to use the Unrestricted license level. * 2 Symmetry Options Since there are two versions of Symmetry software running, the same options must be purchased for both systems. Disaster recovery fee for matched license keys. * 1 AMAG Disaster Recovery Option 2 Double-Take Two copies of the Double-Take software are required to implement the Disaster Recovery model. 5 Professional Services AMAG recommends 5 days of Professional Services be included with each cluster or disaster recovery system 1 Software Support AMAG recommends that customers maintain their software support license. *Note: In quoting a NEC ExpressCluster or Double-Take solution, the less expensive of either (1) two copies of Symmetry software plus the DR Option fee; or (2) the unrestricted, cluster-aware version of the software (Symmetry Enterprise or Global) should be used. For example, if the customer were interested in Symmetry PRO/128, then they would have to purchase two copies of this software, 2 servers (each with Win Server 2003), 2 sets of Symmetry options, and the DR option. However, if the customer were interested in Symmetry ENT/512 it is more cost effective to upgrade to the ENT/UNR-CA version that includes the DR option and both sets of license keys. Combination Systems Business Continuity & High Availability Options 7
In addition to the system configurations described above, high availability (HA) and business continuity attributes can be combined in the following manner: Option 1 (NEC ExpressCluster): NEC can utilize their Express5800 Fault Tolerant servers in two distinct locations to provide a 99.999% hardware high availability solution with remote disaster recovery function as well. If 99.999% hardware high availability is not required then any PC-compatible servers could be used instead. The solution requires two servers with the addition of the ExpressCluster software from NEC that monitors the health of the servers as well as the software services and communications. Such a solution will require the Disaster Recovery option from AMAG in support of the remote unit. Option 2 (Clustered Double-Take): Similar to the above solution, one could utilize the Microsoft Cluster Services on appropriate hardware in the two remote locations with the Double-Take software (to provide the synchronization and fail over logic) that would provide a combination of high availability and disaster recovery. This solution requires MS SQL Server Enterprise Edition, the Symmetry Enterprise or Global Cluster Aware software, and the AMAG Disaster Recovery option. Summary The security management system from AMAG Technology offers companies one of the most flexible system architectures available. The software can be configured to run on a single platform, but also scales to run on clustered hardware, fault tolerant hardware, and can be implemented in a remote redundancy configuration to provide business continuity in the face of a catastrophic situation. There are a variety of system configuration options that have been presented here, but this document is only intended to be a guide. Your sales manager or applications engineer can further assist you in defining system configurations to meet your specific requirements. Revision History 03/22/2005 Initial release, Version 1.0 05/04/2005 Version 1.1, added additional details in configuration tables. 08/22/2005 Version 1.2, clarified DoubleTake configuration and use of DR option. 01/09/2006 Version 1.3, clarified Symmetry software versions supported. 05/11/2007 Version 1.4, updated descriptions. 07/17/2008 Version 1.5, updated with NEC ExpressCluster product features 06/08/2008 Version 1.6, updated recommendation of Professional Services support to 5 days. 04/27/2010 Version 1.7, reformatted Business Continuity & High Availability Options 8