High Availability Option for Windows Clusters Detailed Design Specification 2008 Ingres Corporation Project Name Component Name Ingres Enterprise Relational Database Version 3 Automatic Cluster Failover for Windows Author Steve Wonderly Last Saved Date October 20, 2008 Revision 0.7
Change History: Revision Date Last Revision By Reason for Change 27-Jan-04 Steve Wonderly Initial draft [Revision 0.1]. 26-Mar-04 Steve Wonderly Added cluster administrator commands [0.2] 6-Apr-04 Steve Wonderly Added step-by-step cluster administration [0.3] 13-Jul-04 Steve Wonderly Add to Project 360 site [0.4] 14-Jul-04 Steve Wonderly Minor corrections [0.5] 04-Aug-04 Steve Wonderly Add details to the Internal Specification [0.6] 20-Oct-08 Steve Wonderly Put into Ingres format to publish on Wiki [0.7] Page 2 of 21
TABLE OF CONTENTS 1 INTRODUCTION... 5 1.1 OVERVIEW... 5 1.2 SCOPE... 5 1.3 DEFINITIONS, ACRONYMS AND ABBREVIATIONS... 5 1.4 REFERENCES... 6 1.5 NOTEWORTHY ISSUES... 7 2 ARCHITECTURAL OVERVIEW... 8 2.1 CLUSTER SERVICE OVERVIEW... 8 2.2 MODULE DESCRIPTIONS... 9 2.3 SAMPLE FLOW/EXECUTION DIAGRAM... 10 2.4 DESIGN LIMITATION AND ASSUMPTIONS... 10 2.5 PLATFORM SPECIFIC ISSUES... 10 2.6 PATENT INFORMATION... 11 3 EXTERNAL SPECIFICATION... 12 3.1 USER PERSPECTIVE... 12 3.2 ADMINISTRATION PERSPECTIVE... 12 3.3 MIGRATION ISSUES... 12 3.4 SECURITY IMPACT... 12 3.5 CHANGES INITIATED BY REVIEWS/INSPECTIONS/WALKTHROUGHS... 12 4 INTERNAL SPECIFICATION... 13 4.1 ESTIMATED EFFORT... 13 4.2 PROGRAMMING... 13 4.3 RESOURCES AND FILES... 15 4.4 INTERFACE... 16 4.5 REFERENCES... 16 5 IMPACT SUMMARY... 17 5.1 PRODUCT IMPACTS... 17 5.2 DOCUMENTATION... 17 6 QUALITY ISSUES... 18 6.1 UNIT TESTING SUMMARY... 18 6.2 TESTING RECOMMENDATIONS... 18 6.3 REGRESSION RISK ASSESSMENT... 18 7 PACKAGING AND INSTALLATION IMPACT... 19 7.1 VERIFY THE WINDOWS CLUSTER SOFTWARE... 19 7.2 CONFIGURATION GUIDELINES FOR THE INGRES SYSTEM AND DATA FILES... 19 7.3 INSTALL PATH AND PACKAGE... 19 7.4 REGISTERING THE INGRES HIGH AVAILABILITY OPTION... 19 7.5 UPGRADE... 19 7.6 LICENSING... 20 8 SUPPORT IMPACT... 21 Page 3 of 21
PREFACE Intended Audience This document is intended for experience developers with extensive knowledge of Microsoft Windows 2000 Server software and PC-compatible hardware. It assumes knowledge of the Windows 2000 Advanced Server or Datacenter Server 1 operating system and the Microsoft Windows 2000 Cluster Service environment. This document is oriented toward developers who are very familiar with Windows 2000 clustering concepts such as server clusters, virtual servers, resource groups, and failover. Introduction to Windows 2000 Cluster Service 2 The Windows 2000 Cluster service is a separate, isolated set of components that work together with the operating system in order to provide: Improved availability by enabling services and applications in the server cluster to continue providing service during hardware or software component failure or during planned maintenance. Increased scalability by supporting servers that can be expanded with the addition of multiple processors. Improved manageability by enabling administrators to manage devices and resource within the entire cluster as if they were managing a single computer. The Microsoft Cluster service is based on a shared-nothing model of cluster architecture. Each server owns and manages its local devices, and devices common to the cluster, such as a common disk array, are selectively owned and managed by a single server in the cluster at any given time. This model does not require any specialized hardware or applications and enables the Cluster service to support standard Windows 2000-based applications and disk resources. 1 Windows 2000 Datacenter Server has not been tested but is expected to work like Advanced Server. 2 Windows 2000 Clustering Technologies: Cluster Service Architecture white paper, Microsoft Corporation. Page 4 of 21
1 INTRODUCTION 1.1 OVERVIEW The Ingres High Availability Option (HAO) for Microsoft Windows Clusters will provide fast error detection, fast software switch-over, and infrastructure restarts. If a failover does occur, clients may see a brief interruption in service, and in most cases, will need to reconnect after a failover has finished. However, once reconnected, the physical server from which they get applications and data is transparent. 1.2 SCOPE Cluster service is one of two complementary Windows clustering technologies provided as extensions to the base Windows 2000 operating system. The other clustering technology, Network Load Balancing, complements Cluster service by supporting highly available and scalable clusters for front-end applications and services such as Internet or intranet sites, Web-based applications, media streaming, and Microsoft Terminal Services. The Ingres High Availability Option for Windows 2000 Cluster service will provide a failover data service. For failover data services, application instances run only on a single node. If the Windows Cluster service detects an error, it attempts either to restart the instance on the same node or to start the instance on another node for failover, depending on how the Cluster service has been configured. The Ingres HAO for Windows 2000 Cluster service will not provide Network Load Balancing (scalable clustering), which has the potential for active instances on multiple nodes. 1.3 DEFINITIONS, ACRONYMS AND ABBREVIATIONS Ingres or Ingres Server: An instance of an Ingres installation including all executable, configuration and database files. Ingres Service or Ingres High Availability Option: The process by which the Windows 2000 Cluster service will monitor and control the Ingres server. High Availability (HA): A system that provides near continuous access to data and applications through a combination of hardware and software. Fault-tolerant: A hardware system that provides constant access to data and applications, but at a higher cost than HA because of specialized hardware. Additionally, fault-tolerant systems usually do not account for software failures. Failover: Failover is the process by which the cluster automatically relocates a service from a failed primary node to a designated secondary node. Scalable services: Scalability provides constant response time or throughput without regard to load. A scalable service leverages the multiple nodes in a cluster to concurrently run an application, thus providing increased performance. Page 5 of 21
Cluster service: The Windows 2000 Cluster service is intended primarily to provide failover support for applications such as databases, messaging systems, and file/print services. Cluster service supports 2-node failover clusters in Windows 2000 Advanced Server and 4-node clusters in Datacenter Server. Server cluster: A server cluster is a group of independent computer systems, known as nodes, working together as a single system to ensure that mission-critical applications and resources remain available to clients. Cluster node: A cluster node is a machine running both the Windows 2000 Advanced/Datacenter Server operating environment and Microsoft Cluster Service (MSCS). By definition, a node is always a member of a server cluster. Quorum resource: In every cluster, a single resource is designated as the quorum resource. This resource maintains the configuration data necessary for recovery of the cluster. Resource types: Cluster resources are categorized by type. Windows 2000 defines several types of resources and provides resource DLLs to manage the types. Standard resource types: Windows 2000 comes with numerous resource types to let you manage cluster resources, including: Physical Disk, DHCP and WINS, Print Spooler, File Share, Internet Protocol, Network Name, Generic Application, and Generic Service. Resource: A cluster resource is any physical or logical component that has the following characteristics: Can be brought online and taken offline. Can be managed in a server cluster. Can be hosted (owned) by only one node at a time. Resource Group: A group of resources that can be managed as a unit. Virtual server: A virtual server is a group that contains: A Network Name resource. An IP Address resource. The applications to be accessed by the clients of the virtual server. Cluster Administrator: A cluster is administered using the Cluster Administrator, a graphical administrator s tool that enables performing maintenance, monitoring, and failover administration. 1.4 REFERENCES Ingres Enterprise Relational Database Version 3.0 PRS Ingres Enterprise Relational Database Release 3.0 Software Requirements Specification Windows 2000 Clustering Technologies: Cluster Service Architecture white paper, http://technet.microsoft.com/en-us/library/bb727117.aspx, Microsoft Corporation. Page 6 of 21
Windows 200 Server: Step-by-Step Guide to Installing Cluster Service, http://technet.microsoft.com/en-us/library/bb727114.aspx, Microsoft Corporation. Introducing Windows 2000 Advanced Server, http://www.microsoft.com/windows2000/advancedserver/evaluation/business/ overview/advanced.asp, Microsoft Corporation. [Superceded by Windows 2000 Server,http://technet.microsoft.com/enus/library/bb727159.aspx.] Windows 2000 Clustering Technologies, http://www.microsoft.com/windows2000/technologies/clustering/, Microsoft Corporation. [Superceded by Windows Clustering Technologies - An Overview, http://technet.microsoft.com/en-us/library/bb727116.aspx.] 1.5 NOTEWORTHY ISSUES A few questions remain. 1. Because the Ingres server (or client-only Net server) is installed on only one node of the cluster and then enabled on all other nodes, is it safe to assume that Ingres can only be uninstalled from that same first node? (This is the way it currently works.) 2. Because Ingres is installed on only one node of the cluster, the program shortcuts for the Ingres programs and utilities (Ingres Configuration Manager, Ingres Network Utility, Ingres Visual DBA, etc.) are installed on only that node. The other nodes of the cluster do not have the shortcuts. Page 7 of 21
2 ARCHITECTURAL OVERVIEW 2.1 CLUSTER SERVICE OVERVIEW The following figure shows a sample cluster configuration. For a detailed explanation refer to Introducing Windows 2000 Advanced Server and Windows 2000 Clustering Technologies: Cluster Service Architecture. Two-node server cluster running Windows 2000 Advanced Server Page 8 of 21
Windows 2000 Datacenter Server supports four-node clusters and does require device connections using Fibre Channel as shown in the following illustration of the components of a four-node cluster. Four-node server cluster running Windows 2000 Datacenter Server 2.2 MODULE DESCRIPTIONS 2.2.1 Ingres Service The Ingres service for Windows is enhanced to provide monitoring of the various Ingres servers that it started, e.g., DBMS server, net server, etc. This functionality provides the high availability service required in this project by monitoring the servers and shutting down if one of the servers fails. When the Ingres service shuts down or fails, the Windows Cluster service detects the failure and reports it and restarts Ingres on the same cluster node or failover to another node, depending on the settings defined by the administrator. 2.2.2 Cluster Administrator A cluster can be administered using the Cluster Administrator, a graphical administrator s tool that enables performing maintenance, monitoring, and failover administration. Additionally, Cluster service provides an automation interface that can be used to create custom scripting tools for administering cluster resources, nodes, and the cluster itself. Applications and administration tools, such as the Cluster Administrator, can access this interface using remote procedure calls (RPC) Page 9 of 21
regardless of whether the tool is running on a node in the cluster or on an external computer. 2.2.3 The Cluster Commands You can use cluster commands to administer server clusters from the Windows 2000 command prompt. You can also call the program cluster.exe from command scripts to automate many cluster administration tasks. Cluster.exe is provided on all Windows 2000 computers. The cluster commands used for Ingres Windows 2000 High Availability are shown below. For more information regarding the cluster commands, see the Cluster commands overview Windows 2000 Advanced Server help topic. 2.3 SAMPLE FLOW/EXECUTION DIAGRAM The Windows Cluster Service starts the Ingres service when requested by the Cluster Administrator or Cluster command. When the Ingres service starts, it starts the Name server, Net server, and various other servers as configured. Then the Ingres service monitors the servers by polling them every 30 seconds, as long as they continue to respond. If one of the critical servers does not respond, the Ingres service shuts down all of its servers and terminates. The Windows Cluster Service detects the shutdown of the Ingres service as a failure and attempts to restart Ingres on either the same node or another cluster node, depending on the configuration set in the Cluster Administrator. When requested by the Cluster Administrator or a Cluster command, the Cluster Service sends a shutdown message the Ingres service. The Ingres service, in turn, shuts down all its various servers and then the service itself terminates. All other operations of the Ingres Service and various servers of the installation behave as a normal, non-clustered installation. 2.4 DESIGN LIMITATION AND ASSUMPTIONS The Ingres High Availability Option for Windows 2000 Cluster service runs on Windows 2000 Advanced Server and Windows 2000 Datacenter Server. Ingres HAO for Windows supports both server (DBMS) and client-only (Ingres Net) installations. 2.5 PLATFORM SPECIFIC ISSUES See also the section DESIGN LIMITATION AND ASSUMPTIONS. Supported Platforms Windows 2000 Advanced Server Windows 2000 Datacenter Server 3 3 Windows 2000 Datacenter Server has not been tested but is expected to work like Advanced Server. Page 10 of 21
2.6 PATENT INFORMATION None. Page 11 of 21
3 EXTERNAL SPECIFICATION 3.1 USER PERSPECTIVE Remote client access to the Ingres installation should be done be through the clusterdefined virtual server. The actual server name and IP address where Ingres is running is transparent to the end user. 3.2 ADMINISTRATION PERSPECTIVE Ingres will not be able to be shut down or started by the Ingres user. All starting and stopping must be done through the Cluster service via the Cluster Administrator GUI tool or the cluster commands. If Ingres or one of its servers is shut down while under the control of the Cluster service, the cluster server will see this as a failure and try to restart Ingres on the same node or failover the service to the failover node. 3.3 MIGRATION ISSUES This is the first version of the Ingres DBMS High Availability Option for Windows 2000 Cluster service, so there are no migration issues. 3.4 SECURITY IMPACT Installation of the High Availability Option must be done by a Windows administrator. Once installed, the High Availability Option can be controlled only by an administrator with access to Windows cluster commands or the Cluster Administrator. When started, security of the Ingres server is identical to that of a non-clustered Ingres installation. 3.5 CHANGES INITIATED BY REVIEWS/INSPECTIONS/WALKTHROUGHS Use table below to document changes for a component or a design entity. Date Type No Change Description 1 2 3.5.1 Change Description 1 In this section describe each change and its impact. Create a sub-section for each change identified in the table above. Click here to begin typing Page 12 of 21
4 INTERNAL SPECIFICATION 4.1 ESTIMATED EFFORT Estimated engineering effort to code and test the Ingres DBMS HAO for Windows 2000 Cluster service ready for handover to QA and Technical Writers will require approximately 45 man days of development (including time involved in studying Windows clustering and preparing the DDS). 4.2 PROGRAMMING Setting up the development environment for defining a Cluster service Before beginning Ingres HAO development, you must have installed the Windows 2000 Advanced Server and the Cluster service. Development is done via Microsoft Visual Studio.NET 2003. 4.2.1 Modify the Ingres service module (servproc.exe) The Ingres service module (servproc.exe) starts and stops the various servers that make up an Ingres server installation or client-only (net server) installation. Automatic cluster failover for Windows (the High Availability Option) necessitates adding code to detect the failure of any of the server processes started by the Ingres service and to stop all servers upon such a failure and terminate the service. 4.2.1.1 Detailed Implementation Description After starting the configured servers, the servproc module waited for a single event: service shutdown. The High Availability Option adds code to wait for either the shutdown event or a timeout period (30 seconds). If the timeout event pops, servproc checks to see if all of the started servers are still running. If all servers are running, servproc again waits. If a server is no longer running or responding, servproc stops all servers and then shuts down. The new code is used ONLY IF the parameter HAScluster is passed to servproc upon startup of the service. 4.2.1.2 External Interfaces to the Other Modules A new parameter is added to the servproc module to support the High Availability Option, HAScluster. It is passed when starting the Ingres service to enable the new functionality. The servproc module uses the ingstop module, with a new parameter (-check) to check the status of the started servers. The ingstop module returns a zero return code if all servers are OK; otherwise it returns a -1 value. 4.2.2 Modify the ingstop command/module (ingstop.exe) The ingstop command stops Ingres servers, depending on the parameter passed. Changes for the High Availability Option added a new option, -check, to check the status of the started servers and return a value based on the result of the check. Page 13 of 21
4.2.2.1 Detailed Implementation Description When -check is passed, ingstop builds a list of servers that should have been started, and then it checks that list to see if each server is running. If all configured servers are running, ingstop returns a zero return code; otherwise it returns an error code (-1). Two new functions are added to ingstop.c: build_server_list() and check_server_list(). The build_server_list() function uses PM routines of the CL to read the config.dat file and determine which servers should be started and running. It builds a list of the DBMS, GCC, GCB, JDBC, GCD, and STAR servers and the number and name of each server configured. The check_server_list() function compares the list built by build_server_list() with the actual counts of servers found by an existing functions, get_procs() and find_procs(). The get_procs() function calls the iinamu command to determine which servers are running. 4.2.2.2 External Interfaces to the Other Modules A new parameter, -check, is added to ingstop to support the new functionality. The PM CL routines are used by ingstop to read the configuration of the servers from the config.dat file. The PM interfaces are new to the ingstop module. In addition, changes made to ingstop.c use an existing external interface, the iinamu command, to determine which servers are running. 4.2.3 Create a High Availability Option setup wizard (wincluster.exe) A new setup wizard allows the user to easily setup the High Availability Option for Windows clusters. The wizard uses a standard Windows property sheet control to display a series of dialogs that allow the user to specify the details for implementing the High Availability Option. 4.2.3.1 Detailed Implementation Description The wincluster wizard is contained at front\st\wincluster in a solution/project pair of files and several C++ source and header files. wincluster.sln the Visual Studio.NET solution file wincluster.vcproj the Visual Studio.NET project file 256bmp.cpp/.h the main bitmap in the wizard clusterpage.cpp/.h the cluster information page of the wizard, displays the name of the cluster and processes cluster information for adding of the Ingres service to the cluster configdat.cpp/.h modifies the config.dat file for failover cluster operation dependencies.cpp/.h the dependencies page of the wizard, displays and processes the resource dependencies needed for the Ingres service final.cpp/.h the final page of the wizard, prompts user to finish the installation or cancel it Page 14 of 21
installcode.cpp/.h displays existing Ingres installations installed and allows user to select one to setup the High Availability Option presetup.cpp/.h contains the information obtained during pre-setup about the Ingres installation propsheet.cpp/.h the property sheet that hosts the property pages of the wizard resource.h contains symbolic resource references setenv.cpp/.h sets an Ingres variable to new value (like ingsetenv) splash.cpp/.h displays the splash screen StdAfx.cpp/.h contains headers for creating pre-compiled headers welcome.cpp/.h the first page of the wizard, displays opening comments and explanation of the wizard wincluster.cpp/.h the main module, starts the wizard and contains various helper functions res\block01.bmp, ingsplash.bmp bitmaps res\wincuster.rc2 resources for wincluster 4.2.3.2 External Interfaces to the Other Modules The external interface is a Windows property sheet and property pages which implement a wizard user interface. External interfaces used by the High Availability Option setup wizard (wincluster.exe) include: The PM Compatibility Library (CL) routines read, write, and scan config.dat Windows clustering APIs from the Platform SDK: OpenCluster opens a connection to a cluster and returns a handle CloseCluster closes a cluster handle GetClusterInformation retrieves a cluster's name and version ClusterOpenEnum opens an enumerator for iterating through cluster objects in a cluster ClusterEnum enumerates the cluster objects in a cluster, returning the name of one object with each call OpenClusterResource opens a resource and returns a handle to it CloseClusterResource closes a resource handle ClusterResourceControl initiates an operation affecting a resource. The operation performed depends on the control code passed. ResUtilFindSzProperty utility function locates a string property in a property list 4.3 RESOURCES AND FILES Resources needed for developing Ingres DBMS High Availability Option for Microsoft Windows 2000 are: Page 15 of 21
Name Version New Description of module or entity Microsoft Windows 2000 Microsoft Visual Studio.NET 2003 Microsoft Platform SDK MKS Toolkit for Windows Piccolo client Advanced Server 4 7.1 or above July 2002 or later 5.2 or above 2.1.21 or above O/S C++ development C++ runtime Development tools Source control Description of change 4.4 INTERFACE How do other components that are external to the design interact with this component? Describe methods and rules of interaction. Communication protocols Data formats, acceptable values A description of the input range, the meaning of inputs and outputs, the type and format of inputs and outputs. Describe limitation and boundary conditions. Document error condition, error codes and messages. Click here to begin typing 4.5 REFERENCES Step-by-Step Guide to Installing Cluster Service, http://www.microsoft.com/windows2000/techinfo/planning/server/clustersteps. asp, Microsoft Corporation. Server Clusters Windows 2000 Advanced Server help topic, Microsoft Corporation. Cluster commands overview Windows 2000 Advanced Server help topic, Microsoft Corporation. 4 As of August 6, 2004, Windows 2000 Advanced Server is not on the list of approved operating systems in Computer Associates Principal Tech Stack. Page 16 of 21
5 IMPACT SUMMARY The estimates in this section are approximate and are intended to give other groups such as Tech Writing, Services (Support, Education, CA Technology Services), QA and Localization an idea of the impact this change will have. 5.1 PRODUCT IMPACTS 5.1.1 Entities List the menus, screens/panels, commands, reports and messages that are impacted by the development of the module/function. Use the table below to summarize these changes. Entity New Modified Comments Menus Lists Screen/Panels Commands Messages Help Modules Reports 5.2 DOCUMENTATION MANUAL Getting Started Guide System Administration Guide IMPACT A new section for installing and configuring Ingres HAO clusters will be needed. A new section for Ingres HAO clusters will be needed. Page 17 of 21
6 QUALITY ISSUES 6.1 UNIT TESTING SUMMARY 6.1.1 Unit Testing Description List external functions and procedure that will be unit tested. 6.2 TESTING RECOMMENDATIONS The Ingres HAO needs to be tested on a Windows 2000 Advanced Server and a Datacenter Server cluster. Test that the Ingres HAO behaves properly in all cases where a resource group is moved between physical hosts. These cases include system crashes and the use of the cluster /fail command. Test that the client machines continue to get service after these events. 6.3 REGRESSION RISK ASSESSMENT The Ingres HAO for Windows 2000 Cluster service is a new component with no direct regression risks. 6.3.1 Backward Compatibility Issues The Ingres DBMS HAO for Windows 2000 Cluster service is a new component with no direct backward compatibility issues. Page 18 of 21
7 PACKAGING AND INSTALLATION IMPACT 7.1 VERIFY THE WINDOWS CLUSTER SOFTWARE The Windows Cluster service must be installed and configured on the cluster nodes in order to install the Ingres HAO for Windows 2000 Clusters. To display the cluster version and release, use the cluster [[/cluster:]cluster name] /version command: C:\>cluster /ver Cluster Name: CLUSTER2000 Cluster Version: 5.0 (Build 2195: Service Pack 4) Cluster Vendor: Microsoft(R) Cluster service 7.2 CONFIGURATION GUIDELINES FOR THE INGRES SYSTEM AND DATA FILES Access to the Ingres data does not depend on the type of clustered file system. The clustered files systems supported by Windows 2000 Cluster service are multi-initiator SCSI or SCSI over Fibre Channel to RAID disk sets. The Ingres system files can exist in either the local disks of each cluster node or on the shared cluster file system. The Ingres data and related files must exist on the shared cluster file system. It is recommended that all system, data, and data-related files are stored on the shared clustered file system. 7.3 INSTALL PATH AND PACKAGE The new HAO setup wizard, wincluster.exe, is put into the \ingres\bin directory at the time of the Ingres installation. The servproc.exe module is also in \ingres\bin, and ingstop.exe is in \ingres\utility. 7.4 REGISTERING THE INGRES HIGH AVAILABILITY OPTION These steps will need to be performed after the Ingres installation. Preparing to install the Ingres HAO The Windows Cluster Service must be installed and configured on the cluster nodes before installing the Ingres High Availability Option. The Cluster Service is a component of Windows Clustering that creates a server cluster and controls all aspects of its operation. How to set up and configure the Ingres HAO The instructions for setting up the High Availability Option are given in Chapter 12, Installing the High Availability Option for Windows, in the System Administrator Guide. 7.5 UPGRADE This will be the first version of the Ingres HAO for Windows 2000 Cluster service so there will be no upgrade issues. Page 19 of 21
7.6 LICENSING There are no plans to license the Ingres HAO separately. Licensing will be performed at the DBMS level. Page 20 of 21
8 SUPPORT IMPACT Detail any diagnostics or trace facilities built in to the component, or facilities built into other components that would be used to gather information about what is happening. Note anything that may make the component: Difficult to diagnose (for example: no tracing facility) Difficult to service Unreliable (for example: External Risks) Workarounds Page 21 of 21