White Paper VERITAS Storage Foundation TM for Windows Booting Windows from a Storage Area Network (SAN) With VERITAS Storage Foundation TM for Windows 9/14/2004 1
Introduction...3 Advantages of Booting from the SAN...3 Disadvantages of Booting from the SAN...3 Requirements...3 Recommendations...4 Limitations...4 The Windows Boot Process...4 Pre-Boot...4 Booting Windows...4 Boot Types...6 Local Boot...6 Network Boot...6 Boot from SAN...6 Booting from the SAN...6 Booting in a Cluster Environment...7 Dynamic MultiPathing (DMP)...7 Potential Issues & Resolutions...9 Summary...9 Resources...10 2
INTRODUCTION Until the release of Service Pack 1 for VERITAS Volume Manager 3.1 for Windows 2000, Cluster Disk Groups could not be created on the same bus as the disk that contains the system or boot partition/volume. This restriction was placed there to eliminate the potential for interrupting the Operating System or access to the pagefile during bus resets when a cluster failover occurred. Users of VERITAS Volume Manager requested that this limitation be removed from the product to facilitate booting from a SAN. Microsoft supports booting from a Storage Area Network (SAN) with Windows 2000 and Windows Server 2003 in configurations specified in their Knowledge Base article 305547. With VERITAS Volume Manager 3.1 for Windows 2000 + Service Pack 1 and VERITAS Storage Foundation for Windows, VERITAS simplifies the process of allowing the system/boot disk and cluster disk resources to share the same bus when booting Windows from a Storage Area Network. ADVANTAGES OF BOOTING FROM THE SAN While complicated to deploy, booting from the SAN offers several advantages. Some of the advantages of booting from the SAN are: Server Consolidation - Booting from an OS image located on the SAN opens up opportunity to use less expensive, thin diskless servers, which take up less space, require less power and have fewer hardware components. Centralized Management - Storing OS images on SAN disks facilitates centralized management of upgrades and fixes. Recovery from Server Failures Simplified - When a system that boots from the SAN fails, a replacement system can easily boot from the SAN and access data stored on the SAN, making for a quick return to production. Booting from the SAN greatly simplifies and expedites server recovery. Fast Recovery from Disaster - Mirroring or replicating the boot information and production data on the SAN to a remote SAN facilitates quick take over of production at the remote site in the event of a disaster at the primary site. Temporary Server Loads Rapidly Deployed - In environments that experience temporary periods of high production workloads, the ability to clone the boot image using SAN technologies and distribute the image to multiple servers for rapid deployment provides a distinct advantage. These servers can be quickly deployed into production for as long as needed and then removed, thereby providing a very cost-effective solution. DISADVANTAGES OF BOOTING FROM THE SAN While there are many advantages to booting from the SAN, it should only be undertaken by those familiar with the many complexities associated with SAN deployment. Some of the disadvantages of booting from the SAN are: Complex Hardware Deployment - All hardware components must be configured correctly, especially in heterogeneous environments. Improper configurations can cause SAN boot to fail. The Boot Process is Complex - An understanding of the boot process is necessary for effective troubleshooting. REQUIREMENTS To successfully boot from a Storage Area Network with VERITAS Storage Foundation for Windows the following conditions have to be met: The following configuration requirements have to be met per the above referenced Knowledge Base article: 1. The SAN must be configured in a switched environment or each host must be directly attached to the storage sub-system s Fiber Channel ports 2. The host must have exclusive access to the disk that it boots from, as Microsoft Windows requires a unique, dedicated disk for booting. LUN masking, which grants exclusive access to a disk, array or LUN 3
partition to a host via the unique World Wide Name (WWN) associated with its HBA, is therefore required when booting Windows from the SAN. 3. With Windows 2000, Microsoft requires that servers clustered with MSCS have their boot disks on separate storage paths than those used by the cluster s shared storage. When booting from the SAN, separate dedicated paths are required for booting and for shared storage. Note: With the introduction of the Storport driver in Windows Server 2003,the boot disk and cluster disks can be hosted on the same bus. See Microsoft s white paper Storport in Windows Server 2003: Improving Manageability and Performance in Hardware RAID and Storage Area Networks for more information on the Microsoft Storport driver. 4. The HBA must support booting from the SAN, and the correct HBA BIOS firmware and HBA driver versions must be used. HBA settings should also be correctly configured. 5. The boot BIOS must be enabled on only one adapter per server. RECOMMENDATIONS In addition to the many requirements for booting from a SAN, the following recommendations will provide additional levels of flexibility and redundancy when booting from the SAN: Mirror or replicate the boot volume to another local or remote LUN. If the event of a disk or array failure, the server can easily be mapped to the mirror, making for a quick recovery. Use multipathing to configure redundant paths to SAN storage. This will help in avoiding single points of failure. Using Microsoft s Storport driver, included in Windows Server 2003, along with appropriate miniport drivers from the HBA vendor will allow for more flexibility when configuring the SAN. Storport allows the boot disk and cluster disks to share the same Fiber Channel connection. Place the pagefile on a local disk (see Potential Issues & Resolutions later in this paper). LIMITATIONS Some of the limitations to booting Windows from a SAN are the inability to currently share the boot image (each server must have its own boot image), and there s no easy way to deploy multiple boot images. THE WINDOWS BOOT PROCESS Before delving into the complexities and advantages of booting from a SAN, an explanation of the boot process is necessary. The boot process, also known as bootstrapping, is the process of loading the operating system code from its storage device into a computer s memory. It can occur from a direct attached disk, from the LAN, or from a SAN. Regardless of the boot type, the process is the same. The following sections discuss the boot process for 32-bit architecture. See Microsoft s paper Boot from SAN in Windows Server 2003 and Windows 2000 Server for an explanation of the differences in the boot process with the Intel IA-64 architecture. PRE-BOOT When a computer is first powered on, its BIOS (Basic Input/Output System), the computer s most basic code, is loaded and performs a power-on-self-test (POST), which checks the hardware for any problems. If no problems are detected, the CPU proceeds with its operations. The BIOS then locates and initializes all bootable devices (including NICs and HBAs) and sets the boot device, which will either be the first bootable device found (default), or the one set by the user when there are multiple boot devices present. The boot sector is then loaded from the boot device into memory. Intel based systems require that the first sector of the primary disk contain the MBR (Master Boot Record), which contains the system partition that has the code and configuration files (Ntldr, boot.ini and Ntdetect.com) that are necessary to boot Windows. This partition has to be set as active (bootable). BOOTING WINDOWS After the boot sector is loaded, the rest of the boot process can proceed. In Windows, Ntldr controls most of the boot process. The boot sector loads Ntldr, which begins the process of loading the Windows operating system in phases. 4
The first of these phases is the initial boot loader phase, where Ntldr enables system access to all of the system s memory. This is known as protected-mode. Before this, the system runs in real-mode, where only the first 1 MB of system memory is accessed. Paging is also enabled in this phase. In the next phase, Ntldr loads boot.ini, which contains information about where the operating system kernel, registry and device drivers are located. Boot.ini uses either an ARC (Advanced RISC computing) 1 path or disk signatures (unique 32 bit numbers) to locate the boot files. Ntldr then loads Ntdetect.com, which uses the BIOS to query the system for additional hardware information (machine ID, bus/adapter type, number and size of disk drives and ports), which will later be recorded in the registry. Ntldr then loads the files that are responsible for kernel initialization from the boot partition - ntoskrnl.exe (the kernel), HAL.DLL (the Hardware Abstraction Layer), file system drivers and device drivers necessary to boot the system. Ntoskrnl.exe then takes control and locates the disk so that the registry can be updated with driver changes. With the successful completion of these phases, remaining system driver files are loaded, the Session Manager Subsystem (SMSS) loads and then loads the files necessary to create the User mode interface. The user can then login. Figure 1 outlines the boot process. Figure 1. The Windows Boot Process PRE-BOOT BOOT POST (Power-on Self Test) Check for HW problems BIOS locates and initializes all bootable devices (incl. NICs, HBAs,hard and floppy drives) BIOS sets the boot device (default: first bootable device found is set as the boot device) Boot sector loaded (The first sector of the primary disk contains the MBR 1 which contains the code and configuration files (Ntldr, boot.ini and Ntdetect.com) Initial Boot Loader Phase Boot sector loads Ntldr. Ntldr enables the system to access all physical memory (protected mode) and enables paging. Prior to this, only the first 1MB of system memory is available (real mode). Ntldr loads boot.ini. Boot.ini locates the boot files (OS kernel, registry and device drivers) using either ARC 2 path or disk signatures. Ntldr loads Ntdetect.com which performs hardware detection. Kernel Initialization Ntldr loads Ntoskrnl.exe, HAL.DLL, file system drivers and device drivers necessary to boot the system. Control passed to Ntoskrnl.exe (locates boot disk and updates registry with driver changes). User Mode Session Manager Subsystem (SBSS) is loaded. SBSS loads files to create user mode interface. User can login. 1 Master Boot Record 5
BOOT TYPES LOCAL BOOT Booting from a direct attached disk is the most common boot type. The SCSI BIOS contains instructions used by the server to determine the boot disk. NETWORK BOOT In network boot, a system is booted from a remote boot server over a local area network (LAN). The network adapter (NIC) contains the instructions necessary for booting. While there are advantages to booting over the network, it introduces considerable security risk. BOOT FROM SAN When booting from a storage area network (SAN), the boot disk is stored on the SAN. Communication between the server and the SAN is via the host bus adapter (HBA). The HBA BIOS contains the instructions necessary for the server to locate the boot disk on the SAN. BOOTING FROM THE SAN Now that we ve covered the boot process and boot types, we can delve into booting Windows from the SAN. Boot from SAN can be as simple as two servers connected to a switch, which connects to a single port on a Fiber Channel array, or be very complex configurations that involve multpathing configurations in a cluster environment, where multiple systems connect via multiple paths (HBAs, switches, array controllers) to the SAN. Figure 2 shows a basic SAN with two servers booting from the same array via the same port, which represents LUN 0. LUN 1 and LUN 2 are boot disks for the respective servers. Masking would be used to ensure that each server accesses only its boot LUN i.e. only Server 1 accesses LUN 1 etc. Figure 2. Basic SAN Server 1 Server 2 HBA A HBA B Switch Controller LUN 1 (boot) Physical Disks LUN 2 (boot) Storage Array LUN 3 (data) 6
BOOTING IN A CLUSTER ENVIRONMENT Microsoft supports booting from the SAN in Windows 2000 (updated versions), and Windows Server 2003. However, in Windows 2000, storage devices have to be on a different bus than the boot, system and pagefile disks to be considered as eligible cluster-managed devices. With the introduction of the Storport driver in Windows Server 2003, the boot/system/pagefile disk can share the same bus with any other disks, including cluster disks. This makes for a less expensive and more flexible solution, as it reduces the hardware required when booting from the SAN in a cluster. With the SCSIport driver used in Windows 2000, a cluster would require at least two HBAs per cluster node so that the boot and shared cluster disks do not share the same bus, and the shared cluster disks would have to be masked/zoned away from the boot disks so that other cluster nodes can access the shared disks, but cannot access another server s boot disk. If multipathing is implemented, it raises the minimum number of HBAs required per system to at least four, two for system/boot and two for the cluster disks. This is not only expensive, but the solution is subject to the limitations posed by the number of available HBA slots in a server. The Storport driver removes these limitations and simplifies the configuration. Windows Server 2003 cluster server includes a switch that enables support for any disk, including cluster disks, to be on the same bus as the system disk, boot disk or pagefile disk. To ensure that users who do not fully understand the implications of this configuration don t accidentally enable it, support has to be enabled by setting a registry key (HKLM\SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters\ManageDisksOnSystemBuses). VERITAS makes it possible for users to easily enable this support in both Windows 2000 and Windows Server 2003 through the use of a command line utility, vxclus, which modifies registry settings so that the system/boot disk and disks in a cluster disk group can share the same bus. Note that Windows 2000 requires that the correct HBA driver be used. Note that, even though Microsoft does not support having the boot disk and cluster disks on the same bus in Windows 2000, this limitation was removed per customer requests. VERITAS Storage Foundation for Windows enables/disables support for booting from the SAN from the command line as follows: vxclus UseSystemBus ON Enables support for cluster disks and the system/boot disk sharing the same bus. vxclus UseSystemBus OFF Disables support for cluster disks and the system/boot disk sharing the same bus. Volume Manager 3.1 with Hotfix03 enables/disables support for booting from the SAN from the command line as follows: vxclus SANBOOT enable Enables support for cluster disks and the system/boot disk sharing the same bus. vxclus SANBOOT disable Disables support for cluster disks and the system/boot disk sharing the same bus. DYNAMIC MULTIPATHING (DMP) In their article on booting from SAN (Q305547), Microsoft says the following about multipathing: Multi-path software and multiple HBAs improve your chances of recovery from a path failure. The purpose of having multiple HBAs in a single host is to have redundancy and (possibly) increased throughput. 7
VERITAS Storage Foundation for Windows provides multipathing capability via the Dynamic MultiPathing (DMP) option* (available as a feature on Volume Manger). With DMP, the path to the boot disk on the SAN can be made redundant and throughput can be increased with the introduction of a secondary path (redundant HBAs, cabling, switches, array controllers, and ports on each controller). In multipath configurations, appropriate masking has to be implemented to ensure that each server sees only its boot disk on the SAN through each HBA. Only one HBA should have its BIOS enabled per server, as only one LUN can be the boot LUN. A manual BIOS reset would have to be done to boot from the redundant HBA if the primary HBA fails. In Figure 3, HBA A is the active adapter in Server 1, while HBA A is the standby. If HBA A fails, the BIOS on HBA A would have to be enabled so that it can find the boot LUN and the system can boot from it. * Note: Support for the Storport driver with DMP will occur with the release of VERITAS Storage Foundation 4.2 for Windows. In multipath boot from SAN configurations, there are issues with crash dump file creation as the crash dump stack, which is created at boot and precedes the creation of the crash dump file, is specific to the HBA path from which the system is booted. See Microsoft s white paper Boot from SAN in Windows Server 2003 and Windows 2000 Server for more information on the crash dump file in a multipath environment. Figure 3. Boot from SAN Multipath Configuration Server 1 Server 2 HBA A HBA A HBA B HBA B Switch 1 Switch 2 1 2 3 4 Controller 1 Controller 2 LUN 1 (boot) Physical Disks LUN 2 (boot) Storage Array LUN 3 (data) 8
POTENTIAL ISSUES & RESOLUTIONS A potential issue with booting Windows from the SAN is pagefile access. The OS requires fast, unrestricted access to the pagefile. In SAN environments where multiple systems are simultaneously doing paging or booting via the same storage port, system hangs or slow performance may result. Placing the pagefile on storage local to the host would resolve this, but note that if the page file is on a different partition/volume than the boot partition/volume, a memory dump file (Memory.dmp) will not be created when Windows experiences a STOP error. Memory dumps are used to troubleshoot Windows STOP errors. In their article Disaster Recovery with the PowerVault 530F Dell recommends installing a SCSI drive in each server that is booted from the SAN and configuring it to hold the pagefile only. Upgrades to the storage array (firmware etc.) may require it to be down, resulting in the host being down rather than just applications hosted by that array being down. This can be alleviated though by mirroring the boot volume to another array. Another potential issue is that isolating problems may now involve troubleshooting, not only the host and OS, but also potentially the SAN. SUMMARY Booting from the SAN, while complex to set up, provides several advantages, including consolidation of hardware, centralized management, simplified recovery from server failures and quick recovery from disasters. Using VERITAS Storage Foundation for Windows (Volume Manager) simplifies the SAN boot configuration process in a cluster by allowing the boot disk and cluster disks to share the same bus. This saves on hardware requirements and reduces the complexity of setting up the solution. While this is not supported by Microsoft for Windows 2000, it is supported with the use of the Storport driver in Windows Server 2003. Replicating/mirroring the boot disk to another array provides redundancy for the boot disk in the event of an array failure, and Dynamic Multipathing (DMP) adds an additional layer of protection against failures along the storage path. 9
RESOURCES The following resources were used in the preparation of this document: Boot from SAN in Windows Server 2003 and Windows 2000 Server Microsoft Corporation Server Clusters : Storage Area Networks Windows 2000 and Windows Server 2003 Microsoft Corporation Storport in Windows Server 2003: Improving Manageability and Performance in Hardware RAID and Storage Area Networks Microsoft Corporation Microsoft Knowledge Base Article - 305547: Support for booting from a Storage Area Network Booting Windows from the SAN Rick Cook Disaster Recovery with the PowerVault 530F Mike Kosacek and Juan Montuno (DELL) 10
VERITAS Software Corporation Corporate Headquarters 350 Ellis Street Mountain View, CA 94043 650-527-8000 or 866-837-4827 For additional information about VERITAS Software, its products, or the location of an office near you, please call our corporate headquarters or visit our Web site at www.veritas.com. 11