A Detailed Review Abstract This white paper provides a consolidated study on the (NPIV) feature and usage in different platforms and on NPIV integration with the EMC PowerPath on AIX platform. February 2010
Copyright 2010 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners. Part Number h6894 A Detailed Review 2
Table of Contents Executive summary...4 Introduction...4 Audience... 4 Current challenges...4 From the server perspective... 4 From the SAN fabric perspective... 5...5 N_Port Virtualizer...7 NPIV-based LUN access...7 NPIV and QoS (VMware-specific implementation)...8 PowerPath changes for NPIV (AIX-specific)...9 NPIV and performance...10 Conclusion...11 References...11 A Detailed Review 3
Executive summary Fibre Channel is a flexible standard based on a networking architecture that can be used as a transport mechanism for a number of upper-level protocols. The most common upper-level protocols are SCSI and TCP/IP. Fibre Channel is a serial full duplex protocol; it has sophisticated flow control that allows it to be extended over a long distance. One of the most remarkable Fibre Channel evolutions is the implementation of the storage area network (SAN). In SANs, Fibre Channel has become the industry s de facto fastswitching-system standard for connecting client computers and servers to highly scalable volumes of data. It also provides improved management and control, better viewing and reporting, fault tolerance, reduced downtime, and better efficiency to data centers. (NPIV) is a technical capability to dynamically increase Fibre Channel HBA port virtualization. This technology is increasingly gaining importance in the storage virtualization domain as data center administrators can clearly see the importance of NPIV-based solutions to deployment scenarios like: VMware, where virtual machines can increase as per the business requirement and host resources Virtual machines running on blade servers Environments with increased fabric port requirements Introduction This white paper provides technical insights into NPIV-based solutions for the deployment challenges mentioned above and also into solutions as implemented in an EMC PowerPath for AIX platform. Audience This white paper is intended for the technology professional, data center system administrators, EMC and non-emc technical staff, and EMC customers to provide a consolidated study on NPIV and its features. Current challenges From the server perspective The current trend in data center design is server virtualization, or the use of virtual machine (VM) technology to prevent proliferation of physical servers. All virtual machines running on a physical server share the same physical I/O connections. In other words, a virtual machine monitor or hypervisor blends individual VM disk I/Os before sending them to the SAN, and unveils potential bandwidth contention problems and quality-of-service issues for applications running in individual virtual machines. Also, the current set of tools used by storage administrators to monitor, troubleshoot, and secure the SAN loses application-level visibility since I/Os initiate from the same physical HBA. In a non-virtual environment a typical SAN practice is to create a zone when assigning a storage logical unit number (LUN) to a server. A zone permits only one particular server to access that LUN. This could be done by assigning the World Wide Name (WWN) of the SAN host bus adapter (HBA) to that LUN. Since each HBA had its own unique identifier or WWN, this allows secure access to that LUN as well as allowing customizable quality of service (QoS) for the application. This best practice was initially broken by server virtualization. As mentioned, each zone is assigned to a WWN, but the problem is each virtualization host may support multiple virtual machines. Each virtual machine shares access to the server s HBA through a hypervisor and as a result, has the same WWN identification to the LUN. Without a mechanism to identify the individual virtual machines to the SAN there is no way to track their use of SAN resources or to make sure they don't conflict with those SAN resources. A Detailed Review 4
Another challenge that server virtualization brings to SAN storage is a live migration capability the ability to move a virtual machine from one virtualized server to another. Administrators need to remember to include the second host's WWN in the zoning scheme, otherwise after migration to the second host the virtual machine can't see its storage because the SAN fabric will block access to it from the HBA with an unauthorized WWN. One way to solve these issues is to dedicate physical HBAs to each virtual machine, rather than having the hypervisor manage virtual HBAs. But dedicating HBAs to each virtual machine is pricey and does not deliver much additional value for the investment. Also, the inclusion of multiple physical HBAs in VMs would require more physical ports (N_Port) in a SAN fabric and the result is a bigger SAN fabric requirement. From the SAN fabric perspective With the increasing usage of blade servers in SAN environments, the deployment and use of aggregation switches are becoming more widespread. One major concern when designing and building Fibre Channel based SANs is the total number of switches or domains that can exist in a physical fabric. As the edge switch population grows, the number of domain IDs becomes a concern. The domain is the address of a physical switch or logical virtual fabric; the domain ID is the most significant byte in an endpoint Fibre Channel ID (FCID) (Figure 1). Figure 1. Fibre Channel ID (FCID) The switch uses this FCID to route frames from a given source (initiator) to any destination (target) in a SAN fabric. This 1 byte allows up to 256 possible addresses. The Fibre Channel standard allows for a total of 239 port addresses that can be used for domain IDs, but having more and more domain IDs brings complexity in managing the fabric as well as impact in performance due to a lot of switch connectivities. Another design concern is interoperability with third-party switches. Different SAN fabric vendors infer the Fibre Channel addressing standard differently. Also, some vendor-specific attributes used for switch-toswitch connectivity (or expansion port/e_port connectivity) made connections among different vendor switches challenging, leading customers to implement edge switch technology that matched the core director type in the fabric. To address these concerns, two features, and N_Port Virtualizer, were developed. (NPIV) is an ANSI T11 standard that describes how a single Fibre Channel HBA port (single N_Port/single FCID) can register with several World Wide Port Names (WWPNs) or multiple N_Port IDs in the SAN fabric. This allows a fabric-attached N_Port to claim multiple fabric addresses. Each address appears as a unique entity on the Fibre Channel fabric. In other words, NPIV-capable HBAs can provide multiple WWPNs rather than registering a single WWPN in the fabric. This is beneficial in two ways: In a virtual machine environment each VM can have separate WWPNs so that the hypervisor will be released to provide the I/O blending operation. In a virtual machine environment where many host operating systems or applications are running on a physical host, each virtual machine can now be managed independently from zoning, aliasing, and security perspectives. Also, there would be no extra physical ports to be connected in the SAN fabric so the addition of more edge switches would not be required. Figure 2 shows an example of an NPIV-aware host connection. In the figure, the NPIV-capable SAN is a combination of NPIV-capable HBAs and NPIV-capable switches. A Detailed Review 5
NPIV Capable SAN Figure 2. NPIV-aware server host connection An HBA that supports the NPIV feature follows the standard login process. The initial connection and login to the fabric are performed through the standard F_Port login (FLOGI) process. All subsequent logins for either virtual machines or logical partitions (LPAR) on a mainframe are transformed into Fabric Discovery (FDISC) login commands. The FDISC logins follow the same standard process and acquire additional addresses. Figure 3 steps through the login process of an NPIV uplink and the local logins to the NPIV-enabled adapter. Standard FLOGI Process FDISC Login for NPIV Figure 3. NPIV login process A Detailed Review 6
N_Port Virtualizer An extension to NPIV is the N_Port Virtualizer feature. The N_Port Virtualizer feature allows the edge switch or end fabric device to behave as an NPIV-based HBA to the core Fibre Channel director (Figure 4). The device aggregates the locally connected host ports or N_Ports into one or more uplinks (pseudointerswitch links) to the core switches. The login process for the N_Port uplink is the same as for an HBA that is NPIV-enabled; the only requirement is the core director should support the NPIV feature. As end devices log in to the NPV-enabled edge switches, the FCID addresses that are assigned use the same domain of the core director. Because the connection is treated as an N_Port and not an E_Port to the core director, the edge switch shares the domain ID of the core switch as FCIDs are being allocated. The edge NPV-enabled switch no longer requires a separate domain ID to receive connectivity to the fabric. Therefore, the use of more domain IDs for additional edge switches could be eliminated using NPV. Figure 4. An N_Port Virtualizer-enabled edge switch behaves like an HBA to the core switch NPIV-based LUN access NPIV enables a single FC HBA port to register several unique WWNs with the fabric, each of which can be assigned to an individual virtual machine. When a virtual machine has a WWN assigned to it, the virtual machine s configuration is updated to include a WWN pair (consisting of a WWPN), and a World Wide Node Name (WWNN). As that virtual machine is powered on, the VMkernel instantiates a Virtual Adapter Port (VPORT) on the physical HBA that is used to access the LUN. The VPORT is a virtual HBA that appears to the FC fabric as a physical HBA, that is, it has its own unique identifier, the WWN pair that was assigned to the virtual machine. Each VPORT is specific to the virtual machine, and the VPORT is destroyed on the host and it no longer appears to the FC fabric when the virtual machine is powered off. If NPIV is enabled, four WWN pairs (WWPN and WWNN) are specified for each virtual machine at creation time. When a virtual machine using NPIV is powered on, it uses each of these WWN pairs in sequence to try to discover an access path to the storage. The number of VPORTs that are instantiated equals the number of physical HBAs present on the host up to the maximum of four. A VPORT is created on each physical HBA that a physical path is found on. Each physical path is used to determine the virtual path that will be used to access the LUN. Note that HBAs that are not NPIV-aware are skipped in this discovery process because VPORTs cannot be instantiated on them. In Figure 5, two IBM mainframe LPARs share a single physical FCP port. Each instance registers with the name server. The NPIV WWPN is supported in the FDISC process. A Detailed Review 7
Figure 5. NPIV provides unique WWPNs to servers sharing an FCP port in a zvm Mainframe In a zvm mainframe during Power On Reset (POR) or dynamic I/O activation, each FCP sub-channel is assigned a WWPN by the Support Element (SE) regardless of whether the LPAR is NPIV-enabled. If the LPAR is not enabled for NPIV, the microcode does not use the NPIV WWPNs. The SE retains, on its hard drive, the information about the assigned WWPN (to prevent the data from being lost if the system is shut down or the FCP adapter is replaced). Each LPAR receives a different N_Port ID. This allows multiple LPARs or VM guests to read and write to the same LUN using the same physical port. Without NPIV, writing to the same LUN over a shared port is not allowed. The Virtual FC adapter feature makes use of NPIV. NPIV and QoS (VMware-specific implementation) NPIV becomes truly valuable when it is used in conjunction with storage QoS capabilities like those that Brocade and other vendors provide in an end-to-end configuration. NPIV support in VMware enables extending the benefits of Brocade Adaptive Networking Services to each individual VM rather than to the physical server running the VM. Cisco also plays a significant role in providing NPIV-based solutions on SAN fabric like N_Port Virtualizer, developing solutions like Fabric Port (F_Port) Trunking, and integrating NPV with Cisco VSAN-based environments. Using NPIV to optimize server virtualization provides an administrator with another layer of control. This allows system administrators to more completely understand and provide QoS to the application by specifying the QoS with NPIV. A Detailed Review 8
Figure 6. NPIV and QoS in a Brocade-based implementation PowerPath changes for NPIV (AIX-specific) PowerPath is host-based software that provides path management. PowerPath operates with several storage systems, on several operating systems, with Fibre Channel and iscsi data channels. PowerPath supports multiple paths to a LUN, enabling PowerPath to provide: Automatic failover in the event of a hardware failure. PowerPath automatically detects path failure and redirects I/O to the available path(s). PowerPath also performs periodic path health status checks and automatically restores the path when recovered. Dynamic multipath load balancing. PowerPath distributes I/O requests to a logical device across all available paths, thus improving I/O performance and reducing management time and downtime by eliminating the need to configure paths statically across logical LUNs. Typically the AIX disk driver establishes a reserve on a LUN and manages that reserve based on usersettable attributes for a physical volume when the volume is opened. EMC PowerPath manages the reserve through its proprietary commands. The AIX disk driver does not inspect the command nor does it need to understand the semantics of the vendor-specific reserve command. The PowerPath reserve does not change the state of the device from the perspective of the AIX disk driver. This allows complete decoupling of EMC PowerPath solution to manage reserves on logical units from the AIX disk driver. In an NPIV LPAR Partition Mobility solution, the client I/O stack manages the reserve during the migration. This means depending on the type of SCSI RESERVE command issued, a specific action needs to be taken to break and or re-establish a reserve to the LUN as part of the migration. The AIX disk driver does this if it is managing the reserves on behalf of the initiator. In the case of PowerPath, the AIX disk driver does not manage the reserves. The AIX kernel provides a kernel service to allow vendor kernel extensions to act before and after a Partition Mobility migration. The kernel extension calls reconfig_register_ext and registers a function with the AIX kernel, which is called by the AIX kernel on specific events. The kernel calls back into the kernel extensions synchronously in respect to the registered events so in this case a particular stage of the LPAR migration cannot proceed until the registered function completes. A Detailed Review 9
NPIV and performance As NPIV is associated with multiple independent data channel I/O transfer, it would be wise to discuss how NPIV efficiently handles performance. NPIV-capable HBAs optimize performances by the capability of interleaving Fibre Channel data transfers at the frame level. To illustrate why frame-level multiplexing has such an impact, let s start with the basics of a Fibre Channel communication exchange. An I/O transaction in Fibre Channel is called an exchange. Exchanges contain one or more sequences, which in turn contain one or more frames, as shown in Figure 7. Frames can be 512, 1,024, or 2,048 bytes in length, but 2,048 is used almost universally. Think of the frame as a word, the sequence as a phrase, and the exchange as an entire conversation. Figure 7. Fibre Channel I/O exchange Frame interleaving allows transfers to be inserted between the frames of another sequence instead of having to wait for the end of the conversation. The difference between exchange and frame-level multiplexing is illustrated in Figure 8. A data transfer conversation begins on the far left (Exchange 0). That conversation is broken into three frames (frames 0, 1, and 2). A second conversation (Exchange 1) begins shortly after the first conversation has begun. When the traditional exchange interleaving method is used, the first frame of the second conversation cannot be transferred until the first conversation (Exchange 0) is complete. With frame level interleaving, the second conversation (Exchange 1) begins earlier and is interleaved with the first conversation. As a result, the second conversation begins transferring data and completes sooner. This translates into more efficient, reliable data transfer and improved performance. Figure 8. Exchange vs. frame interleaving A Detailed Review 10
Conclusion Server virtualization technology has matured in recent years and is being adopted by a growing number of IT managers looking to reduce hardware and management costs through server consolidation. NPIV increases the security of virtual servers by enabling secure access to shared Fibre Channel storage using the zoning and LUN masking techniques familiar to the SAN administrators. NPIV also reduces cost and complexity. Recapping the benefits of NPIV: LUN optimization through VM-to-LUN assignment Fabric QoS and prioritization at the VM level NPIV-capable initiator zoning at the VM level releases the hypervisor to provide the I/O blending operation Array-level LUN masking to control LUN access on a per-vm basis Accelerated VM migration VSAN integration and routing Future enhancements on NPIV are in progress. For example, Cisco is developing solutions on F_Port Trunking and F_Port Channeling. References NPIV entry on Wikipedia http://en.wikipedia.org/wiki/npiv NPIV Functionality Protocol ftp://ftp.t11.org/t11/member/fc/da/02-340v1.pdf T11 draft standards page http://www.t11.org/t11/docreg.nsf/draftlr NPIV in the Data Center white paper http://www.ciscosystems.net.mu/en/us/prod/collateral/ps4159/ps6409/ps5989/ps9898/white_paper_c1 1-459263.html PowerVM Virtualization on IBM System p: Introduction and Configuration www.redbooks.ibm.com Deployment Guide: Emulex Virtual HBA Solutions and VMware vsphere 4 www.vmware.com/technology Storage Networking Industry Association website http://www.snia.org ESG Lab Review Emulex Optimized Server Virtualization http://www.emulex.com/artifacts/69020029-a482-4ca9-bc1e-d612315cacdd/esg_server_virt.pdf A Detailed Review 11