iscsi: Accelerating the Transition to Network Storage David Dale April 2003 TR-3241 WHITE PAPER Network Appliance technology and expertise solve a wide range of data storage challenges for organizations, adding business value and enabling them to create and sustain a competitive advantage. 1
Table of Contents INTRODUCTION 3 STORAGE TODAY 3 DIRECT-ATTACHED STORAGE (DAS) 3 Storage Area Networks (SANs) 3 NETWORK-ATTACHED STORAGE (NAS) 5 THE NEED FOR IP STORAGE 5 WHAT IS GIGABIT iscsi? 5 BENEFITS 6 WHO CAN USE IT? 6 DEPLOYMENT EXAMPLES 7 Windows Storage Consolidation via iscsi 7 Cost-Effective Storage Consolidation for Linux 8 Centralized SAN/NAS Data Management for Divisional Workgroups 8 CONCLUSION 9 iscsi PRODUCTS 9 ABOUT NETWORK APPLIANCE 9 FOR MORE INFORMATION 9 2
Introduction Over the past decade, the world has seen enormous increases in the amount of data storage required, driven by a pervasive global economy, e-commerce, e-mail, and the digitization of media and information. Researchers estimate that there are now several million terabytes of online information, an amount that continues to grow exponentially with every e-mail message sent, every Web and MP3 file posted, and every online transaction executed. This growth places tremendous pressure on enterprises to store, protect, distribute, and derive value from all that data. Thus, IT executives are increasingly looking to new data storage solutions to manage their growing storage needs. Storage Today Today s technology market offers two broad options for storing data: direct-attached storage (DAS) and networked storage. Networked storage in turn consists of two suboptions: storage area networks (SAN) and network-attached storage (NAS). Today, more than one-third of storage sold is networked storage, a fraction that analysts expect to grow to two-thirds over the next few years. Networked storage architectures separate servers from storage and can offload much of the data management from the server. To understand the anticipated shift from DAS to networked storage, it is necessary to first discuss the different architectures in more detail. Direct-Attached Storage In its simplest form, direct-attached storage consists of a disk drive attached directly to a server. Data is transferred using SCSI (Small Computer System Interface) commands, the most common means of I/O communication between a computer and a hard drive. In most enterprise implementations, however, the storage is a shelf of disks, a disk subsystem with integrated RAID protection, or a higher-end storage array providing additional data protection. All applications work transparently with direct-attached storage. However, to insulate applications from the complexity of dealing directly with the disk drives, the server operating system usually provides some level of virtualization and incurs the overhead of all the data management. Virtualization is typically provided through a volume manager, and the data management is usually provided through a local file system. Direct-attached storage works well in environments with individual or a limited number of servers, but if there are dozens of servers and significant data growth, the situation rapidly becomes unmanageable. The storage for each server has to be managed separately. It can t be shared. Performance is often limited, scalability is limited, and the overall efficiency of storage resource allocation tends to be low. The data management needs of today s enterprise IT environments are typically much better served by a networked storage approach. Storage Area Networks In a storage area network a number of servers have shared access to storage. The servers are connected via host bus adapters (HBAs) to a Fibre Channel switch, which in turn is connected to the Fibre Channel storage system(s). The servers and storage communicate via the Fibre Channel protocol suite, which allows SCSI commands to be transmitted via serial connections. The protocol allows for high throughput, transmitting data at 700 to 800Mb/sec in first-generation products and approximately twice that in the second-generation products shipping today. 3
Since the storage system needs to allow multiple servers to share access to the disk pool, a set of virtual disks, called logical unit numbers (LUNs), is presented to each server. Consequently, the storage systems in this environment are smarter and more complex than a simple shelf of disk drives with a RAID controller. However, as in the case of direct-attached storage, each server has to incur the overhead of running a volume manager and file system to provide data management services to its applications, and these are often different for each server architecture. Consequently, although a SAN provides users with resource sharing, it does not provide shared data each server s file system only knows about its own LUNs. SAN storage has considerable advantages over direct-attached storage through improved scalability, reliability, availability, and performance. However, cost and complexity issues have limited their deployment to mission-critical and high-performance applications, typically in glass house data center environments. Total Cost of Ownership (TCO) The total cost of ownership (TCO) for operating a Fibre Channel SAN, while lower than the DAS model, is still relatively high. Although SANs benefit from the efficiencies of centralized management (which reduces TCO), the deployment cost for Fibre Channel is high. In particular, a shortage of specialized operations staff has historically been a problem. Complexity and SAN Islands Another issue with SANs stems from Fibre Channel being a serial interconnect, not a network. Fibre Channel has no built-in capabilities to do routing and node failover; and it has only primitive address management capabilities. In a Fibre Channel SAN, these capabilities have to be operator-configured, and host-managed. The result is a complex environment. This complexity has led to interoperability issues in multivendor Fibre Channel deployments. While interoperability issues have diminished, most SANs today are still single-vendor storage environments. 4
Network-Attached Storage Network-attached storage, the most mature networked storage solution, was originally designed for data sharing in a LAN environment. This is accomplished by incorporating volume manager and file system capabilities into the storage device. In a NAS environment, the servers are connected to the storage by a standard Ethernet network, and they use file access protocols like NFS and CIFS to make storage requests. Local file system calls from the clients are redirected to the NAS device, providing shared file storage for all clients. If the clients are desktop systems, the NAS device provides serverless file serving. If the clients are server systems, the NAS device offloads the data management overhead from the servers. NAS devices deliver the lowest total cost of ownership of any storage approach, but historically, they have been viewed as low capacity and low performance when compared with SAN storage. More recently, however, the advent of Gigabit Ethernet and enterprise-capable NAS appliances has greatly accelerated the deployment of NAS solutions for enterprise applications in data center environments, particularly when ease of use is an important issue. However, since some enterprise applications are architected to view storage as a local disk, NAS is not suitable for all applications. In a typical corporate IT environment, you ll find applications that require SAN storage and applications that require NAS storage. The industry is starting to respond to this with unified storage systems, which can support both SAN and NAS protocols, and Fibre Channel and Ethernet transports. The Network Appliance FAS900 series was the first of this new breed of storage systems to be shipped. The Need for IP Storage Networked storage is a mature and well-understood technology. Many analysts believe that the main issues slowing the transition from direct-attached storage solutions to networked storage solutions are the cost and complexity associated with Fibre Channel. However, a rapidly emerging new technology promises to address these issues: Internet SCSI (iscsi) or SCSI over IP. What Is Gigabit iscsi? Internet SCSI (iscsi) is a new IETF standard protocol for encapsulating SCSI commands into TCP/IP packets and enabling block data transport over IP networks. iscsi can be used to build SANs using standard Gigabit Ethernet infrastructure. An iscsi HBA, or storage NIC, connects storage resources over Ethernet. As a result, core transport layers can be managed using existing network management applications. High-level management activities of the iscsi protocol such as permissions, device information, and configuration are layered over or built into these applications. For this reason, the deployment of robust, interoperable, enterprise data management solutions for iscsi devices is expected to occur quickly. First-generation iscsi HBA performance is well suited for the workgroup or departmental storage requirements of medium- and large-sized businesses. The availability of TCP/IP offload engines further improves the performance of iscsi at 1 Gigabit Ethernet and will allow vendors to scale to 10 Gigabit Ethernet iscsi in 2004 or 2005. 5
Benefits By combining SCSI, Ethernet, and TCP/IP, Gigabit iscsi delivers these key advantages: Builds on stable and familiar standards since many IT staffs are familiar with the component technologies Creates a SAN with a reduced TCO installation and maintenance costs are low Provides a high degree of interoperability reduces disparate networks and cabling, and uses regular Ethernet switches instead of special Fibre Channel switches Delivers a solution with no practical distance limitations since IP datagrams can travel over the global IP network Who Can Use It? iscsi SANs will initially be most attractive to organizations with the following environment: Distributed IT environment Significant data growth over the past five years Proliferation of Intel architecture servers in divisional, departmental, and workgroup environments Business requirement to consolidate data storage and management for these environments, to improve operational efficiency, data availability, and storage resource management Budget and staffing limitations, which preclude a Fibre Channel SAN deployment 6
Deployment Examples Windows Storage Consolidation via iscsi Server proliferation is causing storage management in many distributed enterprise environments to become complex and expensive, particularly for applications experiencing significant data growth, such as Microsoft Exchange. The configuration in figure 2 deploys centralized network storage for the Windows servers to deliver significant savings in TCO, and greatly improved data availability and recoverability. By adding iscsi HBAs to the servers, the existing network infrastructure can be used to connect them to the native iscsi enterprise storage system. Figure 2 7
Cost-Effective Storage Consolidation for Linux Servers Many organizations are considering compute farms consisting of dozens of rack-mounted servers running Linux to significantly reduce the cost of running their analytical or compute-intensive applications. Direct-attached storage in this environment often makes data management prohibitively complex, and traditional SANs are too costly. However, an iscsi-based SAN, as shown in figure 3, in combination with iscsi-enabled enterprise storage, provides a high-performance solution, which solves the complexity problem at an affordable price. Figure 3 Centralized SAN/NAS Data Management for Divisional Workgroups Many companies need to support their departmental and regional data centers with minimal staff. However, the data management needs of the servers often make this impossible. Figure 4 shows how iscsi can solve this problem: centralize the storage with an easy-to-use storage appliance, and use standard Ethernet infrastructure to connect the servers to the storage. The storage offloads data management complexity from the servers using the existing network infrastructure. 8
Figure 4 9
Conclusion Organizations with server proliferation and data growth problems in departmental and workgroup data centers will be the first to benefit from the introduction of IP storage and iscsi, as they replace directattached storage and accelerate the transition to networked storage. As the technology matures and performance increases, iscsi-based storage solutions will gradually expand to displace direct-attached storage in high-end data centers and mission-critical environments, making iscsi SANs ubiquitous. iscsi Products The iscsi protocol standard was ratified by the IETF in February 2003, and iscsi products are now available from a number of vendors, including Network Appliance and Intel. This, and the availability of iscsi software from the leading software vendors, will rapidly drive iscsi to become a mainstream, broadly deployed solution. About Network Appliance Network Appliance is the industry leader in Ethernet-connected storage solutions. In October 2002, NetApp released the FAS900 series storage systems, which can support both SAN and NAS environments. In February 2003, NetApp announced iscsi support for its F800 and its FAS900 series filers, becoming the first vendor to converge NAS and iscsi. NetApp s iscsi solutions use Intel s PRO/1000 T IP Storage Adapter in server-attach kits for its new native iscsi storage systems, and both companies intend to ensure broad interoperability through participation in the Storage Networking Industry Association (SNIA) and interoperability plugfests. For More Information Network Appliance Storage Solutions: http://www.netapp.com Intel PRO/1000 T IP Storage Adapter: http://www.intel.com/network/connectivity/products/iscsi/ The Storage Networking Industry Association (SNIA): http://www.snia.org The Internet Engineering Task Force (IETF): http://www.ietf.org Network Appliance, Inc. l Proprietary. 2003 Network Appliance, Inc. All rights reserved. NetApp and the Network Appliance logo are registered trademarks and Network Appliance and the evolution of storage are trademarks of Network Appliance, Inc., and/or its affiliates in the U.S. and certain other countries. Microsoft and Windows are registered trademarks of Microsoft Corporation. Linux is a registered trademark of Linus Torvalds. Intel is a registered trademark of Intel Corporation. All other brands, names, or trademarks mentioned in this document or Web site are the property of their respective owners. TR3241 Rev. 3/03 10