SAN Technology, Inc. Reference Architectures Flash Storage and VDI Best Practice Multi-vendor RTM Solutions SANBlaze Flash Storage Rear Transition Modules (RTMs) RTM436 RTM441 RTM451 Inside contents Overview 1 RTM Flash Storage 2 Architecture Summary 3 Hypervisor Boot 3 Key VDI concepts 4 SANBlaze RTM family 5 Summary 6 Overview ATCA enjoys a healthy, multi-vendor selection of CPU blades that feature Intel s latest processor technology (Westmere and Sandy Bridge). These blades rightly boast about SMP core count, sophisticated NUMA architectures, and DDR memory capacities in the 100-300 GB range. For all the benefits, this impressive concentration of processing power brings unique challenges for application software; high core counts and hyper-threading make it pretty difficult for many applications to extract the blades full computing potential. The world s largest software vendors (VMware, Microsoft, Citrix and Oracle) recognize this, and each is offering hypervisor software solutions which seek to improve CPU utilization by allowing multiple OS instances to live together, while also adding fault tolerance and automated load balancing. Unfortunately, software alone may not solve the CPU utilization problem. Once we start giving the CPUs more to do, we see a large increase in random disk I/O. This leads to storage congestion. This is where Flash storage solutions can help. This paper examines a family of RTMs that incorporate flash storage which are compatible with for the industry s leading ATCA processor blades. We ll discuss how these RTMs are used to create native ATCA reference architectures which can increase processor utilization in virtualized environments. We ll specifically examine their affect in a VMware Virtual Desktop Infrastructure (VDI), but conclusions are applicable to any hypervisor. RTM Flash storage details The native ATCA form factor means that RTM storage connects to the front blade with no cables. All communication occurs over x8 PCI express links that route directly to the Intel processor. The bridge between PCI-express and NAND flash is made with the LSISAS2008 controller. This controller affords some important architectural advantages not found in many competing PCI-express flash products. Foremost is the ability to choose NAND flash type and supplier in the form of commodity-off-the-shelf (COTS) solid state disks (SSD). The majority of the world s NAND flash (88%) comes from three sources: Samsung, Toshiba and Intel/Micron. The fiercely competitive market keeps pricing good, but can also cause occasional product availability issues as suppliers tweak output to meet cyclic NAND demand. By integrating NAND as SSD, users can quickly and easily re-populate RTM with the suppliers offering best prices and lead time. SSD also offer greater flexibility to match usage models with the appropriate flash technology. 1
RTM Flash storage details (continued) Each NAND cell within a SSD can be charged (written) with a voltage to represent bits (0 or 1). Once charged, the voltage is held for years (decades), requiring no external power source. SSDs incorporate special controllers that deposit voltage charges in a NAND cell; these can be designed to store 1 single bit (SLC) or 2 bits (MLC). Disks using MLC controllers can store literally twice as much data as those using a SLC controller. TLC controllers can store triple the capacity. As bit density increases the controllers require ever more precise voltage charging and detection circuits. Unfortunately, a flash cells ability to accurately retain a specific voltage value quickly diminishes as it s reused by rewriting. For example, a flash cell used with an SLC controller are rated for approximately 100,000 write cycles, while the same cell used with a MLC controller might remain reliable for only 10,000 write cycles; a 90% endurance reduction. This is why it s very important to match application storage profiles with the appropriate flash controller technology, but don t be too alarmed. Many applications (laptop, server, and multimedia data) can safely operate out of MLC flash for 7-10 years. In fact, the vast majority of consumer electronics (cameras, smart phones and most commercial SSD) use MLC along with sophisticated wear-leveling algorithms that distribute write data across the entire pool of available flash cells. The ability to match SSD controller technology (SLC, MLC) with application write profile is a key advantage of the LSI controller. Another advantage of the LSISAS2008 controller is its ability to perform both RAID0 and RAID1 operations. RAID0 is used to combine all SSD into a single unified storage volume, with capacity and performance that scales linearly with each incremental member. For example, a system with 4 SSD offers not only 4x the capacity of a single drive, but also 4x the performance. RAID1 is another option that offers a mirror capability, a redundancy capability that can be especially important feature for some critical application deployments. The last point worth highlighting is driver support. LSI commands 80% of worldwide SAS controller market share, and their PCI express storage drivers are bundled by literally every major OS (Microsoft, Redhat, SuSE, VMware, Solaris, debian, FreeBSD), with source published on kernel.org. This means storage device drivers come bundled with most commercial OS distributions and there s no separate download and install task to get things going. The controllers also support BIOS and EFI boot, which is not yet common with many competing PCI-Express flash storage solutions. Best Practice: Not all SSD are created equal, so don t be swayed by what s selling for the lowest price online. Regardless of whether you re buying MLC or SLC technology, be sure to choose an SSD from a top tier supplier and ask them about their wear leveling and over-provisioning policies. These may seem esoteric until you fill the flash to capacity and suddenly its performance falls by 50% or more. Also ask specifically about their approach for preventing data loss due to unexpected power failures. Some SSD have unprotected write caches, while others include a super-capacitor to ensure committed writes are safely flushed to flash upon power loss. Architecture Summary Construct a virtual environment using three ATCA blade products, purpose built for compute, primary storage and networking. Distribute virtual desktops using a combination of bulk SAN storage and local Flash storage provided by the RTM. iscsi/nas Blade Hybrid SSD/HDD Full-clones on SAN storage 10% Virtual Desk Tops 10/40 Gb switch Blade 90% Intel Compute Blade Emerson Oracle RadiSys or SANBlaze RTM Link-clones On Flash Storage ATCA Backplane 2
Architecture Summary (continued) Compute Blade: Users may select any compute blade from the ATCA vendor that best meets their performance and budget needs (Radisys, Emerson or SANBlaze). The selection is paired with a SANBlaze Flash RTM, which depending on model can provide storage capacities between 1 and 3.2TB. The flash is presented as local storage, with very low latency. We ll use this storage to host the majority of linked-clone user desktops (100 instances) indicative of a virtualized environment. Storage Blade: A dedicated storage blade is used to provide 5-11TB of bulk primary storage and 0.8TB of Flash Cache. The flash cache system works intelligently to keep large amounts of recently used data in solid state storage, decreasing access times by up to 600%. If users continue to access it, it is marked as hot storage, and remains in the flash storage. As data ages, it is automatically moved to the cheaper rotating storage. The hybrid approach (rotating and Flash Disk) provides a good balance of performance and cost. We ll use this bulk storage for 100% of user generated data (home directories), along with a small percentage of full-clone user desktops in our VDI configuration. Switch Blade: Those familiar with ATCA backplanes know a switch blade provides two networks for inter-blade communication. Most data movement will occur on the Ethernet Fabric network, routed on the ATCA backplane using XAUI 10G or 40G connections. Control and management data is usually relegated to a secondary Ethernet connection known as the Base network, which runs at 1G. For better reliability, ATCA backplanes usually support two switch slots that together provide a total of four network paths between all compute and storage blades. Best Practice: (1) Plan to deploy no more than 10 desktops per core; thus, a blade with two 6-core processors should be provisioned for no more than 120 desktops. (2) Plan to allocate 2GB of DDR per desktop (3) you may thin-provision your storage, but expect each instance will consume ~4GB of actual capacity. You should therefore expect 100 VDI will consume ~0.4 TB of actual storage. Solution Component Primary Storage Blade SANBlaze ATCA2000 (with flash Cache) NFS, Samba, iscsi, LSI MegaRAID 5/6 w/cachecade, Encryption, 5 TB Storage, 0.5TB Flash Cache Compute Blade (each tested separately) SANBlaze ATCA7300 RadiSys 4600 Emerson 7470 VMware vsphere Infrastructure Minimum Revision 1.1 Samsung 21nm NAND flash, MLC 1.1 ESX Hosts VMware ESXi, 5.1.0 VCenter Server 5.1.0 VCenter Database SQL Server 2005 View Planner 1.1 Network/Backplane interconnect SANBlaze ATCA1936 (20 port 10Gb, 24 port 1Gb) Hypervisor installation and boot Mating Flash Storage RTM SB-RTM436 0.5 TB Samsung 21nm NAND Flash, MLC SB-RTM441 3.2 TB Samsung 21nm NAND Flash, MLC SB-RTM451 2.4 TB Samsung 21 nm NAND Flash, MLC SB-RTM436 0.5 TB Samsung 21nm NAND Flash, MLC Although we sometimes see small deployments using USB boot media, we do not recommend this for its lack of enterprise scalability. Imagine the difficulty in trying to upgrade or patch 100 or more physical servers from ESXi 5.0 to 5.1. Best Practice: All of the ATCA compute blades cited in this paper support PXE (pre-execution environment) that facilitates installing hypervisors on the flash media from a network location. The mechanism requires a dedicated server that can provide both FTP and TFTP protocols. When ATCA compute blades are first powered on (booted), they are set to fetch the appropriate hypervisor and install it on the flash storage RTM. After installation, the blade can subsequently boot the hypervisor from the flash components on the RTM. 3
Key VDI concepts After creating a golden reference, desktop administrators must then decide whether to define additional user desktops as fullclones or linked-clones. Fundamentally, they differ in terms of user persistence. Full-clones provide users with a desktop that remembers customizations made within each login session. If they install software or put documents on the local desktop, they are preserved when the user logs out. By contrast, a linked-clone may allow users to install software, but will forget and wipe clean the desktop once they log out. From an administration standpoint, it s much easier to maintain 100 linked-clones vs. 100 full clones customized for each user. If users need a new program, administrators simply add it to the golden desktop and redeploy. If linked-clone users need to store or share documents, they are assigned allotments of NAS or SAN disks which are mapped persistent and accessible upon login. Best Practice: Deploy the majority of desktops as linked-clones on local RTM flash storage, and separately dedicate external NAS/SAN storage for user data and documents. Utilize external storage for a small number of full-clones, perhaps 10%, for a subset of users such as executive or users with special software needs. Use the Flash Storage RTM for the remaining 90% of users that can operate with linked clones. Linked-clones themselves contain no user data and thus there are no data loss concerns when running from the local flash storage. Compute Blade & RTM Pairing iscsi Storage VDI Host Dual 10Gb NIC 10Gb Switch Blade Simulate User I/O VCenter VM View Planner VM 4
SANBlaze flash based RTM family SANBlaze RTM SSD Capacity Key Features Compatible Blades SB-RTM436 1.6 TB 6Gb SAS Expander Two SAS, SATA or SSD disks Two mini- SAS HD ports (x4 each) Pass-thru LAN Management Pass-thru Serial console SANBlaze ATCA2000 SB-RTM441 1.6 TB 6Gb SAS RAID Controller Two SAS, SATA or SSD disks Two mini- SAS HD ports (x4 channel) Two 1GbE SFP ports (build option) Radisys Promentum ATCA-4500 blade. SB-RTM451 2.4 TB 6Gb SAS controller with RAID0,1,10 Six 1.8 SSD, user serviceable 10Gb Ethernet controller, 2 ports Two 10Gb SFP+ ports Emerson ATCA- 7370 and ATCA- 7470 SB-RTM468 3.2 TB PCI-E based RTM storage Eight 1.8 SSD drives JBOD, Raid 0 or Raid 1 Radisys ATCA- XE80 5
SANBlaze Technology, Inc. One Monarch Drive Suite 204 Littleton, MA. 01460 Phone (978) 679-1400 Fax (978) 897-3171 E-mail info@sanblaze.com Web www.sanblaze.com Summary ATCA platforms feature powerful multi-core computer blades, which are well suited for virtualization. However, a common problem is that virtualization can raise utilization so high, that it exceeds the I/O capability of storage solutions using traditional hard disks. This is where flash storage like that featured on SANBlaze RTMs can help. SANBlaze engineers have developed a family of Flash Storage RTM interoperable with the industry s leading ATCA vendors. These RTM provide terabyte size NAND flash pools close to the CPU to minimize latency and maximize data bandwidth. All of this keeps the CPU utilization high, maximizing performance of the entire ATCA platform. With the SANBlaze RTM family, there is no need to change your existing ATCA server environment, and no need to install drivers or configure the RTM itself. It s simply plug-andplay. Once installed, the volume configuration and management can be carried out via the BIOS or natively within most OS using command shells. The flash RTM solutions are also rather unique in their ability to support OS boot. Finally these RTMs offer incredibly competitive cost of ownership thresholds, whether you use metrics of $/GB or IOPS/GB. By letting RTM customers choose NAND flash technology, they have full control to match application need and cost. For more information please visit the SANBlaze web site at: www.sanblaze.com or send email info@sanblaze.com SANBlaze Technology, Inc. is a leading provider of storage, networking and multifunction solutions for embedded systems. SANBlaze embedded computing products include a complete line of ATCA storage and compute blades, multifunction RTMs for ATCA blades, and AMC storage and networking controllers and modules. Additionally, the company provides fully configured and integrated ATCA systems and services. Copyright 2013 SANBlaze Technology Inc. All rights reserved. Referenced products are trademarks or registered trademarks of their respective owners UC 128 r04 6