WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX: 617.252.9977 info@permabit.com www.permabit.com
Contents Introduction...3 VMware Storage Background...3 Managing VMware Storage...3 VMware Storage Sprawl...4 Data Optimization Software...4 Permabit Albireo Data Optimization Software...5 Albireo Architecture................................................................. 6 Albireo Performance...6 Benefits of Albireo for VMware...7 Conclusion...7 About Permabit...7 Find Out More...7 The Albireo technology from Permabit will save an OEM 18-24 months getting to market, if they can do it at all. This stuff is so far ahead in its capabilities and performance I can t see why you would want to do it yourself, unless you already have it baked. Steve Duplessie Founder & Sr. Analyst Enterprise Strategy Group 2
Virtualization has significantly reduced data center footprints, and significantly increased storage costs Introduction Server virtualization as popularized by VMware, Microsoft, and others is a widely used tactic for reducing data center costs. By reducing the number of physical servers, data centers have been able to significantly reduce data center footprints and the costs of server acquisitions, energy, management, and more. While server-related costs have decreased, the corresponding storage costs have not in fact, storage costs are rising rapidly as a result of server virtualization. At first glance it is not obvious why storage costs increased, given that cost per GB has consistently fallen. The answer lies in the management of virtualized servers. This paper reviews the popular VMware vsphere Hypervisor and how its management impacts total storage consumption and cost. It then introduces Permabit Albireo Data Optimization Software as a means of reducing virtual server storage requirements. VMware Storage Background A VMware virtual machine uses a virtual disk (VMDK) to store its operating system, program files, and other data associated with its activities. (Figure 1.) The VMDK is a large physical file, or set of files, that can be moved, deleted, and copied as easily as any other file. To store and manage virtual disks, VMware vsphere uses its own special storage space called a VMFS datastore, which is similar to a file system on a logical volume. A VMFS datastore can be created on a wide variety of physical storage devices, including internal and external storage or networked storage devices. Figure 1: Typical VMware Storage Layout ESX Server A ESX Server B ESX Server C Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 VMFS Volume Virtual Disk Files Managing VMware Storage Creating VMFS datastores requires careful planning. For example, configuring fewer, larger VMFS volumes allows for more virtual machine capacity and reduces the odds of requiring additional space to be allocated. Larger VMFS datastores allow more flexibility for resizing virtual disks and reduce the number of VMFS datastores to manage. Alternatively, configuring more, smaller VMFS datastores can improve virtual disk performance (due to locking and SCSI reservation issues), reduce wasted storage space, and support applications such as Microsoft Cluster Service that require each cluster disk resource to have its own LUN. 3
Table 1: Virtual Servers Running Applications A, B, C, D VMware Storage Sprawl The reason that VMware storage management is causing such a sharp increase in storage demands can be demonstrated using a simple example: A virtual server runs applications A, B, C, and D. Applications A, B, C, and D each require Windows Server and a Microsoft SQL Database. An IT best practice is to keep three complete copies of the server installed and running: one for production, one for production standby and one for QA test. (Table 1.) This simple example requires twelve virtual machines, each of which contains a full copy of the operating system, application software, and its copy of the application data. Operating System Database Application Intended Use Windows Server 2008 MS SQL 2008 A Production Windows Server 2008 MS SQL 2008 A Production standby Windows Server 2008 MS SQL 2008 A QA Test Windows Server 2008 MS SQL 2008 B Production Windows Server 2008 MS SQL 2008 B Production standby Windows Server 2008 MS SQL 2008 B QA Test Windows Server 2008 MS SQL 2008 C Production Windows Server 2008 MS SQL 2008 C Production standby Windows Server 2008 MS SQL 2008 C QA Test Windows Server 2008 MS SQL 2008 D Production Windows Server 2008 MS SQL 2008 D QA Test Windows Server 2008 MS SQL 2008 D Production standby Storage sprawl is a direct result of the ease with which virtual machines can be created The storage challenge created by VMware is not just caused by the example in table one. After all, having three running copies of each production application is a normal practice. Rather, this storage sprawl is a direct result of the ease with which virtual machine clones can be created. Because the need for additional physical server deployment is significantly reduced, VMware administrators now create additional virtual machines for patches, bug fixes, operating service packs, and other internal departments (e.g., engineering, support, and marketing). The resulting storage growth is at the discretion of the administrator, but dozens of virtual machine clones result from such solid administrative use cases, each consuming an identical amount of disk space. Each virtual machine also requires space for snapshots, swap files, log files, ISO images, and diagnostic partitions. Depending on the storage management practices of VMware virtual servers, the amount of disk space needed for all virtual machines and their support files can quickly become staggering and costly. Data Optimization Software Implementing data optimization software, such as data deduplication, is the answer to VMware storage sprawl. Data deduplication software identifies duplicate chunks of data so that each unique chunk is stored only once. When applied to VMFS datastores, the amount of disk space necessary to store virtual machines can be reduced significantly. Virtual machines are ideal candidates for data reduction because they so often contain identical operating systems and applications. As a result, the actual differences between files are quite small and lend themselves to significant data reduction via deduplication technology. Figures 2 and 3 illustrate this basic storage reduction technique. In Figure 2, three virtual machines with the same operating system and application software are shown, each with its own virtual disk storage. When data optimization is applied, three virtual disks are reduced to one (Figure 3.). It is clear that virtual server storage can benefit by storage optimization. Storage optimization also benefits virtual machine performance. Virtual machine memory blocks can be deduped in cache so that each virtual machine runs faster with reduced disk access. 4
Figure 2: Virtual Machines Before Deduplication Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 Datastore OS OS OS APP APP APP ESX Cluster OS OS OS APP APP APP RAID Level Traditional Storage Figure 3: Virtual Machines After Deduplication Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 Datastore OS OS OS APP APP APP ESX Cluster OS APP Deduplicated Volume Albireo integrates at any point (inline, parallel, postprocess) at a sub-file level, enabling deduplication to optimize primary storage and downstream replication processes. Permabit Albireo Data Optimization Software Permabit Albireo Data Optimization Software with VMFS datastores reduces storage demands up to 97%. Albireo is embedded within the storage device connected to the virtual server host where it identifies duplicate blocks of data and advises the storage device so it can update its block pointers and avoid writing the same block of data to disk more than once. This saves on disk space and reduces other downstream storage related activities such as replication, snapshots, and backup. Albireo massively improves the performance and efficiency of data creation, transmission, and storage. It integrates at any point (parallel, inline, or post-process) at a sub-file level, enabling deduplication to optimize both primary storage and downstream replication processes. Albireo is delivered as a Software Development Kit (SDK) to OEMs. The SDK contains the Albireo software library, full API documentation, code samples, and application notes for integration. 5
Albireo Architecture Albireo s architecture combines the Albireo High Performance Index Engine with the Albireo content segmentation technologies, and is easily implemented via the Albireo SDK. As shown in Figure 4, Albireo operates as an advisory service outside of the storage application software data path. This ensures that data integrity is never at risk and that there is zero performance impact an important requirement for successful VMware deployments. Figure 4: Albireo Architecture iscsi FC DATA SOURCES NFS CIFS 1. OEM software pushes new data and internal placement information (e.g., filename, inode, offset or LUN, block) to Albireo 2. Content-aware segmentation breaks larger objects into variable-sized chunks 3. Unique content fingerprints are computed 4. Patented indexing technologies determine if the chunk has been previously seen 5. Previous placement information is pushed asynchronously to the OEM software for file, block, or extent unification Albireo s High Performance Index Engine table can identify duplicate data in a matter of microseconds orders of magnitude faster than other deduplication solutions. In the case of VMware, incoming data is managed by the existing VMFS datastore from any VMwaresupported source (e.g., FC, iscsi, SCSI, or NFS). In a parallel integration scenario, once data is received, a copy is made and delivered via the Albireo API with its corresponding metadata (e.g., file name, offset, block, LUN). Using a hash algorithm (SHA-256 or MurmurHash3), unique content fingerprints are computed and compared to existing hash keys using patented high-speed indexing technologies. Information on whether or not a data chunk is a duplicate is asynchronously pushed to the storage application software via the Albireo API for file, block, or extent unification. If the data chunk is unique, then no action is required. If the data chunk is a duplicate, then the storage application software takes steps to modify its storage tables (e.g., inode block data structure for UNIX systems). The advantage of the Albireo architecture is that it operates outside of the storage data flow and avoids any performance penalty. There is no risk to data integrity because Albireo itself does not write the data to disk and data can always be read even if Albireo were to become disabled. Further, when reading data, Albireo avoids having to perform data rehydration, a performance penalty and ease-of-use issue common with other data optimization technologies. Albireo Performance Albireo performance tests were performed by the Enterprise Strategy Group (ESG) in August 2011. For the purpose of testing how well Albireo could perform deduplication, a test environment was constructed using a modified open-source file system. A set of four VMware images, totaling 157 GB, were used. The ESG results confirmed that Permabit Albireo deduplication advisory services can be used to reduce capacity requirements for storing VMware images. ESG Lab recorded an outstanding deduplication rate of 97% (36.2:1) for four VMware virtual server images. Table 2: VMware Deduplication Results with Albireo Data Type Before (GB) After (GB) Deduplication Rate Deduplication Ratio VMware Images 157 4.3 97% 36.2 6
Benefits of Albireo for VMware As shown in the ESG lab results, Albireo can reduce the total disk capacity necessary to store VMFS datastores by over 97%. This is a huge storage savings that comes without compromising disk I/O performance or data integrity. Permabit Albireo is the only primary storage data optimization software that operates out of the data read path and therefore does not impact disk read performance. Even if Albireo were disabled or removed for any reason, the data can remain accessible. Albireo is an advisory service to the storage system and never modifies the data written to disk. The storage device always retains full control of data being written. This protects data integrity and eliminates the need to decompress data during read, a process that is expensive and necessary with compression data optimization technologies. Permabit is working closely with primary storage vendors to integrate Albireo into existing and planned storage devices to benefit virtual environments. Albireo (al-beer-ee-oh) appears to the naked eye to be a single star but can be resolved with a telescope into a double star, consisting of a brighter yellow star and a fainter blue star. Conclusion The broader adoption of virtual servers is hindered by huge amounts of disk storage consumed by hundreds to thousands of virtual machines. With each virtual machine requiring its own independent disk storage, even a reasonably sized virtual server deployment can require a considerable amount of disk space. Permabit Albireo Data Optimization Software has been shown in independent tests to reduce VMware storage requirements by over 97%. Without consuming additional storage space for each separate VMDK, VMware administrators can deploy more virtual machines as needed without sacrificing storage capacity. Deploying Albireo substantially reduces direct storage costs and, equally important, reduces associated energy, space and cooling costs. Albireo operates completely out of the data read path so virtual server/storage performance is maintained and data integrity are never compromised. Permabit s Albireo is truly a breakthrough for the full utilization of virtual servers. About Permabit Permabit is a recognized leader in data efficiency technology. We enable OEMs to leverage their R&D investment, increase margin, accelerate time to market, and achieve competitive advantage. Permabit Albireo software massively improves performance and efficiency of data creation, transmission and storage. Solutions built with Albireo are being delivered by leading hardware, software and service providers. Find Out More To learn more about the Permabit Albireo technology, or to license our products, visit our website at www.permabit.com or call us directly at 617.252.9600. 2012 Permabit Technology Corporation. All Rights Reserved. Permabit is a registered trademark and the Permabit logo, Albireo logo, Permabit Enterprise Archive, and Scalable Data Reduction are trademarks of the Permabit Technology Corporation. All other products or services mentioned may be covered by registered trademarks, trademarks, service marks, or product names as designated by the companies who market those products. 7 Ten Canal Park Cambridge, MA 02141 Phone: 617.252.9600 FAX: 617.252.9977 info@permabit.com www.permabit.com