Storage Virtualization in Cloud

Storage Virtualization in Cloud Cloud Strategy Partners, LLC Sponsored by: IEEE Educational Activities and IEEE Cloud Computing

Course Presenter s Biography This IEEE Cloud Computing tutorial has been developed by Cloud Strategy Partners, LLC. Cloud Strategy Partners, LLC is an expert consultancy firm that specializes in Technology and Strategy relating to Cloud Computing. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 2 / 16

Course Summary In this tutorial, we will review the various ways that Storage is virtualized and implemented for large scale, distributed systems, including Cloud. We will also discuss the storage primitive which was virtualized, and see that some systems concentrate on virtualizing files, and some systems concentrate on virtualizing blocks. Next we will review how the virtualization function can run in a variety of places in the architecture. It can run in the host, in the network, or all the way back where the drives are. We also will discuss how virtualization can be placed in band of the storage operations, and for scale, is usually placed out of band. Finally, we will look at several of the New file systems optimized for managing heterogeneous cloud storage farms. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 3 / 16

Outline As you can see from the slide, we will be covering several areas relating to Storage virtualization in Cloud IaaS; Storage virtualization techniques Storage virtualization layers; Approaches to storage virtualization; Storage types in cloud; Virtualized file systems for cloud. Storage Virtualization This slide illustrates Virtualization techniques in Cloud IaaS. At the top the General Virtualization Technique is illustrated. Underneath, one can visualize how the general technique is applied to several virtualization problems. For Server virtualization, hypervisors are extensively used. For storage virtualization, many software techniques including volume management, file systems, and replication are applied. For network virtualization, there are many different features including link aggregation, VPN, and also firewall, switching, routing, and application filtering and load balancing have virtual capability in many cloud implementations today. Outline As we have seen from previous lessons, virtualization of any kind of resource follows a common blueprint. The physical hardware is put under a special kind of software control which abstracts the physical layer, and presents to the user what looks like the actual resource but is actually a virtual instance of that resource. Storage follows this blueprint. The hardware and the hardware interfaces are virtualized by a software layer as shown in the illustration, thus presenting virtual storage primitives (disks, file systems, etc.) to the consumer. Common Storage Architectures In clouds, there are many options for implementing physical storage, because, the virtual storage interfaces can be kept to a small set, with the software providing common interfaces. The illustration shows three kinds of physical storage: DAS -Direct Attached Storage NAS - Network Attached Storage SAN -Storage Area Network. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 4 / 16

Once can see by the diagram, that the underlying architectures of these different schemes are quite different. The applications interface (legacy, non-virtual) all look the same basically an application accesses a file system. Under the hood, the connectivity of the components and where the file system code actually runs can be at any number of locations (as shown) The software layer which is implementing the virtualized storage, can also enhance the storage model offered to beyond that which physical storage can accomplish. Manageability Virtualized storage resource are easier to configure and manage Scalability virtualization simplifies storage resources scalability. Availability virtualization simplifies protecting against storage hardware failures and overloading Storage redundancy, backup and load balancing are part of the distributed cloud storage. Security Virtualized storage instances provide additional security by storage segments isolation The illustration on this slide goes into more depth as to how the virtualization layers for storage work, and what kind of storage model they present. Properties of Storage Virtualization The software layer for storage is in kernel space in the operating system, where it can intercept the disk and file system primitives and insert the capability of utilizing external, networked storage, and storage built from replicating, distributed drives. The most common storage models are file system and block device. While the names imply the capability of the models, the Cloud OS may not provide the exact same capabilities as a standard let s say Linux file system or block device. We will discuss more on this later. File System Level Virtualization What is file system A file system is a software layer responsible for organizing and policing the creation, modification, and deletion of files File systems provide a hierarchical organization of files into directories and subdirectories The B-tree algorithm facilitates more rapid search and retrieval of files by name File system integrity is maintained through duplication of master tables, change logs, and immediate writes off file changes IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 5 / 16

Different file systems In Unix, the super block contains information on the current state of the file system and its resources. In Windows NTFS, the master file table contains information on all file entries and status. File Metadata The control information for file management is known as metadata. File metadata includes file attributes and pointers to the location of file data content. File metadata may be segregated from a file's data content. Metadata on file ownership and permissions is used in file access. File timestamp metadata facilitates automated processes such as backup and life cycle management. Different file systems In Unix systems, file metadata is contained in the i-node structure. In Windows systems, file metadata is contained in records of file attributes. Block Device Level Virtualization Block Device Level virtualization is a low level technique which creates a volume pool from a collection of drives. It presents virtualized storage primitives called LUN for Logical Unit Identifier, and an offset within that LUN, which known as a Logical Block Address (LBA) This is illustrated in the slide Block Device Level: Logical Unit and Logical Volume Block level data The file system block The atomic unit of file system management is the file system block. A file's data may span multiple file system blocks. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 6 / 16

A file system block is composed of a consecutive range of disk block addresses. Data in disk Disk drives read and write data to media through cylinder, head, and sector geometry. Microcode on a disk translates between disk block numbers and cylinder/head/sector locations. This translation is an elementary form of virtualization. Block device level interface: SCSI (Small Computer System Interface) The exchange of data blocks between the host system and storage is governed by the SCSI protocol. Storage Interconnection Drives are not always local to the server, and therefore astorage Interconnection is utilized. This illustration shows that the path to storage includes multiple layers of physical and logical data transformation The storage interconnection provides the data path between servers and storage The storage interconnection is composed of both hardware and software components Approaches to Storage Virtualization Abstracting physical storage Physical to virtual The cylinder, head and sector geometry of individual disks is virtualized into logical block addresses (LBAs).For storage networks, the physical storage system is identified by a network address / LUN pair.combining RAID and JBOD assets to create a virtualized mirror must accommodate performance differences. Metadata integrity Storage metadata integrity requires redundancy for failover or load balancing. Virtualization intelligence may need to interface with upper layer applications to ensure data consistency. Host-based Virtualization Important issues IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 7 / 16

Storage metadata servers Storage metadata may be shared by multiple servers. Shared metadata enables a SAN file system view for multiple servers. Provides virtual to real logical block address mapping for client. A distributed SAN file system requires file locking mechanisms to preserve data integrity. Host-based storage APIs May be implemented by the operating system to provide a common interface to disparate virtualized resources. Microsoft's virtual disk service (VDS) provides a management interface for dynamic generation of virtualized storage. Host-based Virtualization: Example An additional layer of abstraction and control can be run on each host, and is called Logical Volume Manager (LVM). This code runs on the host and front-ends all kinds of back end storage resources. The use cases and the architecture are shown on the slide. Host-based Virtualization: Pros and Cons Host based storage virtualization has gotten very popular in server machines because no additional hardware or infrastructure requirements Simple to design and implement Improve storage utilization However Storage utilization optimized only on a per host base Software implementation is depending on each operating system Consume CPU cycles for virtualization As we all know, NFS is very popular, and it is a form of Host based storage virtualization Network-based Virtualization Animation illustrates Fabric switch should provide Connectivity for all storage transactions Interoperability between disparate servers, operating systems, and target devices FAIS ( Fabric Application Interface Standard ) Define a set of standard APIs to integrate applications and switches. FAIS separates control information and data paths. The control path processor (CPP) supports the FAIS APIs and upper layer storage virtualization application. The data path controller (DPC) executes the virtualized SCSI I/Os under the management of one or more CPPs Network-based Virtualization: Pros and Cons Network-based Virtualization come with plus and minus factors as well True heterogeneous storage virtualization No need for modification of host or storage system Multi-path technique improve the access performance However Complex interoperability matrices -limited by vendors support Difficult to implement fast metadata updates in switch device Usually require IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 8 / 16

to build specific network equipment (e.g., Fibre Channel) IBM SVC ( SAN Volume Controller )is an example Storage-based Virtualization This animation illustrates how the different layers in a storage system perform their function. In the first part of the animation one can see a mode where the underlying virtualized storage provides a virtual filesystem interface; the connected Operating Systems send the files there to get saved. The virtualized storage takes care of replicating the file across actual drives in the cloud for high durability. In the second part of the animation, on can see the mode where the underlying virtualized storage presents and block level interface. Here, the application is running on an OS, which presents a local file system interface. The operating system deconstructs, through the file system code, the save into a series of blocks which need to be written. The blocks go to the virtualization layer in this case, which stores and replicates at the block level. Storage-based Virtualization: Pros and Cons Storage virtualization is extremely useful On the one hand Provide most of the benefits of storage virtualization Reduce additional latency to individual IO However Storage utilization optimized only across the connected controllers Replication and data migration only possible across the connected controllers and the same vendors devices In-band Virtualization Storage virtualization can be implemented in a number of ways. The animation in this slide shows what is called In-Band virtualization, Also known as symmetric, virtualization devices actually sit in the data path between the host and storage. Hosts perform IO to the virtualized device and never interact with the actual storage device. While Easy to implement it has Bad scalability & Bottle neck characteristics IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 9 / 16

Out-of-band Virtualization The animation in this illustration shows out of band virtualization Also known as asymmetric, virtualization devices are sometimes called metadata servers. It Requires additional software in the host which knows the first request location of the actual data. While a good architecture for Scalability & Performance It is Hard to implement Storage Types in Cloud (1) Block storage Block storage is a type of data storage where data is stored in blocks, also referred to as volumes. Each block is treated as individual disk drive and can contain multiple files. In this way, block storage provides a good abstraction for physical storage devices and well suited for most of file systems. In cloud, VM instance is often provisioned with the attached block storage of the configured or requested size. Object storage Storage architecture that manages data as objects. Each object contains data, metadata and accessed via a globally unique identifier, typically in a form of URI or URL. Object storage systems use namespace that is consistent across multiple physical devices. Object storage systems usually include such additional services as data replication and distribution, and may also support application specific access protocols and data management. As an example, object storage infrastructure is used by Dropbox for storing files and Facebook for storing photos. Storage Types in Cloud (2) Bucket storage Bucket storage is a storage organisation where data objects are stored in the basic containers and using single global namespace, where data can be accessed with their own methods. Bucket storage type is used by Amazon S3 and Google. Blob storage Blob storage represents a generic key-value data store, often designed for storing large data objects. A blob (short for binary large object) is a collection of binary data stored as a single entity in a database management system. Blob storage is used in Microsoft Azure cloud. Virtualized File Systems for Cloud We will look closer at some popular file systems for cloud. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 10 / 16

Each of them has a number of benefits when implemented in specific cloud environment. Most applications need filesystems because servers have filesystems. Most filesystems need block storage upon which to mount. Therefore it is no surprise that filesystems with underlying block storage are popular cloud storage options. A simple approach is to use server or NAS based technologies extended directly to the cloud for the block filesystems. That is tie a number of drives together, and maybe access across the network. As th slide lists, LVM, RAID, and NFS are all examples of virtualized filesystems on block storage commonly found in smaller clouds. Larger clouds utilize distributed file systems, which have a high degree of redundancy across the cloud they are serving. As the slide shows, they can be file or object based and there are many examples of both popular in cloud implementations. HDFS (Hadoop Distributed File System) is specifically designed for large scale data processing on massively parallel clusters. Can be used in cloud for high performance data input/output, in particular for CDN (Content Distribution Network) Logical Volume Management (LVM) Logical Volume Manager is very popular because it is commonly found in Linux. It implements block level host-based virtualization approach Allows disks to be added or replaced without downtime and service disruption. Supports file systems extension and dynamic re-sizing, data backup, creation and dynamic resizing of logical volumes Suitable for managing large disk farms Logical Volume Management Architecture This slide illustrates the Logical Volume Management Architecture Tools and utilities are in user space Device mapper framework implements a Linux kernel driver for different mappings Logical Volume Management Implementation Logical Volume Management Implementation LVM project is implemented in two components: In user space Based on FUSE (Filesystem in Userspace) In kernel space which Implements device mapper framework IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 11 / 16

LVM implementation using FUSE This slide diagrams the Logical Volume Manager using Filesystem in User Space As can be seen by the diagram, the key is a that the Loadable kernel module FUSE provides a bridge to actual kernel interfaces LVM Implementation in Kernel Space For performance reasons, there is an implementation available of LVM in kernel space This slide speaks to the system calls when implemented this way Redundant Array of Independent Disks (RAID) Another common scheme for virtualizing storage is RAID (Redundant Array of Independent Disks) RAID is a software layer which groups together disks and implements various levels of replication and distribution of data across the drives. RAID schemes provide different balance between the key goals: Reliability, Availability, Performance, Capacity The slide speaks to the Difference in common RAID schemes RAID0, RAID1, RAID1+0, RAID5, and RAID5+0 Network File System (NFS) Filesystems have characteristic which application developers have come to depend on. For example, when the write call returns from the kernel to the user application, the user application assumes that the data is actually written, or at least that a subsequent read of the same data will return what was just written. These behavioral assumptions are part of the POSIX specification. When clusters of disks are used, and when filesystems ae exported across the network, it becomes challenging to live up to all of the POSIX filesystem requirements. Network filesystems which had correct behavior became very popular. The SUN Network File System (NFS) was one of the first and most reliable POSIX-compliant distributed file systems. As the slide lists, NFS is specified by a number of RFCs and the protocols use to implement NFS are well known and understood. Ove the years NFS has become a go to network filesystem. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 12 / 16

Notably, NFS has been extended to support clustered server deployments including scalable parallel access to files distributed among multiple servers (pnfs extension). This technology is a key part of for example the IBM private cloud implementation. Lustre Lustre is a type of parallel distributed file system used for large amount of data and can work with large computer clusters. Lustre name is derived from two words "Linux" and "cluster". Lustre is often used as a file system for supercomputers and multi-site computer clusters. Lustre storage cluster may contain thousands of nodes and Petabytes of storage volume. Lustre architecture includes three main components: metadata servers that stores filesystem information (files and directories) as well as access rights, object storage servers, and clients that access and use data. Lustre uses unified namespace compatible with POSIX semantics. Lustre File System Architecture Components The main file system components inside of Lustre are described in this slide. Note one of the most significant elements of the design is the notion of many Object Storage Servers. This lends to the scalability of the design. Lustre Cluster at Scale This slide illustrates the scalability design introduced in the previous slide. In general, highly available and high scalability concepts are both used in large deployments. Here a Lustre deployment at scale is illustrated showing multiple networks between clients and the Lustre cluster, and also the number of I/O Servers (paired as fail over groups). Lustre File System in HPC: Examples The high performance computing community has been working on pushing the limits of performance and scalability and Lustre has achieved popularity within that community. Lustre has significant momentum in the HPC community and is actually the leading distributed filesystem for those systems, as the slide details. You can see impressive scale-out and high performance numbers achieved. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 13 / 16

Ceph Ceph is an example of fully distributed storage architecture (not having central management) and the file system designed to integrate object, block and file storage servers from a single distributed computer cluster. Ceph distributed object storage is built around the Reliable Autonomic Distributed Object Store (RADOS) that support data replication. Ceph block storage can be directly mounted to a VM and provides automatic data replication across the storage cluster. Ceph file system runs on top of the object or block storage and maps file names and directories across RADOS cluster. Ceph Architecture and Design Ceph has three components Clients: Near-POSIX file system interface Cluster of OSDs: Store all data and metadata Metadata server (MDS) cluster : Manage namespace (file names) It is designed for high availability and scalability using key design patterns: Separating data and metadata Dynamic distributed metadata management Reliable Autonomic Distributed Object Storage Ceph Architecture The illustration in this slide shows the Ceph architecture. Ceph separates data and metadata operations Data/file request includes request to MDS to obtain file components/inodes location and metadata Ceph Operation on Request Ceph uses an effective client synchronization model. The client makes a request to the Metadate Server which translates the file name into inode (inode number, file owner, mode, size, ) Then the CRUSH (Controlled Replication Under Scalable Hashing) module goes to work. CRUSH is A scalable pseudo-random data distribution function designed for distributed object-based storage systems Maps objects to Placement groups (PGs) using a simple hash function It returns inode number, map file data into objects. The client then accesses the Object Storage Device, as can be seen by the illustration. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 14 / 16

Gluster Gluster storage and files system is an Open Source platform for scale-out public and private cloud storage. Similar to Lustre, the Gluster name is derived from two words GNU and cluster. Gluster aggregates heterogeneous storage server connected over Ethernet or Infiniband network. The Gluster file system provides simple functionality and leave all file management functionality to clients. Gluster Architecture This slide illustrates the Gluster architecture. Gluster accesses a variety of physical storage devices, from Direct Attached, to JBOD, to SAN, and creates a global namespace across them. It also puts a virtualization layer across them, providing a variety of filesystem models, such as NFS or CIFS or WebDAV up to the clients. Gluster has been widely used in many cloud distributions. The slide lists many of them. Gluster is the standard storage system used in Red Hat s OpenStack distribution One can also use Gluster easily in Amazon with the available AMI Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) is a very different sort of filesystem optimized for a specific class of applications, those are Map Reduce and similar Big Data systems like no-sql databases. It is a scalable distributed file system for large scale data analysis A part of the Open Source Apache Hadoop suite The primary storage used by Hadoop MapReduce applications Can run on commodity hardware assuring high fault-tolerant HDFS Architecture HDFS cluster consists of a single master node/server that runs NameNode and multiple DataNodes, usually one per physical node in the Hadoop cluster. User data are stored in the files, externally they are exposed through namespace managed by the NameNode. To access a file, a user client needs to request a file location or metadata from the NameNode, and after that it can send read or write request to the DataNode directly. DataNodes create data blocks and do replication based on instructions from NameNode. IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 15 / 16

Summary and Take Away This tutorial has explored the various ways that Storage is virtualized and implemented for large scale, distributed systems, including Cloud. We explored the storage primitive which was virtualized, and saw that some systems concentrate on virtualizing files, and some systems concentrate on virtualizing blocks. We saw that the virtualization function can run in a variety of places in the architecture. It can run in the host, in the network, or all the way back where the drives are. We saw that virtualization can be placed in band of the storage operations, and for scale, is usually placed out of band. There are many types of storage primitives which, after all the virtualization has occurred, end up getting exposed to applications. Objects (buckets, blobs), blocks, or file systems There are many ways to layer the filesystem in leveraging the virtualization and replication in a cluster. The capability can be close to the operating system such as LVM or across the network like NFS. Finally, we took a hard look at several of the New file systems optimized for managing heterogeneous cloud storage farms IEEE elearning Library Storage Virtualization in Cloud Transcript pg. 16 / 16