Software Define Storage (SDs) and its application to an Openstack Software Defined Infrastructure (SDi) implementation This paper discusses how data centers, offering a cloud computing service, can deal with the complexities of scaling their cloud services platform to many thousands of end users while retaining the ability to offer each end user a customised, operationally efficient service. aper shows how MPST Delivering Operational Efficiency to the Cloud 1
Contents Executive summary... 3 Background... 4 Datacentre Scale... 4 Self Service Semantics... 5 The User Vision of the Cloud... 5 Self Service Use Cases... 5 Rapid Elasticity... 7 Resource pooling & Multitenancy... 7 Metered Service... 7 SDi Controllers... 7 Control API... 8 Resource Allocator... 8 Device Manager Implementation... 8 Virtual Device Managers... 9 Cloud Software Defined Infrastructure (SDi)... 9 SDi and Openstack... 10 SDI Life Cycle... 12 Orkestra an SDi implementation for Openstack... 12 About the Author... 12 2
Executive summary This paper discusses how data centers, offering a cloud computing service, can deal with the complexities of scaling their cloud services platform to many thousands of end users while retaining the ability to offer each end user a customised, operationally efficient service. The answer lies in providing the end user of the service with the ability to self serve and right size his IT resources, through a dashboard, which automates the provisioning of those resources. The challenge, however, is increased if the targeted end users for the service vary from non-technical office workers, who need simple dashboard semantics, to the more technically competent IT administrators, who need to configure more complex IT networks, remote from the data center. The answer lies in a new paradigm, called Software Defined Infrastructure (or SDi), which abstracts the user interface from the physical infrastructure and which automatically translates the end user requests into virtual machines, customised to the end user requirements. The three main components of SDi are software defined compute (SDc), networking (SDn) and storage (SDs), each responsible for abstracting the control of its own specific resource from the physical device where it is stored. SDc, previously called server virtualization, has been used by data centers for many years to increase server densities and to reduce capex and power consumption costs, however SDc on its own cannot easily scale if the networking and storage cannot be virtualized. A number of SDn solutions have recently been introduced which promise to greatly simplify network scaling so that today storage virtualization, or SDs, is the only significant remaining challenge to solve to make SDi a reality. MPSTOR, with over sixty man-years of storage array design experience, has been developing Software Defined Storage (SDs) for the past three years under the guise of self organizing storage, a concept it designed and patented for its open-platform, open-api storage array management software. MPSTOR has adapted its SDs to solve the complexities of provisioning storage, from multiple storage tiers, across real and virtual networks, to real and virtual servers. In order to demonstrate the scalability, flexibility and operational efficiency of its SDs, MPSTOR has integrated it with OpenStack s SDc (Nova) and SDn (Quantum) to provide an automated SDi solution, called Orkestra TM. 3
Background The single biggest challenge cloud computing service providers face today is how to scale their cloud services platform while maintaining the flexibility and operational efficiency to deliver on-demand IT solutions. IT solutions comprised of servers implemented as Virtual or Real Machines providing compute, storage and networking services customized to each individual end user s needs. Cloud computing consumers now demand the ability to right size their IT solution by purchasing only the exact amounts of IT resources, at defined SLA s required to run the users business. Users expect the service provider infrastructure to deliver these resources elastically in response to their changing needs. There is no cloud services infrastructure available today capable of simultaneously delivering scalability, flexibility and high operational efficiency. The infrastructure required to deliver this capability needs to be fully automated and software configurable. This new paradigm is now referred to as Software Defined Infrastructure (SDi) and it manages the three main categories of IT resources in cloud infrastructure that make up an IT solution; Compute, Network and Storage. The drivers for this new paradigm are data center scale and the increasing need to emulate more complex IT infrastructures in the cloud Datacentre Scale Before the introduction of server virtualization, the limits to scaling IT infrastructure were dictated by physical constraints such as space and power availability. Server virtualization, along with the introduction of multi-core processors, has facilitated a 100-fold increase in server densities to a point where it s now possible to have more than 5,000 virtual servers in a single 19-inch rack. The ability to create and manage many thousands of virtual servers, referred to as Software Defined Compute (SDc), has been the key enabler of the growth in cloud computing to date as it enables many thousands of virtual servers to be quickly created and deployed. Virtualizing only the compute resource and not the network or storage resources, however, becomes increasingly complex as the cloud infrastructure scales and it ultimately becomes a bottleneck to further scalability. Only through the virtualization of all three resources can a scalable software defined cloud infrastructure be created. As the software defined infrastructure grows, its value to the user moves from the virtual resources into the infrastructure management software that manages the automated provisioning of resources for the end user applications. The automation prevents the datacenter administrator becoming the bottleneck to growth and it ensures that the cloud service can be delivered quickly, flexibly and efficiently, increasing responsiveness, asset utilization and operational gross margins. Nicira (recently acquired by VMWare) has proposed a Software Defined networking (SDn) solution that is seen as a solution for managing the complexities of networking and network switching through network virtualization. SDi (fig 1.) is an evolving paradigm and many cloud stacks today fail to provide the 'software defined storage' (SDs) component, which is needed to manage the manage 4
the virtualization, automation and provisioning of storage, from multiple storage tiers to virtual servers, across virtual networks. Without SDi, the scaling of the number of devices under management becomes impossible to manage by a classical IT manager (inventory, logical, physical.) and drives the vision of an end user managed solution. The End User Vision of the Cloud On demand Self Service Rapid Elasticity Ubiquitous Network Access Resource pooling & Multitenancy Metered service Fig. 1 Self Service Semantics As IT organizations migrate to the Cloud the issue of On Demand Self Service will become increasingly important. On Demand Self Service means different things to the different Users of the service and hence the semantics required to meet the demands of these different Users will vary. Fig 2 shows the Use Cases that need to be covered. Self Service Use Cases 1) User Dashboard Semantics driven by User needs and Requirements 2) Advanced Dashboard semantics allows an IT user to specify Policy Management for the SDi device managers (ex. SnapShot Frequency, QoS, Media Tier Mgt) 3) Device Manager Semantics such Virtual Storage Array or Network Switch Fig. 2 5
We can consider two types of Users of a cloud service, standard Users and IT Admin Users. Standard Users interface with their IT solution providers using requirement type semantics, these semantics drive the provisioning, creation and configuration of the devices needed by the user and allow the User to right size the CPU, Network and Storage dimension of his server as shown in Fig 3. Fig. 3 Requirement semantics are important in that they allow the number of cloud users to scale without the user being an IT expert or IT administrator. The traditional role of the IT Engineer was to administer hardware devices through the use of configuration files and management tools with low level access to the managed devices. As the scale of datacenter infrastucture has increased to include thousands of managed CPU cores, many petabytes of data storage, spread across multiple disk tiers all linked together by complex networks, the scale of the associated management task has exponentially increased. A new way of managing this scaled out architecture is required that is not IT Administrator driven but is driven by intelligent software. This new paradigm of Software Defined Infrastructure (SDi) interprets the user requirement semantics and implements the device provisioning, creation and configuration, without the need for an IT administrator. The IT administrator is still required off-site when more complex IT configurations are required, which the user dashboard semantics cannot fulfill. These semantics give the IT administrator very similar functionality to what he had with his own physical, on premises, hardware. The essential difference is that the physical device allocation is determined by the datacenter SDi layers and not by the User or IT Administrator. Two classes of semantics are available to the IT Administrator; i) Group Policy Management and ii) storage/compute/network device management. Group Policy Management allows the IT user to set the policies used in each of the SDi controllers. For storage this could include; Thin Provisioning Strategy (ALWAYS, NEVER) Media Tier (SLOW, MEDIUM, FAST) BW throttle (0-100 megs/sec) IO throttle (0-1000 io/sec) Data protection level (LOW, MEDIUM, HIGH) Controller redundancy level (LOW, MEDIUM, HIGH) Snapshot Frequency (DAILY, WEEKLY, MONTHLY, NEVER) Replication Frequency (DAILY, WEEKLY, MONTHLY, NEVER) Default Export Protocol (iscsi, CIFS, NFS) Default Fabric (GIGE, 10GIGE, SAS, FIBER CHANNEL) 6
If Policy management does not allow enough control than the IT USER should be able to manage his SDi controllers through a device manager such as a Virtual Storage Array or Virtual Network Switch. Rapid Elasticity SDi is implemented in software therefore it fulfills the requirement of rapid elasticity as it obviates the need of IT administrator and can provision the users requirements in real time. Resource pooling & Multitenancy SDi is a datacenter wide framework that understands device inventory, utilization and topology. SDi uses this knowledge to manage the resource pooling & multitenancy requirements in cloud. A common management database captures the static and dynamic blueprint of the datacenter and allows the SDi layers to choose the appropriate devices from the device pools to meet the User SLAs. Metered Service Since the SDi framework manages the device levels, it can monitor and in many cases set throttles on the traffic speeds and the resource feeds used by each tenant. This data is stored in a CMDB and can be visualized by the User or Datacenter Administrator for each resource or it can be exported to a billing engine. SDi Controllers On demand self service requires semantics and APIs so that the IT User and standard User can define their IT resource requirements and manage their IT resources. The need for rapid elasticity requires that these APIs drive automated management layers to implement the User requests. Ubiquitous network access requires complex networks to be easily setup, managed and torn down. Multitenancy requires the automated management layers understand the impact and control how multiple users share the same resource. Sharing a resource should not be at the cost of losing performace so the management layers must understand how to implement QoS (Quality of Service) and SLA (Service Level Agreements). Virtual Machines can be viewed as one of the products produced by the data center factory. The efficient provisioning of customized Virtual Machines, with the User s required levels of IT resources, at scale, is one of the most challenging processes which must be run in this factory. By efficient provisioning we mean the execution of software processes that translate the VM User s instructions to right size and customize his VM with the required amount of resources. 7
A virtual Machine is an aggregate of a number of devices, namely CPU/Memory, Network and Storage running an OS/Application suite. Devices are managed by device managers (DMs), for example a storage array is a disk device manager, a Hypervisor such as KVM is a CPU/Memory device manager and a switch a networking device manager. To create, build and launch a VM, via a user request, requires additional software layers that manage device managers. This layer is called the software defined infrastructure (SD I ) layer. The SDI layer has three main controllers which are connected together by soft wiring ; Software Defined Compute (SD C ) Software Defined Networking (SD N ) Software Defined Storage (SD S ) and each software defined controller has three components; Control API Fig. 4 Resource allocator Device Manager Implementation (of Storage, Compute and Networking devices) Control API The control API provides the interface to allow a User to specify the characteristics of devices used in building his VM. Resource Allocator The resource allocator has a global view of the dynamic and static behavior of the class of devices it manages. In this sense the resource allocator is fully distributed over all the devices that it manages. The static and dynamic view of the datacenter devices are centralized in a CMDB allowing for resource requests to be made based on appropriate information. Device Manager Implementation The device manager implementation provisions from the device pools and connects the VM devices together using soft wiring through the available datacenter networks and fabrics. These devices are typically linked together across high speed fabrics and as the number of devices increases the complexity of linking the devices together becomes the dominant technological challenge. Additional properties for each device type usually need to be configured, e.g. storage size and allocated BW (in IO/sec or MB/sec) for a volume as well as the various management policies relating to snapshot, replication, thin provisioning etc. SDi in a cloud IaaS environment is achieved using the services of the Resource Controllers, which have APIs that allow resources to be provisioned from resource pools. Resource pools 8
group together ICT devices such as CPU, Memory, Storage and Networking. Resource Controllers allow the properties of a provisioned resource to be managed by the user of the resource without needing to understand the details of the devices managed by the controller. For example, storage resource controller properties include; volume size, QoS, resiliency level, BW & IO limit and a user who may want to set the B/W limit can do so without needing to understand how that is implemented at the device level. The resource controller API allows an application to request a resource from a resource pool and to specify the properties required of that resource. Devices with the correct properties can be selected, provisoned and routed to the resource requester. This ability of the Resource Controller to set the properties of compute, network and storage allows virtual machines to be right sized for the application they are being used for. Right sizing means setting the correct amount of memory & cpu cores, specifying not just the amount of storage required but also the BW/IO SLA, the level of resiliency at disk and storage array. It is the responsibility of the resource controller to understand the underlying device layer and to correctly provison and configure the resource requested by the user. Fig 5 shows the example of an SD storage controller. The control API allows the User Dashboard to request a ressource, the Resource Allocator has a datacenter wide view of the devices it manages stored in the CMDB and understands current static and dynamic properties of the devices. Using this CMDB data and the Policies that have been setup by the datacenter it can select the most appropriate resource that satisfies the constrainst of the CMDB data and the User request. Fig. 5 Virtual Device Managers We have seen that the standard user semantics for IT users may not be flexible enough to implement their IT configuration. In this case a set of Virtualised Device Managers (VDMs) are required. A VDM for storage would be a Virtual Storage Array which would allow an IT user to manage his storage using a standard storage array paradigm. In a VSA configuration storage is allocated using the SDs controller either as Volumes from the disk pool or a set of single disks from the storage pool. This storage is then managed by the VSA and exported over the tenant network to all the Virtual Machines or services using the storage. The VSA volumes can be exported over CIFS, NFS, iscsi or other network protocols. 9
Cloud Software Defined Infrastructure (SDi) SDI is the emerging paradigm for implementing scale out datacentre infrastructure used in cloud Infrastructure as a Service (IaaS). IaaS typically provides a secure managed tenant space, within which, each tenant can create and manage their IT resources - typically compute, data storage and networking devices. A tenant space is a logical space using devices from multiple hardware pools from any location within the datacentre. The ability to provision a device and provide routed access to the device is a key aspect of SDi. Devices used by a tenant do not require permanent fixed mappings to physical devices as when an on-demand resource is instantiated the allocation and mapping is created from the available pool of devices. An example of this is a virtual machine which may be instantiated on one machine but re-started or migrated for load balancing purposes to another server in the datacenter. A tenant space groups all the allocated resources into a single managed, metered and monitored network. SDi and Openstack OpenStack is a cloud operating system, which means that it is the software that manages the computer resources in a cloud data center. To achieve this goal, OpenStack promotes a disaggregated resource model of three independent device controllers, see fig 6: Network (QUANTUM) Storage (CINDER) Compute (NOVA) QUANTUM NOVA CINDER Fig. 6 Each device controller is composed of three parts; The Control API Resource Allocator Device Manager 10
ISVs implement the Resource Allocator and Device Managers in which they are specialised. A storage array expert, for example, may implement a device manager for storage arrays where the storage resource controller would understand how to implement the following in software; Storage Volume Provisioning Have a datacenter wide view of the storage devices, their topology, static and dynamic usage. Have a policy engine that can provision storage based on the CMDB values and the storage Policies that have been set by the datacenter administrator. Manage the storage arrays, create RAIDs and collect data to be stored in the CMDB. Create, delete and edit storage volumes and export those volumes to the required fabric protocol providing access to the required host OS on that fabric (automated SAN management). Provide access from the device to the guest OS within the tenant space (which is the User of the Resource). The high level request to the Resource Controller from the User Dashboard needs additional qualifiers to correctly manage the devices in the datatcenter. For example a set of storage devices managing multiple storage tiers need policies to decide; which tier to allocate from, over what fabric and what bandwidth to allocate each volume. By setting policies for each storage group the resource allocator can resolve the datacenter, device pool and user request constrainst. The storage policy allows the datacenter to offer the user policies which control the management of the storage; Size of the Volume Thin Provisioning Strategy (ALWAYS, NEVER) o Initial reservation (Gigs) o Reprovision threshold (%) o Reprovision size (Gigs) Media Tier (SLOW, MEDIUM, FAST) BW throttle (0-100 megs/sec) IO throttle (0-1000 io/sec) Data protection level (LOW, MEDIUM, HIGH) Controller redundancy level (LOW, MEDIUM, HIGH) Snapshot Frequency (DAILY, WEEKLY, MONTHLY, NEVER) o Retention strategy (keep 1,2 or 3 snapshots) Replication Frequency (DAILY, WEEKLY, MONTHLY, NEVER) o Replication properties TBD Default Export Protocol (Block, CIFS, NFS) Default Fabric (GIGE, 10GIGE, SAS, FIBER CHANNEL) 11
SDI Life Cycle Start Of Day (fig 7.) Automatic discovery of new nodes and their role assigment and configuration Steady State The use of Dashboards and Virtual DMs by the cloud Users Pertubation Node Loss or node overload requiring a reconfiguration of the cloud Re-configuration Reconfiguration of the remaining nodes to provide continued service Re-Balance Adjustment of policies of the cloud services to provide continued USER SLAs Orkestra MPSTOR s SDi implementation for Openstack Fig. 7 MPSTOR s Distro of Openstack (fig 8.), branded Orkestra TM, is a bootable SDi platform based on LINUX. It integrates all the components required for Openstack to function as well as all the tools necessary to run an SDi platform. Orkestra is downloaded from the MPSTOR website as an ISO image, burned to a bootable media such as a USB key or DOM device to create a Plug and Boot SDi platform. Once booted, the user can continue to run the SDi platform from the USB key or install it to Disk. Orkestra TM integrates MPSTOR s core IP in automated SDs (branded Provizo TM ) and in block storage management software (branded MPStackware TM ) with OpenStack to provide the full suite of SDi capability described earlier in this white paper. In the Orkestra SDi implementation SDc and SDn functionality is provided by OpenStack Nova and Quantum respectively. Fig. 8 About the Author: William Oppermann, founder of MPSTOR, has extensive experience in the ICT industry and in particular the enterprise data storage industry and has worked in new product development for over 20 years. William holds a B.E. as well as a M.Eng.Sc. masters degree in Engineering as well as a degree in International sales from DIT. 12