Introduction to Cloud : Cloud and Cloud Storage Lecture 2 Dr. Dalit Naor IBM Haifa Research Storage Systems 1 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Content What is the Cloud paradigm Cloud principles and virtualization Cloud Storage and cost models How is it done? Cloud-based file systems Cloud object stores 2 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
What is a cloud and why is it interesting? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. US National Institute of Standards and Technology, Information Technology Laboratory Key features of cloud: On-demand Shared Automated Network access Benefits of cloud: Speed and Agility Cost Savings Economies of scale, utilization improvement and standardization Pay-as-you-go for usage 3 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Common Themes in all Definitions Infrastructure as-a-service Pay per use model: utility computing Scale/Elasticity Scale up and scale down (!!) Easy of use, management Highly automated management of resource pools Lower Cost thru Economy of scale 4 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Public and Private Clouds: Business and Operational Models Public Cloud Owned and operated by companies that offer computing resources to others Used as pay-as-you-go No need to own hardware, software -> OPEX vs CAPEX Examples: Amazon Web Service, IBM SoftLayer, Microsoft Azure, Google AppEngine, Private Cloud Owned and operated by a single company for its internal use Internal datacenters Taking advantage of cloud s efficiencies, such as elasticity, virtualization, cost,.. Hybrid Cloud the reality! Uses a private cloud foundation combined with public cloud services. Uses public for some type of IT services, and standard legacy IT for mission critical applications Supports an evolution model, legacy 5 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Infrastructure, Platform and Software as a Service Source: R. Paul Singh's Blog 6 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Virtualization Abstraction of the physical layers and resources Widely exists in computer systems Memory, Storage, Compute, Networking Virtual machines Back to IBM s mainframe, IBM AIX/Power systems Revolution: X86 virtualization - VMWare - Linux KVM, Xen Virtual machines technology is the enablement for cloud computing Note: There are also bare-metal clouds, so Cloud holds even without virtual machine technology 7 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Different cloud workloads need different classes of storage High-performance, co-located storage for XaaS Blocks/file to support compute E.g. Amazon EBS, Openstack NOVA General purpose data center NAS extension Files Fixed content depot Objects E.g. Amazon S3, Openstack Swift 8 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Cloud (Online) Storage Networked online storage Data is stored in virtualized pools of storage may span across multiple data centers Typically hosted by a third party Customers use to store files or data objects. Cloud Object Storage protocols WAN (Cloud) Web based HTTP protocol Put/Get operations, for fixed content Enables new extensions: integrity, dedup. Doctor/ Patient 9 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Source: http://aws.amazon.com/s3/ 10 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Costs Cost is typically a combination of Used Capacity Network data transfer Number of requests E.g. Storage pricing for Amazon S3 11 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Storage pricing for Amazon S3 - http://aws.amazon.com/s3/ 12 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Storage pricing for Amazon S3 - http://aws.amazon.com/s3/ 13 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Collect, Store, Organize, Analyze Data TCO Total Cost of Ownership Usable vs Raw capacity Redundancy and durability levels, e.g. AWS regions and availability zones Fixed costs: administrators, power, floor space Optimization : tiering, backups Security if we have a 2 Terabyte model it would cost $155.65 per month in US-West and US- East standard Reduced Redundancy Storage on Amazon S3 storage. At that point, you may as well treat yourself to the standard storage option which would run $194.56 per month for the same 2 Terabytes. Over three years, that is over $7,000 to keep 2 Terabytes in the public storage cloud. Most on-premise storage systems would cost less, but in the disaster recovery use case the abstraction that cloud storage brings is priceless. But how much power, cooling, and operational expense would be avoided? How to determine if cloud storage is a cost savings, TechRepublic, March 4, 2013, 14 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Today, Storage on the cloud is prevailing. Example: Five best storage cloud providers, June 2013 Free cloud storage is easy to come by these days anyone can give it out, and anyone can give out lots of it. However, the best cloud storage providers give you more than just storage. http://lifehacker.com/five-best-cloud-storage-providers-614393607 15 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
But Limitations Data lock-in Security Multi tenancy Secure delete Data confidentiality and auditability How vulnerable is the cloud infrastructure Service Level Agreement - SLAs Cost is it REALY cheaper? 16 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
How is it done? the Internals Cloud File Systems Cloud Object Stores 17 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Scalable File Systems Different design points than traditional file systems New architecture New, relaxed, protocols and systems operations (I/O and management) New solutions for resiliency and high availability based on replication, e.g. not RAID Support for computation Designed for new workloads: large streaming, sequential Writes or Analytics. Assumptions Based on commodity hardware Components always fail - Need self monitoring to detect, tolerate, and recover from failures Optimized for large files Results No POSIX API Each chunk is replicated d times (a typical value for d==3) Smart placement of chunks Scribed from: Clouddbms2011.pdf 18 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Examples Hadoop File System (HDFS, Yahoo) - 2009 Source: The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur 19 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
HDFS Architecture Source: NextGen Infrastructure for Big Data, IMEX Research http://imexresearch.com/big_data_infrastructure.pdf 20 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Examples Google File System (GFS) 2002 Source: http://en.wikipedia.org/wiki/file:googlefilesystemgfs.svg 21 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Cloud Object Storage Openstack / Swift RESTful APIs Swift storage: http://swift.company.com/v1/account /container/object Get/Put/Delete Source: Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/ 22 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
OpenStack Swift architecture/installation Source: http://docs.openstack.org/grizzly/openstack-compute/install/apt/content/example-object-storage-installationarchitecture.html 23 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Cloud Object Storage Openstack / Swift Building Blocks Proxy Servers: Handles all incoming API requests. Rings: Map logical names of data to locations on particular disks. Zones: failure domains Storage Nodes Source: Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/ 24 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Cloud Object Storage Openstack / Swift Data model Accounts: tenants Containers: sets of objects Objects: The data itself, mapped to files on the local file system Partitions/Containers : Manage locations where data lives in the cluster. Replication Everything is stored three times (by default) Upon a disk failure, the data is replication to other zones, ensuring three copies Source : Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/ 25 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Summary Cloud is a paradigm shift Cloud Storage is prevailing Cloud Storage requires new storage architectures, e.g. Cloud file systems Cloud object stores 26 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom