A Survey on Cloud Storage Systems Team : Xiaoming Xiaogang Adarsh Abhijeet Pranav
Motivations No Taxonomy Detailed Survey for users Starting point for researchers
Taxonomy Category Definition Example Instance Storage Storage coming with virtual machine images Amazon EC2 instance Object Block Storage of binary objects provided in the form of Web services. An object can be any type of file. Virtual block devices that can be attached to VM instances and used like local disks. Amazon Simple Storage Service (S3) Amazon Elastic Block Store (EBS) Semi-structured data Database service for storing semi-structured data with high availability, high scalability, and high performance. Amazon Simple DB Relational Database Relational database servers on VM instances in clouds. Amazon Relational Database service Distributed file system Online Drive/ Folder service Distributed provided through file system interfaces with high availability and high scalability. Storage space provided in the form of a virtual drive or folder on Internet. Google File System Microsoft SkyDrive
Commercial Cloud Providers Vendor Instance Object Block Semistructured data Relational Database Distributed File System Amazon EC2 S3 EBS SimpleDB RDS N/A Online Folder/Drive Microsoft Azure VM Azure Blob Azure drive Google N/A Google Storage for Developers Azure table SQL Azure N/A SkyDrive/Mesh N/A BigTable N/A Google File System
Commercial Cloud Providers Windows Azure Blob - Distributed for large items. Each item can be of maximum size 50 GB. - One can view Azure Blob as a container. Each container consists of blobs and each blob is made of blocks. - All access to Azure Blob is through HTTP REST interface. Windows SQL Azure - SQL Azure provides web-facing database functionality as utility service. - TDS is the protocol which is used to connect to a Cloud-based database. - Queries are formulated in Transact-SQL language. - Applications and tools already in use with existing other relational databases work seamlessly with SQL Azure. Windows Azure Table - Provides structured for maintaining service state. - Structured is provided in the form of tables which contain a set entities and each entity is made up of a set of named properties. - Provides support for LINQ, ADO.NET data services and REST. - Azure Table can be thought of as a fancy spreadsheet. One can store the state of an entity in the columns of the spreadsheet.
Commercial Cloud Providers Amazon Elastic Block Store (EBS) - Off-instance that persists independently from the life of an instance. - Storage volumes behave like raw, unformatted, block devices. - Can store from 1 GB to 1 TB in volumes, can be mounted on EC2 instances. Amazon S3 - Object that is designed to make web-scale computing easier for developers. - Users can store persistent data organized in buckets and objects. - Uses standards-based REST and SOAP interfaces designed to work with any Internet- development toolkit. - Unlimited objects containing 1 byte to 5 GB of data each can be stored. Amazon Relational Database Storage (RDS) - Provides cost-effective and resizable capacity. - Applications and tools in use with existing MySQL databases work seamlessly with Amazon RDS. Amazon SimpleDB - Non-relational database that offloads the work of database administration. - User can Focus on application development without worrying about infrastructure provisioning, high availability, software maintenance.
Commercial Cloud Providers
Commercial Cloud Providers - Use Cases Creating a Web Application With Relational Data SQL Azure or Amazon RDS can be used Creating parallel processing Application, Storage for data analysis, Backup and Recovery (examples: financial modeling at a bank, New drug development in a pharmaceutical company.) Azure Blob or Amazon S3 can be used to store intermediate data. Creating Scalable Web Application, gaming application, metadata indexing (example : On line Tickiet system, news video site etc,) Azure table or Amazon Simple DB can be used Applications that require a database, file system, or access to raw block level. Amazon EBS or Azure drive can be used.
System Academic Cloud Systems Instance Object Block Semi-structured data Distributed file system Eucalyptus VM S3 EBS N/A N/A Nimbus VM Cumulus N/A N/A N/A OpenNebula VM N/A N/A N/A N/A OpenStack VM OpenStack object N/A N/A N/A Hadoop N/A N/A N/A HBase Hadoop distributed file system (HDFS)
Academic Cloud Systems Eucalyptus SOAP/REST based tools Cluster A Storage Controller Walrus Storage Controller Cluster B S3 mainly used for VM image Typical configuration contains one server per cluster
Academic Cloud Systems Nimbus - Cumulus service used for VM image - Cumulus can be configured to use various backend OpenNebula -Two ways to manage VM images: shared NFS and nonshared SSH
Academic Cloud Systems OpenStack - OpenStack object used for VM image management - Uses disk blocks directly instead of file systems Hadoop - HDFS interface is not totally compatible with POSIX standard, nor is the system optimized for file I/Os - Hbase is built on top of HDFS
Conclusions and Future work Virtualized I/O performance of cloud services not comparable to local disk yet Academic cloud systems are not providing a rich set of services so far Performance tests for commercial services in future More investigation on design and implementation details Include emerging services from other providers.