Masaryk University Faculty of Informatics. Master Thesis. Database management as a cloud based service for small and medium organizations



Similar documents
See Appendix A for the complete definition which includes the five essential characteristics, three service models, and four deployment models.

Database management as a cloud-based service for small and medium organizations

Kent State University s Cloud Strategy

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com


Cloud Computing. Course: Designing and Implementing Service Oriented Business Processes

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

IS PRIVATE CLOUD A UNICORN?

An Introduction to Cloud Computing Concepts

Capability Paper. Today, aerospace and defense (A&D) companies find

Managing Cloud Computing Risk

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

The NIST Definition of Cloud Computing

Enhancing Operational Capacities and Capabilities through Cloud Technologies

A white paper from Fordway on CLOUD COMPUTING. Why private cloud should be your first step on the cloud computing journey - and how to get there

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

Cloud Models and Platforms

Cloud Computing for SCADA

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

CHAPTER 8 CLOUD COMPUTING

What is Cloud Computing? First, a little history. Demystifying Cloud Computing. Mainframe Era ( ) Workstation Era ( ) Xerox Star 1981!

SCADA Cloud Computing

White Paper on CLOUD COMPUTING

Cloud Glossary. A Guide to Commonly Used Terms in Cloud Computing

1. From the CIO Strategic Direction for Cloud Computing at Kent State Cloud Computing at Kent State University 5

A Study on Service Oriented Network Virtualization convergence of Cloud Computing

Analysis and Strategy for the Performance Testing in Cloud Computing

SURVEY OF ADAPTING CLOUD COMPUTING IN HEALTHCARE

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Outlook. Corporate Research and Technologies, Munich, Germany. 20 th May 2010

Cloud Computing Service Models, Types of Clouds and their Architectures, Challenges.

East African Information Conference th August, 2013, Kampala, Uganda. Security and Privacy: Can we trust the cloud?

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

The NIST Definition of Cloud Computing (Draft)

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

Cloud computing: the state of the art and challenges. Jānis Kampars Riga Technical University

IT Security Risk Management Model for Cloud Computing: A Need for a New Escalation Approach.

Building an AWS-Compatible Hybrid Cloud with OpenStack

Business Intelligence (BI) Cloud. Prepared By: Pavan Inabathini

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

A Study of Infrastructure Clouds

Security Considerations for Public Mobile Cloud Computing

Getting Familiar with Cloud Terminology. Cloud Dictionary

10/25/2012 BY VORAPOJ LOOKMAIPUN CISSP, CISA, CISM, CRISC, CEH Agenda. Security Cases What is Cloud? Road Map Security Concerns

Virtualization and Cloud Computing

Cloud Computing: The Next Computing Paradigm

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Planning the Migration of Enterprise Applications to the Cloud

Architectural Implications of Cloud Computing

Research Paper Available online at: A COMPARATIVE STUDY OF CLOUD COMPUTING SERVICE PROVIDERS

Assignment # 1 (Cloud Computing Security)

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Overview. The Cloud. Characteristics and usage of the cloud Realities and risks of the cloud

Hadoop in the Hybrid Cloud

Cloud Computing: Making the right choices

Cloud Courses Description

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

CLOUD COMPUTING. When It's smarter to rent than to buy

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

The cloud - ULTIMATE GAME CHANGER ===========================================

CLOUD COMPUTING PHYSIOGNOMIES A 1.1 CLOUD COMPUTING BENEFITS

Cloud Computing - Architecture, Applications and Advantages

Cloud Computing; What is it, How long has it been here, and Where is it going?

Competitive Comparison Between Microsoft and VMware Cloud Computing Solutions

Lecture 02a Cloud Computing I

Introduction to Cloud Computing

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Cloud Computing. Chapter 1 Introducing Cloud Computing

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Introduction to Cloud Computing

Amazon Relational Database Service (RDS)

Cloud Security Introduction and Overview

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Oracle s Cloud Computing Strategy

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

IBM EXAM QUESTIONS & ANSWERS

Cloud Computing Architecture: A Survey

Sistemi Operativi e Reti. Cloud Computing

WhitePaper. Private Cloud Computing Essentials

Cloud Computing Trends

Cloud computing and SAP

SOLUTION BRIEF Citrix Cloud Solutions Citrix Cloud Solution for On-boarding

Cloud Courses Description

Windows Server 2012 R2 The Essentials Experience

OWASP Chapter Meeting June Presented by: Brayton Rider, SecureState Chief Architect

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Outline. What is cloud computing? History Cloud service models Cloud deployment forms Advantages/disadvantages

Running head: TAKING A DEEPER LOOK AT THE CLOUD: SOLUTION OR 1

OVERVIEW Cloud Deployment Services

White Paper. Cloud Vademecum

Software-Defined Networks Powered by VellOS

9/26/2011. What is Virtualization? What are the different types of virtualization.

Transcription:

Masaryk University Faculty of Informatics Master Thesis Database management as a cloud based service for small and medium organizations Dime Dimovski Brno, 2013

2

Statement I declare that I have worked on this thesis independently using only the sources listed in the bibliography. All resources, sources, and literature, which I used in preparing or I drew on them, I quote in the thesis properly with stating the full reference to the source. Dime Dimovski 3

Resume The goal of this thesis is to explore the cloud computing, manly focusing on database management systems as a cloud service and to propose general scope breakdown structure of a project for migrating company s database to a cloud based solution. It will focus on explaining the key deliverables to migrate to a database in the cloud and illustrate the tasks required to fulfill each part of the project. The potential challenges and risk that must be taken in consideration are discussed and a comparison between some of the current available solutions of SQL and NOSQL based database management systems as a cloud service, considering the advantages and disadvantages of the cloud computing in general and the common considerations is provided. Keywords Cloud computing, SaaS, PaaS, Database management, SQL, NOSQL, DBaaS, Work breakdown structure, WBS, Scope breakdown structure. 4

Contents Statement... 3 Resume... 4 Keywords... 4 1. Introduction... 7 2. Introduction to Cloud Computing... 8 2.1 Cloud computing definition... 9 2.2 Cloud Types... 9 2.2.1 NIST model... 10 2.3 Cloud computing architecture... 12 2.3.1 Infrastructure... 14 2.3.2 Platform... 15 2.3.3 Application... 16 3. Scalability... 18 4. Elasticity... 20 5. Database Management Systems in the cloud (Database as a service)... 21 6. Scope Breakdown Structure of project for migration of a database to a cloud based solution.. 24 7. SBS Deliverables, Challenges and Risks... 26 8. Will cloud computing reduce the budget?... 40 9. Conclusion... 42 10. Appendix... 44 Some of the currently available RDBM DBaaS, comparison and common considerations... 44 Understand various available storage options... 47 NOSQL options data models... 48 Amazon DynamoDB DataModel... 48 Amazon SimpleDB... 51 Document oriented database... 52 5

List of Abbreviations... 55 References... 56 6

1. Introduction The boom of the cloud computing over the past few years has led to situation that it is common to many innovations and new technologies. It became common for enterprises and a person to acknowledge that the cloud computing is a big deal and to use the services that are offered in the cloud, even though they are not clear why that is so. Even the phrase in the cloud has been used in our colloquial language. Many developers in the world are currently working on some cloud-related products. Therefore the cloud is this amorphous entity that is supposed (citation [1]) to represent the future of modern computing. In an attempt to gain a competitive edge, businesses are looking for new innovative ways to cut costs while maximizing value. They recognize the need to grow but at the same time they are under pressure to save money. The cloud gave this opportunity for the business allowing them to focus on their core business by offering hardware and software solution without having to develop them on their own. In this thesis I will give an overview of what cloud computing is. I will describe its main concepts and architecture; and take a look at the paradigm XaaS (something/everything as a service) and the current available options in the cloud mostly focusing on Database in the cloud or Database as a service. Good planning and preparation are one of the most important parts before migration to the cloud. As a part of this thesis I will create a proposal of a general Scope Breakdown structure of a project for moving companies databases in the cloud based solution describing the key deliverables and the potential challenges and risk connected with them, taking in consideration some of the current available options. I will also give a closer look on how the cloud computing in general and database as a service can be used for small and medium enterprises, and what are the main benefits that it offers and whether it will really help businesses to reduce the budget and focus on their core business. 7

2. Introduction to Cloud Computing In reality the cloud is something that we have been using for a long time, it is the Internet, with all the standards and protocols that provide Web services to us. Usually the Internet is drawn as a cloud; this is abstraction, one of the essential characteristics of cloud computing. The cloud computing is distinguished by its concept of virtual resources that appear to be limitless and the details of the physical system on which software runs are abstracted from user. Cloud computing refers to services and applications that run on a distributed network using virtualized resources that are accessed by common Internet protocols and networking standards. [1] The advancements in the past few years in connectivity and wireless network speed is one of the main things that make driving cloud computing practical or even possible. In some way, cloud computing is an eventuality. The boom of the mobile devices, smartphones, tablets, etc. is pushing cloud computing even faster. This represents a major breakthrough not only in computing but also communication. With the popularization of the internet and the growing number of large service companies a massive scale of cloud computing systems were enabled. The cloud computing brings a real paradigm shift of the way the systems are deployed. Cloud computing can be compared to the standard utility companies. Cloud computing makes the dream of utility computing possible with a universally available, pay-as-you-go, infinitely scalable system. In other words, everything comes from one central location; the things are just turned on and off. This will give more people access to a much larger pool or resources at a highly reduced cost. The ability of cloud computing to offer users access to off-site hardware and software, is one of its biggest benefits. All the networks, processors, hardware and software combined will give individuals much more computing power. With keeping things light and simple individual access devices are going to last a lot longer, and losing or breaking a device is no longer going to be of any particular concern, as they can be replaced and there s no danger of losing your files or information as they are in the cloud (citation [1]). With cloud computing, you can start very small and become big very fast. That's why cloud (citation [3]) computing is revolutionary, even if the technology it is built on is evolutionary. 8

2.1 Cloud computing definition The use of the word cloud makes reference to the two essential concepts: Abstraction Virtualization Abstraction Cloud computing is abstracting the details of the system implementation from the users and the developers. Applications run on physical systems that aren't specified, data is stored in locations that are unknown, administration of systems is outsourced to others, and access by users is ubiquitous. [1] Virtualization Cloud computing virtualizes systems by pooling and sharing resources. Systems and storage can be provisioned on-demand from a centralized infrastructure, the resources are scalable, there is enabled multi-tenancy and costs are assessed on a metered basis. Cloud computing is an abstraction based on the idea of pooling physical resources and manifest them as virtual resources. It is a represents a model for provisioning resources for platform independent user access to services and applications. There are many different types of clouds and it is important to define what kind of clouds we are working with. The applications and services that run on cloud are not necessarily delivered by a cloud service provider. 2.2 Cloud Types Usually the cloud computing is separated into two distinct sets of models: Deployment models refers to location and management of the cloud s infrastructure. Service models particular types of services that can be accessed on a cloud computing platform. 9

2.2.1 NIST model The NIST model is set of working definitions published by the U.S. National Institute of Standards and Technology. The following section presents part of the NIST model definition of cloud computing. The whole content of this section is taken as it is and defined in the paper NIST Definition of cloud computing published by the U.S. National Institute of Standards. This cloud model is composed of five essential characteristics, three service models, and four deployment models. [2] Essential Characteristics: On-demand self-service - A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider. Broad network access - Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations). Resource pooling - The provider s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth. Rapid elasticity - Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time. Measured service - Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g. storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service. Service Models: Software as a Service (SaaS) - The capability provided to the consumer is to use the provider s applications running on a cloud infrastructure 2. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings. 10

Platform as a Service (PaaS) - The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment. Infrastructure as a Service (IaaS) - The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls). Deployment Models: Private cloud - The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises. Community cloud - The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises. Public cloud - The cloud infrastructure is provisioned for open use by the general public. It is usually open system available to general public via WWW or Internet. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider. Examples of public cloud: Google application engine, Amazon elastic compute cloud, Microsoft Azure. Hybrid cloud - The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds). [2] 11

2.3 Cloud computing architecture Cloud computing architecture is essentially a series of levels that function together in various ways to create a system. The cloud itself creates a system where resources can be pooled and distributed as needed. Cloud architecture can combine software running in multiple locations on hardware that is virtualized in order to provide an on-demand service to user facing hardware and software. A cloud can be created within an organization's own infrastructure or it is possible to be outsourced to another datacenter. Because virtual resources are easier to optimize and modify the resources in the cloud are mostly virtualized resources. A compute cloud requires virtualized storage to support the staging and storage of data. From a user's perspective, it is important that the resources appear to be infinitely scalable, that the service be measurable, and that the pricing be metered. [1] Figure 1 Cloud computing stack Applications in the cloud are usually composable systems, this means that they are using standard component so assemble services that are tailored for a specific purpose. A composable component must be: Modular: It is a self-contained and independent unit that is cooperative, reusable, and replaceable. It can be deployed independently. 12

Stateless: A transaction is executed independently, without regard to other transactions or requests In general cloud computing does not require that hardware and software to be composable but it is a highly desirable characteristic. Composable systems are much easier to implement and solutions are more portable and interoperable. Some of the benefits from composable system are [1] : Easier to assemble systems Cheaper system development More reliable operation A larger pool of qualified developers A logical design methodology There is a trend toward designing composable systems in cloud computing in the widespread adoption of what is called the Service Oriented Architecture (SOA). The essence of a Service Oriented Architecture is designing application using services or components, building application in modular fashion. The services are constructed from modules that are using standard communications and service interfaces that collectively provide the complete functionality of a large software application. An example of widely used XML based standards describes the services themselves in terms of: Web Services Description Language (WSDL) describes the web service, how to invoke it and what exactly it does. Simple Object Access Protocol (SOAP) describes the communications between the services, the message format. Universal Discovery, Description, and Integration (UDDI) Directory of web services that are available to be used. There are, of course, alternative sets of standards. The nature of the module itself it is not specified and it can be developed in any programming language. From the standpoint of the system, the module is a black box, and only the interface is well specified. This independence of the way how the internals of the module or component works means it can be easily replaced with different model, relocated, or replaced at will, provided that the interface specification remains unchanged. 13

Essentially there are 3 tiers in a basic cloud computing architecture: Infrastructure Platform Application If we further break down the standard cloud computing architecture there are really two (Citation [1]) areas to deal with; the front end and back end. Front End - The front end includes all client (user) devices and hardware in addition to their computer network and the application that they actually use to make a connection with the (Citation [1]) cloud. Back End - The back end is populated with the various servers, data storage devices and (Citation [1]) hardware that facilitate the functionality of a cloud computing network. 2.3.1 Infrastructure The infrastructure of cloud computing architecture is basically all the hardware, including virtualized hardware, data storage devices, networking equipment, applications and software that drives the cloud. Most Infrastructure as a Service (IaaS) providers use virtual machines to deliver servers that run applications. Virtual machines images or instances are containers that have assigned specific resources (number of CPU cycles, memory access, network bandwidth, etc.). Figure 2 shows the cloud computing stack that is defined as the server. The Virtual Machine Monitor, also called a hypervisor is the low level software or hardware that allows different guest operating systems to run in their own memory space and manages I/O for the virtual machines. [1] 14

Figure 2 "Server" stack 2.3.2 Platform A cloud computing platform is the software layer that is used to create higher level of services. It is the programming code and implemented systems of interfacing that allows users and applications connect and use the available hardware and software resources of the cloud. A cloud computing platform is generally divided up between the front end and back end of a network. Its job is to provide a communication and access portal for the client, so that they may effectively utilize the resources of the cloud network. The platform may only be a set of directions, but it is in all actuality the most integral part of a cloud computing network; without it cloud computing would not be possible (citation [3]). There are many different Platform as a Service (PaaS) providers, we will mention some of them: Salesforce.com s Force.com and Databse.com Platforms Windows Azure Platform Google Apps and Google AppEngine Amazon Web services 15

All platform services offer hosted hardware and software that is needed to build and deploy Web application or other custom services. Many of the operating system vendors already provide their development environments into the cloud with the same technologies that have been successfully used to create Web applications. [1] Thus, you might find a platform based on an Oracle xvm hypervisor virtual machine that includes NetBeans IDE and that supports the Oracle GlassFish Web stack programmable using Perl or Ruby. For Windows, Microsoft with its Azure cloud provides a platform that allows Windows users to run on a Hyper-V VM, use the ASP.NET application framework, supports SQL Server or some of the other enterprise applications, and be programmable within Visual Studio. With this approach developers can develop a program in the cloud that can be used by many others. Platforms usually come with tools and utilities to support application design and deployment. Depending on the vendor they can be: tools for team collaboration, testing tools, versioning tools, database and web service integration, and storage tools. Platform providers begin with creation of developer s community to support the work that is being done in the environment. The platform is exposed to users through an API, also an application built in the cloud using a platform service would encapsulates the service through its own API. An API can control data flow, communications, and other important features of the cloud application. Until now there are is no standard API and each cloud vendor has their own. 2.3.3 Application This area is composed of the client hardware and the interface used to connect to the cloud. It is from the design of Internet protocols to treat each request to a server as an independent transaction (stateless service) that crucial problems arise from. The standard HTTP commands are all atomic in nature. While stateless servers are easier to architect and stateless transactions are more resilient and can survive outages, much of the useful work (citation [1]) that computer systems need to accomplish are stateful. Usage of transaction servers, message queuing servers and other similar middleware is meant to bridge this problem. Standard methods that are part of Service oriented Architecture that help to solve this issue and that are used in cloud computing are: Orchestration process flow can be choreographed as a service Use of service bus that controls cloud components There are many ways how clients can connect to a cloud service. The most common are: 16

Web browser Proprietary application This application can run on number of different devices, PC, Servers, Smartphones, and Tablets. They all need a secure way to communicate with the cloud. Some of the basic methods to secure the connection are: Secure protocol such as SSL (HTTPS). FTPS, IPSec or SSH Virtual connection using a virtual private network (VPN) Remote data transfer such as Microsoft RDP or Citrix ICA that are using tunneling mechanism Data encryption 17

3. Scalability The scalability is the ability of a system to handle growing amount of work in a capable manner or its ability to improve when additional resources are added. The scalability requirement arises due to the constant load fluctuations that are common in the context of Web-based services. In fact these load fluctuations occur at varying frequencies: daily, weekly, and over longer periods. The other source of load variation is due to unpredictable growth (or decline) in usage. The need for scalable design is to ensure that the system capacity can be augmented by adding additional hardware resources whenever warranted by load fluctuations. Thus, scalability has emerged both as a critical requirement as well as a fundamental challenge in the context of cloud computing. [1][4] Typically there are two ways to increase scalability: Vertical scalability by adding hardware resources, usually addition of CPU, memory etc. This vertical scaling (scaling-up) enables them to use virtualizations technologies more effectively by providing more resources for the hosted operating systems and applications to share. Horizontal scalability means to add more nodes to a system, such as adding new node to a distributed software application or adding more access points within the current system. Hundreds of small computers may be configured in a cluster to obtain aggregate computing power. The Horizontal scalability (scale-out) model also increases the demand for shared data storage with high I/O performance especially in cases where processing of large amounts of data is required. In general, the scale-out paradigm has served as the fundamental design paradigm for the large-scale datacenters of today. Integrating multiple load balancers into the system is probably one of the best solution for dealing with scalability issues. There are many different forms of load balancers to choose from; server farms, software and even hardware that have been designed to handle and distribute increased traffic. Items that interfere with scalability: Too much software clutter (no organization) within the hardware stack(s) Overuse of third-party scaling Reliance on the use of synchronous calls Not enough caching 18

Database not being used properly Creating a cloud network that offers the maximum level of scalability potential is entirely possible if we apply a more diagonal solution. By incorporating the best solutions present in both vertical and horizontal scaling, it is possible to reap the benefits of both models. (citation [3] ) In order to keep a consistent architecture when adding new components, once the servers reach their limits (no possibility of growth), we should simply start cloning them. Usually most of the problems arise from lack of resources not the inherent architecture of their cloud itself. A more diagonal approach should help the business to deal with the current and growing demands that it is facing. 19

4. Elasticity One of the most important attributes of the cloud computing is certainly its elasticity, the ability to upgrade resources and capacities on the fly, on the moment notice, instantly. The cloud computing creates an illusion of virtually infinite computing resources available on demand. Scalability of applications and storage are all elastic in the cloud. It must be noted that there is a subtle difference between elasticity and scalability when used to express a system s behavior. Scalability is a static property that specifies the behavior of the system on a static configuration. For example, a system design might scale to hundreds or even to thousands of nodes. On the other hand, elasticity is dynamic property that allows the system s to scale up or down on-demand all in a live system without service disruption while the system is operational. For example, a system design is elastic if it can scale from 5 servers to 10 servers (or vice-versa) on-demand. A system can have any combination of these two properties. The real-time infrastructure that actively responds on user requests for resources is the most remarkable thing about cloud computing. It is this elastic ability that allows service providers to offer their users access to cloud computing services at highly reduced costs. The pay for what you use model allows users to save money. As an example with the traditional computing network users have their own hardware setup of which most of the users rarely use more than 50% of the capacity. What cloud computing is offering is the possibility for the users to keep their expectations and current standards while still having the opportunity for expansion open when they will need it. This also improves the get more energy efficient computing while still providing the same computer experience plus adding the benefit of virtually limitless resources. In other word elasticity allows both user and provider to do more with less. 20

5. Database Management Systems in the cloud (Database as a service) Data and database management are integral part of wide variety of applications. Particularly relation DBMSs had been massively used due to many futures that they offer: Overall functionality offering intuitive and relatively simple model for modeling different types of applications. Consistency, dealing with concurrent workloads without worrying about the data getting out of sync Performance, low latency and high throughput combined with many years of engineering and development Reliability, persistence of data in the presence of different types of failures and ensuring safety. The main concern is that the DBMSs and RDBMSs are not cloud-friendly because they are not as scalable as the web-servers and application servers, which can scale from a few machines to hundreds. The traditional DBMSs are not design to run on top of the sharednothing architecture (where a set of independent machines accomplish a task with minimal resource overlap) and they do not provide the tools needed to scale-out from a few to a large number of machines. Technology leaders such as Google, Amazon, and Microsoft have demonstrated that data centers comprising thousands to hundreds of thousands compute nodes, provide unprecedented economies-of-scale since multiple applications can share a common infrastructure. All three companies have provided frameworks such as Amazon s AWS, Google s AppEngine and Microsoft Azure for hosting third party application in their clouds (data-center infrastructures). Because the RDBMs or transactional data management databases that back banking, airline reservation, online e-commerce, and supply chain management applications typically rely on the ACID (Atomicity, Consistency, Isolation, Durability) guarantees that databases provide and It is hard to maintain ACID guarantees in the face of data replication over large geographic distances 1, they even have developed propriety data management 1 CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it was successful or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) 21

technologies referred to as key-value stores or informally called NO-SQL database management systems. [6] The need for web-based application to support virtually unlimited number of users and to be able to respond to sudden load fluctuations raises the requirement to make them scalable in cloud computing platforms. There is a need that such scalability can be provisioned dynamically without causing any interruption in the service. Key-value stores and other NOSQL database solutions, such as Google Datastore offered with Google AppEngine, Amazon SimpleDB and DynamoDB, MongoDB and others, have been designed so that they can be elastic or can be dynamically provisioned in the presence of load fluctuations. We will explain some of these systems in more details later on. As we move to the cloud-computing arena which typically comprises data-centers with thousands of servers, the manual approach of database administration is no longer feasible. Instead, there is a growing need to make the underlying data management layer autonomic or self-managing especially when it comes to load redistribution, scalability, and elasticity. [7] Figure 3 Traditional VS Cloud Data Services This issue becomes especially acute in the context of pay-per-use cloud-computing platforms hosting multi-tenant applications. In this model, the service provider is According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three. 22

interested in minimizing its operational cost by consolidating multiple tenants on as few machines as possible during periods of low activity and distributing these tenants on a larger number of servers during peak usage [7]. Due to the above desirable properties of key-value stores in the context of cloud computing and large-scale data-centers, they are being widely used as the data management tier for cloud-enabled Web applications. Although it is claimed that atomicity at a single key is adequate in the context of many Web-oriented applications, evidence is emerging that indicates that in many application scenarios this is not enough. In such cases, the responsibility to ensure atomicity and consistency of multiple data entities falls on the application developers. This results in the duplication of multi-entity synchronization mechanisms many times in the application software. In addition, as it is widely recognized that concurrent programs are highly vulnerable to subtle bugs and errors, this approach impacts the application reliability adversely. The realization of providing atomicity beyond single entities is widely discussed in developer blogs. Recently, this problem has also been recognized by the senior architects from Amazon and Google, leading to systems like MegaStore [10] that provide transactional guarantees on key-value stores. In the next part I will offer a general Work breakdown structure of a project for migrating a database to a cloud based solutions, the key project activities will be explained including the potential challenges encountered in each activity are discussed and compered between the current available solutions considering both RDBMs and NOSQL DBMs offerings in the cloud explained to considerable level of detail how they work and how they are provisioned. 23

6. Scope Breakdown Structure of project for migration of a database to a cloud based solution. A work breakdown structure is a project management tool which organizes the project deliverables in hierarchical structure. According to the IPMA (International Project Management Association) definition, the work breakdown structure is document containing a hierarchical breakdown of the project goal into individual deliverables and further into individual products and sub-products to the level of individual work packages to be delivered in the course of project implementation. It defines 100% of the overall scope of the project. Subsequent levels list increasingly detailed definitions of project products. Because some methodologies describe this term as hierarchical breakdown of activities or tasks, sub-tasks, phases, as recommended by RNDr. Zdenko Stanicek in this Thesis and in the next release, version four, of the IPMA Competency Baseline (ICB) the term Scope Breakdown structure will be used. The scope represents the total content included in the project. The project should deliver all that is described within its scope. The Scope definition is expressed in scope breakdown structure (tree structure) where each node of this tree represents a deliverable or subdeliverable on a particular level. [citation Stanicek] Each deliverable or sub-deliverable that is part of the SBS should be absolutely clear without any reference how it should be delivered. The definition of Manage Scope encompasses the state which has to be achieved by the project without any reference to a possible way how it can be achieved. What is more the scope includes the totality of the goals and deliverables of the project. The scope is refined as the project develops. The continuous refinement of the Scope definition is visualized through documents that define those deliverables and subdeliverables in step-by-step improved manner, i.e. with growing detail and precision, as [citation Stanicek] the knowledge of the solved problem progresses. Because a deliverable is any product of a project, it is not concerned with the tasks and activities that lead to its completion. The Scope embraces the totality of the goals and deliverables of the project, defines the boundaries, i.e., what is included in the project and, [citation Stanicek] moreover, what is not included in the project. Although almost identical and very often confused in terms of terminology it is of crucial importance to properly understand WBS and Scope In order to avoid further confusion and 24

misleading it is suggested to use SBS instead of WBS. Therefore the SBS terminology will be applied This section describes the Scope Breakdown Structure (SBS) of a project for migration of databases to cloud based solution. In Figure 4 the SBS is presented to the first level. As I will describe the deliverables in more detail the lower levels will be presented. Figure 4 Scope Breakdown Structure 25

7. SBS Deliverables, Challenges and Risks In this section, the SBS project deliverables will be described. The first and second level deliverables will be in focus of this section as well as the potential challenges that could emerge during the execution of each of them. Management and support provided This deliverable should provide the project preparation and all the intial planing. The steps taken here will help us to indentify the primary focus areas: scope, objectives, plan, definition of project team and risk management. The project charter and the project strategy are defined here. Some of the challenges and risks that might be presented are: Not clearly defined project goals or lack on agreement on the project goals, Lack of senior management involvement, Not defined effective project management methodology. 26

Planning completed The point of this deliverable is to provide preliminary cloud assessment. Evaluate the offerings from the different cloud providers on the market, get clear understanding of the separation of the responsibilities between cloud provider and the client, agree on it and define a responsible team. Also the legacy application should be evaluated at this point to get the overall picture, early enough if the cloud based solution should be considered for these applications. The cloud computing represents a shared responsibility between the provider and the client. The demarcation line that separates the responsibilities between the client and the provider varies according to the area and according to the delivery model that is being evaluated. Sometimes this demarcation line is referred to as trust boundary, it illustrates that for those areas that fall under cloud provider responsibility the client must trust the execution and implementation by the provider. Most of the cloud provider s service agreements limit the provider s responsibility, typically to a refund fees, whether the application will fail in availability causing financial loss to the client or failure to comply with some of the important compliance requirements, these agreements shift the primary risk responsibility to the client. Having this in mind it is from outmost importance to clearly understand where this trust boundary lays. The Table 1, presented below can help in better understanding the trust boundary and provide some guidance regarding the security responsibilities of the various delivery models. The zigzag line indicates where the trust boundary lies for each delivery model. Creating a comparison table from the available solutions can be also useful and helpful. In Appendix section there is presented an example of a table, Table 2, with comparison based on the common considerations of some of the currently available RDBMs as a service on the market. 27

Table 1 Trust boundary in cloud security IaaS PaaS SaaS Application Responsibility User User Provider Action Apply the best practices and certification Apply the best practices and certification Evaluation certification and OS/Middleware Responsibility User Provider Provider Action Apply the best practices and certification Evaluation certification and Evaluation certification and Infra-Structure Responsibility Provider Provider Provider Action Evaluation and certification Evaluation certification and Evaluation certification and Potential challenges and risks are: Provide accurate cost analysis - Weighing the cost considerations of owning and operating data center against going to a cloud provider and choosing the one that meets your requirements requires careful and detailed analysis. Businesses have to take multiple options in consideration in order to get a valid comparison between the alternatives. Most of the cloud providers like Amazon and Microsoft have already published whitepapers that can help in the process of gathering data for the appropriate comparison. Also they have implemented cost calculators that can help with the analysis of the decision makers. Failure to involve company security advisors early in the process during the cloud security assessment Some organizations require specific IT security policies and compliance and it is very important to include the company s security advisors in the cloud assessment process to help you with the decision. First the information needs to be classified. The organization data has to be evaluated, its value has to be understood, and what are the risks if the data is compromised. Key challenges and risks here are: 28

o Identification and correct classification of the data, o Where the data currently resides, o Is there are any obligation to store the data in specific jurisdiction For example Microsoft Azure, Amazon and Google allow the users to designate in which region the data is stored. o Clarify the options to retrieve the all the data from the cloud provider and the option to move it to a different provider. This also covers interoperabilty (being able to communicate and work with multiple cloud service providers) and poratbilty (to be able to move the system to a different cloud provider, not to be dependent on one cloud service provider). It must be taken in consideration that some of the cloud database offerings like for example Google Cloud SQL are only accessible through their platform Google AppEngine. Data security can be a big issue but if it is properly understood analyzed and classified, with the proper understanding of the risks and threats it can help to identify which databases can be moved in the cloud and which one should be kept in house. Technical architecture assessment Application dependency tree can help to identify which applications are suitable to be moved into the cloud. The main considerations here should be: o Will the cloud provide the entire infrastructure that we require? o Is it possible to reuse the management and configuration tools that we have? o Will it allow us to cancel the support contracts for software, network and hardware? Creating a dependency tree based on detailed examination of the construction of the enterprise applications will help to classify applications based on their dependences. This dependency tree should highlights all the different parts of the applications and identify their upward and downstream dependencies to other applications. This diagram should be an accurate snapshot of the enterprise application assets. In order to identify good candidates for the cloud we should look for applications with under-utilized assets; applications that have an immediate 29

business need to scale and are running out of capacity; applications that have architectural flexibility; applications that utilize traditional tape drives to backup data. Avoid applications that require specialized hardware to function (for example, mainframe or specialized encryption hardware). Another important point during this activity is to evaluate the possibility of migrating licensed products. For example Amazon is offering the possibility to Bring your own license. If the organization has purchased the license in the traditional way or already has a purchased license it can be applied to the products that are available as preconfigured Amazon Machine Image. [42] Similarly Microsoft is offering Azure VM with included license for the operating system and also VMs with included license for SQL server. Not dedicating a team. Expecting that the IT staff will be able to do their BAU job while moving to the cloud. A dedicated team should be created that will focus on the challenges that will come, overcome them and succeed. Inability to move or link legacy applications Focus on the applications that provide the maximum benefit for the minimum cost/risk. Assess legacy application compatibility and how much re-work is needed for their migration. Prioritize which applications to migrate to the cloud and in which order. Understanding the SLA - Small business owners usually do not have much experience with these types of agreements and by failing to review them fully, might open up big problems for the future. Business impact in the SLA must be carefully considered and analyzed. Close attention should be paid to the availability guarantees and penalty clauses: o Does the availability fit in with organization business model? o What do you need to do to receive the credits when the hosting provider failed to achieve the guaranteed service levels? o Are they automatically processes, or do you need to ask for them in writing? Usually the cloud providers have one SLA for all users and do not provide customization of the SLA. All of the above mentioned considerations must be evaluated carefully before moving to a cloud based solutions in order to mitigate the risk and be confident to choose the right cloud services that will support and insure growth of the business. 30

Proof of concept created Once the cloud assessment is compete and the possible candidates are identified it is time to test the cloud solution with a small proof of concept. The main goal here is to learn the chosen cloud provider/solution and to test the assumptions regarding sustainability for migration to the cloud are feasible and accurate. At a minimum you should get familiar with the APIs, Tools, SDKs, plugins that cloud provider is offering. It is a good idea at this time to deploy some small application as a test and in the process get really involved in the cloud. This can be easily achieved as most of the cloud providers offer some limited free account or free trial period. The proof of concept should represent the real application in small; it should test its critical functionalities in the cloud environment. You should start with a small database and users should not be afraid to play around with the offered possibilities, for example lunching and terminating instances, etc. In order to gather all the the necessary benchmarks stress testing of the cloud system should be included here too. During this building of proof of concept there is a lot that can be learn about the capabilities and applicability of the chosen cloud solution and it can quickly broaden the set of the applications that can be migrated. The proof of concept should raise the awareness of the power of the cloud within the organization and it can help to set expectation, validate the technology and perform the necessary benchmarks. It provides the hands-on experience with the new cloud environment and will give more inside what challenges you might face and need to overcome in order to move ahead with the migration. Possible challenges and risks might be: Unclear and misunderstood requirements, Lack of effective methodology to build the correct proof of concept failure to build appropriate proof of concept test that is missing some of the key functionalities 31

or/and is using data that is too simple and doesn t correspond with the real-time production data might be one of the biggest risk for the project. Poor estimation and failure to perform all the needed activities, Not documenting the lessons learned capturing the lessons learned in a form of whitepaper or a presentation and sharing it within the company is one of the most important deliverables. Data migration completed First and most important the different available storage options should be carefully evaluated. There several points that have to be considered to make sure the solution will meet the need for easily scaling the applications. Cost, query ability, relational SQL, size of the objects, update frequency, read vs write, consistency (strictly consistent vs eventual), all this point have to be taken in consideration and right tradeoffs have to be made. Creating table from the available storage options with examples for what can be used can be very beneficial. An example of such table, Table3, is presented in the Appendix section. At this point it should be decided how the data will be migrated, whether it will be migrated to the cloud native solution (for example for MySQL implementation can be migrated to Amazon RDS or Google SQL, MS SQL can be migrated to SQL Azure etc.) or the migration will be done to a VM preloaded with the desired product (Oracle, MS SQL, DB2 etc.). In a case where cloud native solution is chosen there might be need to develop new DB architecture specific for that cloud solution. (For example migrating to database.com will require complete re-engineering of the database). In case of migration of large amount of data (multiple terabytes) options such as Import/Export service that some cloud providers provide should be considered. For example Amazon with AWS Import/Export Service, offers the ability to load the data on USB 2.0 or esata storage devices and ship them via a carrier to AWS. AWS then uploads the data into the designated buckets in Amazon S3. [41] 32

Another point that it is very important at this time is to set the backups and retention period for the already migrated data and also consider the possibility to move backups that are done on tape to cloud based storage. Data migration challenges and risks include: Failure to identify the right storage option Failure to implement good migration strategy - usually underestimated and overlooked is the time that is needed to move an exsisting workload to the cloud. Tipcaly overlooked is the bandwith cost of moving large amount of data to the cloud provider. The time thaken to transfer the data and the business process involved in the migration. Cloud tested and leveraged After the data migration is complete and the data is successfully set and working in the cloud, tests are run and confirmed that everything is working, it is necessary to invest some time and resources in determining how to draw additional benefits from the cloud. The following questions can be asked: What needs to be changed in order to leverage and implement the scalability and elasticity that the cloud is offering? What processes can be automated for easier management and maintenance? What steps should be taken to secure the organization in the event of failure? Even though the data is migrated to the cloud you still have the responsibility for securing the data. Security best practices should be always implemented. Password should be changed on regular basis; Users should have restricted access to the resources; Users and groups with different access privileges should be created; It is advisable to encrypt the data no meter if it is at-rest (AES) or during transfer (SSL) 33

With moving into the cloud arena it is necessary to revise the software development lifecycle and upgrade process that is already in place. Having the possibility to request the infrastructure minutes before it is needed and the scriptable environment the software deployment process can be fully automated. The development, testing, staging and production environments can be managed by creating re-usable configuration tools and launching specific VMs for each environment on demand. The upgrade process can be also automated and simplified, with the cloud under our hands there is no need to upgrade the software version on the old machines, instead new pre-configured instances can be launched and old ones can be thrown away. This also gives the opportunity for a quick rollback, minimizing the downtime, in case of upgrade problems. It is highly advisable to create Business Continuity Plan as a part of this deliverable. The business continuity plan should include: Data replication strategy for the databases Data backup and retention policy Using VMs with latest patches implemented Disaster recovery plan in the cloud Disaster recovery plan to fail back to in-house or corporate center Smaller organizations usually do not have disaster recovery plan in place because it is prohibitively costly to maintain separate hardware or datacenter for disaster recovery. With the use of virtualization and data snapshots the cloud makes the disaster recovery plan implementation noticeably less expensive and much simpler. The process of launching cloud resources on which the entire cloud environment can be brought up within couple of minutes can be fully automated. Potential challenges and risk here may include: Discovering that substantial amount of refactoring and decomposing of the application needs to be done to make it more scalable Lack of understanding of the cloud Failure to realize the importance of creating business continuity plan. 34

Documentation created In order to achieve efficient planning, operation and reporting every solution require reliable and understandable documentation. The goal is to create central documentation for the implemented cloud solution. One of the main challenges is to preserve the knowledge gathered during the project and to keep it up to date. The information stored in the documentation is needed in the operation and continuous improvement and optimization of your application. The documentation can include on-line help, user guides, whitepapers, quick reference guides. The usual challenges and risks during this activity are: Failure to understand the importance of clear and complete documentation Lack of user commitment and willingness to write documentation Training provided Training involves imparting knowledge of the implemented solution to the users before the system goes into live operation. This activity entails defining business processes for the respective roles and defining business scenarios to suit these processes. These scenarios enable the users to understand the system functionality better. Also training should be provided for the application and database administrators. They will need to learn the tools and the automation possibilities that specific cloud providers offer. As it was mentioned earlier the cloud changes a lot of things from a point of view of management and maintenance of the system. In order to completely exploit all the possibilities that the cloud promises the IT support stuff should be properly educated. 35

Potential challenges and risk can be: Ineffective communication Lack of user commitment Lack of understanding of the cloud and its benefits Failure to provide proper training for the decision makers Decision makers must be able to understand the cloud and the functional and management changes that it brings in order to buy-in into the solution. Resistance to change Conflicts between departments Operation switched over/cutover to the cloud This means going live from the in-house solution to the new cloud based one. With this the cloud database and application became operational in live environment. This includes final migration of the live data from the old system to the new cloud based system. Typical challenges and risks that might be presented: Underestimating time that is needed to move an exsisting workload to the cloud The time thaken to transfer the data and the business process involved in the migration resulting in prolonged downtime Lack of testing with the whole production data transferred in the cloud Lack of business redines Failure of timelly education of the IT support stuff 36

Monitoring set and optimization complete Proper optimization of the cloud based solution can have immediate visible improvement in increasing the cost savings. Having the pay for what you use model in place you should always strive to optimize the system in whatever way possible. A small optimization can result in huge amount of savings in the next month bills. [42] To achieve this you should: Understand you usage patterns With the cloud ability to create automated elastic environment and good understanding of the usage patterns you can easily scale down your infrastructure during inactive time periods and reduce costs. For example with proper monitoring and log inspections you can easily identify underutilized instances and eliminate them or scale them down to a smaller and cheaper VM instance instead. Improve efficiency and reduce waste during deployment As all cloud providers charge based on the traffic, compressing the data before transmitting could result in significant cost savings. Evaluate if you have all the cloud aware system administration tools required for management and maintenance of the database and application Implement advance monitoring Proper monitoring gives the must have visibility for the business critical applications and services. It is important to keep in mind that the end-user response time of the databases and applications in the cloud does not depend only on the cloud infrastructure. Various factors such as internet connectivity, browsers, third party services, just to name a few, can have significant impact. By measuring and monitoring the performance of your cloud applications can help you identify any performance issues and diagnose the root causes so appropriate actions can be taken. 37

What other cloud based services you can use to reduce the cost? What need to be done to optimize database and application to be more elastic? Most of the databases and applications developed in the past were not built for the cloud. In order to have highly scalable application some parts will need reengineering to be optimized for the cloud environment. Some questions that you should ask: o Can you deploy the application to a VM in the cloud? o Can you run multiple instances of the application on multiple VMs? o Can you divide the application into components and run them on separate VM instances? For example, big complex web application can me divided by layers, Web, App and DB and you can have separate VM instance for each layer. o Decompose your relational database most traditional enterprise applications use relational database system. The traditional relational database systems are hard to scale and much time is wasted migrating to a bigger box with more computing power. Database administrators often start with a DB schema based on the instructions from developers. [42] Developers and database architects may fail to communicate with each other on what type of data is being served, which makes it extremely difficult to scale that relational database. During the movement to the cloud it gives an opportunity to analyze the current RDBMS and make it more scalable as a part of the migration. Some techniques that might be used are: Moving large blob object and media files to cloud storage and store just a pointer in the existing database Move associate metadata or catalogs to a NOSQL key-value stores such as DynamoDB, SimpleDB or document oriented databases such as CouchDB or MongoDB. In the appendix section there is more detailed explanation of their data model and comparison that can be useful while considering which one can be used. Keep only absolutely needed relational data (joins) in the relational database. 38

Move all the relational data to a cloud-native relational database system such as Amazon RDS, Salesforge Database.com, and Microsoft s SQL Azure that will provide the flexibility to be able to scale the database with a single API call only when needed. Create multiple read replicas to offload the read load Main challenge and risk at this point will be changing the mindset for all the future development to be cloud friendly. There are a lot of available whitepapers highlighting the best practices that will allow not just creating highly scalable applications for the cloud but also provide help and advice how to create more secure and elastic application. 39

8. Will cloud computing reduce the budget? A small business which owns and manages its own IT equipment sometimes fails to recognize that these equipment and their components will begin to deteriorate, over time and this might cause the system to crash or experience latency. Usually an entrepreneur will surely invest and upgrade this equipment and add extra redundancy without much thought. Also additional IT support personnel may be hired. This can truly become a vicious circle as the new equipment will deprecate and break down after couple of years and new additional investment will be needed. In general, IT is using a big part of the company s budget not only because of the expensive equipment but it also adds up the cost for maintenance and upgrade. Upgrades, security threats, and unexpected system crashes often cost a lot of money. [3] With cloud computing and its deployment on demand model, all of these IT expenses and capital investments are transferred to the third-party supplier. The business owner will just have to budget for the cloud provider monthly subscription fees. Because cloud computing can be deployed on demand, only when needed there is no need to invest in IT on expectation of a future demand. For better forecasting of an IT budget an entrepreneur can choose to go for a cloud computing service. Cloud computing simplifies budgeting. The business owner does not need to worry about complex expansion or merging projects because he is only paying for the resources his company uses at the moment. Also, when users are reduced, the cloud computing costs are reduced also. The traditional IT process of purchase, installation, management, protection, and support of an in house system contradicts the company s goal of reducing recurring expenses and can be a vicious circle. Cloud computing resources and services are used only when needed which greatly reduces recurrent expenses and leverages the company in easy adaptation to frequently and fast and ever evolving market conditions. With cloud computing, an entrepreneur can better manage risks and uncertainties. Because of a growing demand, a lot of companies overinvest in Information Technology which eventually increases expenses and uncertainties of maintenance and IT management thus exposing the company to a higher risk. [42] Cloud computing providers reduce the company s dependency on on-premises IT systems thereby assuming the costs and connected risks of IT support, hardware, backups and security. The business owner, therefore, has no more responsibility in purchase, management, and upgrade of IT equipment. Growth opportunities can then be pursued without having to bear the uncertainties of important capital investments. [42] One of the most overlooked benefits by the small business owner is the reduction in 40

energy costs that cloud computing is delivering. The fact that the company has less IT equipment to maintain and power greatly reduces the energy bill. When owners decide to use cloud computing services, expensive IT equipment are moved to a safe, monitored, and disaster proofed data center and energy bills are reduced. [42] Cloud computing infrastructure usually can be access through Internet, this allows employees to do their work anywhere and anytime they want. They can work from home or any remote location that has internet connection. This also improves the employee s morale and travel time and costs are significantly reduced. Each employee that has access to the software can use the cloud computing provider support team with regards to the problems which might arise while he is using the system. Most of the cloud providers provide management consoles than can allow management to remotely monitor each employee s activity. 41

9. Conclusion Database management system, for a long time has been an integral part of the computing. As the whole IT world is moving to the cloud whether you are assembling, managing or developing on a cloud computing platform, you need a cloud compatible database. In this work I gave a general scope breakdown structure of a project for moving the database and application in general to the cloud. The presented SBS should present a good starting point for any type of migration to the cloud. In the appendix section there is also presented comparisons between couple of the currently available companies that offer database as a service in the cloud, available storage options and a short explanation of different NOSQL data models. Although cloud native solutions differ from the most widely used traditional relational database systems and most of them might require revision and recoding of the existing applications, it is obvious that they bring a lot of benefits especially with the offer for fully managed and automated database administration tuning and optimization. Cloud database system are built to use the power of the cloud, they are extremely scalable and elastic, giving the opportunity to start small and expand as you need mitigating the risk and uncertainties of investing in IT equipment and professional IT support. Cloud computing in general, with the flexible pricing models and different plans it presents the one of the best solutions for startup and small companies that are developing new products and does not have the financial power to risk and invest in uncertain projects. The cloud database solution provides an ideal solution for web and mobile application. The fact that most of the DBaaS offerings are tightly integrated with other PaaS gives the organization the opportunity to focus on developing their products and do not waste any resources on administration of the platform and gives an opportunity to fully focus on the development of the product. Despite the benefits offered by cloud-based DBMS and the cloud in general, many people still have apprehensions about them. This is most likely due to the various security issues that have yet to be dealt with. Storing and entrusting security of critical business data in the cloud, to a third party, where the data will be spread on multiple hardware stacks and across multiple data centers can be a big security issue. In my opinion, maybe the cloud is still not ready to be used to move critical enterprise applications which store highly sensitive data but is definitely ready to be used for testing and development of new projects. With the proper preparation and planning the process of the migration to the cloud can be easy and beneficial, not just from financial point of view but also in the might trigger the long needed revision and re-engineering of some legacy applications and making them more elastic. 42

Many companies including some of the huge multinational corporations have already moved to cloud computing because it is less expensive, more efficient, and agile as compared to onsite IT systems. Therefore, small and medium scale enterprise must follow suit. If cloud computing can and is proven to work for these big enterprises, it will surely work for small and medium enterprises. 43

10. Appendix Some of the currently available RDBM DBaaS, comparison and common considerations The offering for Relational Database as a Service (DBaaS) is currently found in the public marketplace in two broad capabilities - online general relational databases, and the ability to operate virtual machine images loaded with common databases such as MySQL, Oracle or similar commercial databases. Database.com offers relational multitenant database specially build for the cloud using their metadata-driven architecture. Microsoft AzureSQL offers SQL Server like relational database management system and controls many of the database configuration details allowing the users to focus on the schema, data and application layer. Amazon RDS provides implementation of MySQL or Oracle on virtual machine build and tune for that purpose and Google also has their cloud SQL providing MySQL for their AppEngine PaaS. While the all presented RDBMS DBaaS provide an opportunity to reduce cost there are many consideration to taken before moving the data to a cloud based solution. Figure 11 presents the main considerations comparison. Data Sizing - All of the RDBMS DBaaS offerings presented have limits on the size of the data set that can be stored on their systems. Portability - Portability and adherence to standards is a critical issue for ensuring Continuity of Operations and to mitigate business risk (e.g., a provider going out of business or raising rates). The ability to instantiate a replicated version of the data off-cloud or in another cloud offering can provide the business owners with an extra level of assurance that they will not suffer a loss of data. This can be facilitated by standards, such as the use of a standard database query language (SQL). Transaction Capabilities - Transaction capabilities are an essential feature for databases that need to provide guaranteed reads and writes (ACID). 44

Table 2 Main Considerations comparison Salesforce Database.com Microsoft SQL Azure Amazon RDS (MySQL or Oracle) Google Cloud SQL Maximum amount of data that can be stored Maximum data is limited by number of records per database. Up to 22300000 records. 5gb with web edition database and up to 150GB with business edition database 1 terabyte per database instance. 100GB per database instance. Ease of software portability with similar locally hosted capability Low. Requires database to be specially built and tested by Salesforce before deployment. High. Most SQL Server features are available in SQL Azure. High. MySQL/Oracle instantiation in cloud is very similar to the local instantiated version. Medium. MySQL instance in the cloud very similar to the local instance but accessible only by Google App Engine Transaction capabilities Yes Yes Yes Yes Configurability and ability to tune databases Low. It creates indexes automatically and keeps record of most recently accessed records but does not allow control over it. Also does not allow control over memory allocation and similar resources. Medium. Can create indexes and stored procedures, but no control over memory allocation or similar resources. High. MySQL/Oracle instantiation in cloud on virtual machine. Low. Automatically tuned. Database accessible as stand-alone offering. Yes Yes Yes No. Requires Google App Engine application layer. Possibility to designate where the data is stored (ex. Region or data center) No Yes Yes Yes Replication No Yes Yes Yes 45

Configurability - DBaaS offerings may provide capabilities that reduce the amount of configuration options available to database administrators. For some applications, if more configurability options are managed by the platform owner rather than the customer s database administrator, this can be a benefit and it can reduce the amount of effort expended to maintain the database. For others, the inability to tune and control all aspects of the database, such as memory management, can be a limiting constraint in obtaining performance. Database Accessibility - Most DBaaSs offer a predefined set of connectivity mechanisms that will directly impact adoption and use. There are three general approaches. First, Most RDBMS offerings are typically accessible through industry standard database drivers such as Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC). These drivers allow for applications external to the service to access the database through a standard connection, facilitating interoperability. Second, services typically provide interfaces that use standardsbased, Service-Oriented Architecture (SOA) protocols, such as SOAP or REST, with Hypertext Transfer Protocol (HTTP) and a vendor-specific API definition. These services may provide software development kits in common source-code languages to facilitate the adoption. Third, some databases may be restricted to accessing data through software running in the vendor s ecosystem. This approach may increase security, but it also significantly limits portability and interoperability. Availability and Replication - the ability to ensure that data is available and not lost will be a key consideration. Ensuring access to data can come through enforcement of service-level agreements (SLA) metrics such as up time, replication across a cloud provider s regions, and replication or movement of the data across cloud providers or to the consuming organization s data center. Replication across a cloud provider s hardware within a region may ameliorate the effects of a localized hardware or software failure. Replication across a cloud provider s geographic regions may ameliorate the effects of a network outage, natural disaster, or other regional event. Replication across multiple cloud providers or back to the consuming organization s IT infrastructure may provide the most continuity of operation benefit through full geographic and IT stack independence. Many providers such as Microsoft and Amazon offer replication of the data across hardware within a specific region as part of a packaged service. Within a given vendor, replication across geographies is usually more expensive and may result in significant data transfer fees. 46

Understand various available storage options Creating a table similar to the one bellow can help in understanding the various available storage options and their use. Table 3 Data storage options Amazon S3 and Microsoft BLOB Amazon SimpleDB Amazon RDS and SQL Azure MongoDB, CouchDB Good for Storing large writeonce, read-many types of objects, Static Content Distribution Query-able lightweight attribute Data Storing and querying structured relational and referential data Query-able documentoriented databases with schemaless JSON-style object data storage Good Examples Media files, audio, video, images, Backups, archives, versioning Querying, Indexing Mapping, tagging, click-stream logs, metadata, Configuration, catalog Web apps, Complex transactional systems, inventory management and Querying, Indexing Mapping, document retrieval based on their contents. order fulfillment systems Not recomended Querying, Searching Complex joins or transactions, BLOBs Relational, Typed data Clusters Complex joins or transactions, BLOBs Relational, Typed data Not recommended examples Database, File Systems OLTP, DW cube rollups Clustered DB, Simple lookups OLTP, DW cube rollups 47

NOSQL options data models In this section the data models of some of the available no sql solutions are presented. The intention of this section is to help in the decision which solution is most suitable during the relational data decomposition and re-engineering. Amazon DynamoDB DataModel 2 Amazon DynamoDB organizes data into tables containing items, and each item has one or more attributes. Attributes An attribute is a name-value pair. The name must be a string, but the value can be a string, number, string set, or number set. The following are all examples of attributes: "ImageID" = 1 "Title" = "flower" "Tags" = "flower", "jasmine", "white" "Ratings" = 3, 4, 2 Item A collection of attributes forms an item, and the item is identified by its primary key. An item's attributes are a collection of name-value pairs, in any order. The item attributes can be sparse, unrelated to the attributes of another item in the same table, and are optional (except for the primary key attribute). The table has no schema other than its reliance on the primary key. Items are stored in a table. The primary key uniquely identifies an item for a DynamoDB table. In the following diagram, Figure 9, the ImageID is the attribute designated as the primary key: 2 This section and examples are taken from official product documentation http://aws.amazon.com/dynamodb/ 48

Figure 5 Diagram of DynamoDB Data Model [18] Notice that the table has a name, "my table", but the item does not have a name. The primary key defines the item; the item with primary key "ImageID"=1. [18] Tables Tables contain items, and organize information into discrete areas. All items in the table have the same primary key scheme. Attribute name (or names) to be used for the primary key are designated when a table is created, and the table requires each item in the table to have a unique primary key value. The first step in writing data to DynamoDB is to create a table and designate a table name with a primary key. The following is a larger table that also uses the ImageID as the primary key to identify items. DyanomoDB also allows specifying a composite primary key which enable specifying two attributes in a table that collectively form a unique primary index. All items in the table must have both attributes. One serves as a hash partition attribute and the other as a range attribute. For example, there might be a Status Updates table with a composite primary key composed of UserID (hash attribute, used to partition the workload across multiple servers) and a Time (range attribute). Then query can be executed to fetch either: 1) a particular item uniquely identified by the combination of UserID and Time values; 2) all of the items for a particular hash bucket in this case UserID; or 3) all of the items for a particular UserID within a particular time range. Range queries against Time are only supported when the UserID hash bucket is specified. [18] 49

Table: My Images Primary Key Other Attributes ImageID = 1 ImageLocation = https://s3.amazonaws.com/bucket/img_1.jpg Date = 1260653179 Title = flower Tags = Width = 1024 Depth = Flower, Jasmine 768 ImageID = 2 ImageLocation = https://s3.amazonaws.com/bucket/img_2.jpg Date = 1252617979 Rated = 3, 4, 2 Tags = Width = 1024 Depth = Work, Seattle, 768 Office ImageID = 3 ImageLocation = https://s3.amazonaws.com/bucket/img_3.jpg Date = 1285277179 Price = 10.25 Tags = Author = you Camera = Seattle, Grocery, phone Store ImageID = 4 ImageLocation = https://s3.amazonaws.com/bucket/img_4.jpg Date = 1282598779 Title = Author = Colors = orange, Tags = Hawaii Joe blue, yellow beach, blanket, ball Figure 6 DynamoDB Table 50

Amazon SimpleDB 3 SimpleDB another NOSQL DBaaS offered by Amazon. The data model used by Amazon SimpleDB makes it easy to store, manage and query structured data. Data-sets are organized into domains and can run queries across all of the data stored in a particular domain. Domains are collections of items that are described by attribute-value pairs. For example, if we take details of a customer management database shown in the table below and consider how they would be represented in Amazon SimpleDB. The whole table would be domain named customers. Individual customers would be rows in the table or items in the domain. The contact information would be described by column headers (attributes). Values are in individual cells. CustomerID First Last Street City State Zip Telephone name name Address 123 Bob Smith 123 Main St Springfield MO 65801 222-333-4444 456 James Johnson 456 Front St Seattle WA 98104 333-444-5555 Amazon SimpleDB differs from tables of traditional databases in important ways. It offers the flexibility to easily go back later on and add new attributes that only apply to certain records. For example, adding customers email addresses to enable real-time alerts on order status it is possible to add the new records and any additional attributes to the existing customers domain. The resulting domain might look something like this: CustomerID First name Last name Street address City State Zip Telephone Email 123 Bob Smith 123 Main St Springfield MO 65801 222-333-4444 456 James Johnson 456 Front St Seattle WA 98104 333-444-5555 789 Deborah Thomas 789 Garfield New York NY 10001 444-555-6666 dthomas@xyz.com 3 This section and examples are taken from official product documentation http://aws.amazon.com/simpledb/ 51

Document oriented database A document oriented database or data store does not use tables for storing data. It stores each record as a document with certain characteristics. Documents inside a documentoriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid [43]. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like [24][25]. For example here's a document: FirstName:"Bob", Address:"5 Oak St.", Hobby:"sailing". Another document could be: FirstName:"Jonathan", Address:"15 Wanamassa Point Road", Children:[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}]. Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it does not require explicitly stating if other pieces of information are left out [43]. The benefit would be that if you are using a document oriented database for storing a large number of records in a huge database, any change in the number or type of row does not need an alter on the table. All it is needed is to do is insert new documents with new structure and it is automatically inserted to the current datastore. Documents in the database are addressed via a unique key. This key is a simple string, it can be URI or path. Regardless, the document from the database can be retrieved by this key. Typically, the database retains an index on the key such that document retrieval is fast. One of the other defining characteristics of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow document retrieval based on their contents. [43] For example, you may want a query that gets you all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next. 52

Implementations offer a variety of ways of organizing documents, including notions of Collections Tags Non-visible Metadata Directory hierarchies MongoDB and CouchDB are document-oriented databases with schemaless JSON-style object data storage. The Table below shows comparison between these two databases. MongoDB CouchDB Data Model Interface Document-Oriented (JSON) HTTP/REST Document-Oriented (BSON) Native Drivers; REST Large Objects (Files) Yes (attachments) Yes (GRIDFS) Horizontal Partitioning scheme BigCouch, CouchDB Lounge, Auto-sharding Pillow Object Storage Database contains Documents Database contains collections Collection contains documents Query Method Map/Reduce (javascript + others) creating Views + range queries Map/Reduce (javascript) creating collections + objectbased query language Replication Master-master with custom Master-Slave conflict resolution function Concurrency MVCC (Multi Version Update in-place Concurrency Control) Distributed Consistency Eventually consistent Strong consistency. Eventually consistent reads from secondary replicas Written in Erlang C++ 53

54

List of Abbreviations IaaS Infrastructure as a service PaaS Platform as a service SaaS Software as a service DBaaS Database as a service RDBMS Relational database management system WBS Work breakdown structure SBS Scope breakdown structure VM Virtual machine SOA Service oriented architecture WSDL Web services description language SOAP Simple object access protocol UDDI Universal Discovery, Description, and Integration AWS Amazon web services Amazon EC2 Amazon Elastic Cloud 2 VPN Virtual private network IPMA International Project Management Association ICB IPMA Competence Baseline API Application programming interface SDK Software development kit SLA Service level agreement JSON JavaScript Object Notation BSON Binary JSON 55

References 3. Cloud Computing Bible - Barrie Sosinsky, Janury 2012. ISBN: 978-0-470-90356-8 4. http://csrc.nist.gov/publications/nistpubs/800-145/sp800-145.pdf 5. Introduction to cloud computing - Ivanka Menken, Emereo Publishing 2011 6. Understanding PaaS - Michael P. McGrath, O'Reilly Media January 2012 7. Data Management Challenges in Cloud Computing Infrastructures - Divyakant Agrawal, A. E., University of California, Santa Barbara. 8. Database Scalability, Elasticity, and Autonomy in the Cloud - Divyakant Agrawal, A. E., Department of Computer Science, University of California at Santa Barbara. 9. Cloud Computing: Principles, Systems and Applications - Gillam, N. A., Springer 2010 10. http://relationalcloud.com/index.php?title=database_as_a_service 11. The multitenant, metadata-driven architecture of Database.com - Database.com Getting Started Series White Paper 12. Megastore: Providing Scalable, Highly Available Storage for Interactive Services - Jason Baker, C. B.-M. http://pdos.csail.mit.edu/6.824-2012/papers/jbaker-megastore.pdf 13. Inside SQL Azure. Microsoft TechNet. http://social.technet.microsoft.com/wiki/contents/articles/1695.inside-windowsazure-sql-database.aspx 14. https://www.windowsazure.com/en-us/home/features/data-management/ 15. https://www.windowsazure.com/en-us/pricing/details/#storage 16. http://aws.amazon.com/rds/ 17. https://developers.google.com/appengine/docs 18. http://en.wikipedia.org/wiki/paxos_algorithm 19. Werner Vogels' weblog on building scalable and robust distributed systems http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html 20. http://aws.amazon.com/dynamodb/ 56

21. http://www.databasejournal.com/features/mssql/article.php/3823471/cloud- Computing-with-Google-DataStore.htm 22. Google AppEngine Documents https://developers.google.com/appengine/docs/java/overview - Product page 23. Google AppEngine Documents https://developers.google.com/appengine/docs/phyton/overview - Product page 24. Google AppEngine Documents https://developers.google.com/appengine/docs/python/datastore/gqlreference 25. MongoDB - http://www.mongodb.org/ - Product Page 26. MongoDB blog: http://blog.mongodb.org Product Blog 27. Cloudant Blog http://blog.cloudant.com/cloudant-bigcouch-is-open-source - Product Blog 28. http://bsonspec.org/ 29. http://wiki.apache.org/couchdb/ Product wiki 30. http://www.mongolab.com Product page 31. Technical Overview: Anatomy of the Cloudant Data Layer Service - 2012 Cloudant, Inc. 32. http://bigcouch.cloudant.com/ 33. Building Scalable Database Solution with SQL Azure - Introducing Federation in SQL Azure. http://blogs.msdn.com 34. White Paper - Top Ten Data Management Trends - Scalability Experts - Raj Gill, Y. B. 35. http://nosql.mypopescu.com/post/1669537044/sql-and-nosql-in-the-cloud 36. White Paper - NOSQL for the Enterprise - Neo Technology (2011) 37. White Paper - Database as a Cloud Service - Scalability Experts - Wolter, R. (2011) 38. http://www.netmba.com/operations/project/wbs/ 39. http://aws.amazon.com/importexport/ 40. Amazon Web Services - Migrating Your Existing Applications to the AWS Cloud (October 2010) 57

41. http://en.wikipedia.org/wiki/document-oriented_database 42. http://www.cloudtweaks.com/2012/03/benefits-of-cloud-computing-to-growingsmall-companies/ 58