Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344



Similar documents
References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Cloud computing - Architecting in the cloud

Scalable Architecture on Amazon AWS Cloud

A Survey on Cloud Storage Systems

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Users VM A A A. Application. Compute/Storage/Network. VM Virtual Machine. On-Premises Data Center

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Shadi Khalifa Database Systems Laboratory (DSL)

A programming model in Cloud: MapReduce

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

Storing and Processing Sensor Networks Data in Public Clouds

Scaling in the Cloud with AWS. By: Eli White (CTO & mojolive) eliw.com - mojolive.com

Cloud Courses Description

Platforms in the Cloud

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

Implement Hadoop jobs to extract business value from large and varied data sets

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Oracle Database Cloud Service Rick Greenwald, Director, Product Management, Database Cloud

Cloud Computing. Chapter 1 Introducing Cloud Computing

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Amazon Web Services Student Tutorial

CLOUD DATABASE DATABASE AS A SERVICE

Cloud Computing. Technologies and Types

Cloud Courses Description

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Cloud Computing Now and the Future Development of the IaaS

Cloud Computing Summary and Preparation for Examination

Cloud Computing. Chapter 1 Introducing Cloud Computing

Preparing Your Data For Cloud

2.1.5 Storing your application s structured data in a cloud database

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing. Adam Barker

WINDOWS AZURE DATA MANAGEMENT

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

What is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection

Can the Elephants Handle the NoSQL Onslaught?

Oracle Cloud Update November 2, Eric Frank Oracle Sales Consultant. Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Jeffrey D. Ullman slides. MapReduce for data intensive computing

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

Cloud Computing an introduction

Data Management in the Cloud

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Cloud Computing Trends

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Microsoft Azure Data Technologies: An Overview

Cloud Computing. Lecture 24 Cloud Platform Comparison

Introduction to Azure: Microsoft s Cloud OS

Assignment # 1 (Cloud Computing Security)

Databases in the Cloud

DLT Solutions and Amazon Web Services

Last time. Today. IaaS Providers. Amazon Web Services, overview

Data processing goes big

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Cloud Models and Platforms

NoSQL Database Options

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

How To Choose Between A Relational Database Service From Aws.Com

Cloud Computing For Bioinformatics

Technical Writing - Definition of Cloud A Rational Perspective

How AWS Pricing Works

Web Application Hosting in the AWS Cloud Best Practices

Research Paper Available online at: A COMPARATIVE STUDY OF CLOUD COMPUTING SERVICE PROVIDERS

NoSQL and Hadoop Technologies On Oracle Cloud

Public Cloud Offerings and Private Cloud Options. Week 2 Lecture 4. M. Ali Babar

ur skills.com

Cloud Computing: Making the right choices

A Web Base Information System Using Cloud Computing

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

The Cloud at Crawford. Evaluating the pros and cons of cloud computing and its use in claims management

CloudFTP: A free Storage Cloud

Real Time Big Data Processing


Virtualization and Cloud Computing

Lecture Data Warehouse Systems

The Cloud to the rescue!

Cloud Service Models. Seminar Cloud Computing and Web Services. Eeva Savolainen

Large-Scale Web Applications

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

A SHORT INTRODUCTION TO CLOUD PLATFORMS

Lecture 02a Cloud Computing I

Amazon Relational Database Service (RDS)

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

How AWS Pricing Works May 2015

Oracle Applications and Cloud Computing - Future Direction

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

CLOUD COMPUTING. When It's smarter to rent than to buy

Open Source Technologies on Microsoft Azure

Transcription:

Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL Data integration Data cleaning Magda Balazinska - CSE 344 Fall 2011 1 Strongly encouraged to take 444 to learn more! Magda Balazinska - CSE 344 Fall 2011 2 References Amazon SimpleDB, RDS, Elastic MapReduce Websites Part of Amazon Web services Google App Engine Datastore Website Part of the Google App Engine Microsoft SQL Azure Part of Windows Azure Very dynamic space! Need to check docs regularly! Magda Balazinska - CSE 344 Fall 2011 3 Cloud Computing A definition Style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet Basic idea Developer focuses on application logic Infrastructure, software, and data hosted by someone else in their cloud Hence all operations tasks handled by cloud service provider 4 Cloud Computing History Computation may someday be organized as a public utility (John McCarthy 1960) Late 1990 s: Infrastructure as a Service (i.e., rent machines) Late 1990s : Software as a service (e.g., Hotmail, Salesforce) Early 2000s: Web services 2006: Amazon Web Services And now it s a craze! Levels of Service Infrastructure as a Service (IaaS) Example Amazon EC2 Platform as a Service (PaaS) Example Microsoft Azure, Google App Engine Software as a Service (SaaS) Example Google Docs 5 Magda Balazinska - CSE 344 Fall 2011 6 1

How About Data Management as a Service? Running a DBMS is challenging Need to hire a skilled database administrator (DBA) Need to provision machines (hardware, software, configuration) If business picks up, may need to scale quickly Workload varies over time Solution: Use a DBMS service All machines are hosted in service provider s data centers Data resides in those data centers Pay-per-use policy Elastic scalability No administration! Magda Balazinska - CSE 344 Fall 2011 7 Basic Features for Data Management as a Service Data storage and query capabilities Operations and administration tasks handled by provider Include high availability, upgrades, etc. Elastic scalability: Clients pay exactly for the resources they consume; consumption can grow/shrink dynamically No capital expenditures and fast provisioning Magda Balazinska - CSE 344 Fall 2011 8 Types of Data Management as a Service Three different types exist at the moment Relational data management systems (e.g., SQL Azure) Simplified data mgmt systems (e.g., Amazon SimpleDB) Also called NoSQL systems. We will see why in a few slides Analysis services such as Amazon Elastic MapReduce Outline Overview of three systems Amazon Web Services with SimpleDB RDS, and Elastic MapReduce Google App Engine with the Google App Engine Datastore Microsoft Azure platform with Azure SQL Discussion Technical challenges behind databases as a service Broader impacts of databases as a service Magda Balazinska - CSE 344 Fall 2011 9 Magda Balazinska - CSE 344 Fall 2011 10 Amazon Web Services Since 2006 Infrastructure web services platform in the cloud Amazon Elastic Compute Cloud (Amazon EC2 ) Amazon Simple Storage Service (Amazon S3 ) Amazon SimpleDB Amazon Elastic MapReduce And more And growing Amazon EC2 Amazon Elastic Compute Cloud (Amazon EC2 ) Rent compute power on demand ( server instances ) Select required capacity: small, large, or extra large instance Share resources with other users (multitenant): Virtual machines Variety of operating systems Includes: Amazon Elastic Block Store Off-instance storage that persists independent from life of instance Highly available and highly reliable Magda Balazinska - CSE 344 Fall 2011 11 Magda Balazinska - CSE 344 Fall 2011 12 2

Amazon S3 Amazon RDS Amazon Simple Storage Service (Amazon S3 ) Storage for the Internet Web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. Some key features Write, read, and delete uniquely identified objects containing from 1 byte to 5 TB of data each Objects are stored in buckets. User chooses geographic area A bucket can be accessed from anywhere Authentication Reliability Amazon Relational DB Service (Amazon RDS TM ) Web service that facilitates set up, operations, and scaling of a relational database in the cloud Full capabilities of a familiar MySQL or Oracle DBMS Some key features Automated patches of DBMS Automated backups for user-defined retention period Elastic scalability but can only scale-up Make your instance more powerful (CPU and memory) Attach more storage to your instance Can scale-out only by adding read replicas Magda Balazinska - CSE 344 Fall 2011 13 Magda Balazinska - CSE 344 Fall 2011 14 NoSQL Motivation NoSQL Systems Scaling a relational DBMS is hard We saw how to scale queries with parallel DBMSs Much more difficult to scale transactions Need to partition the database across multiple machines If a transaction touches one machine, life is good If a transaction touches multiple machines, ACID becomes extremely expensive! Need what is called two-phase commit Replication Replication can also help to increase throughput Create multiple copies of each database partition Spread queries across these replicas Easy for reads but writes, once again, become expensive! Magda Balazinska - CSE 344 Fall 2011 15 Goal: elastic and highly scalable data management Basic data storage, basic querying, and atomic updates More flexible than a relational DBMS: no fixed schema! Highly scalable! But to scale-out, give up on complex queries No joins (or limited joins) Gives up on ACID: instead eventually consistent No transactions! Or limited transactions Caveat: Hard to build apps without ACID guarantees Today: Many NoSQL systems provide choice between strong consistency and eventual consistency Magda Balazinska - CSE 344 Fall 2011 16 Amazon SimpleDB An example of a NoSQL data management system Partitioning Data partitioned into domains: queries run within domain Domains seem to be unit of replication. Limit 10GB Can use domains to manually create parallelism Schema No fixed schema Objects are defined with attribute-value pairs Amazon SimpleDB (2/3) Indexing Automatically indexes all attributes Support for writing PUT and DELETE items in a domain Support for querying GET by key Selection + sort A simple form of aggregation: count select output_list from domain_name [where expression] [sort_instructions] [limit limit] Query is limited to 5s and 1MB output (but can continue) Magda Balazinska - CSE 344 Fall 2011 17 Magda Balazinska - CSE 344 Fall 2011 18 3

Amazon SimpleDB (3/3) Amazon Elastic MapReduce Availability and consistency Fully indexed data is stored redundantly across multiple servers and data centers Takes time for the update to propagate to all storage locations. The data will eventually be consistent, but an immediate read might not show the change Today, can choose between consistent or eventually consistent read Integration with other services Developers can run their applications in Amazon EC2 and store their data objects in Amazon S3. Amazon SimpleDB can then be used to query the object metadata from within the application in Amazon EC2 and return pointers to the objects stored in Amazon S3. Magda Balazinska - CSE 344 Fall 2011 19 Web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data Hosted Hadoop framework on top of EC2 and S3 Support for Hive and Pig User specifies Data location in S3 Query Number of machines System sets-up the cluster, runs query, and shuts down Magda Balazinska - CSE 344 Fall 2011 20 Google App Engine Run your web applications on Google's infrastructure Limitation: app must be written in Python or Java Key features (examples for Java) A complete development stack that uses familiar technologies to build and host web applications Includes: Java 6 JVM, a Java Servlets interface, and support for standard interfaces to the App Engine scalable datastore and services, such as JDO, JPA, JavaMail, and Jcache JVM runs in a secured "sandbox" environment to isolate your application for service and security (some ops not allowed) Magda Balazinska - CSE 344 Fall 2011 21 Google App Engine Datastore (1/3) Distributed data storage service that features a query engine and transactions Partitioning Data partitioned into entity groups Entities of the same group are stored together for efficient execution of transactions Schema Each entity has a key and properties that can be either Named values of one of several supported data types (includes list) References to other entities Flexible schema: different entities can have different properties Magda Balazinska - CSE 344 Fall 2011 22 Google App Engine Datastore (2/3) Indexing Applications define indexes: must have one index per query type Support for writing PUT and DELETE entities (for Java, hidden behind JDO) Support for querying GET an entity using its key Execute a query: selection + sort Language bindings: invoke methods or write SQL-like queries Lazy query evaluation: query executes when user accesses results Google App Engine Datastore (3/3) Availability and consistency Every datastore write operation (put/delete) is atomic Outside of transactions, get READ_COMMITTED isolation Support transactions (many ops on many objects) Single-group transactions Cross-group transactions with up to 5 groups Transactions use snapshot isolation Interesting details on transaction implementation: see 444 Magda Balazinska - CSE 344 Fall 2011 23 Magda Balazinska - CSE 344 Fall 2011 24 4

Microsoft Azure Platform Internet-scale cloud computing and services platform Provides an operating system and a set of developer services that can be used individually or together SQL Azure Cloud-based relational database service built on SQL Server technologies Key features Highly available, scalable, multitenant database service Includes authentication and authorization No administration Full-featured DBMS Key limitation Only 50 GB at the moment Magda Balazinska - CSE 344 Fall 2011 25 Magda Balazinska - CSE 344 Fall 2011 26 Outline Overview of three systems Amazon Web Services with SimpleDB RDS, and Elastic MapReduce Google App Engine with the Google App Engine Datastore Microsoft Azure platform with Azure SQL Discussion Technical challenges behind databases as a service Broader impacts of databases as a service Challenges of DBMS as a Service Scalability requirements Large data volumes and large numbers of clients Variable and heavy workloads High performance requirements: interactive web services Consistency and high availability guarantees Service Level Agreements Security Magda Balazinska - CSE 344 Fall 2011 27 Magda Balazinska - CSE 344 Fall 2011 28 Broader Impacts Cost-effective solution for building web services Content providers focus only on their application logic Service providers take care of administration Service providers take care of operations Security/privacy concerns: all data stored in data centers Magda Balazinska - CSE 344 Fall 2011 29 5