Lecture 6 Cloud Application Development, using Google App Engine as an example



Similar documents
A programming model in Cloud: MapReduce

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Efficient Network Marketing - Fabien Hermenier A.M.a.a.a.C.

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Introduction to Azure: Microsoft s Cloud OS

The Cloud to the rescue!

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

JBoss & Infinispan open source data grids for the cloud era

Large-Scale Web Applications

Cloud Computing at Google. Architecture

Cloud Computing. Up until now

Introduction to Database Systems CSE 444. Lecture 24: Databases as a Service

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

Cloud Computing. Up until now

Cloud Computing Is In Your Future

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing

appscale: open-source platform-level cloud computing

Hypertable Architecture Overview

Drupal in the Cloud. Scaling with Drupal and Amazon Web Services. Northern Virginia Drupal Meetup

Last time. Today. IaaS Providers. Amazon Web Services, overview

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Windows Azure and private cloud

Google Cloud Platform The basics

Datacenter Operating Systems

Cloud Computing. Lecture 24 Cloud Platform Comparison

THE WINDOWS AZURE PROGRAMMING MODEL

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cloud Computing Technology

Cloud Computing with Microsoft Azure

E-Business Technology

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Cloud Computing mit mathematischen Anwendungen

Eclipse Exam Tutorial - Pros and Cons

Distributed storage for structured data

Research Paper Available online at: A COMPARATIVE STUDY OF CLOUD COMPUTING SERVICE PROVIDERS

Accelerating Wordpress for Pagerank and Profit

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Data Management in the Cloud

Building Scalable Applications Using Microsoft Technologies

SOA, case Google. Faculty of technology management Information Technology Service Oriented Communications CT30A8901.

Cloud Computing an introduction

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

Platforms in the Cloud

Cloud computing - Architecting in the cloud

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

Future Prospects of Scalable Cloud Computing

Assignment # 1 (Cloud Computing Security)

Cloud Computing Summary and Preparation for Examination

Cloud Computing Trends

Cloud Computing Now and the Future Development of the IaaS

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

PaaS - Platform as a Service Google App Engine

Cloud Service Models. Seminar Cloud Computing and Web Services. Eeva Savolainen

Security Guide. BlackBerry Enterprise Service 12. for ios, Android, and Windows Phone. Version 12.0

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Cloud Computing. Chapter 3 Platform as a Service (PaaS)

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Evolution into PaaS Hardware-centric vs. API-centric Platform as a Service (PaaS) is higher level

Oracle Database Cloud Service Rick Greenwald, Director, Product Management, Database Cloud

Scaling Progress OpenEdge Appservers. Syed Irfan Pasha Principal QA Engineer Progress Software

Web Technologies: RAMCloud and Fiz. John Ousterhout Stanford University

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

In Memory Accelerator for MongoDB

Introduction to Cloud Computing

Kentico CMS 6.0 Performance Test Report. Kentico CMS 6.0. Performance Test Report February 2012 ANOTHER SUBTITLE


Web 2.0 Technology Overview. Lecture 8 GSL Peru 2014

Learning Management Redefined. Acadox Infrastructure & Architecture

Outline. Failure Types

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Eucalyptus User Console Guide

Public Cloud Offerings and Private Cloud Options. Week 2 Lecture 4. M. Ali Babar

CLOUD COMPUTING. When It's smarter to rent than to buy

Preparing Your Data For Cloud

Cloudy with a chance of 0-day

Getting Started with AWS. Hosting a Static Website

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

Google App Engine. Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008

Cloud Computing. Technologies and Types

Chapter 1 - Web Server Management and Cluster Topology

Ground up Introduction to In-Memory Data (Grids)

Version 1.0 January Xerox Phaser 3635MFP Extensible Interface Platform

Decision Support System Software Asset Management (SAM)

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal

Cloud Computing: Making the right choices

Cloud Hosting. QCLUG presentation - Aaron Johnson. Amazon AWS Heroku OpenShift

Networks and Services

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Amazon Cloud Storage Options

Cassandra A Decentralized, Structured Storage System

Apache Traffic Server Extensible Host Resolution

Cloud computing taxonomy

Transcription:

Lecture 6 Cloud Application Development, using Google App Engine as an example 922EU3870 Cloud Computing and Mobile Platforms, Autumn 2009 (2009/10/19) http://code.google.com/appengine/ Ping Yeh ( 葉 平 ), Google, Inc. pingyeh@google.com

Numbers real world engineers should know L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main memory reference 100 ns Compress 1 KB with Zippy 10,000 ns Send 2 KB through 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within the same data center 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from network 10,000,000 ns Read 1 MB sequentially from disk 30,000,000 ns Round trip between California and Netherlands 150,000,000 ns 2

Web Applications and APIs A web application = a software that input: taken from HTTP and URL output: contained in HTML, sent via HTTP contained stuff: HTML, javascript, CSS, flash, etc A web API = a software that input: taken from HTTP and URL output: data in a format like XML or JSON sent via HTTP Not a clear cut 3

Software as a Service: The instance on a hardware, may or may not have elasticity Hosting service: allocation of hardware or virtual machines, no elasticity. Infrastructure as a Service (IaaS): hardware + Virtual Machine + Frontend/Routing + Management, with elasticity Platform as a Service: IaaS + OS + Libraries + Reverse Proxy Layers of a Web Site Frontend / Routing HTTP, URL Reverse Proxy Instance Application Software Libraries HTTP, HTML Operating System (optional) Virtual Machine Elasticity Management Provisioning, accounting, monitoring, start/stop servers,... Hardware 4

How to to know user's behavior? Logs Fast analysis Designing a Web Application What does it it do? Your ideas here Function Analysis & Admin How to to manage the app? Admin UI UI Data {im ex}port Internal process How long do do I I need to to wait? Caching Data organization Static content etc Performance Web Application User Experience Cloud service can help Security Availability How to to use it? it? Flow Simplicity Interactivity etc Is Is it it available when I I need it? it? Scalability DDOS defense Fault tolerance Routing etc Is Is my data safe? Encryption Password strength Data replication XSRF Buffer overflow Privacy measures etc 5

Cloud Platforms IaaS: Amazon EC2 / S3 / SimpleDB, GoGrid, Sun, RackSpace Cloud, and others. Infrastructure software: Eucalyptus PaaS Engine Yard and Heroku: Ruby on Rails on EC2 Force.com: for applications that run on salesforce.com Google App Engine: Python and Java on GFS / BigTable / Google's frontend Microsoft Azure:.NET on Microsoft's infrastructure Platform software: AppScale, Vertebra 6

Google App Engine http://code.google.com/appengine/ 7

Google App Engine at a Glance 8

Design Motivations Build on Existing Google Technology Frontend, BigTable, GFS Provide an Integrated Environment Encourage Small Per-Request Footprints Encourage Fast, Efficient Requests Maintain Isolation Between Applications Encourage Statelessness and Specialization Require Partitioned Data Model (from BigTable) 9

Life of an App Engine Request 10

Routing Routed to the nearest Google datacenter Travels over Google's network Same infrastructure other Google products use Lots of advantages for free 11

Request for Static Content Routing at the Front End Static Content Servers Google App Engine Front Ends Load balancing Routing Front Ends route static requests to specialized serving infrastructure Google Static Content Serving Built on shared Google Infrastructure Static files are physically separate from code files 12

Request For Static Content Response to the user Back to the Front End and out to the user Front End handles connection to the user Frees up Static Content server Specialized infrastructure App Server runtimes don't serve static content Flexible cache expiration settings 13

Request for Static Content Defining static content Java Runtime: appengine-web.xml... <static> <include path="/**.png" /> <exclude path="/data/**.png /> </static>... Python Runtime: app.yaml... - url: /images static_dir: static/images OR - url: /images/(.*) static_files: static/images/\1 upload: static/images/(.*)... 14

Front Ends Request for Dynamic Content Route dynamic requests to App Servers App Servers Serve dynamic requests Where your code runs App Master Schedules applications Informs Front Ends 15

Request for Dynamic Content Checks for cached runtime If it exists, no initialization Execute request Cache the runtime Consequences / Opportunities Slow first request, faster subsequent requests Optimistically cache data in your runtime! 16

App Servers - What do they do? Many applications Many concurrent requests Smaller footprint + faster requests = more apps Enforce Isolation Keeps apps safe from each other Enforce statelessness Allows for scheduling flexibility Service API requests Provides access to other services 17

Requests accessing APIs App Servers 1. App issues API call 2. App Server accepts 3. App Server blocks runtime 4. App Server issues call 5. Returns the response Use APIs to do things you don't want to do in your runtime, such as... 18

APIs 19

Memcacheg Distributed in-memory cache Very fast memcacheg Like memcached Also written by Brad Fitzpatrick adds: set_multi, get_multi, add_multi Optimistic caching Very stable, robust and specialized A more persistent in-memory cache 20

The App Engine Datastore Persistent storage http://labs.google.com/papers/bigtable.html 21

The App Engine Datastore Your data is already partitioned on day one Use Entity Groups Explicit Indexes make for fast reads But slower writes Replicated and fault tolerant On commit: 3 machines Geographically distributed (shortly thereafter) Bonus: Keep globally unique IDs for free Persistent storage (more about it later) 22

Other APIs GMail Gadget API Google Accounts Task Queue Picasaweb Google Talk 23

The App Engine Datastore 24 24

The Datastore Is... Transactional Natively Partitioned Hierarchical Schema-less Based on Bigtable Not a relational database Not a SQL engine 25 25

Simplifying Storage Simplify development of apps Simplify management of apps App Engine services build on Google s strengths Scale always matters Request volume Data volume 26 26

Datastore Storage Model Basic unit of storage is an Entity consisting of Kind (compare: table) Key (compare: primary key) Entity Group (compare: partition) 0..N typed Properties (compare: columns) class Person(db.Model): Age = db.integerproperty() BestFriend = db.referenceproperty() sally = Person() sally.put() ethel = Person(Age=30, BestFriend=Sally) ethel.put() Kind Person Entity Group /Person:Ethel Key /Person:Ethel Age Int64: 30 Best Friend Key:/Person:Sally 27 27

Noteworthy Datastore Features Kind Ancestor Heterogenous property types Multi-value properties Variable properties Entity Group Key Person /Person:Ethel /Person:Ethel Age Int64: 30 Best Friend Key:/Person:Sally Same entity group class Person(db.Expando): BestFriend = db.listproperty(db.key) ethel = Person(BestFriend=[sally]) ethel.age = 30 ethel.put() jane = Person(BestFriend=[eloise,patty], parent=ethel) jane.age = 8.5 jane.grade = 3 jane.put() Kind Entity Group Key Person /Person:Ethel Age Double: 8.5 Best Friend Grade Int64: 3 /Person:Ethel/Person:Jane Key:/Person:Eloise, Key:/Person:Patty 28 28

Datastore Transactions Transactions apply to a single Entity Group Watch out for contention! Global transactions are feasible get(), put(), delete() are transactional Queries cannot participate in transactions (yet) /Person:Ethel Transaction /Person:Ethel/Person:Jane /Person:Max 29 29

Transparent Entity Group Management Entity Group layout is important (* probably the most crucial design decision of your application *) Write throughput (1 10 writes/second in an entity group) Atomicity of updates Object relationships can be described as owned or unowned owned: an entity doesn't make sense without another entity, e.g., chapter and book. unowned: otherwise, e.g., car and person. We let ownership imply co-location within an Entity Group 30 30

Under the covers of the App Engine Datastore Slides are at http://snarfed.org/space/datastore_talk.html 31 31

Porting: check your Transactions Identify roots in your data model User is often a good choice for online services Identify operations that transact on multiple roots Analyze the impact of partial success and then either refactor disable the transaction disable the transaction and write compensating logic 32 32

Porting: check your queries Shift processing from reads to writes Identify joins Denormalize or rewrite as multiple queries select * from PERSON p, ADDRESS a where a.person_id = p.id and p.age > 25 and a.country = US Relational Database select from com.example.person where age > 25 and country = US App Engine Datastore Identify unsupported filter operations (distinct, toupper()) Rewrite as multiple queries Filter in-memory 33 33

Open For Questions The White House's "Open For Questions" application accepted 100K questions and 3.6M votes in under 48 hours 34

Homework #2 Implement the paxos algorithm Assume that disk never fails, but processes may die (TA kills it) and restart a random time later, and communications may randomly be duplicated (TA sends it again) or lost (OS glitch). Start: $ paxos -p port config Config file lists ports of all processes (find N from it). Interfaces: MASTER : master location query, return MASTER port or UNKONWN. WRITE name value : to master: make value the consensus for name, return WRITE OK, WRITE FAILED, or TRY AGAIN. to non-master: return MASTER port QUERY name : return VALUE name value or MASTER port if the process died and missed the round for name. 35 35

Functional specs Homework #2 (cont.) Elect a master among processes before the master can take external values to propose. An acceptor waits randomly 0.5-1 second before replying to a message. Message timeout: configurable with config file, default 10s. Every process keeps at least 2 log files for crash recovery: proposer. {port}.log, acceptor.{port}.log, and a value.{port}.log for consensus of names and values. 36 36