Cloud Computing. Up until now

Similar documents

Cloud Computing. Up until now

Cloud Computing. Lecture 24 Cloud Platform Comparison

Handling Flash Crowds From Your Garage

A Survey on Cloud Storage Systems

DNS records. RR format: (name, value, type, TTL) Type=NS

Alfresco Enterprise on AWS: Reference Architecture

A programming model in Cloud: MapReduce

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Cloud Computing Trends

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public

Assignment # 1 (Cloud Computing Security)

Putchong Uthayopas, Kasetsart University

Cloud Infrastructure Planning. Chapter Six

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Scaling Analysis Services in the Cloud

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Distribution transparency. Degree of transparency. Openness of distributed systems

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Lecture 6 Cloud Application Development, using Google App Engine as an example

ECE6130 Grid and Cloud Computing

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

CLOUD COMPUTING. When It's smarter to rent than to buy

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

How To Scale A Server Farm

Cloud computing. Examples

Cloud Computing. Summary

2) Xen Hypervisor 3) UEC

Mark Bennett. Search and the Virtual Machine

Platforms in the Cloud

Cloud computing - Architecting in the cloud

Cloud Computing: Making the right choices

Hadoop IST 734 SS CHUNG

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

Cloud Computing Training

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Distributed Data Parallel Computing: The Sector Perspective on Big Data

From Internet Data Centers to Data Centers in the Cloud

Technical Writing - Definition of Cloud A Rational Perspective

Cloud Computing Disaster Recovery (DR)

Data Centers and Cloud Computing. Data Centers

Cloud Computing Is In Your Future

Aspera Direct-to-Cloud Storage WHITE PAPER

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Cloud computing opens new perspectives for hosting

Building a Highly Available and Scalable Web Farm

Amazon EC2 Product Details Page 1 of 5

Introduction to Cloud Computing

WINDOWS AZURE EXECUTION MODELS

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

Open Source Technologies on Microsoft Azure

DATA SECURITY MODEL FOR CLOUD COMPUTING

Amazon Elastic Beanstalk

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

Cloud Computing with Microsoft Azure

Cloud computing. Intelligent Services for Energy-Efficient Design and Life Cycle Simulation. as used by the ISES project

Cloud Models and Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Real Time Big Data Processing

How To Choose Between A Relational Database Service From Aws.Com

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Users VM A A A. Application. Compute/Storage/Network. VM Virtual Machine. On-Premises Data Center

Hadoop and Map-Reduce. Swati Gore

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Large-Scale Web Applications

Challenges for Data Driven Systems

Written examination in Cloud Computing

Research Paper Available online at: A COMPARATIVE STUDY OF CLOUD COMPUTING SERVICE PROVIDERS

SQL Server on Azure An e2e Overview. Nosheen Syed Principal Group Program Manager Microsoft

Scalable Application. Mikalai Alimenkou

Designing Apps for Amazon Web Services

PV213 Enterprise Information Systems in Practice 07 - Architecture of the EIS in the cloud

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Scalable Linux Clusters with LVS

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Matchmaking in the Cloud: Amazon EC2 and Apache Hadoop at eharmony

Scaling Out With Apache Spark. DTL Meeting Slides based on

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

Data Centers and Cloud Computing

Chapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1

Jeffrey D. Ullman slides. MapReduce for data intensive computing

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Massive Cloud Auditing using Data Mining on Hadoop

Facilitating Communal Data Sharing in Public Clouds

Managing large clusters resources

Ch. 4 - Topics of Discussion

Implementing Microsoft Azure Infrastructure Solutions

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

VM-Series Firewall Deployment Tech Note PAN-OS 5.0

Hadoop & Spark Using Amazon EMR

Cloud Computing. Technologies and Types

Transcription:

Cloud Computing Lecture 20 Cloud Platform Comparison & Load 2011-2012 Up until now Introduction, Definition of Cloud Computing Pre-Cloud Large Scale Computing: Grid Computing Content Distribution Networks Cycle-Sharing Distributed Scheduling Cloud: Map Reduce Storage Execution Monitoring Programming 1

Cloud Platform Comparison Load Outline Comparison of Cloud Platform Google / Google App Engine Hadoop Amazon Web Services / Eucalyptus Microsoft Azure 2

Computing 3 visions for Cloud Computing: Who will win? AmazonWeb Services x86 Microsoft Azure CLR (VM) Google App Engine Framework Aplicacional (Python, Java) Storage Disk blocks SQL server API BigTable Network BlocksofIP addresses Declarative but automatic (endpoints) 3 level applicational topology»thisistheideal model! Inpractice, the overlap is much larger! Comparison: Storage AWS / Eucalyptus Microsoft Azure Google / Hadoop SQL RDS SQL Azure X Tables SimpleDB Tables (Datastore [BigTable]) / HBase Objects/Blocks S3 Blobs GFS/ HDFS Queues Simple Queue Service(SQS) Queues (Task Queue) 3

Comparison: Storage There are two general complaints: Performance (latency). Strict coherency models do not scale. The bottom-line is that the storage scalability problem is not solved. There are no available reliable metrics. The market is still too dynamic. Google services are not accessible remotely. It is always possible to make an intermediary bridge service. Programming languages: Comparison: Programming Model Amazon: Language not relevant. The program is a VM. Google: Java and Python. Azure: Any.NET language - C#, J#, VB.NET, etc... Google (servlet/jsp) has the most restrictive model. It is the simplest choice and will tend to be the first one until limitations are found. 4

Comparison: Remote Interaction Model There are little differences/variations. All systems are based on Web Services. Most services support both REST and SOAP protocols. In most cases, applications/machines/services/stores have their own DNS names. Stored objects are identified by type less strings. Comparison: Integration The Amazon VM model permits normal interactions between servers. Google requires that other servers be accessible via Web Services. Azure supports richer integration mechanism with external servers: AppFabric, Access Control e Queues. DryadLINQtransparently integrates local and remote applications. 5

Comparison: Price Resource Unit Amazon Google Microsoft Bandwidth (outgoing) GB $0.03 - $0.085 $0.12 $0.15 Bandwidth (ingoing) GB $0.10 $0.10 $0.10 Computation Instance hour $0.10 - $1.201 $0.10 $0.12 Storage GB per month $0.05 (>5PB) to 0.14 (<1TB) $0.15 $0.15 Storage Calls Each 10k calls $0.01 (GET) $0.10 (others) $0.01 Prices are very similar. AWS, because they use system VMs, has a larger granularity. Scenario Application ported to the cloud Web Application Parallel Processing Mixed Application Characteristics Monolythic application in Java or.net. Web app with load balancer, logic layer and database. Long lasting calculations without GUI. Cloud application integrated with external servers. Platform/Application Match Amazon Normal EC2 instance. System configuration needed. Normal EC2 instance + RDS. Requires system config. and AutoScale. If RDS does not scale, requires port to S3. Many pre-built instances with infra-structure, e.g. MPI. MapReduce instances may be used. EC2 instance may access external servers. Google May require porting and requires data and logic refactoring. Very good match with Google App Engine. Automatic scalability. Requires DB rewrite. No support for larger scale applications. No direct support. Some integration possible using a bridge app to the Datastore. Microsoft If.NET refactor data. Otherwise more complex. Well adapted to the Web Role model. Worker roles + blobs e queues provide some/adequate support. AppFabric ServiceBus supports integration with external applications. 6

Hurdles to CC on the 3 Main Platforms 1. Availability: Depends on the SLA and the provider s track record. 2. Lock-In: Stronger with Google App Engine, then Azure, weaker with AWS. 3. Confidentiality and Auditing: In general confidentiality is guaranteed. No open auditing is available. Regarding applications, EC2 provides higher isolation. 4. Data transfer costs: Similar prices. AWS now has bulk transfer services (you can send them your disks). Cost/benefit is application dependent. Must be analyzed. 5. Reliable Performance For general applications, the situation is similar: there are recovery and repetition mechanisms for most services. In the case of MapReducethere is skipping mode to recover tasks. 6. Scalable storage 7. Large-scale software errors 8. Speed of scale-up: Hurdles to Cloud Computing Clearer feedback with EC2 instances. 9. Reputation propagation: Similar situation on all 3 major platforms. Not solved. Less relevant for Google App Engine. 10. Compatible licensing: only relevant at AWS (solved!) 7

Conclusions The main difference between the main providers is the applicational model: Google has the most restrictive model. The cost of an easy to program system is more lock-in than lack of functionality. I can do whatever I want on EC2 but a scalable application will require distributed scalable services.. Scalability: What is the Best Approach for Cloud Computing Clients? Handling flash crowds from your garage, USENIX 08 8

Flash Crowds! We have seen several examples of scalability in a cloud platform. What about the clients? What if we have a server running an application and need to scale quickly? How do I adapt the front-ends? Three main requirements: The system must scale to a very large size. The system must scale quickly. Off-peek operation must be cheap. Data storage services: Available Tools (i) Pros: they are cheap and they scale transparently for the user. Cons: Only solve the problem of static content. Virtual servers: Before the cloud it was already possible to rent virtual servers at ISP (even at different geographical locations). Cons: It only solves the bandwidth problem. Mostly, the computation of the distributed applications doesn t really scale. 9

Available Tools (ii) Cloud computing services. External DNS services: Prevents the service from facing a bottleneck on the DNS requests. MISSING! Scalable relational database service: As we have seen, it s not trivial to scale a classical relational database service. There are many similar services but they always sacrifice some aspect: transactional model, features of the query language, scalability. Scalable Architectures (i) What is the best approach to matching a large set of clients with a multi-server service? Hyp. 1: Use only a storage service. Good for servers with a large percentage of static content. 10

Scalable Architectures (ii) Hyp. 2: Cluster with DNS load balancing Rent several machines (e.g. EC2). Add machines to the DNS record. By default, addresses are used in round-robin fashion. Causes delays to the clients who cached the DNS record but in general the issue is the large number of clients and not a large number of requests from the same client. There are commercial implementations (e.g. RightScale). Scalable Architectures (iii) Hyp. 3: Redirection Having a server redirect the initial client request to a set of back-end servers. Subsequent requests don t go through the redirection. Example: Amazon Elastic Load. Hyp. 4: L4 or L7 Rerouting A front-end server analyzes the request source (4 OSI level 4 e.g. TCP) or the content (OSI level 7 e.g. ) and reroutes the request to the corresponding back-end server. Requires a high-performance server or switch, but the client does not see the redirection. There are commercial implementations (e.g. Flexiscale). Hyp. 5: Hybrids of the 4 previous hypothesis. 11

M: repl. front-end Redirection M: repl. front-end Static Redirection 12

M: repl. front-end Static Redirection Scales very well Client arrival rate M: repl. front-end Static Redir. Scales very well Client arrival rate Redirecting clients (specially if it s done only when a session begins), is very cheap even if the front-server is receiving back-end status reports and running a load balancing algorithm. 13

M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited The UDP-based DNS response has only 512 bytes (up to 25 back-end servers). Most ISP complete the request using TCP if there are more than 25. However, some DNS clients only use the first reply. M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited Incoherent, but in the case of L4 there are growing hurdles to success: NAT, proxies,... 14

M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited Incoherent Immediate + DNS TTL M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited Incoherent Immediate + DNS TTL Immediate Session duration Session duration Days It is difficult to identify when sessions finish (e.g. webmail). There are DNS clients that ignore DNS records TTL and take days to invalidate their DNS cache. 15

M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited Incoherent Immediate + DNS TTL Immediate Session duration Session duration Days It is difficult to identify when sessions finish (e.g. webmail). There are DNS clients that ignore DNS records TTL and take days to invalidate their DNS cache. M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited Incoherent Immediate The front-end VM start-up of the time storage service. Not the web server. Immediate Session duration Session duration + DNS TTL Days Total Failure Total Failure Significant Fault Especially when using low TTL s for good scalling 16

M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited If there is load balancing of the redirection servers, Immediate one has to VM wait start-up for the timeclient VM to start-up try time another server. It should take max 2.5 s but in some Linux implementations it takes up to 3 min! Immediate Session duration Session duration Incoherent + DNS TTL Days Total Failure Total Failure Significant Fault Has no effect Total Failure Rare Effect Improbable Longdelayfor 1/m sessions? Longdelayfor 1/m sessions? Small Delay M: repl. front-end Static Redir. Scales very well Client arrival rate Request arrival rate Unlimited Incoherent Immediate + DNS TTL Immediate Session duration Session duration Days e.g., in S3 1% of first write attempts Has no fail, effect but immediate retries succeed. Improbable Improbable Improbable Improbable Total Failure Longdelayfor 1/m sessions? Has no effect Has no effect User recouverable fault Total Failure Total Failure Longdelayfor 1/m sessions? 1/m sessions fail. Has no effect Occasional fault Significant Fault Rare Effect Small Delay Some sessions have small delay. Longdelayfor 1/n sessions. Longdelayfor 1/n sessions. 17

M: repl. front-end Redir. Static Scales very well Client arrival rate Request arrival rate Unlimited Incoherent Immediate + DNS TTL Immediate Session duration Session duration Days Total Failure Total Failure Significant Fault Has no effect Total Failure Rare Effect Improbable Longdelayfor 1/m sessions? Longdelayfor 1/m sessions? Small Delay Improbable Has no effect 1/m sessions fail. Some sessions have small delay. Improbable Improbable Has no effect User recoverable fault Has no effect Occasional fault Longdelayfor 1/n sessions. Longdelayfor 1/n sessions. MapCruncher 18

Example: MapCruncher Map conversion site. Loaded with 25 GB of interactive demo maps. Flash crowd due to Microsoft publicizing it. The server had theoretical capacity to handle traffic (100 images/sec.), but the lack of reference locality (each client looking at different parts of the maps) made the thrashing unbearable. Moved all the static content to S3: they pay $4/month if there is no traffic. Example 2: Asirra 19

Example 2: Assirra CaptchaWeb Service based on distinguishing cats from dogs. EC2 servers + 100GB of images placed on S3. Database of image metadata: SQL server was slow. Nightly transfer of a image key indexed structure (read-only DB) to each of the applicational servers. Example 2: Azirra How can the session state be maintained? Hyp. 1: Inside S3. It s slow. Hyp. 2: On the applicationalservers disks. Since they use DNS load balancing it s not guaranteed that the question and answer to the captchago to the same server. Solution: Forward all session requests to the same server. Server id stored in session id. It s very cheap because it requires no disk accesses and only 10% change servers between request and response. 20

Example 2: Azirra Again, a flash crowd after a trade fair appearance. 75000 requests in 24h. Two interesting observations: 30000 requests were from a DoS. Using more instance was cheap. The attacker gave up but it would have been cheap to keep them running until a filter were set up. Example 3: InkBlotPassword.com Website for associating mnemonic images (Rorschach inkblots) to passwords. After the two previous experiences, they simplified the development process. Is it worth optimizing code? If optimizations are only for peek periods, it s better to pay for more machines. The website was mentioned on Slashdot (tech news site) without the authors knowing. They detected a flash crowd (request queue = 130!), started 12 new nodes. 20 min. later, the website was stable. Three days later they were again stable at only 3 servers. Total cost of the flash crowd: $150. 21

Next Time... Cloud Data Centers 22