Cloud Computing on Amazon's EC2



Similar documents
Amazon Elastic Compute Cloud Getting Started Guide. My experience

Amazon EC2 XenApp Scalability Analysis

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Building a Private Cloud Cloud Infrastructure Using Opensource

Cloud Computing and Amazon Web Services

Running R from Amazon's Elastic Compute Cloud

Hey, You, Get Off of My Cloud! Exploring Information Leakage in Third-Party Clouds. Thomas Ristenpart, Eran Tromer, Hovav Shacham, Stefan Savage

Cloud Computing For Bioinformatics

Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using

benchmarking Amazon EC2 for high-performance scientific computing

Cloud Computing. Adam Barker

Network performance in virtual infrastructures

Cloud security CS642: Computer Security Professor Ristenpart h9p:// rist at cs dot wisc dot edu University of Wisconsin CS 642

MySQL and Virtualization Guide

Amazon EC2 Product Details Page 1 of 5

High Performance Computing in CST STUDIO SUITE

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

What is Cloud Computing? Why call it Cloud Computing?

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

Cloud Performance Benchmark Series

Introduction to Cloud Computing

Muse Server Sizing. 18 June Document Version Muse

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Cloud n Service Presentation. NTT Communications Corporation Cloud Services

LabStats 5 System Requirements

How To Choose Between A Relational Database Service From Aws.Com

Cloud Performance Benchmark Series

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

This computer will be on independent from the computer you access it from (and also cost money as long as it s on )

Shadi Khalifa Database Systems Laboratory (DSL)

Moving Drupal to the Cloud: A step-by-step guide and reference document for hosting a Drupal web site on Amazon Web Services

Performance test report

When Does Colocation Become Competitive With The Public Cloud? WHITE PAPER SEPTEMBER 2014

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

Scaling in the Cloud with AWS. By: Eli White (CTO & mojolive) eliw.com - mojolive.com

PARALLELS CLOUD SERVER

When Does Colocation Become Competitive With The Public Cloud?

Estimating the Cost of a GIS in the Amazon Cloud. An Esri White Paper August 2012

Cluster Computing at HRI

Dedicated Hosting. The best of all worlds. Build your server to deliver just what you want. For more information visit: imcloudservices.com.

Last time. Today. IaaS Providers. Amazon Web Services, overview

Hosting Requirements Smarter Balanced Assessment Consortium Contract 11 Test Delivery System. American Institutes for Research

PARALLELS SERVER BARE METAL 5.0 README

Sage 200 Online. System Requirements and Prerequisites

LCMON Network Traffic Analysis

Planning the Installation and Installing SQL Server

How To Set Up A Firewall Enterprise, Multi Firewall Edition And Virtual Firewall

Eucalyptus Tutorial HPC and Cloud Computing Workshop

Getting Started with Amazon EC2 Management in Eclipse

Cornell University Center for Advanced Computing

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

wu.cloud: Insights Gained from Operating a Private Cloud System

An Introduction to Cloud Computing Concepts

Benchmarking Large Scale Cloud Computing in Asia Pacific

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Exploiting Private and Hybrid Clouds for Compute Intensive Web Applications

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Performance Analysis: Benchmarking Public Clouds

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

System Requirements. SuccessMaker 5

Case Study: Load Testing and Tuning to Improve SharePoint Website Performance

Cornell University Center for Advanced Computing

GeoCloud Project Report GEOSS Clearinghouse

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

System requirements for A+

Ignify ecommerce. Item Requirements Notes

Zend Server Amazon AMI Quick Start Guide

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

ISPS & WEBHOSTS SETUP REQUIREMENTS & SIGNUP FORM LOCAL CLOUD

Onboarding VMs to Cisco OpenStack Private Cloud

HPC performance applications on Virtual Clusters

Multi-Threading Performance on Commodity Multi-Core Processors

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Cloud Computing: Amazon Web Services

Description of Application

Cloud computing for fire engineering. Chris Salter Hoare Lea, London, United Kingdom,

MATLAB Distributed Computing Server Cloud Center User s Guide

A Middleware Strategy to Survive Compute Peak Loads in Cloud

The Performance and Cost Variability of Amazon EC2

Merge CADstream. For IT Professionals. Merge CADstream,

Affinity Aware VM Colocation Mechanism for Cloud

Parallel Computing with MATLAB

Computing in High- Energy-Physics: How Virtualization meets the Grid

Performance measurement of a private Cloud in the OpenCirrus Testbed

Introduction to Cloud computing. Viet Tran

Using the Windows Cluster

Transcription:

Technical Report Number CSSE10-04 1. Introduction to Amazon s EC2 Brandon K Maharrey maharbk@auburn.edu COMP 6330 Parallel and Distributed Computing Spring 2009 Final Project Technical Report Cloud Computing on Amazon's EC2 As computer components have become cheaper and more widely available, there is an increasing trend for corporations and scientific research / academic institutions to off-load their data-processing to a high performance computing (HPC) center. These centers, specifically Amazon's EC2, offer a truly virtual computing environment for use of processing power at a premium. Instead of these corporations and institutions purchasing equipment for peak traffic volume or a short-lived scientific experiment, they can instead purchase as much processing power on EC2 as they need and only pay for what they use. This virtual environment is created by the user in the form of an Amazon Machine Image (AMI). An AMI contains the application(s) the user wishes to run along with any libraries, data and/or configuration settings. It is essentially an all-inclusive operating system. This AMI is bundled up and the user uploads or otherwise transfers the AMI to another of Amazon's products, Amazon Simple Storage Solution (S3). This is the location from which Amazon loads the user's AMI to run on a processor or series of processors, depending on the type and quality of service the user requires. After an instance is instantiated using a command-line tool or web interface provided by Amazon, this instance has its own machine name, ip address, firewall properties, ethernet interfaces, etc., just like it is its own standalone machine. The instance can then be accessed remotely just as any other machine can be accessed, through the use of ssh and scp protocols. A user can choose among several different instance types to fit her specific needs. Following is a snippet from Amazon's website: Standard Instances o Instances of this family are well-suited for most applications o Small Instance (Default) 1.7GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160GB of instance storage, 32-bit platform o Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform o Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform High-CPU Instances o Instances of this family have proportionally more CPU resources than memory (RAM) and are well-suited for compute-intensive applications o High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform

Maharrey 2 o High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform Note: EC2 Compute Unit (ECU) One ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. EC2 together with S3 provide a high-capacity computing resource worthy of use by virtual parallel computing clusters. Here, it is important to note that when an instance is destroyed, any data generated by that instance is lost. This is where S3 comes into play. It is used as a persistent storage device for instances. S3 holds users files in what Amazon calls buckets. Amazon provides an API to access user files in buckets and the necessary security requirements of these buckets (i.e. bucket permissions, access, etc.) so that another user cannot purposefully or unwittingly destroy, overwrite, etc. files in another user s bucket. Since my experiments on Amazon s EC2 do not use this persistent storage provided by Amazon except to store the AMIs I created for experimentation purposes, I will skip further content for brevity and urge the curious reader to pursue this further on Amazon s website. 2. Experiment 1 The first experiment I chose to help evaluate the performance of Amazon s EC2 computing environment is a distributed prime search. The intensive processing nature of finding all prime numbers in a certain range is well-suited for parallel processing. First, I started with 1 instance (compute node) and increased this number, one-by-one, to 15, recording the length of time it took each instance to finish its calculations. I repeated the same experiment three times for each number of instances. I then repeated the same experiment with two different instance types (m1.small and c1.medium) and the Linux cluster in Shop 304. (see the accompanying Microsoft Excel document) One important factor is how the range is divided and given to each instance. On each instance, I ran a java program as follows: java prime <id> <number of instances> <beginning number> <ending number>. One way to partition the range is as follows (<number of instances> = 2, <beginning number> = 3, <ending number> = 10): Number to Test 3 4 5 6 7 8 9 10 Assigned to Instance <id> 0 1 However, partitioning the range in this way ensures that instances with lower <id> always finish before instances with higher <id> (i.e. it takes longer to determine if a number is prime if it is large). Instead, the way I chose to partition the range of numbers is as follows (<number of instances> = 2, <beginning number> = 3, <ending number> = 10): Number to Test 3 4 5 6 7 8 9 10 Assigned to Instance <id> 0 1 0 1 First, notice that even numbers are not checked at all. This speeds up the processing a little, so the program can focus on real primes. This spreads out the larger possible primes over the instances, giving each instance a more roughly similar number of possible primes to test. 3. Experiment 2

milliseconds milliseconds Maharrey 3 In my second experiment, I wanted to find some way to test networking between these instances. In this experiment, I decided to use my implementation of a distributed version of Conway s Game of Life implemented in libsynk. I tested the performance of an m1.small instance and c1.medium instance against the Shop 304 Linux machines. The data I gathered show the performance of the m1.small instance is eclipsed by both the c1.medium instance and the Linux cluster in Shop 304. The c1.medium instance performed best. 4. Conclusions As the reader can see from the accompanying Microsoft Excel document and Figure 1 below, the truly parallel Linux cluster at first out-performs Amazon s c1.medium instance, but soon they level off to provide about the same performance. No significant gain is achieved by adding more instances. 0.25 0.2 0.15 0.1 0.05 0 Time per Prime 1 3 5 7 9 11 13 15 m1.small c1.medium Shop 304 Figure 1 Number of Compute Nodes Figure 2 shows the relative performance of the Game of Life experiment using libsynk. Average Time to Completion 8000 6045 6000 4000 2000 0 1200 2066 m1.small c1.medium Shop 304 Linux Cluster Figure 2 Figure 1 shows that having your own cluster is quite comparable to the c1.medium Amazon image. At 20 an hour, that s not a bad deal, especially if the computing you re trying to perform will complete in the short term. The m1.small instance, while not entirely a bad deal at 10 an hour, it doesn t match in performance to the c1.medium instance nor the shop 304 Linux cluster. Figure 2 surprised me a little. I would have expected the Linux cluster to out-perform even the c1.medium instance. My reasoning was thus: Since Amazon charges for bandwidth usage as well, surely

Maharrey 4 my 100Mbps Linux lab Ethernet connection will be faster than Amazon s due to the fact that Amazon has some sort of in-house overhead in keeping up with my bandwidth usage. My reasoning is thus corrected. Amazon apparently has little to no overhead in keeping up with my bandwidth usage from instance-toinstance, or they have a much faster connection than 100Mbps. All-in-all, the c1.medium instance outperformed both the m1.small and shop 304 Linux cluster. Computing surely has come a long way, and Amazon s EC2 is doing a lot to help improve on the cloud-computing front (no pun intended). With the most powerful instance Amazon has to offer only costing 80 per hour, cloud computing will one day no longer be a vision for the future, but a realization that all business owners and researchers alike can use to help them with their short-term computing needs. Eventually, this technology will come around and perhaps Amazon will one day, like Microsoft, Cisco, A+, etc. [insert techical certification programs here] provide certification programs for computer professionals to help qualify them for jobs, etc. A business looking for an outside hire could advertise they are looking for computer scientists with Amazon EC2 certifications to work for them. This would ease the burden on companies having to train their employees in Amazon s cloud computing environment. Instead, the employee would already have the certification and be ready to go to work. Dr Gu, you already have the source code for the distributed Game of Life simulation from homework 2. All I did to modify it to run on the instances was to take out all the code concerning the viewer and recompile it. Following is my source code for the prime search program:

Maharrey 5 /* * Brandon Maharrey * COMP 6330 * Spring 2009 * Final Project * Filename: prime.java */ import java.math.biginteger; import java.lang.integer; class prime { public static void main(string[] args) { int id = 0, numinstances = 0; BigInteger begnum, endnum, numtocheck; if (args.length == 4) { try { id = Integer.parseInt(args[0]); catch (NumberFormatException e) { System.err.println("argument 1 must be an integer\nusage: java prime <id> <number of instances> <beginning number> <ending number>\n"); System.exit(1); else { try { numinstances = Integer.parseInt(args[1]); catch (NumberFormatException e) { System.err.println("argument 2 must be an integer\nusage: java prime <id> <number of instances> <beginning number> <ending number>\n"); System.exit(1); System.out.println("\nusage: java prime <id> <number of instances> <beginning number> <ending number>\n"); begnum = new BigInteger(args[2]); endnum = new BigInteger(args[3]); if (begnum.compareto(new BigInteger("3")) < 0) { System.out.println("\nerror: <beginning number> must be greater than or equal to 3\n"); if (begnum.compareto(endnum) >= 0) { System.out.println("\nerror: <beginning number> must be strictly less than <ending number>\n"); if (id >= numinstances id < 0) { System.out.println("\nerror: <id> must be on [0, <number of instances> - 1]\n"); BigInteger two = new BigInteger("2"); BigInteger zero = new BigInteger("0"); BigInteger one = new BigInteger("1"); BigInteger remainder = begnum.mod(two); int num; if (remainder.compareto(zero) == 0) { num = (2 * id) + 1; numtocheck = begnum.add(new BigInteger(new Integer(num).toString())); else { num = 2 * id; numtocheck = begnum.add(new BigInteger(new Integer(num).toString())); int numprimes = 0; num = numinstances * 2; BigInteger interval = new BigInteger(new Integer(num).toString()); long begtime = System.currentTimeMillis(); while (numtocheck.compareto(endnum) <= 0) { if (numtocheck.isprobableprime(10)) { // we're 99.9% sure numtocheck is a prime number numprimes++; numtocheck = numtocheck.add(interval); long endtime = System.currentTimeMillis(); long timetocompute = (endtime - begtime); System.out.println("\nnode id = " + id + "\nprogram executed for " + timetocompute + " milliseconds\nfound " + numprimes + " primes");