Cloud Computing AWS a practical example Mayo 2012 Hugo Pérez UPC
-2- Index Introduction Infraestructure Development and Results Conclusions
Introduction In order to know deeper about AWS services, mapreduce process, the public data available from tweeter and the method to interact with them, i developed a little example, using: AWS Infraestructure: - Elastic Cloud Compute EC2 - Elastic Block Store EBS - Elastic IP - Simple Storage Service S3 AWS Tools: - Management Console - CloudWatch - Elastic MapReduce EMR Tweeter Search API -3-
-4- Index Introduction Infraestructure Development and Results Conclusions
Creating AWS Account Go to http://aws.amazon.com -5-
Creating AWS Account Sign in as a new user -6-
Creating AWS Account Record name, email and password -7-
Creating AWS Account Record contact details -8-
Creating AWS Account Record payment data -9-
Creating AWS Account Confirm a PIN by a phone call - 10 -
Creating AWS Account Confirming.. - 11 -
Creating AWS Account Wait some minutes until the account is active (less than 10 mins in this case) - 12 -
Creating EC2 Go to AWS Management Console-> EC2 Dashboard - 13 -
Creating EC2 Create a new instance - 14 -
Creating EC2 Choose the AMI (Amazon Machine Image) to install, Ubuntu Server 12.04-15 -
- 16 - Creating EC2 Defining number of instances and type, in this case 1 Micro, characteristics: HD: 8Gb (EBS), RAM: 600 Mb, CPU:Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
Creating EC2 Defining instance details, like shutdown behavior, user data. - 17 -
Creating EC2 Defining tags: user-friendly names to manage the resources - 18 -
Creating EC2 Creating Key Pair to securely connect with the instance. - 19 -
Creating EC2 Configuring the firewall - 20 -
Creating EC2 Review - 21 -
Creating EC2 You can check the details from the Management Console - 22 -
- 23 - Creating EC2 Also you can monitor the instance, create alarms, configure detailed monitoring.
Creating Elastic IP Now you can access to the instance by ssh using this name: ec2-23-23-187119.compute-1.amazonaws.com To simplify it, you can create a elastic ip address - 24 -
Creating Elastic IP Once created the elastic ip - 25 -
Creating Elastic IP You should associate it with the instance - 26 -
Creating S3 Defining the name and region, the region should be the same that EC2 to optimize for latency. AWS gives 5 Gb free. - 27 -
Creating S3 Set permissions to grant access to list the S3 Bucket to Authenticated Users. - 28 -
Creating Billing Alarm First you have to enable this function. - 29 -
Creating Billing Alarm Define the parameters: recipients and threshold - 30 -
Cloud Watch Besides the alarm, you can check the estimated charges, through cloud watch - 31 -
Cloud Watch Throught cloud watch you can query different kind of metrics - 32 -
- 33 - Index Introduction Infraestructure Development and Results Conclusions
- 34 - Installing EMR CLI Connect to the server ssh -i awskey.pem ubuntu@23.21.252.15 Install the Amazon Elastic MapReduce Ruby Client $ mkdir elastic-mapreduce-cli $ cd elastic-mapreduce-cli $ wget http://elasticmapreduce.s3.amazonaws.com/elastic-mapreduce-ruby.zip $ unzip elastic-mapreduce-ruby.zip
Installing EMR CLI Configuring credentials$ vi credentials.json { "access_id": "[Your AWS Access Key ID]", "private_key": "[Your AWS Secret Access Key]", "keypair": "[Your key pair name]", "key-pair-file": "[The path and name of your PEM file]", "log_uri": "[A path to a bucket you own on Amazon S3, such as, s3n://mylog-uri/]", "region": "[The Region of your job flow, either us-east-1, us-west-2, us-west-1, eu-west-1, ap-northeast-1, apsoutheast-1, or sa-east-1]" } - 35 -
Installing EMR CLI You can get the AWS Access Key ID and the AWS Secret Access Key by entering to your account in http://aws.amazon.com in the Access Credentials section. - 36 -
Installing EMR CLI It is recomended to create a new key pair for the exercise. I did it from Management Console, i put this key pair in the EC2 instance. - 37 -
Installing EMR CLI I save all the parameters in the file: ubuntu@ip-10-195-195-175:~/elastic-mapreduce-cli$ more credentials.json { "access_id": "HPVAJFNULSZULY5NWHPV", "private_key": "65xBzYVzV7THPVYWW2LcYN0roVwK1I+nxJ+BNHPV", "keypair": "mapreduce", "key-pair-file": "/home/ubuntu/mapreduce.pem", "log_uri": "s3n://mylog-uri-hpv/", "region": "us-east-1" } - 38 -
Basics EMR CLI Basic commands of EMR CLI: $./elastic-mapreduce --help $./elastic-mapreduce --create $./elastic-mapreduce --list $./elastic-mapreduce --describe --jobflow [JobFlowID] $./elastic-mapreduce -j JobFlowID --stream $./elastic-mapreduce --terminate JobFlowID - 39 -
Mapper The mapper script, the classic word counter: #!/usr/bin/python import sys import re def main(argv): pattern = re.compile("[a-za-z][a-za-z0-9]*") for line in sys.stdin: for word in pattern.findall(line): print "LongValueSum:" + word.lower() + "\t" + "1" if name == " main ": main(sys.argv) - 40 -
Using Twitter API To generate the input data, run a simple query to twitter: - 41 -
Using Twitter API Query: http://search.twitter.com/search.json?q=cloud% 20computing&rpp=5&include_entities=true&result_type=mixed pattern: cloud computing rpp: return per page=5 include_entities: if it is true the result includes urls, media and hashtags result_type: - mixed: Include both popular and real time results in the response. - recent: return only the most recent results in the response - popular: return only the most popular results in the response. - 42 -
Using Twitter API Query: http://search.twitter.com/search.json?q=cloud% 20computing&rpp=5&include_entities=true&result_type=mixed pattern: cloud computing rpp: return per page=5 include_entities: if it is true the result includes urls, media and hashtags result_type: - mixed: Include both popular and real time results in the response. - recent: return only the most recent results in the response - popular: return only the most popular results in the response. - 43 -
Using Twitter API Transfer the result to S3: $ s3curl.pl --id=personal --put=cloudcomputing -- http://s3.amazonaws.com/mylog-uri-hpv/entradas/cloudcomputing - 44 -
Exec EMR $./elastic-mapreduce --create --stream --mapper s3://elasticmapreduce/samples/wordcount/wordsplitter.py --input s3://mylog-uri-hpv/entradas/cloudcomputing --output s3://mylog-uri-hpv/salidas/cloudcomputing --reducer aggregate $./elastic-mapreduce --list --active j-3ebj6mt4fbm80 STARTING Development Job Flow PENDING Example Streaming Step $./elastic-mapreduce --list --active j-3ebj6mt4fbm80 RUNNING ec2-23-20-6-34.compute-1.amazonaws. com Development Job Flow RUNNING Example Streaming Step - 45 -
Exec EMR Monitoring from Management Console - 46 -
Exec EMR Provisioning on demand - 47 -
Exec EMR Monitoring Graphs - 48 -
Results EMR Results on S3-49 -
- 50 - Index Introduction Infraestructure Development and Results Conclusions
- 51 - Conclusions The software development model is completely new. Is eliminated the purchase process, the installation process is becoming easier, the role of system administrator (sysadmin, DBA, etc.) is disappearing, the developer can focus on business logic, not only provides AWS infrastructure, but also the development platform. Twitter api is well documented and easy to use. This model is available to a company of any size. The free application layer covers all hardware components used in this exercise (EC2, EBS, Elastic IP, S3) except for one small EC2 instance that is used on demand in the process of MapReduce. The total charge for the development of this exercise was USD $ 0.45
Conclusions Charges: - 52 -
References http://aws.amazon.com http://aws.amazon.com/es/elasticmapreduce/ http://docs.amazonwebservices. com/elasticmapreduce/latest/gettingstartedguide/welcome.html?r=6602 https://dev.twitter.com/docs https://dev.twitter.com/start https://dev.twitter.com/docs/using-search https://dev.twitter.com/docs/api/1/get/search
Thanks In order to know deeper about AWS services, mapreduce process, the public data available from tweeter and the method to interact with them, i developed a little example, using: AWS Infraestructure: - Elastic Cloud Compute EC2 - Elastic Block Store EBS - Elastic IP - Simple Storage Service S3 AWS Tools: - Management Console - CloudWatch - Elastic MapReduce EMR Tweeter Search API