Instance Types Standard Instances: 1EC2CU: equivalent of 1.0-1.2GHz 2007 AMD Opteron or 2007 Intel Xeon processor Small: 1.7GBmem, 1EC2Compute Unit (EC2CU), 160GB local instance storage(lis), 32/64bits. Medium: 3.75 GBmem, 2EC2CU, 410GBlis, 32/64bits. Large: 7.5GBmem, 4EC2CU, 850GBlis, 64bits Extra Large: 15GBmem, 8EC2CU, 1690GBlis, 64bits. Micro Instances: 613MBmem, 2ECUs, EBS High-Memory Instances: 17.1, 34.2, 68.4GBs. High-CPU Instances (5EC2CU or 20EC2CU) Cluster GPU Instances (22GBmem, 33.5EC2CU, 2xNVIDIA Tesla Fermi M2050 GPUs, 1690GBlis, 10GEthernet. 21
Instance vs. VM Instance = VM + hardware (instance type) AMI (Amazon Machine Image) = VM image VM image = OS + software Users specify the type of VM and hardware (i.e., instance type) when setting up an instance 22
OS and Software Amazon Machine Images (AMIs) are preconfigured with an evergrowing list of operating systems (win2008os including in price!!) 23
Pricing: On-Demand Instance 24
Data Transfer Charge chow 25
AWS s Free Usage Tier 26
Amazon S3 (Simple Storage Service) Basics Data stored as objects (files) in buckets key to file is path identified by <bucket> + <path> No real directories, just path segments Great as persistent storage for data Reliable up to 99.999999999% Scalable up to petabytes of data Fast highly parallel requests
S3 Access Via your web browser Various command line tools s3cmd Or via HTTP REST interface Create (PUT/POST), Read (GET), Delete (DELETE)
S3 Limitations Can t be modified (no random write or append) Max size of 5TB (5GB per upload request)
S3 Pricing Varies by region Data in is (currently) free Data out is also free within same region Otherwise starts at $0.12/GB Storage cost is per GB-month Starts at $0.140/GB, drops w/volume
S3 Access Control List (ACL) Read/Write permissions on per-bucket basis Read == listing objects in bucket Write == create/overwrite/delete objects in bucket Read/Write permissions on per-object (file) basis Read == read object data & metadata
S3 Amazon web services S3 API support the ability to: Find buckets and objects (jar file, data file, etc.) Discover their meta data Create new buckets Upload new objects Delete existing buckets and objects Distcp/s3distcp from S3 to HDFS for computation
Amazon EMR A web service that allow cost-effective large data processing Hadoop (HDFS + Map-Reduce) over EC2 and S3 EMR is mostly used for data intensive tasks Examples: web indexing, data mining, log analysis, data warehousing, machine learning, financial analysis, scientific simulation, bioinformatics 33
Apache Hadoop Stack for Data analytics Resource Management & Workflow HBase Pig, Hive, Mahout Map Reduce Yarn Zookeeper HDFS Sqoop Flume 34
Why Use Elastic MapReduce? Reduce hardware & IT personnel costs Pay for what you actually use Don t pay for people you don t need Don t pay for capacity you don t need More agility, less wait time for hardware Don t waste time buying/racking/configuring servers Many server classes to choose from (micro to massive) Less time doing Hadoop deployment & version mgmt Optimized Hadoop is pre-installed
Amazon Mechanical Turk A web service that exposes an ondemand global workforce ready to complete small tasks in exchange for micro-payments Frictionless. Outsourcing per-se is irrelevant. A web services API Examples?
Identify Road Markings
How It Works www.mturk.com Requester (Developer) Human Intelligence Tasks (HITs) Worker Qualifications Artificial, Artificially Intelligent Software Completed HITs Workers 38
Example Application: Podcast transcription service provider, which transcribes audio into high-quality text Amazon Simple Storage: Stores the podcasts and related files Amazon Mechanical Turk + EMR: voice recognition algorithms transcribe podcasts Amazon EMR: index text within search engine
Learn More About AWS AWS: http://aws.amazon.com EC2 Resources: http://docs.amazonwebservices.com/awse C2/latest/UserGuide/ Amazon EMR: http://aws.amazon.com/elasticmapredu ce/
Homework (Last Friday) Setup AWS account Watch Video on AWS EMR Getting Started (11:04) Signing up for an AWS account, generating a keypair, and setting up an S3 bucket. Running Jobs (14:47) Creating, monitoring, and getting results from you EMR Job Flow. Clusters of Servers (10:50) EC2 instance types, pricing, and Hadoop cluster configuration. Dealing with Data (18:54) S3 architectures, pricing, and access control. 41
Homework (Cont.) AWS Hands-on Lab 0 Follow the instructions from the tutorial and repeat the tasks including: create an account, working with S3, create cluster, and run a job, setup instances. Compile jar file using the source code posted on course website 42
Summary Cloud Computing AWS EC2 and S3 EMR and AMT Hands-on Lab 0 warming up 43