Even though you need a computer to access your instance, you are running on a machine with different capabilities, great or small This computer will be on independent from the computer you access it from (and also cost money as long as it s on ) You do have the power to start & stop these instances on the fly, changing computing power as you need it You do not have the ability to touch the power switch, rest your drink on top of the server, or call IT to manually restart.
Remember this? EC2 The microprocessor or CPU/GPU, of the cloud computer EBS / S3 The hard drive of the cloud computer Each has pros/cons, we ll discuss those later.
Let s talk Anatomy (not like they do on Grey s Anatomy)
EC2 & ECUs You are running on a virtual machine, which is a bunch of software that makes possible one environment with certain characteristics that can run on any number of machines. All the concrete stuff: RAM, CPU, GHz, disks it s all pretend. CPU is given in ECU units. New EC2 offerings have larger ECUs (in case this wasn t confusing enough) Your machine is actually a template which runs inside an environment which dictates all that pretend stuff Henry Ford would be proud
AMI (Amazon Machine Image) AMI is your template. AMI dictates operating system, kernel, ram disks There are many AMIs Our quick launch made use of Amazon-provided Ubuntu AMI Community-provided AMIs Bioinfromatics Core Galaxy AMI: ami-9e1c31db You too will have your own AMI! (Promise!) AMIs are created from 2 sources Bundled images using CLI tools (Not in workshop, though ask us!) Based on a previous AMI (bootstrappin!) So if AMIs are just data, where are they?
Storage
There are two services that hold file-type data: S3 and EBS Pros and cons: S3 EBS Durability of data Price per GB Disk I/O operations More expensive / GB but pay only for what you use 1/3 cheaper but pay for what is allocated, not just used File sizes Speed Unlimited files, but each capped at 5TB No cap, but size limited to volume size on creation Ease of Use
S3 is amazing Data retention rate approaching ludicrous: 99.999999999% If you store 10000 files with Amazon, they ll eat one, on average, every 10 million years. S3 also has a cheaper offering called Reduced Redundancy Storage, in case outlasting humankind isn t in your budget. RRS is designed to provide 99.99% durability and to sustain the loss of data in a single facility. Good enough for most folks. Still waiting to eat his 2nd meal
And here s why bioinformaticians should NOT use S3 for day-to-day work: It s made to be a RESTful tool, so true interface is as a service (URLs, request/response, CLI) Crappy support with uploaders & file managers Web interface: best interface. To be fair, it s improved by leaps & bounds. S3Fox: Firefox addon from developer who vocally rejects updating the software S3browser: Not bad, but windows-only. Speed concerns (150mb file = 4 minutes over S3, 20 seconds over mounted EBS) Here s the kicker: Most disk operations require transferring data to EBS regardless! That said, EMR uses S3 exclusively, so sometimes you can t avoid the headaches. S3 can be correct once you ve got your final final final data and want to publish
EBS to the rescue! Create drives from 1GB to 1TB Cheaper (most of the time) than S3 Mount multiple drives to one instance Data transfer rate and latency is emulated to be disk I/O when dealing with EC2, the speed of which S3 cannot match Your AMIs are already using EBS whether you used an S3 snapshot or not Finally, EBS drives will remember state without you explicitly saving them
What EBS is NOT Not as permanent as S3. Final data sets & data for publication and submission for public data sets are best put on S3 for permanence. Not pay for what you use: If you create a 10Gb drive and use 1Mb of it, you pay for 10Gb monthly. Not flexible in size. You create an image of X size, and X size it shall forever be. Need more space on your drive? Too bad. (ok, ok, there are ways, but it s complicated.)
Snapshots
Snapshots Snapshots are point-in-time states of data persisted to S3 Remember our question from the boot slide? Your AMIs are essentially snapshots. EBS Volumes can also be stored as snapshots. Snapshots then are the copy of the data you want, and the machine you want to run it on. Didn t you say S3 becomes EBS anyway? Yep, and here is when it happens.
Summary Your cloud computer is: Software running on one or more machines in Amazon data centers (Availability Zones!) That software is a virtual machine whose capabilities are pre-set by Amazon On that virtual machine is a machine image (AMI) which defines the operating system and parameters for your machine, like a template Inside that machine image is your data, programs, and everything else you want to run. Amazon Infrastructure EC2 Instance type (virutal machine) Amazon Machine Image (AMI) Your Data & Software But that s not all (bwahahaha!)
Security and Credentials (a.k.a. the Labyrinth)
Credentials serve to identify your local machine to Amazon s services The credentials you use vary with the type of API you re accessing To get to Credentials page, go to http://aws.amazon.com/account/, and click on Security Credentials
Three different types of credentials for accessing APIs EC2 Key pairs: Created on-the-fly from EC2 instance start or from EC2 console under Key pairs Access keys: Created when account is created, also can be created from Credentials page X.509 Certificates: Created from Credentials page Other authentication methods for other services Username/password Account Identifiers AWS Account ID: If you want to share an AWS resource (like an Amazon EC2 AMI, Amazon EBS snapshot, Amazon SQS queue, etc.), you use the AWS Account ID to specify the account. Canonical User ID: If you want to share Amazon S3 resources (objects and buckets) with another AWS account, you use this ID to specify the account. The ID is a long string. More info at: http://docs.amazonwebservices.com/awssecuritycredentials/1.0/aboutawscredentials.html
Ready for the maze?
If you want to Make a REST or Query API request to an AWS product, access to S3 storage Make a SOAP API request to an AWS product Access secure pages on the AWS web site or AWS Management Console Use the Amazon EC2 command line tools Launch or connect to an Amazon EC2 instance Bundle an Amazon EC2 AMI Share an Amazon EC2 AMI or Amazon EBS snapshot Create a signed URL to access Amazon CloudFront private content Access AWS Discussion Forums or AWS Premium Support site Use this credential Access Keys X.509 certificates (except for S3 and Amazon Mechanical Turk, which require Access Keys) Username/Password with optional Multi-Factor Authentication X.509 certificates EC2 Key pairs Linux/UNIX AMIs: X.509 & Amazon Account ID to bundle the AMI, and Access Keys to upload to S3. For Windows AMIs: Access Keys for both bundling and uploading. Amazon Account ID of the account you want to share with (without hyphens) CloudFront Key Pairs Username/Password
AWS Account Identifiers Mainly used for permissions & sharing between accounts Multiple accounts can share data (personal vs lab-wide) S3 uses different credentials for sharing Our ID is: 3249-7882- 2037, share with us! Get your account identifier, we ll use it later
X.509 Certificates Same certificates used for SSL Can be created either on your own or by Amazon To create on your own, use openssl & CA.pl, but this is beyond our scope Can only have two X.509 certificates at any time Should be rotated every 90 days
There are too many Credentials, they make my head spin! Remember these three: AWS user name and password Shopping at Amazon, logging into your AWS web console, and account management. SSH Key pairs Logging into and using your cloud machine. X.509 Certificate and Key Maximize your EC2 cloud machine with command line tools.
Security Groups
Security groups allow you to set up firewalls for each of your instances Let you save setups for various services, such as web server, or mail server Default security group does not allow SSH Some services are pre-defined for you. Learn these as you may want to access your instance in ways you haven t thought of (FTP, Telnet, MySQL, Galaxy, etc)
Creating new Security Group Let s create one that gives us SSH & Web access Go to Console -> EC2 -> Security Groups and click Create Security Group Give it a name & description From the bottom pane, you can add connection methods (click on tab Inbound ) Protocol is either TCP (packets) or UDP (stream) Define a port range & source IP or group 0.0.0.0/0: any IP Localhost: only the instance Your computer s IP: only your machine can access the instance For our example, let s add SSH & HTTP and click Apply Rule Changes
Questions?