Cloud Computing For Bioinformatics EC2 and AMIs
Cloud Computing Quick-starting an EC2 instance (let s get our feet wet!)
Cloud Computing: EC2 instance Quick Start On EC2 console, we can click on Launch Instance This will let us get up and going quickly
Cloud Computing: EC2 instance Availability Zone First thing s first: Choose an availability zone that you would like to work with Remember usage prices change per zone Your AMIs, S3 buckets, and EBS volumes live in one availability zone only and will not show up if you choose different ones for your instance versus where you ve saved your data We will use US-West for this tutorial
Cloud Computing: EC2 instance Choosing an AMI Quick Start AMIs are pre-configured for popular operating systems & software packages My AMIs is where you would choose your custom-made instances. You have access to any of your AMIs in the availability zone you ve chosen.
Cloud Computing: EC2 instance Community AMIs AMIs created by users & made public Use at your own risk Instances created from community AMIs will be instance-stores For this example, we ll get one of Canonical s official Ubuntu releases For our purposes, we ll search ami-d197c694 Two types of root devices: EBS (OS on EBS) Instance-store (OS on ephemeral storage) When possible, use EBS Official Ubuntu 10.04 releases: http://uec-images.ubuntu.com/releases/10.04/release/
Cloud Computing: EC2 instance Instance Details Choose the type of instance you want to run. Remember different sizes have different prices! We re using a 64-bit AMI, so the smallest choice available is Large. You can request spot instances here, but we want instant gratification at this point.
Cloud Computing: EC2 instance Kernel ID & RAM ID Kernel ID & RAM Disk ID identify what physical kernel & disk your instance will run on. Super-fiddly and 99.99% of the time you ll go with defaults CloudWatch monitoring is an extra pay-for service which monitors your instance usages & provides metrics based on it. Costs extra Useful if you re interested in the crunchy bits of what the instance is doing, otherwise you can skip.
Cloud Computing: EC2 instance Key Pairs Key pairs are part of the security protocol for AWS. Only a user with the appropriate key pair will be allowed to log onto an instance as root Key pairs are specific to machines. You ll create one for each machine you d like to have root access to your instance Security is not perfect: if your machine has root access, anyone using it has root access. Name your key pair, then download. Remember where you save this, may need it later.
Cloud Computing: EC2 instance Security Groups Think of this as an instance-specific firewall setting Default is useless, as it has no SSH access so we re making our own Create Custom Group Click on New Security Group Give it a name and description From the dropdown at the bottom, choose SSH and add rule
Cloud Computing: EC2 instance Launch! The instance will spin up, and you ll be ready to log in! Clicking on the instance once it says running on status will show details about the instance. Note that this is an ebsbacked instance. If your image is instance-store, you will not be able to create snapshots of your server! Pay attention to the Public DNS, this is what you ll use to SSH into your machine.
Cloud Computing Logging into your Instance
Cloud Computing: Logging in From Windows This one takes a little more doing Download PuTTY and PuTTYgen: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html PuTTY doesn t support the.pem file Amazon provides, we ll need to convert it to PuTTY.ppk Launch PuTTYgen Load the private key created during instance launch (told you to remember where you saved it!) Save the private key
Cloud Computing: Logging in From Windows Launch PuTTY Copy/Paste the Public DNS from the AWS Management console into Host Name or IP Go to Connection -> SSH -> Auth Load the keypair we created in PuTTYgen Go to Connection -> Data and put ubuntu in the Auto-login username (or root for other unix instances) Optionally save this configuration (to avoid doing all this in the future) You re good to go!
Cloud Computing Now that we re on the cloud, Let s take a few minutes and enjoy the view (I can see my house from here!)
You Your server
Cloud Computing Even though you need a computer to access your instance, you are running on a machine with great capabilities This computer will be on independent from the computer you access it from (and also cost money as long as it s on ) Let s go over a few general topics & terms
Cloud Computing Security and Credentials (a.k.a. the Labyrinth)
Cloud Computing: Credentials Credentials serve to indentify your machine to Amazon s services The credentials you use vary with type of API you re accessing To get to Credentials page, go to http://aws.amazon.com/account/, and click on Security Credentials (I forget where this is all the time)
Cloud Computing: Credentials Three different types of credentials for accessing APIs EC2 Key pairs: Created on-the-fly from EC2 instance start or from EC2 console under Key pairs Access keys: Created when account is created, also can be created from Credentials page X.509 Certificates: Created from Credentials page Other authentication methods for other services Username/password Account Identifiers AWS Account ID: If you want to let someone else use one of your AWS resources (like an Amazon EC2 AMI, Amazon EBS snapshot, Amazon SQS queue, etc.), you use the AWS Account ID to specify the account. Canonical User ID: If you want to share Amazon S3 resources (objects and buckets) with another AWS account, you use this ID to specify the account. The ID is a long string. More info at: http://docs.amazonwebservices.com/awssecuritycredentials/1.0/aboutawscredentials.html
Cloud Computing: Credentials If you want to Make a REST or Query API request to an AWS product Use this credential Access Keys Make a SOAP API request to an AWS product X.509 (except for S3 and Amazon Mechanical Turk, which require Access Keys) Access secure pages on the AWS web site or AWS Management Console Username/Password with optional Multi-Factor Authentication Use the Amazon EC2 command line tools X.509 Launch or connect to an Amazon EC2 instance Bundle an Amazon EC2 AMI Share an Amazon EC2 AMI or Amazon EBS snapshot Create a signed URL to access Amazon CloudFront private content Access AWS Discussion Forums or AWS Premium Support site EC2 Key pairs Linux/UNIX AMIs: X.509 & Amazon Account ID to bundle the AMI, and Access Keys to upload to S3. For Windows AMIs: Access Keys for both bundling and uploading. Amazon Account ID of the account you want to share with (without hyphens) CloudFront Key Pairs Username/Password
Cloud Computing: Credentials AWS Account Identifiers Mainly used for permissions & sharing between accounts Multiple accounts can share data (personal vs lab-wide) S3 uses different credentials for sharing Our ID is: 3249-7882- 2037, share with us! Get your account identifier, we ll use it later
Cloud Computing: Credentials X.509 Certificates Same certificates used for SSL Can be created either on your own or by Amazon To create on your own, use openssl & CA.pl, but this is beyond our scope Can only have two X.509 certificates at any time Should be rotated every 90 days
Cloud Computing: Credentials There are too many Credentials, they make my head spin! Remember these three stooges: AWS user name and password Shopping at Amazon, login into your AWS web console, and account management. SSH Key pairs Login into and use your cloud machine. X.509 Certificate and Key Maximize your EC2 cloud machine.
Cloud Computing Security Groups
Cloud Computing: Security Groups Security groups allow you to set up firewalls for each of your instances Let you save setups for various services, such as web server, or mail server Default security group is rather useless (no ssh) Some services are pre-defined for you. Learn these as you may want to access your instance in ways you haven t thought of (FTP, Telnet, MySQL, etc)
Cloud Computing: Security Groups Creating new Security Group Let s create one that gives us SSH access Go to Console -> EC2 -> Security Groups and click Create Security Group Give it a name & description From the bottom pane, you can add connection methods Protocol is either TCP (packets) or UDP (stream) Define a port range & source IP or group 0.0.0.0/0: any IP Localhost: only the instance Your computer s IP: only your machine can access the instance For our example, let s add SSH and click save
Cloud Computing And now we go back to your regularly scheduled program(ming)
Cloud Computing Installing Software onto your AMI
Cloud Computing: Software We re starting with a bare instance, so we want to update the software sudo apt-get update sudo apt-get upgrade Let s install some bioinformatics packages sudo apt-get install bioperl sudo apt-get install python-biopython Biolinux anyone? First edit /etc/apt/sources.list and add biolinux to it deb http://nebc.nerc.ac.uk/bio-linux/ unstable bio-linux sudo apt-get update sudo apt-get install bio-linux-keyring sudo apt-get update
Cloud Computing: Software Now we can install some biolinux goodness sudo apt-get install blast2 bio-linux-blast+ bio-linuxblast ncbi-tools-bin bio-linux-big-blast libncbi6 Let s add R to our instance. Add to sources.list & get GPG key deb http://cran.cnr.berkeley.edu/bin/linux/ubuntu lucid/ sudo gpg --keyserver subkeys.pgp.net --recv-key E2A11821 sudo gpg -a --export E2A11821 sudo apt-key add - sudo apt-get update Now we can install R and some R packages sudo apt-get install r-base sudo apt-get install r-cran-boot r-cran-class r-crancluster r-cran-codetools r-cran-foreign r-cran-kernsmooth r-cran-lattice r-cran-mass r-cran-matrix r-cran-mgcv r- cran-nlme r-cran-nnet r-cran-rpart r-cran-spatial r-cransurvival r-cran-vr r-cran-rodbc
Cloud Computing Saving your AMI
Cloud Computing: Saving & Sharing We ve done all this work, but if we were to terminate this instance it s all gone. To avoid this, we need to save our AMI for future use. Once we have our own AMI updated and loaded with our tools, any number of users can launch an instance with your setup. Think of it as saving your progress in a video game (or your kids video game)
Cloud Computing: Saving & Sharing A Note: Stop vs Terminate All instances have Terminate under the instance actions menu. This will kill the instance This is how you lose all your changes All instances also have Reboot Rebooting is data-safe: you ll keep your data on reboot EBS-backed instances have an additional parameter: Stop This will stop the instance so you won t get charged for having it, but keep everything you ve worked on in the EBS on the instance If you go onto Terminate this instance, all work is still lost! So now we know many ways to lose all our work, how do we actually save it?
Cloud Computing: Saving & Sharing Creating an Image In the Console, click on your instance, then under Instance Actions, click Create Image (EBS AMI). You ll probably notice you ve been kicked out of your session. Amazon makes instances unavailable while it takes the snapshot. Do not create a snapshot while jobs are running, or you ll lose your work. Once the AMI is created, you may terminate your current instance and launch your own instance with your AMI. All the work you ve done will be there.
Cloud Computing: Saving & Sharing AMI Permissions AMIs are, by default, private. This means only your account has access to them. But you can change this. In the Console, go to AMIs and choose the AMI you just created, and click Permissions. You can share this AMI with other AWS account numbers (remember the Security and Credentials slides? Comes into play here.) You can also set this AMI to public, allowing anyone who knows the AMI ID to find and run on this They cannot modify your AMI, but can create snapshots from it. We ll be sharing data later on.
Cloud Computing: Saving & Sharing Speaking of Sharing: Public Data Sets Amazon hosts many public data sets that can be used in the cloud List of data sets is at http://bit.ly/amazonpublicdata (http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryid=243) Public data sets are hosted by Amazon, and are not billed to creators Great way to unhook your big data from charging you money This is public as in public public; you give up access control rights to data To make data set public, fill out form at: http://developer.amazonwebservices.com/connect/entrycreate!default.jspa?categoryid=244&entrytypeid=14 (Now you see why I like to shorten URLs)
Cloud Computing Loading Data into the Clouds (with rockets)
Cloud Computing: Loading Data There are two services that hold file-type data: S3 and EBS Pros and cons: S3 EBS Durability of data Price per GB Disk I/O operations More expensive / GB but pay only for what you use 1/3 cheaper but pay for what is allocated, not just used File sizes Speed Unlimited files, but each capped at 5gb No cap, but size limited to volume size on creation Ease of Use
Cloud Computing: Loading Data Why bioinformaticists should NOT to use S3 Files are capped at 5GB, even if you can make unlimited numbers of them Crappy support with uploaders & file managers Web interface: best interface S3Fox: Firefox addon from developer who vocally rejects updating the software S3browser: Not bad, but windows-only Speed concerns (150mb file = 4 minutes over S3, 20 seconds over mounted EBS) Here s the kicker: Most disk operations require transferring data to EBS regardless! That said, EMR uses S3 exclusively, so sometimes you can t avoid the headaches.
Cloud Computing: Loading Data Loading Data the Easy Way Since your instance is EBS-backed, you refer to Creating an Image earlier in the talk, and do that. Your data is saved! A few problems with this: Data needlessly tied to system it runs on Sharing issues: sharing this instance means sharing your file system AND data If you need to restore your image to an earlier version, you revert your data as well
Cloud Computing Loading Data into the Clouds (Slightly Less Easy Way)
Cloud Computing: Loading Data Loading Data the Slightly Less Easy Way To separate the concerns of data vs operating system, we ll need to create a separate EBS This EBS will allow you to share ONLY your data, and update the data independent of the system you run it on Allows other systems to mount your data Assumes file system format is compatible
Cloud Computing: Loading Data Loading Data the Slightly Less Easy Way Find your instance ID Go to Console, click on your instance, then in the detail pane, copy the digits after EC2 Instance (should begin with i- ) Create an EBS volume (or get volume ID of one already made) ec2-create-volume --size 10 -z us-west-1a (This will print out the volume ID) Use the volume ID to mount the drive onto your instance ec2-attach-volume <volume ID> -i <instance ID> -d <mount point: /dev/sdd > Log into your instance and check that the drive is available fdisk -l /dev/sdd
Cloud Computing: Loading Data Loading Data the Slightly Less Easy Way If your EBS volume is new, you ll have to create a file system on it sudo mkfs.ext3 /dev/sdd Mount and use the new file system sudo mkdir <whatever directory you like: /data> sudo mount /dev/sdd /data And now give yourself permissions on that new directory sudo chown ubuntu:ubuntu /data To unmount the drive from your instance: sudo umount /data Or from your local computer: ec2-detach-volume <volume ID> -i <instance ID>
Cloud Computing: Loading Data Loading Data the Slightly Less Easy Way Best ways to get data: WGET: allows you to pull data from a URL SCP: secure file transfer over SSH protocol, can transfer to/from two machines you have permissions on RSYNC: auto-resumable file transfer across two machines FTP: Good old ftp. Emphasis on old Any of these is still faster than loading data over S3 Don t forget public data sets! You can mount those EBS snapshots as well