FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar
Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat deparment servers
What is server? Basically computers just like the one you have but no monitor A lot of computers Stacked on a rack Connected in a network fast transfer of data between the computers Shares hard drive storage and in many cases memory (RAM) as well Its only purpose is to process something like crunch numbers that s why there is no need for monitors Usually a variant of Unix installed because they are much more stable than Windows You connect to it via SSH
When should you run jobs on the servers? Jobs that take long time to finish When you have multiple jobs that can run in parallel Pretty much all the time because you need your computer to do other things like... facebook Why not? These machines do nothing except to crunch numbers and process jobs
When NOT to run jobs on the servers During the development stage of your code, you should run small jobs on your computer to get quick results to verify correctness of your code as you develop
Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat deparment servers
Cloud computing Cloud computing is the concept of not having to know where your servers are located These computers are there somewhere in the clouds of servers... When you launch a job into the cloud, one of the available computers will get the job and run it you won t know exactly which computer is running your job But you know how to access it to retrieve your results
Amazon Elastic Cloud Computing EC2 for short An individual computer on the Amazon s cloud of computers is referred to as an instance There are many types of instances micro, small, medium, large, extra large, and so on and so on. Only themicro instances are free... But the other instances are quite cheap if you ever need fast computers
How to use EC2 instances 1. Sign up for an account 2. You need to provide your credit card information make sure you read the rules carefully so that you don t get charged 3. Once you sign up, you get 750 free hours of computing per month! 4. You can use these hours anyway you want for example, you can get 10 EC2 instances at once, run 10 jobs (1 job per instance) simultaneously for 75 hours or get one instance and run a job on it for 750 hours
Free tier usage
Other Amazon services Amazon offers wide variety of services under the brand name of Amazon Web Services (Details: http://aws.amazon.com/) The most useful service for us is EBS and S3, Storage for very large and frequently used data (GB s or even more) These data are easily accessible from the EC2 instances EBS is free up to 30GB S3 is not free but quite cheap
Here are some of the things you can do with AWS: MapReduce for natural language processing (e.g., counting n-grams) Any machine learning problem where datasize does not fit in your personal computer Scientific computing R, MATLAB, python, Java, C++, and etc Storing genome sequences (human and other species) on EBS or S3 process it using EC2 instances Amazon has many large datasets publicly available http://aws.amazon.com/datasets?_encoding= UTF8&jiveRedirect=1
Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat deparment servers
Creating a free instance The instructions are well described here: http://www.r-bloggers.com/ automating-r-scripts-on-amazon-ec2/ You can also Google the following keywords: Amazon EC2 R or other combination of relevant keywords for a step-by-step instructions.
Key-pair Logging in to our department server requires username and password. To log in to an EC2 instance, you use something called key pair. These are files that you download once when you create them and keep in your computer You provide these files when you log into your instance
Logging in Public DNS
Logging in Commands Now you can login using the key-pair file and the public DNS, 1. chmod 400 key.pem 2. ssh -i key.pem ubuntu@public-dns Example: ssh -i key.pem ubuntu@ec2-184-169-246-55.us-west-1.compute.amazonaws.com
Installing R Type the following commands, 1. sudo apt-get update 2. sudo apt-get -y install r-base 3. type R on the command prompt to ensure that the installation was successful The second command is the command for installing R. It may take up to few minutes.
Running R jobs Refer to Song Cai s slides or search the Google by yourself (most of you know how to do it already).
Things that you can do on an EC2 instance Run Java, C, C++, Fortran, and other jobs Host a web server you can get your results via your personal private website One example usage: 1. Use C++, Java, or R to connect to your stock broker s trading platform (API) 2. Run your trading algorithm on multiple instances of EC2 3. Process the results at night using R on EC2 4. View the results through your web on your phone on the bus to school or during a boring morning class or you can just run your R code with multiple different inputs over different instances of EC2
Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat deparment servers
How to use stat department servers... PhD student Song Cai gave a talk on it last year and he asked me to give one on it this year. His slides are very good (concise) so we will just go over it together.