1 Introduction to Cloud Computing Qloud Demonstration , spring rd Lecture, Jan 19 th Suhail Rehman
2 Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own Cloud (Virtual Machines)! Get your own Cloud from Qloud! 2
3 Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own Cloud (Virtual Machines)! Get your own Cloud from Qloud! 3
4 User s Qloud Perspective
5 Important Qloud servers and interfaces Hadoop Server hadoop.qatar.cmu.edu User workspace (Hadoop/Eclipse) AFS access and login Cloud Gateway Server cloud qatar.cmu.edu Gives you access the virtualized resources of the cloud Will be a SOCKS proxy for all your Cloud and Hadoop tasks Qloud Web Interface Easy web interface to request your Cloud Once provisioned, you can checkout the vital stats of your cloud
6 Steps to get your own Cloud Set the Cloud Gateway Server as the SOCKS proxy in your Browser Log on to the Qloud Web Interface and request your Cloud Wait for our uber geek (aka Brian) to approve Once Brian approves it, you ll have your cloud in 2 hours The entire process should take less than 24 hours You cannot request a cloud at 2am and expect it to be ready at 4am
7 Qloud Web Interface
8 It is time for Hadoop The Hadoop infrastructure allows you to run map reduce jobs distributed over your virtual machines In Hadoop MapReduce, one node is designated as the Master Node, and the rest are slaves. HDFS requires one Namenode and several Datanodes. In our setup, the Master Node and Namenode are the same machine. MapReduce Your Cloud Master Node HDFS Slaves Namenode Master Node Datanodes Slaves Namenode Datanodes
9 Hadoop on Your Cloud Slaves and Datanodes Master Node and Namenode
10 Where to go from here? Logon to your Master Node ssh to cloud qatar.cmu.edu and then ssh to your master node Setup Hadoop Fortunately, your VM s automatically have the correct configuration files for Hadoop the moment they are provisioned (Thanks to Brian!) All you need to do is format HDFS and start the Hadoop services. Lets try running some sample code on Hadoop
11 Sample MapReduce Code Estimate π Estimating π by random sampling Imagine you have a dart board like so: π is simply the (ratio of darts that land inside the circle to the total number of darts thrown) times 4
12 Writing this as a Serial Program Throw N darts on the board. Each dart lands at a random position (x,y) on the board. Note if each dart landed inside the circle or not Check if x 2 +y 2 <r Take the total number of darts that landed in the circle as S 4 ( ) = π S N
13 But I have Millions of Darts! If you want to get an accurate estimate of Pi, you need a large number of random samples. Notice that each dart can be thrown at any time and it s position can be evaluated independently With one person throwing all the darts, it will take a long time to finish If we had N people throwing a dart each, this would be much faster!
14 But I have Millions of Darts!
15 How do you do this in Parallel? Let (x,y) be a random position of the dart inside the square. Each (x,y) pair can be evaluated independently. Let us map each (x,y) pair to a result the result being whether it is inside the circle (1) or not (0). Input Result (x1,y1) 1 (x2,y2) (x3,y3) (x4,y4) (x5,y5)
16 The Map function A Map function takes input values and produces an output for each input value in parallel. Input Result (x1,y1) 1 (x2,y2) 0 (x3,y3) 1 (x4,y4) 0 (x5,y5) 1 Map Function
17 ..and then? So we have results of each (x,y) pair lots of them We need to find the number of points inside the circle. We need to sum up the values Result SUM S
18 The Reduce Function A Reduce function takes input values from the Map functions and produces output using a user defined operation. Result REDUCE S In this case, addition is the reduce operation.
19 What about Pi? Now that we have the total number of points inside the circle, S and the total number of points N we ve sampled 4 ( S ) = π * N *Subject to Terms and Conditions 1. N should be large 2. Points should be chosen uniformly at random
20 Running PI MapReduce Code The MapReduce code creates random (x,y) pair values It gives each node a number of (x,y) pairs and evaluates if it s in the circle or not (MAP) Then some nodes will collect the results of these samples, evaluate the percentage and calculate π (REDUCE) Running the hadoop example: hadoop jar hadoop examples.jar pi Run a jar file The Jar file Name of the java class #maps #samples per map
21 Working with Files in Hadoop Notice that the Pi example randomly generates input, it does not require any user files. Hadoop is mainly used to work with large data, and large data is always in a file. HDFS to the rescue!
22 HDFS Basics HDFS is the Hadoop Distributed File System. Files are distributed over all four nodes and are triplereplicated, by default, to tolerate failure.
23 HDFS Commands All commands begin with hadoop dfs UNIX command Hadoop HDFS Command ls / hadoop dfs ls / cat /dir/filename hadoop dfs cat /dir/filename mkdir dir1 rm /dir/filename hadoop dfs mkdir /dir1 hadoop dfs rm /dir/filename rm r /dir hadoop dfs rmr /dir
24 Handling Files in HDFS To add files to HDFS: hadoop dfs put localfilename /hdfs_dir/remotefilename To copy files from HDFS to local filesystem hadoop dfs get /hdfs_dir/remotefilename localfilename To copy files inside HDFS filesystem hadoop dfs cp /hdfs_dir/file1 /hdfs_dir/file2
25 Keeping track of your Hadoop & HDFS Hadoop MapReduce has a JobTracker web interface Keeps Track of the submitted jobs, time taken, errors, logs etc. The HDFS Namenode also maintains a web interface Browse your HDFS files See how much disk space you have remaining in your HDFS.
26 Setting up Eclipse Might be easier to work with an IDE when developing large applications in Hadoop. Eclipse is available on hadoop.qatar.cmu.edu with the MapReduce plugin Setup and Run hadoop.qatar.cmu.edu Use xwin32 on windows machines to run eclipse remotely Configure Eclipse to use your cloud Start developing MapReduce applications
MULTI LICENSES The information in this document is subject to change without notice and does not represent a commitment on the part of Propellerhead Software AB. The software described herein is subject
Getting Started Guide Cloud Server powered by Mac OS X Getting Started Guide Page 1 Getting Started Guide: Cloud Server powered by Mac OS X Version 1.0 (02.16.10) Copyright 2010 GoDaddy.com Software, Inc.
LISTSERV in a High-Availability Environment DRAFT Revised 2010-01-11 Introduction For many L-Soft customers, LISTSERV is a critical network application. Such customers often have policies dictating uptime
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
WebTrends Enterprise Reporting Server for Solaris and Linux Administrator Guide October 1999 Edition Copyright 1999 WebTrends Corporation Disclaimer WebTrends Corporation makes no representations or warranties
BACKUPS FAQ Why Backup? Backup Location Net Connections Programs Schedule Procedures Frequently Asked Questions I do a backup once a week and sometimes longer, is this sufficient? In most cases, I would
MATLAB Distributed Computing Server Cloud Center User s Guide How to Contact MathWorks Latest news: Sales and services: User community: Technical support: www.mathworks.com www.mathworks.com/sales_and_services
NovaBACKUP User Manual NovaStor / May 2014 2014 NovaStor, all rights reserved. All trademarks are the property of their respective owners. Features and specifications are subject to change without notice.
Outlook Web App User Guide Outlook Web App (OWA) is the new version of the webmail system that enables you to access your email from home, or wherever you have an Internet connected computer. There are
Linksys Business Series Network Storage System Getting Started Guide GETTING STARTED GUIDE Linksys Business Series Network Storage System Models NSS4000 and NSS6000 Series 2007-2008 Copyright 2007-2008,
Big Data for Social Good Example Demo Demo Summary This demo will take you through the logistics and process involved in running Analytics for Hadoop Service (BigInsights) on IBM Bluemix, loading external
Network Management & Monitoring Request Tracker (RT) Installation and Configuration Notes: Commands preceded with "$" imply that you should execute the command as a general user - not as root. Commands
McAfee SIEM Alarms Setting up and Managing Alarms Introduction McAfee SIEM provides the ability to send alarms on a multitude of conditions. These alarms allow for users to be notified in near real time
Best Practices for Deploying and Managing Linux with Red Hat Network Abstract This technical whitepaper provides a best practices overview for companies deploying and managing their open source environment
Physical to Virtual Migration with Portlock Storage Manager What this document covers: This document covers physical to virtual migration using Portlock Storage Manager to image the physical server and
Activity Builder TP-1908-V02 Copyright Information TP-1908-V02 2014 Promethean Limited. All rights reserved. All software, resources, drivers and documentation supplied with the product are copyright Promethean
Installation Guide for contineo Sebastian Stein Michael Scholz 2007-02-07, contineo version 2.5 Contents 1 Overview 2 2 Installation 2 2.1 Server and Database....................... 2 2.2 Deployment............................
Automated Performance and Scalability Testing of Rich Internet Applications Niclas Snellman Master of Science Thesis Supervisor: Ivan Porres Software Engineering Laboratory Department of Information Technologies
With hundreds of Help Desk software packages available, how do you choose the best one for your company? When conducting an Internet search, how do you wade through the overwhelming results? The answer
HP D2D NAS Integration with HP Data Protector 6.11 Abstract This guide provides step by step instructions on how to configure and optimize HP Data Protector 6.11 in order to back up to HP D2D Backup Systems
Microsoft Office Live Meeting User Guide Setting up Microsoft Office Live Meeting The Live Meeting Manager is a web based tool that helps you manage Office Live Meeting tasks from scheduling and joining
www.freeraidrecovery.com Practical issues in DIY RAID Recovery Based on years of technical support experience 2012 www.freeraidrecovery.com This guide is provided to supplement our ReclaiMe Free RAID Recovery
How to Convert Outlook Email Folder Into a Single PDF Document An introduction to converting emails with AutoPortfolio plug-in for Adobe Acrobat Table of Contents What Software Do I Need?... 2 Converting