Introduction to Cloud Computing



Similar documents
Apache Hadoop new way for the company to store and analyze big data

TP1: Getting Started with Hadoop

Introduction to HDFS. Prasanth Kothuri, CERN

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

How To Use Hadoop

map/reduce connected components

Single Node Setup. Table of contents

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Extreme computing lab exercises Session one

Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

MapReduce. Tushar B. Kute,

Introduction to HDFS. Prasanth Kothuri, CERN

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Apache Hadoop. Alexandru Costan

Extreme computing lab exercises Session one

Hadoop (pseudo-distributed) installation and configuration

Hadoop Architecture. Part 1

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

Introduction to MapReduce and Hadoop

HSearch Installation

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

How To Install Hadoop From Apa Hadoop To (Hadoop)

Hadoop Training Hands On Exercise

HDFS Installation and Shell

Running Hadoop on Windows CCNP Server

Distributed Filesystems

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop MultiNode Cluster Setup

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CS242 PROJECT. Presented by Moloud Shahbazi Spring 2015

Single Node Hadoop Cluster Setup

Hadoop Setup. 1 Cluster

Hadoop Hands-On Exercises

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

MapReduce, Hadoop and Amazon AWS

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Research Laboratory. Java Web Crawler & Hadoop MapReduce Anri Morchiladze && Bachana Dolidze Supervisor Nodar Momtselidze

L1: Introduction to Hadoop

Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Hadoop Distributed File System. Dhruba Borthakur June, 2007

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

Chapter 7. Using Hadoop Cluster and MapReduce

HADOOP MOCK TEST HADOOP MOCK TEST I

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

Hadoop Basics with InfoSphere BigInsights

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Recommended Literature for this Lecture

Yahoo! Grid Services Where Grid Computing at Yahoo! is Today

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

Installation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

A very short Intro to Hadoop

5 HDFS - Hadoop Distributed System

Big Business, Big Data, Industrialized Workload

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Introduction to Cloud Computing

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

Introduc)on to Map- Reduce. Vincent Leroy

Mrs: MapReduce for Scientific Computing in Python

Hadoop Installation MapReduce Examples Jake Karnes

How to install Apache Hadoop in Ubuntu (Multi node setup)

Using BAC Hadoop Cluster

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

HDFS. Hadoop Distributed File System

A Cost-Evaluation of MapReduce Applications in the Cloud

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Sriram Krishnan, Ph.D.

RDMA for Apache Hadoop User Guide

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop Multi-node Cluster Installation on Centos6.6

IBM Software Hadoop Fundamentals

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services

Hadoop Tutorial. General Instructions

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Hadoop Data Warehouse Manual

Performance and Energy Efficiency of. Hadoop deployment models

Big Data 2012 Hadoop Tutorial

Running Kmeans Mapreduce code on Amazon AWS

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

How to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13

Getting to know Apache Hadoop

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

Extreme Computing. Hadoop MapReduce in more detail.

Developing a MapReduce Application

2.1 Hadoop a. Hadoop Installation & Configuration

Transcription:

Introduction to Cloud Computing Qloud Demonstration 15 319, spring 2010 3 rd Lecture, Jan 19 th Suhail Rehman

Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own Cloud (Virtual Machines)! Get your own Cloud from Qloud! 2

Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own Cloud (Virtual Machines)! Get your own Cloud from Qloud! http://www.lgloop.com/images/2183 3

User s Qloud Perspective

Important Qloud servers and interfaces Hadoop Server hadoop.qatar.cmu.edu User workspace (Hadoop/Eclipse) AFS access and login Cloud Gateway Server cloud 01 14.qatar.cmu.edu Gives you access the virtualized resources of the cloud Will be a SOCKS proxy for all your Cloud and Hadoop tasks Qloud Web Interface http://10.160.0.100:9080/cloud/ Easy web interface to request your Cloud Once provisioned, you can checkout the vital stats of your cloud

Steps to get your own Cloud Set the Cloud Gateway Server as the SOCKS proxy in your Browser Log on to the Qloud Web Interface and request your Cloud Wait for our uber geek (aka Brian) to approve Once Brian approves it, you ll have your cloud in 2 hours The entire process should take less than 24 hours You cannot request a cloud at 2am and expect it to be ready at 4am

Qloud Web Interface

It is time for Hadoop The Hadoop infrastructure allows you to run map reduce jobs distributed over your virtual machines In Hadoop MapReduce, one node is designated as the Master Node, and the rest are slaves. HDFS requires one Namenode and several Datanodes. In our setup, the Master Node and Namenode are the same machine. MapReduce Your Cloud Master Node HDFS Slaves Namenode Master Node Datanodes Slaves Namenode Datanodes

Hadoop on Your Cloud Slaves and Datanodes Master Node and Namenode

Where to go from here? Logon to your Master Node ssh to cloud 01 14.qatar.cmu.edu and then ssh to your master node Setup Hadoop Fortunately, your VM s automatically have the correct configuration files for Hadoop the moment they are provisioned (Thanks to Brian!) All you need to do is format HDFS and start the Hadoop services. Lets try running some sample code on Hadoop

Sample MapReduce Code Estimate π Estimating π by random sampling Imagine you have a dart board like so: π is simply the (ratio of darts that land inside the circle to the total number of darts thrown) times 4

Writing this as a Serial Program Throw N darts on the board. Each dart lands at a random position (x,y) on the board. Note if each dart landed inside the circle or not Check if x 2 +y 2 <r Take the total number of darts that landed in the circle as S 4 ( ) = π S N

But I have Millions of Darts! If you want to get an accurate estimate of Pi, you need a large number of random samples. Notice that each dart can be thrown at any time and it s position can be evaluated independently With one person throwing all the darts, it will take a long time to finish If we had N people throwing a dart each, this would be much faster!

But I have Millions of Darts!

How do you do this in Parallel? Let (x,y) be a random position of the dart inside the square. Each (x,y) pair can be evaluated independently. Let us map each (x,y) pair to a result the result being whether it is inside the circle (1) or not (0). Input Result (x1,y1) 1 (x2,y2) (x3,y3) (x4,y4) (x5,y5) 0 1 0 1

The Map function A Map function takes input values and produces an output for each input value in parallel. Input Result (x1,y1) 1 (x2,y2) 0 (x3,y3) 1 (x4,y4) 0 (x5,y5) 1 Map Function

..and then? So we have results of each (x,y) pair lots of them We need to find the number of points inside the circle. We need to sum up the values Result 1 0 1 0 1 SUM S

The Reduce Function A Reduce function takes input values from the Map functions and produces output using a user defined operation. Result 1 0 1 0 1 REDUCE S In this case, addition is the reduce operation.

What about Pi? Now that we have the total number of points inside the circle, S and the total number of points N we ve sampled 4 ( S ) = π * N *Subject to Terms and Conditions 1. N should be large 2. Points should be chosen uniformly at random

Running PI MapReduce Code The MapReduce code creates random (x,y) pair values It gives each node a number of (x,y) pairs and evaluates if it s in the circle or not (MAP) Then some nodes will collect the results of these samples, evaluate the percentage and calculate π (REDUCE) Running the hadoop example: hadoop jar hadoop-0.20.1-examples.jar pi 10 100 Run a jar file The Jar file Name of the java class #maps #samples per map

Working with Files in Hadoop Notice that the Pi example randomly generates input, it does not require any user files. Hadoop is mainly used to work with large data, and large data is always in a file. HDFS to the rescue!

HDFS Basics HDFS is the Hadoop Distributed File System. Files are distributed over all four nodes and are triplereplicated, by default, to tolerate failure.

HDFS Commands All commands begin with hadoop dfs UNIX command Hadoop HDFS Command ls / hadoop dfs ls / cat /dir/filename hadoop dfs cat /dir/filename mkdir dir1 rm /dir/filename hadoop dfs mkdir /dir1 hadoop dfs rm /dir/filename rm r /dir hadoop dfs rmr /dir

Handling Files in HDFS To add files to HDFS: hadoop dfs put localfilename /hdfs_dir/remotefilename To copy files from HDFS to local filesystem hadoop dfs get /hdfs_dir/remotefilename localfilename To copy files inside HDFS filesystem hadoop dfs cp /hdfs_dir/file1 /hdfs_dir/file2

Keeping track of your Hadoop & HDFS Hadoop MapReduce has a JobTracker web interface Keeps Track of the submitted jobs, time taken, errors, logs etc. http://master_node_ip:50030 The HDFS Namenode also maintains a web interface Browse your HDFS files See how much disk space you have remaining in your HDFS. http://name_node_ip:50070

Setting up Eclipse Might be easier to work with an IDE when developing large applications in Hadoop. Eclipse is available on hadoop.qatar.cmu.edu with the MapReduce plugin Setup and Run eclipse @ hadoop.qatar.cmu.edu Use xwin32 on windows machines to run eclipse remotely Configure Eclipse to use your cloud Start developing MapReduce applications