What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea



Similar documents
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Application Development. A Paradigm Shift

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Map Reduce & Hadoop Recommended Text:

Qsoft Inc

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Open source Google-style large scale data analysis with Hadoop

Big Data With Hadoop

Big Data and Apache Hadoop s MapReduce

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

How To Use Hadoop

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Chase Wu New Jersey Ins0tute of Technology

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

BBM467 Data Intensive ApplicaAons

Sriram Krishnan, Ph.D.

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Hadoop implementation of MapReduce computational model. Ján Vaňo

MapReduce. Introduction and Hadoop Overview. 13 June Lab Course: Databases & Cloud Computing SS 2012

Deploying Hadoop with Manager

Big Data Management and NoSQL Databases

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Introduction to MapReduce and Hadoop

CSE-E5430 Scalable Cloud Computing Lecture 2

A bit about Hadoop. Luca Pireddu. March 9, CRS4Distributed Computing Group. (CRS4) Luca Pireddu March 9, / 18

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

How To Scale Out Of A Nosql Database

Cloudera Manager Training: Hands-On Exercises

Big Data Introduction

MapReduce. Tushar B. Kute,

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

BIG DATA HADOOP TRAINING

Apache Hadoop. Alexandru Costan

Using BAC Hadoop Cluster

Implement Hadoop jobs to extract business value from large and varied data sets

Big Data : Experiments with Apache Hadoop and JBoss Community projects

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Introduction to Hadoop

H2O on Hadoop. September 30,

File S1: Supplementary Information of CloudDOE

Accelerating and Simplifying Apache

Performance and Energy Efficiency of. Hadoop deployment models

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

Open source large scale distributed data management with Google s MapReduce and Bigtable

A Cost-Evaluation of MapReduce Applications in the Cloud

How To Install Hadoop From Apa Hadoop To (Hadoop)

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Hadoop. Sunday, November 25, 12

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Introduction to Hadoop

Apache Hadoop new way for the company to store and analyze big data

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Case Study : 3 different hadoop cluster deployments

Data-Intensive Computing with Map-Reduce and Hadoop

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

CSE-E5430 Scalable Cloud Computing. Lecture 4

MapReduce with Apache Hadoop Analysing Big Data

Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Apriori-Map/Reduce Algorithm

COURSE CONTENT Big Data and Hadoop Training

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

ITG Software Engineering

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Introduction to Cloud Computing

Anumol Johnson et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (1), 2015,

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

Open Source Technologies on Microsoft Azure

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

How to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Yahoo! Grid Services Where Grid Computing at Yahoo! is Today

Transcription:

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Overview Riding Google App Engine Taming Hadoop Summary

Riding Google App Engine

Riding Google App Engine Google App Engine in a nutshell: A platform to create and deploy web applications on Google s infrastructure An example of PaaS Free and scalable (to certain extent) Apps should be written in Java or Python For first-timers: http://appengine.google.com

Riding Google App Engine (cont d) Quickstart: 1. Create a Google account (if you don t have one) 2. Login to appengine 3. Verify your account by SMS

Riding Google App Engine (cont d) Quickstart (cont d): 4. Pick a name for your application

Riding Google App Engine (cont d) Quickstart (cont d): 5. Read the documentation to grab more info about how-to Url: http://code.google.com/appengine/docs/ Also check about Java compatibility in GAE Url: http://groups.google.com/group/google-appenginejava/web/will-it-play-in-app-engine?pli=1 6. We are good to go. Let s start our first Google App Engine-based application. We will use Java and Eclipse IDE in this showcase

Riding Google App Engine (cont d) Development stage: Step 1a: Download and install JDK 1.6 from Sun s website (if you haven t done so) Step 1b: Download and install Eclipse (if you haven t done so) Eclipse 3.5 Galileo is available at WISE FTP server (ftp://wise.ajou.ac.kr)

Riding Google App Engine (cont d) Development stage: Step 2: Read this tutorial about developing in Java for GAE Part 1: http://www.ibm.com/developerworks/java/library/jgaej1/ Part 2: http://www.ibm.com/developerworks/java/library/jgaej2/ Part 3: http://www.ibm.com/developerworks/java/library/jgaej3.html

Riding Google App Engine (cont d) Development stage: Step 3: Download GAE plugin for Eclipse Click Help > Install New Software in Eclipse Input url: http://dl.google.com/eclipse/plugin/3.5 Check the option to download the SDK After the plugin is downloaded, you will see these icons in the menu bar Create a new web application project Compile project Deploy project on Google infra

Riding Google App Engine (cont d) Development stage: Step 4a: Create a GAE project

Riding Google App Engine (cont d) Development stage: Step 4b: Observe the packages and files created in the project tree

Riding Google App Engine (cont d) Development stage: Step 5: Modify the default code and deploy on Google

Riding Google App Engine (cont d) Development stage: Step 6: Take a look at the deployed web application by visiting the application url

Riding Google App Engine (cont d) Development stage: Step 7: Try to deploy more advanced examples provided in the tutorial Step 8: Plan the application you will build for your project: It should exploit the features associated with a cloud application service Scalability High availability Economies of scale (if you go with the Google App Engine for Business) Recommendation: Exploit Google data storage functionality using JDO (Java Data Objects) or JPA (Java Persistence API)

Taming Hadoop

Taming Hadoop Hadoop in brief: Platform consisting of a collection of open-source software for reliable, scalable, distributed computing Initially an Apache project (http://hadoop.apache.org) but several forks or distros are now available (e.g.: Cloudera Hadoop) Hadoop!= MapReduce but MapReduce Hadoop Ì

Taming Hadoop (cont d) Subprojects (or subplatforms) in Hadoop: Hadoop common: common utilities that support other subplatforms Chukwa: data collection system for managing large distributed systems HBase: scalable, distributed database supporting big tables HDFS: primary storage system used by Hadoop applications Hive: data warehouse infrastructure that provides data summarization and SQL-like ad hoc querying MapReduce: implementation of Google s MapReduce programming model Pig: high-level data-flow language and execution framework for parallel computation ZooKeeper: high-performance coordination service for distributed applications

Taming Hadoop (cont d) MapReduce programming model*: (*)J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Proc. of 6 th OSDI, 2004

Taming Hadoop (cont d) Two phases in MapReduce: Map: user provides a map function that processes a key/value pairs and output intermediate key/value pairs Reduce: a (user-supplied) function merges all intermediate values associated with the same intermediate key

Taming Hadoop (cont d) Input Map Shuffle/sort Map Map Map Reduce Reduce Output Output Map Map

Taming Hadoop (cont d) How Hadoop adopts MapReduce model: Image by Ricky Ho

Taming Hadoop (cont d) Some terms in Hadoop MapReduce (MR): Job: all jar files (or classes) which contains a MR program JobTracker: a Hadoop service that controls the distribution of MapReduce tasks to nodes in the cluster Task: program that executes individual map and reduce TaskTracker: a Hadoop service that starts and tracks MR task in a networked environment. It communicates with JobTracker for task assignment and reporting results HDFS: Hadoop Distributed File System NameNode: directory namespace manager and inode table for HDFS DataNode: a class (and program) in a cluster node that stores a set of blocks for DFS deployment

Taming Hadoop (cont d) Hadoop MapReduce sample applications: WordCount LogAnalyzer Protein sequence analyzer

Taming Hadoop (cont d) WordCount application: We provide input files and Hadoop will output words found along with its recurrence We use two samples: WordCount with MapReduce java library WordCount without MapReduce java library We specify the Mapper and Reducer functions Execute the job with HadoopStreaming

Taming Hadoop (cont d) Step 1: copy input files to HDFS

Taming Hadoop (cont d) Step 2: Hadoop MapReduce with native library

Taming Hadoop (cont d) Step 3: Hadoop MapReduce with user-supplied Mapper and Reducer functions

Summary It is necessary to plan your project well Problem to attack Project complexity Resource availability (knowledge, hands-on experience, time, infrastructure, etc) You can build your project by using infrastructure and platform on WISE or other public ones Eucalyptus Community Cloud Amazon Web Services Google App Engine Github etc

THANK YOU