Sriram Krishnan, Ph.D.

Size: px
Start display at page:

Download "Sriram Krishnan, Ph.D. sriram@sdsc.edu"

Transcription

1 Sriram Krishnan, Ph.D.

2 (Re-)Introduction to cloud computing Introduction to the MapReduce and Hadoop Distributed File System Programming model Examples of MapReduce Where/how to run MapReduce code On the cloud (AWS) Traditional HPC systems (myhadoop)

3 Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction

4 A consumer can unilaterally provision computing capabilities, such as server time and network storage As needed Automatically Without requiring human interaction from service provider

5 Capabilities are available over the network Accessed through standard mechanisms Heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs). Typically via light-weight Web service APIs

6 The provider s computing resources are pooled to serve all consumers using a multi-tenant model Different physical and virtual resources dynamically assigned and reassigned according to consumer demand The customer generally has no control or knowledge over the exact location of the provided resources But may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter) Examples of resources include storage, processing, memory, network bandwidth, and virtual machines

7 Capabilities can be rapidly and elastically provisioned to quickly scale up And rapidly released to quickly scale down To the consumer, the capabilities available for provisioning often appear to be infinite Can be purchased in any quantity at any time

8 Automatic control and optimization of resource use by leveraging a metering capability At the level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts) Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service

9 Enterprises incur no infrastructure capital costs, just operational costs on a pay-per-use basis Elimination of up-front costs Architecture specifics are abstracted Capacity can be scaled up or down dynamically, and immediately Illusion of infinite resources, on-demand The underlying hardware can be anywhere geographically

10 * Source: McKinsey & Co

11 Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS)

12 The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources Consumer is able to deploy and run arbitrary software, which can include operating systems and applications The consumer does not manage or control the underlying cloud infrastructure But has control over operating systems, storage, deployed applications, and possibly select networking components (e.g., firewalls, load balancers) Examples: Amazon EC2 (compute), S3 (storage)

13 The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created applications using programming languages and tools supported by the provider (e.g., Java, Python,.Net) The consumer does not manage or control the underlying cloud infrastructure, network, servers, operating systems, or storage But the consumer has control over the deployed applications and possibly application hosting environment configurations Examples: Google AppEngine, Amazon Elastic MapReduce (EMR), Microsoft Azure

14 The capability provided to the consumer is to use the provider s applications running on a cloud infrastructure Accessible from various client devices through a thin client interface such as a Web browser (e.g., web-based ) The consumer does not manage or control the underlying cloud infrastructure With the possible exception of limited user-specific application configuration settings Examples: Google Maps, Facebook, Animoto

15 Cloud computing necessitates a different approach for large scale data processing Use of non-specialized commodity hardware and low latency interconnect between nodes Not suitable for implementing traditional databases or HPC systems which require specialized hardware configurations Traditional databases also not known to scale very well to thousands of nodes which may be available on the cloud

16 Programming environment for very large scale data processing on commodity resources Originally developed by Google Managing task executions and data transfers in a shared-nothing environment Distributed data repository ( file system ): Google File System (GFS) Round-robin partitioning of data More info: MapReduce: Infrastructure to support data scatter / gather Pushing the computation to the data More info:

17 Open Source Implementation of Google s MapReduce ecosystem Hadoop Distributed File System (HDFS): Based on the Google File System Hadoop MapReduce: Based on Google s MapReduce

18 Shared-nothing (MapReduce-style) Architectures ETHERNET COMPUTE/DATA CLUSTER WITH LOCAL STOARGE

19 *Source:

20 The computation takes a set of input key/ value pairs, and Produces a set of output key/value pairs. The MapReduce library expresses the computation as two functions Map and Reduce All data resides in a distributed file system

21 map (k1,v1) list(k2,v2) Map function transforms input (key, value) pairs into a list of intermediate (key, value) pairs reduce (k2,list(v2)) list(v2) Reduce function transforms a key and a set of its values to a final list of values (outputs)

22 map(string key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result))

23 Distributed grep Pattern matching in parallel Count of URL access frequency (URL, total count) from web logs Reverse Web-Link Graph (target, list(source)) for URLs

24 *Source:

25 Scalable and conducive to data-intensive and data-parallel applications Fault-tolerant by design workers can be restarted on failures Ability to run on non-specialized commodity hardware And cloud resources such as AWS (EC2 MapReduce service)

26 Easy to use complexity is hidden by the library A large variety of problems are easily expressible in this model Caveat: Not all algorithms lend themselves very well to be implemented using this model Implementation scales to 1000s of machines Restricting the programming model makes it easy to parallelize and deal with failures

27

28 *http://bowtie-bio.sourceforge.net/crossbow/index.shtml Credit: "Cloud computing and the DNA data race", M. C. Schatz, B. Langmead, S. L Salzberg. Nature Biotechnology 28, (2010)

29 Full-featured DEM Bare earth DEM

30

31 Dedicated Hadoop Cluster On the cloud (e.g. Amazon Elastic MapReduce) On HPC resources

32 Dedicated Hadoop cluster Set of commodity nodes configured to run Hadoop in distributed mode

33 4-node Hadoop cluster: Masters: cat $HADOOP_HOME/conf/masters Slaves: cat $HADOOP_HOME/conf/slaves Other configs: ls al $HADOOP_HOME/conf Starting up Hadoop: $HADOOP_HOME/bin/start-all.sh Shutting down Hadoop: $HADOOP_HOME/bin/stop-all.sh

34 Job Tracker: Task Tracker:

35 HDFS commands: Copy data into HDFS: hadoop dfs -copyfromlocal Data input-data HDFS directory listing: hadoop dfs -ls HDFS file cat : hadoop dfs -cat input-data/gutenberg/ txt

36 Running MapReduce jobs: hadoop jar /opt/hadoop/hadoop / hadoop examples.jar wordcount inputdata/gutenberg output Results: List: hadoop dfs ls output View: hadoop dfs cat output/part-r-00000

37 On the cloud Variable number of nodes uses Amazon s Elastic MapReduce

38

39

40

41

42 On traditional HPC systems Need scripts to configure Hadoop on-demand via traditional batch systems HOD: myhadoop:

43 An open source tool for running Hadoop jobs on HPC resources Easy to configure and use for the end-user Play nice with existing batch systems on HPC resources Why do we need such a tool? End-users: I already have Hadoop code and I only have access to regular HPC-style resources Computer Scientists: I want to study the implications of using Hadoop on HPC resources And I don t have root access to these resources

44 Shared-nothing (Hadoop) versus HPCstyle architectures In terms of philosophies and implementation Control and co-existence of Hadoop and HPC batch systems Typically both Hadoop and HPC batch systems (viz., SGE, PBS) need completely control over the resources for scheduling purposes

45 BATCH PROCESSING SYSTEM (PBS, SGE) [1] [2, 3] COMPUTE NODES PERSISTENT MODE [4(b)] HADOOP DAEMONS NON PERSISTENT MODE [4(a)] PARALLEL FILE SYSTEM

46 BOOTSTRAP TEARDOWN

47 Write a PBS/SGE script to request N nodes from the scheduler

48 Configure Hadoop on allocated nodes via myhadoop

49 Run your Hadoop job then clean up after execution

50 myhadoop an open source tool for running Hadoop jobs on HPC resources Without need for root-level access Co-exists with traditional batch systems Allows persistent and non-persistent modes to save HDFS state across runs Tested on SDSC Triton, TeraGrid and UC Grid resources More information Software: https://sourceforge.net/projects/myhadoop/ SDSC Tech Report:

51 Hadoop: MapReduce Tutorial: HDFS Users Guide: Hadoop Streaming: Use of custom scripts as mappers and reducers No Java code required Amazon Elastic MapReduce: Disco (MapReduce in Python):

52 Feel free to me

53

54 Private Cloud Operated solely for an organization May be managed by the organization or a third party and may exist on premise or off premise Community Cloud Shared by several organizations and supports a specific community that has shared concerns May be managed by the organizations or a third party and may exist on premise or off premise

55 Public Cloud Made available to the general public or a large industry group Owned by an organization selling cloud services Hybrid Cloud A composition of two or more clouds (private, community, or public) Bound together by standardized or proprietary technology that enables data and application portability

56 Access to HPC resources is typically via batch systems viz. PBS, SGE, Condor, etc These systems have complete control over the compute resources Users typically can t log in directly to the compute nodes (via ssh) to start various daemons Hadoop manages its resources using its own set of daemons NameNode & DataNode for Hadoop Distributed File System (HDFS) JobTracker & TaskTracker for MapReduce jobs Hadoop daemons and batch systems can t co-exist seamlessly Will interfere with each other s scheduling algorithms

57 1. Enabling execution of Hadoop jobs on shared HPC resources via traditional batch systems a) Working with a variety of batch systems (PBS, SGE, etc) 2. Allowing users to run Hadoop jobs without needing rootlevel access 3. Enabling multiple users to simultaneously execute Hadoop jobs on the shared resource 4. Allowing users to either run a fresh Hadoop instance each time (a), or store HDFS state for future runs (b)

58

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Viswanath Nandigam Sriram Krishnan Chaitan Baru Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance

More information

Strategies for Managing Large-Scale Data: Database Systems, Cloud Computing, and Hadoop. Sriram Krishnan Viswanath Nandigam

Strategies for Managing Large-Scale Data: Database Systems, Cloud Computing, and Hadoop. Sriram Krishnan Viswanath Nandigam Strategies for Managing Large-Scale Data: Database Systems, Cloud Computing, and Hadoop Sriram Krishnan Viswanath Nandigam Outline Traditional Database Implementations for largescale spatial data Hybrid

More information

Cloud Computing. Course: Designing and Implementing Service Oriented Business Processes

Cloud Computing. Course: Designing and Implementing Service Oriented Business Processes Cloud Computing Supplementary slides Course: Designing and Implementing Service Oriented Business Processes 1 Introduction Cloud computing represents a new way, in some cases a more cost effective way,

More information

yvette@yvetteagostini.it yvette@yvetteagostini.it

yvette@yvetteagostini.it yvette@yvetteagostini.it 1 The following is merely a collection of notes taken during works, study and just-for-fun activities No copyright infringements intended: all sources are duly listed at the end of the document This work

More information

IS PRIVATE CLOUD A UNICORN?

IS PRIVATE CLOUD A UNICORN? IS PRIVATE CLOUD A UNICORN? With all of the discussion, adoption, and expansion of cloud offerings there is a constant debate that continues to rear its head: Public vs. Private or more bluntly Is there

More information

The NIST Definition of Cloud Computing (Draft)

The NIST Definition of Cloud Computing (Draft) Special Publication 800-145 (Draft) The NIST Definition of Cloud Computing (Draft) Recommendations of the National Institute of Standards and Technology Peter Mell Timothy Grance NIST Special Publication

More information

White Paper on CLOUD COMPUTING

White Paper on CLOUD COMPUTING White Paper on CLOUD COMPUTING INDEX 1. Introduction 2. Features of Cloud Computing 3. Benefits of Cloud computing 4. Service models of Cloud Computing 5. Deployment models of Cloud Computing 6. Examples

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

Cloud Computing Summary and Preparation for Examination

Cloud Computing Summary and Preparation for Examination Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination

More information

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan sriram@sdsc.edu

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan sriram@sdsc.edu Introduc)on to the MapReduce Paradigm and Apache Hadoop Sriram Krishnan sriram@sdsc.edu Programming Model The computa)on takes a set of input key/ value pairs, and Produces a set of output key/value pairs.

More information

Introduction to Hadoop

Introduction to Hadoop 1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model

More information

Cloud Computing. Adam Barker

Cloud Computing. Adam Barker Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles

More information

See Appendix A for the complete definition which includes the five essential characteristics, three service models, and four deployment models.

See Appendix A for the complete definition which includes the five essential characteristics, three service models, and four deployment models. Cloud Strategy Information Systems and Technology Bruce Campbell What is the Cloud? From http://csrc.nist.gov/publications/nistpubs/800-145/sp800-145.pdf Cloud computing is a model for enabling ubiquitous,

More information

Cloud Computing 159.735. Submitted By : Fahim Ilyas (08497461) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

Cloud Computing 159.735. Submitted By : Fahim Ilyas (08497461) Submitted To : Martin Johnson Submitted On: 31 st May, 2009 Cloud Computing 159.735 Submitted By : Fahim Ilyas (08497461) Submitted To : Martin Johnson Submitted On: 31 st May, 2009 Table of Contents Introduction... 3 What is Cloud Computing?... 3 Key Characteristics...

More information

Business Intelligence (BI) Cloud. Prepared By: Pavan Inabathini

Business Intelligence (BI) Cloud. Prepared By: Pavan Inabathini Business Intelligence (BI) Cloud Prepared By: Pavan Inabathini Summary Federal Agencies currently maintain Business Intelligence (BI) solutions across numerous departments around the enterprise with individual

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

The NIST Definition of Cloud Computing

The NIST Definition of Cloud Computing Special Publication 800-145 The NIST Definition of Cloud Computing Recommendations of the National Institute of Standards and Technology Peter Mell Timothy Grance NIST Special Publication 800-145 The NIST

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

The MapReduce Framework

The MapReduce Framework The MapReduce Framework Luke Tierney Department of Statistics & Actuarial Science University of Iowa November 8, 2007 Luke Tierney (U. of Iowa) The MapReduce Framework November 8, 2007 1 / 16 Background

More information

Capability Paper. Today, aerospace and defense (A&D) companies find

Capability Paper. Today, aerospace and defense (A&D) companies find Today, aerospace and defense (A&D) companies find Today, aerospace and defense (A&D) companies find themselves at potentially perplexing crossroads. On one hand, shrinking defense budgets, an increasingly

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

Cloud Models and Platforms

Cloud Models and Platforms Cloud Models and Platforms Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF A Working Definition of Cloud Computing Cloud computing is a model

More information

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos Research Challenges Overview May 3, 2010 Table of Contents I 1 What Is It? Related Technologies Grid Computing Virtualization Utility Computing Autonomic Computing Is It New? Definition 2 Business Business

More information

Chapter 2 Cloud Computing

Chapter 2 Cloud Computing Chapter 2 Cloud Computing Cloud computing technology represents a new paradigm for the provisioning of computing resources. This paradigm shifts the location of resources to the network to reduce the costs

More information

Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

More information

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Cloud definitions you've been pretending to understand. Jack Daniel, Reluctant CISSP, MVP Community Development Manager, Astaro

Cloud definitions you've been pretending to understand. Jack Daniel, Reluctant CISSP, MVP Community Development Manager, Astaro Cloud definitions you've been pretending to understand Jack Daniel, Reluctant CISSP, MVP Community Development Manager, Astaro You keep using that word cloud. I do not think it means what you think it

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

An Introduction to Cloud Computing Concepts

An Introduction to Cloud Computing Concepts Software Engineering Competence Center TUTORIAL An Introduction to Cloud Computing Concepts Practical Steps for Using Amazon EC2 IaaS Technology Ahmed Mohamed Gamaleldin Senior R&D Engineer-SECC ahmed.gamal.eldin@itida.gov.eg

More information

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS CLOUD COMPUTING Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing

More information

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India rlrooparl@gmail.com 2 PDA College of Engineering, Gulbarga, Karnataka,

More information

MapReduce (in the cloud)

MapReduce (in the cloud) MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Contents. 1. Introduction

Contents. 1. Introduction Summary Cloud computing has become one of the key words in the IT industry. The cloud represents the internet or an infrastructure for the communication between all components, providing and receiving

More information

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com IJCSIT, Volume 1, Issue 5 (October, 2014) e-issn: 1694-2329 p-issn: 1694-2345 A STUDY OF CLOUD COMPUTING MODELS AND ITS FUTURE Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India

More information

MapReduce, Hadoop and Amazon AWS

MapReduce, Hadoop and Amazon AWS MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Cloud Computing & Hosting Solutions

Cloud Computing & Hosting Solutions Cloud Computing & Hosting Solutions SANTA FE COLLEGE CTS2356: NETWORK ADMIN DANIEL EAKINS 4/15/2012 1 Cloud Computing & Hosting Solutions ABSTRACT For this week s topic we will discuss about Cloud computing

More information

MapReduce. Tushar B. Kute, http://tusharkute.com

MapReduce. Tushar B. Kute, http://tusharkute.com MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity

More information

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding

More information

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University

More information

Kent State University s Cloud Strategy

Kent State University s Cloud Strategy Kent State University s Cloud Strategy Table of Contents Item Page 1. From the CIO 3 2. Strategic Direction for Cloud Computing at Kent State 4 3. Cloud Computing at Kent State University 5 4. Methodology

More information

Analysis and Strategy for the Performance Testing in Cloud Computing

Analysis and Strategy for the Performance Testing in Cloud Computing Global Journal of Computer Science and Technology Cloud & Distributed Volume 12 Issue 10 Version 1.0 July 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Cloud Computing: The Next Computing Paradigm

Cloud Computing: The Next Computing Paradigm Cloud Computing: The Next Computing Paradigm Ronnie D. Caytiles 1, Sunguk Lee and Byungjoo Park 1 * 1 Department of Multimedia Engineering, Hannam University 133 Ojeongdong, Daeduk-gu, Daejeon, Korea rdcaytiles@gmail.com,

More information

Cloud Computing Service Models, Types of Clouds and their Architectures, Challenges.

Cloud Computing Service Models, Types of Clouds and their Architectures, Challenges. Cloud Computing Service Models, Types of Clouds and their Architectures, Challenges. B.Kezia Rani 1, Dr.B.Padmaja Rani 2, Dr.A.Vinaya Babu 3 1 Research Scholar,Dept of Computer Science, JNTU, Hyderabad,Telangana

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

MapReduce Job Processing

MapReduce Job Processing April 17, 2012 Background: Hadoop Distributed File System (HDFS) Hadoop requires a Distributed File System (DFS), we utilize the Hadoop Distributed File System (HDFS). Background: Hadoop Distributed File

More information

myhadoop - Hadoop-on-Demand on Traditional HPC Resources

myhadoop - Hadoop-on-Demand on Traditional HPC Resources myhadoop - Hadoop-on-Demand on Traditional HPC Resources Sriram Krishnan San Diego Supercomputer Center 9500 Gilman Dr La Jolla, CA 92093-0505, USA sriram@sdsc.edu Mahidhar Tatineni San Diego Supercomputer

More information

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing Advanced Distributed Systems Cristian Klein Department of Computing Science Umeå University During this course Treads in IT Towards a new data center What is Cloud computing? Types of Clouds Making applications

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Institute of Informatics - UFRGS September 2013 Outline Virtualization References Mell, P., & Grance, T. (2011). The NIST denition of cloud computing (draft).nist special publication, 800, 145. Bojanova,

More information

MapReduce. Course NDBI040: Big Data Management and NoSQL Databases. Practice 01: Martin Svoboda

MapReduce. Course NDBI040: Big Data Management and NoSQL Databases. Practice 01: Martin Svoboda Course NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce Martin Svoboda Faculty of Mathematics and Physics, Charles University in Prague MapReduce: Overview MapReduce Programming

More information

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD Building Out Your Cloud-Ready Solutions Clark D. Richey, Jr., Principal Technologist, DoD Slide 1 Agenda Define the problem Explore important aspects of Cloud deployments Wrap up and questions Slide 2

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Cloud Computing An Elephant In The Dark

Cloud Computing An Elephant In The Dark Cloud Computing An Elephant In The Dark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Cloud Computing 1394/2/7 1 / 60 Amir

More information

Essential Characteristics of Cloud Computing: On-Demand Self-Service Rapid Elasticity Location Independence Resource Pooling Measured Service

Essential Characteristics of Cloud Computing: On-Demand Self-Service Rapid Elasticity Location Independence Resource Pooling Measured Service Cloud Computing Although cloud computing is quite a recent term, elements of the concept have been around for years. It is the maturation of Internet. Cloud Computing is the fine end result of a long chain;

More information

CLOUD COMPUTING. When It's smarter to rent than to buy

CLOUD COMPUTING. When It's smarter to rent than to buy CLOUD COMPUTING When It's smarter to rent than to buy Is it new concept? Nothing new In 1990 s, WWW itself Grid Technologies- Scientific applications Online banking websites More convenience Not to visit

More information

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud) Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University

More information

Certified Cloud Computing Professional Sample Material

Certified Cloud Computing Professional Sample Material Certified Cloud Computing Professional Sample Material 1. INTRODUCTION Let us get flashback of few years back. Suppose you have some important files in a system at home but, you are away from your home.

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Sistemi Operativi e Reti. Cloud Computing

Sistemi Operativi e Reti. Cloud Computing 1 Sistemi Operativi e Reti Cloud Computing Facoltà di Scienze Matematiche Fisiche e Naturali Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 2 Introduction Technologies

More information

myhadoop - Hadoop-on-Demand on Traditional HPC Resources

myhadoop - Hadoop-on-Demand on Traditional HPC Resources myhadoop - Hadoop-on-Demand on Traditional HPC Resources Sriram Krishnan San Diego Supercomputer Center 9500 Gilman Dr MC0505 La Jolla, CA 92093-0505, USA sriram@sdsc.edu Mahidhar Tatineni San Diego Supercomputer

More information

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi Cloud Platforms, Challenges & Hadoop Aditee Rele Karpagam Venkataraman Janani Ravi Cloud Platform Models Aditee Rele Microsoft Corporation Dec 8, 2010 IT CAPACITY Provisioning IT Capacity Under-supply

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Hadoop Parallel Data Processing

Hadoop Parallel Data Processing MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for

More information

What is Cloud Computing? First, a little history. Demystifying Cloud Computing. Mainframe Era (1944-1978) Workstation Era (1968-1985) Xerox Star 1981!

What is Cloud Computing? First, a little history. Demystifying Cloud Computing. Mainframe Era (1944-1978) Workstation Era (1968-1985) Xerox Star 1981! Demystifying Cloud Computing What is Cloud Computing? First, a little history. Tim Horgan Head of Cloud Computing Centre of Excellence http://cloud.cit.ie 1" 2" Mainframe Era (1944-1978) Workstation Era

More information

Cloud Computing Trends

Cloud Computing Trends UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

Cloud Computing Service and Legal Issues

Cloud Computing Service and Legal Issues Cloud Computing Service and Legal Issues Takato Natsui Professor of Law, Meiji University, Tokyo, Japan 1. Introduction Many IT businesses have indicated that cloud computing is a very promising emerging

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Enhancing Operational Capacities and Capabilities through Cloud Technologies

Enhancing Operational Capacities and Capabilities through Cloud Technologies Enhancing Operational Capacities and Capabilities through Cloud Technologies How freight forwarders and other logistics stakeholders can benefit from cloud-based solutions 2013 vcargo Cloud Pte Ltd All

More information

2.1 Hadoop a. Hadoop Installation & Configuration

2.1 Hadoop a. Hadoop Installation & Configuration 2. Implementation 2.1 Hadoop a. Hadoop Installation & Configuration First of all, we need to install Java Sun 6, and it is preferred to be version 6 not 7 for running Hadoop. Type the following commands

More information

Cloud Computing Now and the Future Development of the IaaS

Cloud Computing Now and the Future Development of the IaaS 2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.

More information

Cloud Computing Training

Cloud Computing Training Cloud Computing Training TechAge Labs Pvt. Ltd. Address : C-46, GF, Sector 2, Noida Phone 1 : 0120-4540894 Phone 2 : 0120-6495333 TechAge Labs 2014 version 1.0 Cloud Computing Training Cloud Computing

More information

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Aditya Jadhav, Mahesh Kukreja E-mail: aditya.jadhav27@gmail.com & mr_mahesh_in@yahoo.co.in Abstract : In the information industry,

More information

Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing

Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing Yinong Chen 2 Big Data Big Data Technologies Cloud Computing Service and Web-Based Computing Applications Industry Control

More information

Keywords Cloud computing, Cloud platforms, Eucalyptus, Amazon, OpenStack.

Keywords Cloud computing, Cloud platforms, Eucalyptus, Amazon, OpenStack. Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Cloud Platforms

More information

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com ` CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS Review Business and Technology Series www.cumulux.com Table of Contents Cloud Computing Model...2 Impact on IT Management and

More information

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - - The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - - Hadoop Implementation on Riptide 2 Table of Contents Executive

More information

CSO Cloud Computing Study. January 2012

CSO Cloud Computing Study. January 2012 CSO Cloud Computing Study January 2012 Purpose and Methodology Survey Sample Survey Method Fielded Dec 20, 2011-Jan 8, 2012 Total Respondents Margin of Error +/- 7.3% Audience Base Survey Goal 178 security

More information

A Cost-Evaluation of MapReduce Applications in the Cloud

A Cost-Evaluation of MapReduce Applications in the Cloud 1/23 A Cost-Evaluation of MapReduce Applications in the Cloud Diana Moise, Alexandra Carpen-Amarie Gabriel Antoniu, Luc Bougé KerData team 2/23 1 MapReduce applications - case study 2 3 4 5 3/23 MapReduce

More information

Soft Computing Models for Cloud Service Optimization

Soft Computing Models for Cloud Service Optimization Soft Computing Models for Cloud Service Optimization G. Albeanu, Spiru Haret University & Fl. Popentiu-Vladicescu UNESCO Department, University of Oradea Abstract The cloud computing paradigm has already

More information

Cloud Computing; What is it, How long has it been here, and Where is it going?

Cloud Computing; What is it, How long has it been here, and Where is it going? Cloud Computing; What is it, How long has it been here, and Where is it going? David Losacco, CPA, CIA, CISA Principal January 10, 2013 Agenda The Cloud WHAT IS THE CLOUD? How long has it been here? Where

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

CLOUD COMPUTING USING HADOOP TECHNOLOGY

CLOUD COMPUTING USING HADOOP TECHNOLOGY CLOUD COMPUTING USING HADOOP TECHNOLOGY DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY SALEM B.NARENDRA PRASATH S.PRAVEEN KUMAR 3 rd year CSE Department, 3 rd year CSE Department, Email:narendren.jbk@gmail.com

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Managing Cloud Computing Risk

Managing Cloud Computing Risk Managing Cloud Computing Risk Presented By: Dan Desko; Manager, Internal IT Audit & Risk Advisory Services Schneider Downs & Co. Inc. ddesko@schneiderdowns.com Learning Objectives Understand how to identify

More information

ETH Zurich Department of Computer Science Networked Information Systems - Spring Tutorial #1: Hadoop and MapReduce.

ETH Zurich Department of Computer Science Networked Information Systems - Spring Tutorial #1: Hadoop and MapReduce. ETH Zurich Department of Computer Science Networked Information Systems - Spring 2008 Tutorial #1: Hadoop and MapReduce March 17, 2008 1 Introduction Hadoop 1 is an open-source Java-based software platform

More information

The Hybrid Cloud: Bringing Cloud-Based IT Services to State Government

The Hybrid Cloud: Bringing Cloud-Based IT Services to State Government The Hybrid Cloud: Bringing Cloud-Based IT Services to State Government October 4, 2009 Prepared By: Robert Woolley and David Fletcher Introduction Provisioning Information Technology (IT) services to enterprises

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon

More information

High Performance Computing with Hadoop WV HPC Summer Institute 2014

High Performance Computing with Hadoop WV HPC Summer Institute 2014 High Performance Computing with Hadoop WV HPC Summer Institute 2014 E. James Harner Director of Data Science Department of Statistics West Virginia University June 18, 2014 Outline Introduction Hadoop

More information