CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies



Similar documents
Ch. 4 - Topics of Discussion

CRN# CPET Cloud Computing: Technologies & Enterprise IT Strategies

Ch. 4 - Topics of Discussion

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies. Virtualization of Clusters and Data Centers

Cloud Computing Training

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Computing. M< MORGAN KAUfMANM. Distributed and Cloud. Processing to the. From Parallel. Internet of Things. Geoffrey C. Fox. Dongarra.

Open Source Technologies on Microsoft Azure

Cloud Computing. Lecture 24 Cloud Platform Comparison

Certified Cloud Computing Professional VS-1067

Cloud computing - Architecting in the cloud

Assignment # 1 (Cloud Computing Security)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

MS 10978A Introduction to Azure for Developers

Cloud Computing Summary and Preparation for Examination

Introduction to Mobile Middleware. Introduction Mobile Middleware

Course 10978A Introduction to Azure for Developers

Public Cloud Offerings and Private Cloud Options. Week 2 Lecture 4. M. Ali Babar

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

A Cost-Evaluation of MapReduce Applications in the Cloud

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Azure Data Lake Analytics

An HPC Application Deployment Model on Azure Cloud for SMEs

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Microsoft Azure for IT Professionals 55065A; 3 days

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public

Cloud Platforms in the Enterprise

The Inside Scoop on Hadoop

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

MS 20487A Developing Windows Azure and Web Services

Cloud Compu)ng. [Stephan Bergemann, Björn Bi2ns] IP 2011, Virrat

This module provides an overview of service and cloud technologies using the Microsoft.NET Framework and the Windows Azure cloud.

10978A: Introduction to Azure for Developers

WINDOWS AZURE DATA MANAGEMENT

PaaS - Platform as a Service Google App Engine

Hadoop and Map-Reduce. Swati Gore

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

OpenStack Introduction. November 4, 2015

Microsoft Azure: Opção de Nuvem para Todo o Desenvolvedor. Danilo Bordini & Osvaldo Daibert

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

Systems Integration in the Cloud Era with Apache Camel. Kai Wähner, Principal Consultant

Big-Data Computing with Smart Clouds and IoT Sensing

Windows Azure Platform

Microsoft Research Windows Azure for Research Training

A Brief Outline on Bigdata Hadoop

Cloud Computing. Adam Barker

A bit about Hadoop. Luca Pireddu. March 9, CRS4Distributed Computing Group. (CRS4) Luca Pireddu March 9, / 18

Amazon AWS in.net. Presented by: Scott Reed

Microsoft Research Microsoft Azure for Research Training

BIG DATA SOLUTION DATA SHEET

Apache Hadoop. Alexandru Costan

Platforms in the Cloud

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies. Lecture 1

Users VM A A A. Application. Compute/Storage/Network. VM Virtual Machine. On-Premises Data Center

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Cloud Computing Trends

Developing Microsoft Azure Solutions

CLOUD COMPUTING & WINDOWS AZURE

Developing Microsoft Azure Solutions 20532A; 5 days

Contents. Preface Acknowledgements. Chapter 1 Introduction 1.1

A programming model in Cloud: MapReduce

CSE-E5430 Scalable Cloud Computing Lecture 2

PRIVACY AWARE ACCESS CONTROL FOR CLOUD-BASED DATA PLATFORMS

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Google Cloud Platform The basics

Cloud Computing. Up until now

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Resource Scalability for Efficient Parallel Processing in Cloud

Introduction to Azure for Developers

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Big Data Explained. An introduction to Big Data Science.

Cloud Essentials for Architects using OpenStack

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Jeffrey D. Ullman slides. MapReduce for data intensive computing

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

MOC DEVELOPING WINDOWS AZURE AND WEB SERVICES

Integrating Big Data into the Computing Curricula

WHITE PAPER. Migrating an existing on-premise application to Windows Azure Cloud

Course 20533: Implementing Microsoft Azure Infrastructure Solutions

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Course Code CP204. Theory : 04. Practical : 01. Course Credit. Tutorial : 00. Credits : 05. Course Learning Outcomes

The Cloud to the rescue!

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

Transcription:

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments Part 1 of 2 Spring 2013 A Specialty Course for Purdue University s M.S. in Technology Graduate Program: IT/Advanced Computer App Track Paul I-Hai Lin, Professor Dept. of Computer, Electrical and Information Technology Purdue University Fort Wayne Campus 1 References 1. Chapter 6. Cloud Programming and Software Environments, Book Distributed and Cloud Computing, by Kai Hwang, Geoffrey C. Fox a,d Jack J. Dongarra, published by Mogan Kaufmman/ Elsevier Inc. 2 1

Features of Cloud and Grid Platforms Important Cloud Platform Capabilities Physical or virtual computing Platform Massive data storage service, distributed file system Massive database storage service Massive data processing method and programming model Workflow and data query language support Programming interface and service deployment (Web interface, special API: J2EE, PHP, ASP, Rails) Runtime support Support services (MapReduce) 3 Features of Cloud and Grid Platforms Infrastructure Cloud Features Accounting Appliances (VM,, Message Passing Interface MPI) Authentication and authorization Data transport Operating systems: Apple, Android, Linux, Windows Program library Registry Security Scheduling Gang scheduling (multiple data-parallel tasks in a scalable fashion; provided automatically by MapReduce) Software as a Service (SaaS) Virtualization 4 2

Features of Cloud and Grid Platforms Cloud Capabilities and Platform Features Azure s Platform Features Azure Table, Queues, Blobs (Binary Large Objects: images, audios, multimedia objects), SQL Database, Web and Worker roles. Amazon s Platform Features IaaS, SimpleDB, queues, notification, monitoring, content delivery network, relational database, MapReduce (Hadoop) Google Google App Engine (GAE) 5 Features of Cloud and Grid Platforms Workflow links multiple cloud and noncloud services in real applications on demand. Pegasus, Taverna, Kepler Commercial systems: Pipeline Pilot, AVS, LIMS environment 6 3

MapReduce Framework A software framework that support parallel and distributed computing on large data sets. Providing users with two interfaces in the form of two functions Map() Reduce() 7 Control Flow Implementation of MapReduce 8 4

9 10 5

Table 6.5 Comparison of MapReduce Type Systems 11 Figure 6.2 Logical Data Flow in 5 Processing Steps in MapReduce Processing Stages (Key, Value) Pairs are generated by the Map function over multiple available Map Workers (VM instances). These pairs are then sorted and group based on key ordering. Different key-groups are then processed by multiple Reduce Workers in parallel. 12 6

Figure 6.3 A Word Counting Example on <Key, Count> Distribution 13 MapReduce Actual Data and Control Flow 1. Data Partitioning 2. Computation Partitioning 3. Determining the Master and Workers 4. Readings the Input Data (data distribution) 5. Map function 6. Combiner function 7. Partitioining function 8. Synchronization 9. Communication 10. Sorting and Grouping 11. Reduce function 14 7

Figure 6.4 Use of MapReduce partitioning function to link the Map and Reduce workers 15 (Courtesy of Jeffrey Dean, Google, 2008) 16 8

17 Hadoop A software platform originally developed by Yahoo to enable user write and run applications over vast distributed data. Attractive Features in Hadoop: Scalable Economical: an open-source MapReduce Efficient Reliable 18 9

Apache Hadoop Architecture 19 Figure 6.11 HDFS (Hadoop DFS) and MapReduce Architecture 20 10

Figure 6.12 Data flow in running a MapReduce Architecture 21 22 11

MapReduce and Extensions 23 Conclusion 24 12