S06: Open-Source Stack for Cloud Computing

Similar documents

Open Cirrus : A Global Testbed for Cloud Computing Research

Open Cirrus: Towards an Open Source Cloud Stack

Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Assembling Cloud Infrastructures with Eucalyptus

Putchong Uthayopas, Kasetsart University

Performance measurement of a private Cloud in the OpenCirrus Testbed

Sistemi Operativi e Reti. Cloud Computing

The Inside Scoop on Hadoop

Amazon EC2 Product Details Page 1 of 5

Introduction to Cloud Computing

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Hadoop & its Usage at Facebook

BIG DATA TRENDS AND TECHNOLOGIES

SERVER 101 COMPUTE MEMORY DISK NETWORK

Big Workflow: More than Just Intelligent Workload Management for Big Data

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Distributed and Cloud Computing

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Hadoop & its Usage at Facebook

Application Development. A Paradigm Shift

Adobe Deploys Hadoop as a Service on VMware vsphere

Hadoop: Embracing future hardware

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

70-414: Implementing a Cloud Based Infrastructure. Course Overview

Virtualizing Apache Hadoop. June, 2012

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Windows Azure and private cloud

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Red Hat Storage Server

Open Source for Cloud Infrastructure

Apache Hadoop. Alexandru Costan

Cloud Computing Paradigm

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS

Intro to Virtualization

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

An Introduction to Private Cloud

OpenNebula Leading Innovation in Cloud Computing Management

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Toward a Unified Ontology of Cloud Computing

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Emerging Technology for the Next Decade

Introduction to Cloud Computing

Addressing Storage Management Challenges using Open Source SDS Controller

CLOUD COMPUTING USING HADOOP TECHNOLOGY

Cloud and Virtualization to Support Grid Infrastructures

Cloud Computing Training

Experiences with Lustre* and Hadoop*

Design and Building of IaaS Clouds

GeoGrid Project and Experiences with Hadoop

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Big Data and Apache Hadoop s MapReduce

COM 444 Cloud Computing

Bright Cluster Manager

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

The Greenplum Analytics Workbench

Automating Big Data Benchmarking for Different Architectures with ALOJA

Private Clouds with Open Source

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Extending Hadoop beyond MapReduce

Hadoop. Sunday, November 25, 12

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Cloud computing - Architecting in the cloud

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Case Study : 3 different hadoop cluster deployments

How to Use a LAMP Stack on vcloud for Optimal PHP Application Performance. A VMware Cloud Evaluation Reference Document

Lustre * Filesystem for Cloud and Hadoop *

Cloud Computing Architecture: A Survey

THE HADOOP DISTRIBUTED FILE SYSTEM

OCCI and Security Operations in OpenStack - Overview

Cloud Computing and Open Source: Watching Hype meet Reality

An HPC Application Deployment Model on Azure Cloud for SMEs

Hadoop IST 734 SS CHUNG

Microsoft Research Windows Azure for Research Training

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Data movement for globally deployed Big Data Hadoop architectures

Transcription:

S06: Open-Source Stack for Cloud Computing Milind Bhandarkar Yahoo! Richard Gass Intel Michael Kozuch Intel Michael Ryan Intel 1

Agenda Sessions: (A) Introduction 8.30-9.00 (B) Hadoop 9.00-10.00 Break 10.00-10.15 Hadoop 10.15-11:30 Lunch 11.30-12.30 (C) Pig 12.30-1.30 Break 1.30-1.45 (D) Tashi 1.45-3.30 Break 3.30-3.45 (E) PRS 3.45-5.00 I. Speaker intros II. Motivation III. Open Cirrus IV. Open Cirrus software stack V. Getting involved 2

Session A: Introduction 3

Michael Kozuch (Intro) Michael Kozuch is a Principal Engineer with Intel Labs Pittsburgh and manager of the ILP Systems Research and Engineering group Manages the Intel Open Cirrus cluster and is the PI for the Tashi research project Michael is a 12-year veteran of Intel and contributed to the development of Intel s VT and TXT technologies He has published 25+ scientific papers and 20+ patents 4

Milind Bhandarkar (Hadoop) Lead Yahoo! Grid Solutions Team since June 2005 Contributor to Hadoop since January 2006 Trained 1000+ Hadoop users at Yahoo! & elsewhere 20+ years of experience in Parallel Programming 5

Michael Ryan (Tashi) Michael is currently a research engineer with Intel Labs Pittsburgh Lead developer for Tashi Serves as sysadmin for the Intel Open Cirrus site Coordinates the Global Monitoring service for Open Cirrus 6

Richard Gass (PRS) Richard is currently a research engineer with Intel Labs Pittsburgh Lead developer for PRS Serves as sysadmin for the Intel OpenCirrus site Richard has published 9+ scientific papers and is also an (imminent) PhD candidate with University Pierre and Marie Curie LIP6 in Paris 7

Motivation 8

Why Open and Cloud makes sense Cloud Computing is a new, critical technology Efficiency: Admin costs aggregated Scalability: From 1 to 1000 servers in 10 sec. flat Empowerment: Anyone can buy a cluster Open Communities enable rapid innovation Exchange of ideas: Knowledge grows Constructive Darwinism: Best tools survive/evolve Empowerment: Anyone can build a LAMP stack Rapidly developing and deploying innovative computing technologies 9

Research Interest: Big Data Interesting applications are data hungry The data grows over time The data is immobile 100 TB @ 1Gbps ~= 10 days Compute comes to the data Big Data clusters are the new libraries (Data-Rich Computing theme proposal. J. Campbell, et al., 2007) The value of a cluster is its data 10

Open Cirrus 11

Open Cirrus Cloud Computing Testbed Collaboration between industry and academia, sharing hardware infrastructure software infrastructure research applications and data sets UIUC* KIT* ISPRAS* ETRI* IDA* MIMOS* Sponsored by HP, Intel, and Yahoo! (with additional support from NSF) 12 9 sites currently, target of around 20 in the next two years

Open Cirrus Objectives Foster systems research around cloud computing Vendor-neutral open-source stacks and APIs for the cloud Expose research community to enterprise level requirements Provide realistic traces of cloud workloads How are we unique Support for systems research and applications research Federation of heterogeneous datacenters Collection of interesting data sets Independently-managed sites providing a cooperative research testbed 13

User Access to Open Cirrus User access is organized around Research Projects Led by Principal Investigator (PI) Project PIs apply to each site separately Identifying additional team members Contact information for applications to each site are available on the Open Cirrus Web site (http://opencirrus.org) Each Open Cirrus site decides which users and projects get access to its site. 14

Open Cirrus * Research Projects Example research areas of interest Datacenter federation Datacenter management Web services Data-intensive systems Projects typically not of interest Traditional HPC app development Production apps looking for free cycles Closed-source system development 15

Software Stack 16

Open Cirrus* Software Components Single Global Global User Sign-On Monitoring Directories Global Services Application Services (Hadoop) Virtual Machine Allocation (AWS* Compatible, e.g. Tashi or Eucalyptus) Data Resource Billing/ Location Telemetry Accounting Site Services Cluster Storage (HDFS) Physical Machine Allocation (PRS) Compute Node Services 17

Physical Machine Allocation: PRS PRS dynamically divides compute nodes into isolated subdomains Provides each project with a mini-datacenter Isolation of experiments Open service research Tashi development Production storage service Proprietary service research Apps running in a VM mgmt infrastructure (e.g., Tashi, Eucalyptus) Open workload monitoring and trace collection 18

Cluster Storage: HDFS Storage system aggregating standard devices High-performance, parallel access High data reliability through replication Exposing location information enables intelligent placement of computation Storage Service Node Node Node Node Node Node 19

Virtual Machine Allocation: Tashi An open source Apache Software Foundation incubator project Infrastructure for cloud computing on Big Data http://incubator.apache.org/projects/tashi Support for AWS* interface OS, FS, and VMM agnostic Research focus: Location-aware co-scheduling of compute, storage, and power Seamless physical/virtual migration 20

Application Service: Hadoop An open-source Apache Software Foundation project sponsored by Yahoo! http://hadoop.apache.org Provides a scalable, parallel programming model (MapReduce) and the associated runtime 21

Getting Involved 22

Summary Open Communities can shape the development of Cloud Computing Open Cirrus* is a multi-partner test bed for research in Cloud Computing The Open Cirrus software stack provides a good starting point for open-source cloud computing software development 23

Getting Involved http://opencirrus.org Contact Open Cirrus* with research proposals Contribute to the Open Cirrus software stack PRS, Tashi, Hadoop Apache Software Foundation* 24