Cloud Computing for Science



Similar documents
How To Build A Cloud Computing System With Nimbus

Infrastructure-as-a-Service Cloud Computing for Science

Nimbus: Cloud Computing with Science

Cloud Computing with Nimbus

Cloud Computing with Nimbus

OSG All Hands Meeting

a Cloud Computing for Science

Science Clouds: Early Experiences in Cloud Computing for Scientific Applications Kate Keahey and Tim Freeman

SC12 Cloud Compu,ng for Science Tutorial: Introduc,on to Infrastructure Clouds

The idea of using remote resources. Sky Computing

Deploying Business Virtual Appliances on Open Source Cloud Computing

Amazon EC2 Product Details Page 1 of 5

Comparative Study of Eucalyptus, Open Stack and Nimbus

Evalua&ng Streaming Strategies for Event Processing across Infrastructure Clouds (joint work)

Proactively Secure Your Cloud Computing Platform

Pierre Riteau University of Chicago

Infrastructure Clouds for Science and Education: Platform Tools

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Clouds for research computing

Cloud Infrastructure Pattern

OpenNebula Leading Innovation in Cloud Computing Management

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

The OpenNebula Standard-based Open -source Toolkit to Build Cloud Infrastructures

Comparison of Several Cloud Computing Platforms

Infrastructure Outsourcing in Mul2- Cloud Environment

Towards a New Model for the Infrastructure Grid

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

OpenNebula An Innovative Open Source Toolkit for Building Cloud Solutions

salsadpi: a dynamic provisioning interface for IaaS cloud

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

Elastic Site. Using Clouds to Elastically Extend Site Resources

Behind the scenes of IaaS implementations

Efficient Cloud Management for Parallel Data Processing In Private Cloud

Plug-and-play Virtual Appliance Clusters Running Hadoop. Dr. Renato Figueiredo ACIS Lab - University of Florida

Design and Building of IaaS Clouds

SC12 Cloud Compu,ng for Science Tutorial: Cloud Challenges

An Open Source Solution for Virtual Infrastructure Management in Private and Hybrid Clouds

CHAPTER 8 CLOUD COMPUTING

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

Integrating Clouds and Cyberinfrastructure: Research Challenges

Advanced File Sharing Using Cloud

Li Sheng. Nowadays, with the booming development of network-based computing, more and more

VM Management for Green Data Centres with the OpenNebula Virtual Infrastructure Engine

Comparison and Evaluation of Open-source Cloud Management Software

Introduction to Cloud computing. Viet Tran

OpenNebula Cloud Innovation and Case Studies for Telecom

Cloud Computing Overview

Cloud Computing and Amazon Web Services

Capacity Leasing in Cloud Systems using the OpenNebula Engine

Building a Volunteer Cloud

Elastic Management of Cluster based Services in the Cloud

Virtual Machine Management with OpenNebula in the RESERVOIR project

DEVELOPMENT OF TWO MODELS OF DYNAMIC INTEGRATION OF CLOUD COMPUTING RESOURCES WITH ALIEN GRID

globus online Cloud Based Services for Science Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

Scheduler in Cloud Computing using Open Source Technologies

Efficient Data Management Support for Virtualized Service Providers

Private Distributed Cloud Deployment in a Limited Networking Environment

Amazon Web Services. Elastic Compute Cloud (EC2) and more...

SERVER 101 COMPUTE MEMORY DISK NETWORK

Infrastructure for Cloud Computing

PoS(EGICF12-EMITC2)005

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Comparison of Multiple Cloud Frameworks

Criteria for Evaluation of Open Source Cloud Computing Solutions

Private Clouds with Open Source

Key Research Challenges in Cloud Computing

Supporting Cloud Computing with the Virtual Block Store System

Dynamic Extension of a Virtualized Cluster by using Cloud Resources CHEP 2012

Dutch HPC Cloud: flexible HPC for high productivity in science & business

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Chapter 19 Cloud Computing for Multimedia Services

Scientific Cloud Computing: Early Definition and Experience

RemoteApp Publishing on AWS

Assignment # 1 (Cloud Computing Security)

Cloud Computing from an Institutional Perspective

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

14 Published by the IEEE Computer Society /09/$ IEEE IEEE INTERNET COMPUTING

Cloud Computing For Bioinformatics

Cloud Computing: Making the right choices

Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011

Eucalyptus: An Open-source Infrastructure for Cloud Computing. Rich Wolski Eucalyptus Systems Inc.

Enabling Technologies for Cloud Computing

A STUDY ON OPEN SOURCE CLOUD COMPUTING PLATFORMS

Network Infrastructure Services CS848 Project

XtreemOS : des grilles aux nuages informatiques

Enabling multi-cloud resources at CERN within the Helix Nebula project. D. Giordano (CERN IT-SDC) HEPiX Spring 2014 Workshop 23 May 2014

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

Experimental Study of Bidding Strategies for Scientific Workflows using AWS Spot Instances

AC : EXPLORING THE USE OF VIRTUAL MACHINES AND VIRTUAL CLUSTERS FOR HIGH PERFORMANCE COMPUTING EDU- CATION.

Développement logiciel pour le Cloud (TLC)

OpenNebula Cloud Case Studies for Research and Industry

Contents. 1 Introduction 2

Last time. Today. IaaS Providers. Amazon Web Services, overview

Contents. What is Cloud Computing? Why Cloud computing? Cloud Anatomy Cloud computing technology Cloud computing products and market

Cloud Computing & Hosting Solutions

Eucalyptus: An Open-source Infrastructure for Cloud Computing. Rich Wolski Eucalyptus Systems Inc.

Cloud Models and Platforms

Open Cirrus: Towards an Open Source Cloud Stack

Lecture on Cloud Computing. Course 10. Academic cloud systems

Solution for private cloud computing

Transcription:

Cloud Computing for Science June 2009 21st International Conference on Scientific and Statistical Database Management Kate Keahey keahey@mcs.anl.gov Nimbus project lead University of Chicago Argonne National Laboratory

Cloud Computing is in the news is it good news for Science?

Cloud Computing for Science Complex codes Need for control

Grid Computing Assumption: control over the manner in which resources are used stays with the site Site A R R R Site B R R R VO-A Site-specific environment and mode of access Site-driven prioritization But: site control -> rapid adoption

Cloud Computing Change of assumption: control over the resource is turned over to the user Site A R R R Site B R R R VO-A Enabling factors: virtualization and isolation Challenges our notion of a site Lends itself to more explicit service level negotiation But: slow adoption

Grids to Clouds: a Personal Perspective First STAR production run on EC2 Xen released EC2 goes online Nimbus Cloud comes online 2003 2006 2009 A Case for Grid Computing on VMs In-Vigo, VIOLIN, DVEs, Dynamic accounts Policy-driven negotiation First WSRF Workspace Service release EC2 gateway available Context Broker release Support for EC2 interfaces

Benefits to Consumers Eliminate expense of acquiring, managing and operating hardware capital expense 6/5/09 Elastic computing Pay-as-you-go model operational expense The Nimbus Toolkit: http//workspace.globus.org

Benefits to Providers Economies of scale to amortize the costs of buying and operating resources Avoid cost and complexity of managing multiple customer-specific environments and applications Streamline and specialize

Unclouding the Cloud Software-as-a-Service (SaaS) Community-specific applications and portals Platform-as-a-Service (PaaS) Infrastructure-as-a-Service (IaaS) 6/5/09 The Nimbus Toolkit: http//workspace.globus.org

IaaS Infrastructure

Nimbus: Cloud Computing Software Nimbus goals: Allow providers to build clouds Private clouds (privacy, expense considerations) Workspace Service: open source EC2 implementation Allow users to use cloud computing Do whatever it takes to enable scientists to use IaaS Context Broker: turnkey virtual clusters Allow developers to experiment with Nimbus For research or usability/performance improvements Community extensions and contributions

The Workspace Service VWS Service

The Workspace Service The workspace service publishes information about each workspace Users can find out information about their workspace (e.g. what IP the workspace was bound to) VWS Service Users can interact directly with their workspaces the same way the would with a physical machine.

Cloud Computing Ecosystem Appliance Providers Marketplaces, commercial providers, Virtual Organizations Appliance management software Deployment Orchestrator VMM/DataCenter/IaaS User Environments VMM/DataCenter/IaaS User Environments

Turnkey Virtual Clusters IP1 HK1 IP2 HK2 IP3 HK3 IP1 HK1 IP1 HK1 IP1 HK1 IP2 HK2 IP2 MPI HK2 IP2 HK2 IP3 HK3 IP3 HK3 IP3 HK3 Context Broker Turnkey, tightly-coupled cluster Shared trust/security context Shared configuration/context information

What s in the Box? context broker EC2 WSRF storage service workspace service IaaS gateway workspace resource manager workspace pilot workspace control EC2 potentially other providers context client cloud client workspace client

Open Source IaaS Implementations OpenNebula Open source datacenter implementation University of Madrid, I. Llorente & team, 03/2008 Eucalyptus Open source implementation of EC2 UCSB, R. Wolski & team, 06/2008 Cloud-enabled Nimrod-G Open source implementation of EC2 Monash University, MeSsAGE Lab, 01/2009 Industry efforts openqrm, Enomalism

Nimbus: Extensions Nimbus core: Tim Freeman & David LaBissoniere (UC/ANL team) Nimbus monitoring: Ian Gable & team (UVIC ATLAS) Cumulus: Raj Kettimuthu and John Bresnahan (ANL) EBS: Marlon Pierce, Xiaoming Gao, Mike Lowe (IU) Others: OpenNebula project (University of Madrid) Descher et al (Technical U of Vienna): privacy extensions

Scientific Cloud Resources and Applications

Science Clouds Goals Enable experimentation with IaaS Evolve software in response to user needs Exploration of cloud interoperability issues Participants University of Chicago (since 03/08, 16 s), University of Florida (05/08, 16-32 s, access via VPN), Masaryk University, Brno, Czech Republic (08/08), Wispy @ Purdue (09/08) In progress: Grid5K, IU, Vrije Using EC2 for large runs Science Clouds Marketplace: OSG cluster, Hadoop, etc. Come and run: http://workspace.globus.org/clouds

Who Runs on Nimbus? 100+ DNs projects ranging across Science, CS, education, build&test

STAR experiment STAR: a nuclear physics experiment at Brookhaven National Laboratory Studies fundamental properties of nuclear matter Problem: computations require complex and consistently configured environments that are hard to find in existing grids

STAR Virtual Clusters Virtual resources A virtual OSG STAR cluster: OSG head (gridmapfiles, host certificates, NFS, Torque), worker s: SL4 + STAR One-click virtual cluster deployment via Nimbus Context Broker From Science Clouds to EC2 runs Running production codes since 2007 Work by Jerome Lauret, Leve Hajdu, Lidia Didenko (BNL), Doug Olson (LBNL) The Quark Matter run: producing just-in-time results for a conference: http://www.isgtw.org/?pid=1001735

STAR Quark Matter Run Gateway/ Context Broker Infrastructure-as-a-Service

STAR Quark Matter Run (2) Application stats: Processed 1.2 M events Moved ~1TB of data over duration (small I/O needs) Run facts: 300+ s over ~10 days Instances, 32-bit, 1.7 GB memory: Cost: EC2 default: 1 EC2 CPU unit High-CPU Medium Instances: 5 EC2 CPU units (2 cores) Comp: ~ $6,000: ~ $1,7 K (default) + ~ $3,9K (medium) Data: ~ $150

A Large Ion Collider Experiment (ALICE) Heavy ion simulations at CERN Problem: integrate elastic computing into current infrastructure Collaboration with CernVM project With Artem Harutyunyan and Predrag Buncic

Elastic Provisioning for ALICE HEP ALICE queue queue sensor AliEn Context Broker Infrastructure-as-a-Service

Alice HEP Experiment at CERN CHEP09 paper, Harutyunyan et al. Can we elastically extend a local scheduler?

Sky Computing Change of assumption: we can now trust remote resources Site A R R R Site B R R R VO-A Enabling factors: cloud computing and virtual networks Network leases would help Instead of a bunch of disonnected domains, one domain overlapping the Internet

Sky Computing Environment U of Chicago U of Florida ViNE router ViNE router ViNE router Purdue Work by A. Matsunaga, M. Tsugawa, University of Florida

Hadoop in the Science Clouds U of Chicago U of Florida Hadoop cloud Purdue Papers: Sky Computing, by K. Keahey, A. Matsunaga, M. Tsugawa, J. Fortes. Submitted to IEEE Internet Computing. CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications by A. Matsunaga, M. Tsugawa and J. Fortes. escience 2008.

Cloud Computing for Science: Issues and Challenges

Building the Ecosystem Configuring and maintaining appliances Not just VMs, a variety of formats CernVM, rbuilder (rpath) Licenses Still vendor-specific approaches Getting used to dynamic sites Host certificates and keys, community visibility, failure processing, etc. Infrastructure and leveraging

Security and Privacy Issues Leaks in isolation, improper use, new technology issues Lack of features Fine-grained authorization Paper: Palankar et al., Amazon S3 for Science Grids: a Viable Solution? Data privacy Paper: Descher et al., Retaining Data Control in Infrastructure Clouds, ARES (the International Dependability Conference), 2009.

Performance Difficult to track in a virtualized environment I/O can be an issue Tradeoffs between CPU power and throughput New networking solutions Several studies of cloud performance One paper: Walker, Benchmarking Amazon EC2 for high-performance scientific computing Low bandwidth from existing providers: 2-5 MB/sec, 17/21 MB/sec, 30MB/sec Generally speaking, the existing cloud providers do not offer a very high-end computer

Price Price for what? Experimenting with business models Estimating the cost is hard Price of Base Services for AWS: Computation / EC2 On-demand: starting at $0.1 per hour Reserved: starting at $325 per year for $0.03 per hour Data / S3 Storage: $0.15 per GB/month, Transfer: $0.17 per GB AWS import/export for bulk Hosting Scientific datasets for free Free on AWS for frequently used datasets

Service Levels Service levels Computation: immediate, advance reservations, best-effort Data: durability, high/low availability, access performance Cross-cutting concern: security and privacy Different price points for different availability

Parting Thoughts IaaS cloud computing is science-driven Scientific applications are successfully using the existing infrastructure for production runs Many more could be using it, but challenges exist Project for the next few years: solve them!