Volunteer Computing and Cloud Computing: Opportunities for Synergy

Similar documents

Cost-Benefit Analysis of Cloud Computing versus Desktop Grids

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Cost-Benefit Analysis of Cloud Computing versus Desktop Grids. By : Paritosh Heera( MT )

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Cloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd.

Building a Volunteer Cloud

Grid Computing vs Cloud

Manjrasoft Market Oriented Cloud Computing Platform

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

CloudFTP: A free Storage Cloud

<Insert Picture Here> Considerations for Enterprise Cloud Computing

Grid Computing Perspectives for IBM

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

How To Understand Cloud Computing

System Models for Distributed and Cloud Computing

Manjrasoft Market Oriented Cloud Computing Platform

Chapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1

How Liferay Is Improving Quality Using Hundreds of Jenkins Servers

Network Infrastructure Services CS848 Project

Cloud Computing For Bioinformatics

9/26/2011. What is Virtualization? What are the different types of virtualization.

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Mobile Cloud Computing: Paradigms and Challenges 移动云计算 : 模式与挑战

Scientific and Technical Applications as a Service in the Cloud

Large-Scale Data Processing

Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing March 26, 2014

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Solution for private cloud computing

Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus

Cloud Computing. Adam Barker

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

A Gentle Introduction to Cloud Computing

WORKFLOW ENGINE FOR CLOUDS

Network & HEP Computing in China. Gongxing SUN CJK Workshop & CFI

Open Cloud Computing A Case for HPC CRO NGI Day Zagreb, Oct, 26th

Code and Process Migration! Motivation!

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Private Clouds with Open Source

Amazon Web Services 100 Success Secrets

Virtual Client Solution: Desktop Virtualization

Cloud Computing. Alex Crawford Ben Johnstone

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

Cloud Computing Now and the Future Development of the IaaS

Scalable Application. Mikalai Alimenkou

Using Proxies to Accelerate Cloud Applications

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

Cluster, Grid, Cloud Concepts

Chapter 19 Cloud Computing for Multimedia Services

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

SURFsara HPC Cloud Workshop

GTC Presentation March 19, Copyright 2012 Penguin Computing, Inc. All rights reserved

Big Data Database Revenue and Market Forecast,

SLA Driven Load Balancing For Web Applications in Cloud Computing Environment

High Performance Computing in CST STUDIO SUITE

SURFsara HPC Cloud Workshop

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Amazon EC2 Product Details Page 1 of 5

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

Data Semantics Aware Cloud for High Performance Analytics

Deploying Business Virtual Appliances on Open Source Cloud Computing

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Using WebSphere Application Server on Amazon EC2. Speaker(s): Ed McCabe, Arthur Meloy

Stream Processing on GPUs Using Distributed Multimedia Middleware

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Data Centers and Cloud Computing. Data Centers

Cloud Computing: Making the right choices

LSKA 2010 Survey Report Job Scheduler

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

HPC technology and future architecture

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Big Data Analytics - Accelerated. stream-horizon.com

Transcription:

Volunteer Computing and Cloud Computing: Opportunities for Synergy Derrick Kondo INRIA, France

Performance vs. Reliability vs. Costs high Cost Reliability high low low low Performance high

Performance vs. Reliability vs. Costs high Supercomputers Grids Clusters Cost Reliability high low low low Performance high

Performance vs. Reliability vs. Costs high Supercomputers Grids Clusters Cost Reliability high Clouds low low low Performance high

Performance vs. Reliability vs. Costs high Supercomputers Grids Clusters Cost Reliability high Clouds low low low Performance Volunteer Computing high

Performance vs. Reliability vs. Costs high Supercomputers Grids Clusters Cost Reliability high Clouds Clouds + Volunteer Computing low low low Performance Volunteer Computing high

Cost of 1 TFLOPS-year Cluster: $145K Computing hardware; power/ac infrastructure; network hardware; storage; power; sysadmin Cloud: $438K Volunteer: $1K - $10K Server hardware; sysadmin; web development

Performance Current 500K people, 1M computers 6.5 PetaFLOPS (3 from GPUs, 1.4 from PS3s) Potential 1 billion PCs today, 2 billion in 2015 GPU: approaching 1 TFLOPS How to get 1 ExaFLOPS: 4M GPUs * 0.25 availability How to get 1 Exabyte: 10M PC disks * 100 GB

History of volunteer computing 1995 2000 2005 now distributed.net, GIMPS s Middleware SETI@home, Folding@home Academic: Bayanihan, Javelin,... Climateprediction.net Predictor@home IBM World Community Grid Einstein@home Rosetta@home... Commercial: Entropia, United Devices,... BOINC

The BOINC computing volunteers ecosystem projects CPDN LHC@home attachments WCG Directed and developed by David Anderson, UC Berkeley Projects compete for volunteers Volunteers make their contributions count Optimal equilibrium

What apps work well? Bags of tasks parameter sweeps simulations with perturbed initial conditions compute-intensive data analysis Job granularity: minutes to months Native, legacy, Java, GPU soon: VM-based

Example projects Climateprediction.net Rosetta@home IBM World Community Grid Primegrid

Creating a volunteer computing project Server Set up a server Develop software for job submission and result handling System, DB admin (Linux, MySQL) Client Port applications, develop graphics Marketing Develop web site Publicity, volunteer communication

How many CPUs will you get? Depends on: PR efforts and success public appeal availability of internal resources 12 projects have > 10,000 active hosts 3 projects have > 100,000 active hosts

Organizational issues Creating a volunteer computing project has startup costs and requires diverse skills This limits its use by individual scientists and research groups Better model: umbrella projects Institutional Lattice Corporate IBM World Community Grid Community AlmereGrid

Summary Volunteer computing is an important paradigm for high-throughput computing price/performance performance potential Low technical barriers to entry (due to BOINC) Organizational structure is critical

Performance vs. Costs vs. Reliability high Supercomputers Grids Clusters Cost Reliability high Clouds Clouds + Volunteer Computing low low low Performance Volunteer Computing high

Cloud Background Vision Hide complexity of hardware and software management from a user by offering computing as a service Benefits Pay as you go Scale up or down dynamically No hardware management, less software management

Outline Performance tradeoffs Monetary tradeoffs Client hosting Server hosting

Apples to Apples Loosely-coupled, high-throughput, compute-intensive applications Tightly-coupled, data-intensive real-time applications low complexity high complexity VC s Clouds Comparison assuming embarrassingly parallel, compute-intensive applications

Method Use real performance measurements Exported BOINC project data Use real costs Large/small BOINC projects (SETI@home / XtremLab) Amazon Elastic Computing Cloud (EC2)

Stages of Project & Platform Construction Deployment Execution Completion

Platform Construction Deployment Execution Completion How long before I get X TeraFLOPS?

Platform Construction Deployment Execution Completion How long before I get X TeraFLOPS? 10 6 10 3 Number of cloud nodes 10 5 10 4 10 2 10 1 TeraFLOPS 10 3 0 5 10 15 20 25 30 Months for registration

Platform Construction Deployment Execution Completion How long before I get X TeraFLOPS? Number of cloud nodes 10 6 10 5 10 4 Can get over 20 TeraFLOPS within 6 months 10 3 10 2 10 1 TeraFLOPS 10 3 0 5 10 15 20 25 30 Months for registration

Platform Construction Deployment Execution Completion How long before I get X TeraFLOPS? Strategy: Add to BOINC project list Press releases Forum Announcements Google Ad Sense Respond to users (leverage volunteers) Number of cloud nodes 10 6 10 5 10 4 Can get over 20 TeraFLOPS within 6 months 10 3 0 5 10 15 20 25 30 Months for registration 10 3 10 2 10 1 TeraFLOPS

Platform Construction Deployment Execution Completion How long to deploy my batch of tasks needing faster response time?

Platform Construction Deployment Execution Completion How long to deploy my batch of tasks needing faster response time? 7,82/9*,34.4:*,.;*:3)4,6<!" '!" &!"!!" ".!"!!.!"".4=6>6!""".4=6>6!"""".4=6>6!" #!" $!" % ()*+,-./0.1/2)34,,-.3/5,6

Platform Construction Deployment Execution Completion How long to deploy my batch of tasks needing faster response time? 7,82/9*,34.4:*,.;*:3)4,6<!" '!" &!"!!" " For 1000 tasks, ~10 minutes with 10 5 hosts.!"!!.!"".4=6>6!""".4=6>6!"""".4=6>6!" #!" $!" % ()*+,-./0.1/2)34,,-.3/5,6

Platform Construction Deployment Execution Completion How long to deploy my batch of tasks needing faster response time? Strategy: Specify lower latency bounds [Heien et al.] 7,82/9*,34.4:*,.;*:3)4,6<!" '!" &!"!!" "!"!!. For 1000 tasks, ~10 minutes with 10 5 hosts!"".4=6>6!""".4=6>6!"""".4=6>6!" #!" $!" % ()*+,-./0.1/2)34,,-.3/5,6.

Platform Construction Deployment Execution Completion

Platform Construction Deployment Execution Completion How many volunteer nodes are equivalent to 1 cloud node?

Platform Construction Deployment Execution Completion How many volunteer nodes are equivalent to 1 cloud node?

Platform Construction Deployment Execution Completion How many volunteer nodes are equivalent to 1 cloud node? 2.8 active volunteer hosts per 1 cloud node. (Total performance still orders of magnitude better)

Platform Construction Deployment Execution Completion How many volunteer nodes are equivalent to 1 cloud node? Strategy: Use statistical prediction of availability 2.8 active volunteer hosts per 1 cloud node. (Total performance still orders of magnitude better)

Platform Construction Deployment Execution Completion

Platform Construction Deployment Execution Completion How long should I wait for task completion?

Platform Construction Deployment Execution Completion How long should I wait for task completion?

Platform Construction Deployment Execution Completion How long should I wait for task completion? Median project latency bound: 9 days for 3.7 hour work unit (on 3GHz host). Ratio of lat. bound / exec time > 5. Good success rates: 96.1% of WCG tasks met out of 227,000 tasks

Platform Construction Deployment Execution Completion How long should I wait for task completion? Strategy: See BOINC Catalog for typical deadlines and compute/comm/mem ratios. Median project latency bound: 9 days for 3.7 hour work unit (on 3GHz host). Ratio of lat. bound / exec time > 5. Good success rates: 96.1% of WCG tasks met out of 227,000 tasks

Monetary Tradeoffs Client hosting on cloud Possible solution Not worth it and never will Server hosting on the cloud More details in HCW, 2009 paper...

Performance vs. Costs vs. Reliability high Supercomputers Grids Clusters Cost Reliability high Clouds Clouds + Volunteer Computing low low low Performance Volunteer Computing high

Motivation Non-dedicated resources are cheap, but unreliable VC are cheaper by at least order of magnitude Amazon EC2 Spot instances at least 50% compared to dedicated ones Clouds are more expensive but highly reliable Amazon SLA: 99.95% availability s have different levels of performance, cost, and reliability requirements

Goal & Approach Given reliability requirements, what is the cheapest combination of dedicated and undedicated resources Improve reliability of VC resources with prediction and reliability ranking Compute optimal mixture that satisfies reliability requirement while minimizing cost

Resource Provisioning using Predictions User specifies availability guarantee (ag) and # of desired hosts (n). System determines # of given hosts (N) and # of dedicated hosts (d) N = # given hosts n = # desired hosts d = # dedicated hosts working set for intervali selection of non-d. hosts initial data migration host replacement and data migration!!!!!!!!!!!!!!!!!! non-dedicated hosts dedicated hosts i-th prediction interval (length pil in hours) selection of replacements for failed non-d. hosts

Pareto Optimal for Availability >=.90 for 50 requested hosts 30 undedicated hosts out of 55 given minimizes monetary costs

Summary Prediction: use of ranking to predict availability of a collection of hosts Operational model: identify cost-optimal mixture of undedicated and dedicated hosts given reliability requirements

Overall Summary Volunteer Computing Good for high-throughput applications at low cost Cloud Computing At least order of magnitude more expensive, but higher reliability Volunteer and Cloud Computing Can support applications with diverse reliability requirements while minimizing costs

Thank you Acknowledgement: David P. Anderson s material on BOINC