Cloud Compu)ng for Science. Keith R. Jackson krjackson@lbl.gov Computa)onal Research Division

Similar documents
Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011

Science in the Cloud Exploring Cloud Computing for Science Shane Canon. Moab Con May 11, 2011

Performance of HPC Applications on the Amazon Web Services Cloud

Cloud-Based Computation and Collaboration: the Challenges for IT Infrastructure. Greg Bell Driving e-research Across the Pacific `09 Sydney: 11/12/09

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

MAGELLAN 54 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Debunking some Common Misconceptions of Science in the Cloud

High Performance Computing (HPC)

Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis

Big Data and Clouds: Challenges and Opportuni5es

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

Big Data Research at DKRZ

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

How to Build a Data Center?

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Sun Constellation System: The Open Petascale Computing Architecture

Behind the scene III Cloud computing

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Big Data and Scientific Discovery

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Open Cirrus: Towards an Open Source Cloud Stack

Clusters in the Cloud

I/O Performance of Virtualized Cloud Environments

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

SR-IOV: Performance Benefits for Virtualized Interconnects!

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Cloud Computing for Science

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Cloud Computing through Virtualization and HPC technologies

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering

Clusters: Mainstream Technology for CAE

Harnessing the High Performance Capabili5es of Cloud over the Internet

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

SURFsara HPC Cloud Workshop

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Cloud Computing. Alex Crawford Ben Johnstone

HPCHadoop: MapReduce on Cray X-series

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope

NextGen Infrastructure for Big DATA Analytics.

SURFsara HPC Cloud Workshop

How To Make A Cloud Based Computer Power Available To A Computer (For Free)

Hadoop on the Gordon Data Intensive Cluster

Data Requirements from NERSC Requirements Reviews

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Parallel Programming Survey

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Parallel Large-Scale Visualization

High Performance Computing in CST STUDIO SUITE

Processing of Mix- Sensi0vity Video Surveillance Streams on Hybrid Clouds

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Cluster Implementation and Management; Scheduling

The CNMS Computer Cluster

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

On Demand Satellite Image Processing

Internet Storage Sync Problem Statement

Some Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo

Modeling Big Data/HPC Storage Using Massively Parallel Simula:on

MapReduce and Hadoop Distributed File System V I J A Y R A O

Hadoop & its Usage at Facebook

Data Semantics Aware Cloud for High Performance Analytics

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research

- An Essential Building Block for Stable and Reliable Compute Clusters

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Cloud-based Analytics and Map Reduce

UAB Cyber Security Ini1a1ve

Amazon EC2 Product Details Page 1 of 5

How To Understand Cloud Compueng

Licensing++ for Clouds. Mark Perry

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Bulk Synchronous Programmers and Design

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Computing in clouds: Where we come from, Where we are, What we can, Where we go

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Evaluating MapReduce and Hadoop for Science

Big Data on Microsoft Platform

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Energy Efficient MapReduce

Viswanath Nandigam Sriram Krishnan Chaitan Baru

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Hadoop & its Usage at Facebook

Performance Across the Generations: Processor and Interconnect Technologies

Recommended hardware system configurations for ANSYS users

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

BENCHMARKING V ISUALIZATION TOOL

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Transcription:

Cloud Compu)ng for Science Keith R. Jackson krjackson@lbl.gov Computa)onal Research Division

Why Clouds for Science? On demand access to compu)ng and cost associa)vity Parallel programming models for data intensive science e.g., BLAST parametric runs Customized and controlled environments e.g., Supernova Factory codes have sensi)vity to OS/ compiler versions Overflow capacity to supplement exis)ng systems e.g., Berkeley Water Center has analysis that far exceeds capacity of desktops 2

Cloud Early Evalua)ons How do DOE/LBL workloads perform in Cloud environments? What is the impact on performance from virtual environments? What do Cloud programming models like Hadoop offer to scien)fic experimenta)on? What does it take for scien)fic applica)ons to run in Cloud environments such as Amazon EC2? Do Clouds provide an alterna)ve for dataintensive interac)ve science? 3

Scien)fic Workloads at LBL High performance compu)ng codes supported by NERSC and other supercompu)ng centers Mid range compu)ng workloads that are serviced by LBL/IT Services, other local cluster environments Interac)ve data intensive processing usually run on scien)st s desktops 4

NERSC 6 Benchmarking Subset of NERSC 6 applica)on benchmarks for EC2 with smaller input sizes represent the requirements of the NERSC workload rigorous process for selec)on of codes workload and algorithm/science area coverage Run on EC2 high CPU XL (64 bit) nodes Intel C/Fortran compilers OpenMPI patched for cross subnet comm. $0.80/hour 5

Experiments on Amazon EC2 Codes Science Area Algorithm Space Configura6on Slowdown Reduc6on factor (SSP) Rela)ve to Franklin CAM Climate (BER) Navier Stokes CFD 200 processors Standard IPCC5 D Mesh resolu)on 3.05 0.33 Could not complete 240 proc run due to transient node failures. Some I/O and small messages MILC Lafce Gauge Physics (NP) Conjugate gradient, sparse matrix; FFT Weak scaled: 14 4 lafce on 8, 32, 64, 128, and 256processors 2.83 0.35 Erra)c execu)on )me IMPAC T T Acceler ator Physics (HEP) PIC, FFT component 64 processors, 64x128x128 grid and 4M par)cles 4.55 0.22 PIC por)on performs well, but 3D FFT poor due to small message size MAEST RO Astrop hysics (HEP) Low Mach Hydro; block structuredgrid mul)physics 128 processors for 128^3 computa)onal mesh 5.75 0.17 Small messages and allreduce for implicit solve. 6

Mid range codes on Amazon EC2 Lawrencium Cluster 64 bit/dual sockets per node/8 cores per node/ 16GB memory, Infiniband interconnect EC2 64 bit/2 cores per node/75gb,15gb and 7GB memory Code Slow down factor FMMSpeed. Fast Multipole Method. Pthread parallel code with ½ GB IO 1.3 to 2.1 GASBOR. A Genetic algortihm ab initio reconstruction algorithm. Serial workload, minimal I/O (KB) ABINT. DFT code that calculates the energy, charge density and electronic structure for molecules and periodic solids. Parallel MPI, minimal I/O. 1.12 to 3.67 1.11 to 2.43 Hpcc. HPC Challenge Benchmark 2.8 to 8.8 VASP. Simulates property of systems at the atomic scale. MPI parallel application IMB. Intel (formerly Pallas) Memory Benchmark. Alltoall among all MPI threads 7 14.2 to 22.4 12.7 to 15.79

Performance Observa)ons Setup to look like conven)onal GigE cluster achieves 0.26 * SSP (Sustained System Performance) of franklin per CPU but must evaluate throughput/$ Performance Characteris)cs Good TCP performance for large messages Nonuniform execu)on )mes (VMMs have lots of noise/jiper) Bare metal access to hardware High overhead for small messages No OS bypass (it s a VMM), so no efficient one sided messaging Poor shared disk I/O (good local I/O) Need more robust (infiniband) interconnect 8

What codes work well? Minimal synchroniza)on, Modest I/O requirements Large messages or very liple communica)on Low core counts (non uniform execu)on and limited scaling) Generally applica,ons that would do well on midrange clusters mostly run in LBL/IT and local cluster resources today 9

Integrated Microbial Genomes (IMG) Goal: improving overall quality of microbial genome data suppor)ng the compara)ve analysis of metagenomes genomes in IMG together with all available GEBA genomes Large amount of sequencing of microbial genomes and meta genome samples using BLAST the computa)on scheduled within a certain )me range takes about 3 weeks on a modest sized Linux cluster projected to exceed current compu)ng resources What can we do to help such applica)ons? Does cloud compu)ng and tools such as Hadoop help manage the task farming? 10

Hardware Plasorms Franklin: Tradi6onal HPC System 40k core, 360TFLOP Cray XT4 system at NERSC Lustre parallel filesystem Planck: Tradi6onal Midrange cluster 32 node Linux/x86/Infiniband Cluster at NERSC GPFS Global and Hadoop on Demand (HOD) Amazon EC2: Commercial Infrastructure as a Service Cloud Configure and boot customized virtual machines in Cloud Elas)c MapReduce/Hadoop images and S3 for parallel filesystem Yahoo M45: Shared Research PlaOorm as a Service Cloud 400 nodes, 8 cores per node, Intel Xeon E5320, 6GB per compute node, 910.95TB Hadoop/MapReduce service: HDFS and shared file system 11

Sotware Plasorms NCBI BLAST (2.2.22) Reference IMG genomes of 6.5 mil genes (~3Gb in size) Full input set 12.5 mil metagenome genes against reference BLAST Task Farming Implementa6on Server reads inputs and manages the tasks Client runs blast, copies database to local disk or ramdisk once on startup, pushes back results Advantages: fault resilient and allows incremental expansion as resources come available Hadoop/MapReduce implementa6on of BLAST Hadoop is open source implementa)on of MapReduce Sotware framework for processing huge datasets 12

Hadoop Processing Model Advantages of Hadoop Broadly supported on cloud plasorms, Transparent Data Replica)on Data locality aware scheduling Fault tolerance capabili)es Dynamic resource management for growing Implementa6on details Use streaming to launch a script that calls executable HDFS for input, need shared file system for binary and database Each sequence needs to be in a single line to use standard input format reader Custom input format reader that can understand blast sequences 13

Performance Comparison Evaluated small scale problem (2500 sequences) on mul6ple plaoorms (Limited by access and costs) Similar per core performance across plaoorms Time (seconds) 3000 2500 2000 1500 1000 EC2 Hadoop Planck Hadoop On Demand (HOD) Planck Task Farming Franklin Task Farming 500 0 32 64 128 Number of Cores 14

Supernova Factory Tools to measure expansion of universe and energy image matching algorithms data pipeline, task parallel workflow large data volume for supernova search Using Amazon EC2 Stable 32 bit Linux compu)ng environment Data requirements about 0.5 TB exis)ng data about 1 TB of storage for 12 months and 1 TB of transfer from the cloud. 15

Berkeley Water Center Studying global scale environmental processes integra)on of local, regional, global spa)al scales. integra)on across disciplines, e.g., climatology, hydrology, forestry, etc., and methodologies Common Eco Science Data Infrastructure address quality, heterogeneity and scale interfaces and services for accessing and processing data 16

MODerate resolu)on Imaging Spectroradiometer (MODIS) Two MODIS satellites near polar orbits global coverage every one to two days Data Integra)on challenges ~ 35 science data products including atmospheric and land products products are in different projec)on, resolu)ons (spa)al and temporal), different )mes data volume and processing requirements exceed desktop capacity 17

Windows Azure Cloud Solu)on Lower resource entry barriers Hide the complexi)es in data collec)on, reprojec)on and management from domain scien)sts A generic Reduc6on Service for scien)sts to upload arbitrary executables to perform scien)fic analysis on reprojected data. 90X Improvement over scien)st desktop MODIS Source Data Windows Azure Cloud Compu6ng PlaOorm Data Processing Pipeline 18 Scien)fic Results

An Enabling Service for Scien)sts Gives scien)st the ability to do analysis that was not possible before Programming model future experimenta)on with Dryad/MapReduce frameworks Interac)ve cases need to refined intermediate data products upload executables to perform scien)fic analysis on data. 19

DOE Cloud Research Magellan Project DOE Advanced Scien)fic Compu)ng Research (ASCR) $32.8M project at NERSC and Argonne (ALCF) ~100 TF/s compute cloud testbed (across sites) Petabyte scale storage cloud testbed Mission Deploy a test bed cloud to serve the needs of midrange scien)fic compu)ng. Evaluate the effec)veness of this system for a wide spectrum of DOE/SC applica)ons in comparison with other plasorm models. Determine the appropriate role for commercial and/ or private cloud compu)ng for DOE/SC midrange workloads 20

NERSC Magellan Cluster 720 nodes, 5760 cores in 9 Scalable Units (SUs) 61.9 Teraflops SU = IBM idataplex rack with 640 Intel Nehalem cores SU SU SU SU SU SU SU SU SU 18 Login/network nodes Login Network I/O 10G Ethernet Network Network 8G FC I/O Internet Load Balancer 100 G Router NERSC Global Filesystem HPSS (15PB) ANI 1 Petabyte with GPFS 21

NERSC Magellan Research Ques)ons What are the unique needs and features of a science cloud? What applica)ons can efficiently run on a cloud? Are cloud compu)ng APIs such as Hadoop effec)ve for scien)fic applica)ons? Can scien)fic applica)ons use a DaaS or SaaS model? Is it prac)cal to deploy a cloud services across mul)ple DOE sites? What are the security implica)ons of user controlled cloud images? What is the cost and energy efficiency of clouds? 22

Summary Cloud environments impact performance ongoing work to improve these environments for scien)fic applica)ons Cloud tools require customiza)ons suitable for scien)fic data processing Rethinking service model support for interac)ve applica)ons and dynamic sotware environments 23

Acknowledgements NERSC Benchmarks Harvey Wasserman, John Shalf IT Benchmarks Greg Bell, Keith Greg Kurtzer, Krishna Muriki, John White BLAST on Hadoop Victor Markowitz, John Shalf, Shane Canon, Lavanya Ramakrishnan, Shreyas Cholia, Nick Wright Supernova Factory on EC2 Rollin Thomas, Greg Aldering, Lavanya Ramakrishnan Berkeley Water Center Deb Agarwal, Catharine van Ingen (MSR), Jie Li (UVa), Youngryel Ryu (UCB), Marty Humphrey (UVa), Windows Azure team 24