Cloud Compu)ng for Science. Keith R. Jackson krjackson@lbl.gov Computa)onal Research Division

Cloud Compu)ng for Science Keith R. Jackson krjackson@lbl.gov Computa)onal Research Division

Why Clouds for Science? On demand access to compu)ng and cost associa)vity Parallel programming models for data intensive science e.g., BLAST parametric runs Customized and controlled environments e.g., Supernova Factory codes have sensi)vity to OS/ compiler versions Overflow capacity to supplement exis)ng systems e.g., Berkeley Water Center has analysis that far exceeds capacity of desktops 2

Cloud Early Evalua)ons How do DOE/LBL workloads perform in Cloud environments? What is the impact on performance from virtual environments? What do Cloud programming models like Hadoop offer to scien)fic experimenta)on? What does it take for scien)fic applica)ons to run in Cloud environments such as Amazon EC2? Do Clouds provide an alterna)ve for dataintensive interac)ve science? 3

Scien)fic Workloads at LBL High performance compu)ng codes supported by NERSC and other supercompu)ng centers Mid range compu)ng workloads that are serviced by LBL/IT Services, other local cluster environments Interac)ve data intensive processing usually run on scien)st s desktops 4

NERSC 6 Benchmarking Subset of NERSC 6 applica)on benchmarks for EC2 with smaller input sizes represent the requirements of the NERSC workload rigorous process for selec)on of codes workload and algorithm/science area coverage Run on EC2 high CPU XL (64 bit) nodes Intel C/Fortran compilers OpenMPI patched for cross subnet comm. $0.80/hour 5

Experiments on Amazon EC2 Codes Science Area Algorithm Space Configura6on Slowdown Reduc6on factor (SSP) Rela)ve to Franklin CAM Climate (BER) Navier Stokes CFD 200 processors Standard IPCC5 D Mesh resolu)on 3.05 0.33 Could not complete 240 proc run due to transient node failures. Some I/O and small messages MILC Lafce Gauge Physics (NP) Conjugate gradient, sparse matrix; FFT Weak scaled: 14 4 lafce on 8, 32, 64, 128, and 256processors 2.83 0.35 Erra)c execu)on )me IMPAC T T Acceler ator Physics (HEP) PIC, FFT component 64 processors, 64x128x128 grid and 4M par)cles 4.55 0.22 PIC por)on performs well, but 3D FFT poor due to small message size MAEST RO Astrop hysics (HEP) Low Mach Hydro; block structuredgrid mul)physics 128 processors for 128^3 computa)onal mesh 5.75 0.17 Small messages and allreduce for implicit solve. 6

Mid range codes on Amazon EC2 Lawrencium Cluster 64 bit/dual sockets per node/8 cores per node/ 16GB memory, Infiniband interconnect EC2 64 bit/2 cores per node/75gb,15gb and 7GB memory Code Slow down factor FMMSpeed. Fast Multipole Method. Pthread parallel code with ½ GB IO 1.3 to 2.1 GASBOR. A Genetic algortihm ab initio reconstruction algorithm. Serial workload, minimal I/O (KB) ABINT. DFT code that calculates the energy, charge density and electronic structure for molecules and periodic solids. Parallel MPI, minimal I/O. 1.12 to 3.67 1.11 to 2.43 Hpcc. HPC Challenge Benchmark 2.8 to 8.8 VASP. Simulates property of systems at the atomic scale. MPI parallel application IMB. Intel (formerly Pallas) Memory Benchmark. Alltoall among all MPI threads 7 14.2 to 22.4 12.7 to 15.79

Performance Observa)ons Setup to look like conven)onal GigE cluster achieves 0.26 * SSP (Sustained System Performance) of franklin per CPU but must evaluate throughput/$ Performance Characteris)cs Good TCP performance for large messages Nonuniform execu)on )mes (VMMs have lots of noise/jiper) Bare metal access to hardware High overhead for small messages No OS bypass (it s a VMM), so no efficient one sided messaging Poor shared disk I/O (good local I/O) Need more robust (infiniband) interconnect 8

What codes work well? Minimal synchroniza)on, Modest I/O requirements Large messages or very liple communica)on Low core counts (non uniform execu)on and limited scaling) Generally applica,ons that would do well on midrange clusters mostly run in LBL/IT and local cluster resources today 9

Integrated Microbial Genomes (IMG) Goal: improving overall quality of microbial genome data suppor)ng the compara)ve analysis of metagenomes genomes in IMG together with all available GEBA genomes Large amount of sequencing of microbial genomes and meta genome samples using BLAST the computa)on scheduled within a certain )me range takes about 3 weeks on a modest sized Linux cluster projected to exceed current compu)ng resources What can we do to help such applica)ons? Does cloud compu)ng and tools such as Hadoop help manage the task farming? 10

Hardware Plasorms Franklin: Tradi6onal HPC System 40k core, 360TFLOP Cray XT4 system at NERSC Lustre parallel filesystem Planck: Tradi6onal Midrange cluster 32 node Linux/x86/Infiniband Cluster at NERSC GPFS Global and Hadoop on Demand (HOD) Amazon EC2: Commercial Infrastructure as a Service Cloud Configure and boot customized virtual machines in Cloud Elas)c MapReduce/Hadoop images and S3 for parallel filesystem Yahoo M45: Shared Research PlaOorm as a Service Cloud 400 nodes, 8 cores per node, Intel Xeon E5320, 6GB per compute node, 910.95TB Hadoop/MapReduce service: HDFS and shared file system 11

Sotware Plasorms NCBI BLAST (2.2.22) Reference IMG genomes of 6.5 mil genes (~3Gb in size) Full input set 12.5 mil metagenome genes against reference BLAST Task Farming Implementa6on Server reads inputs and manages the tasks Client runs blast, copies database to local disk or ramdisk once on startup, pushes back results Advantages: fault resilient and allows incremental expansion as resources come available Hadoop/MapReduce implementa6on of BLAST Hadoop is open source implementa)on of MapReduce Sotware framework for processing huge datasets 12

Hadoop Processing Model Advantages of Hadoop Broadly supported on cloud plasorms, Transparent Data Replica)on Data locality aware scheduling Fault tolerance capabili)es Dynamic resource management for growing Implementa6on details Use streaming to launch a script that calls executable HDFS for input, need shared file system for binary and database Each sequence needs to be in a single line to use standard input format reader Custom input format reader that can understand blast sequences 13

Performance Comparison Evaluated small scale problem (2500 sequences) on mul6ple plaoorms (Limited by access and costs) Similar per core performance across plaoorms Time (seconds) 3000 2500 2000 1500 1000 EC2 Hadoop Planck Hadoop On Demand (HOD) Planck Task Farming Franklin Task Farming 500 0 32 64 128 Number of Cores 14

Supernova Factory Tools to measure expansion of universe and energy image matching algorithms data pipeline, task parallel workﬂow large data volume for supernova search Using Amazon EC2 Stable 32 bit Linux compu)ng environment Data requirements about 0.5 TB exis)ng data about 1 TB of storage for 12 months and 1 TB of transfer from the cloud. 15

Berkeley Water Center Studying global scale environmental processes integra)on of local, regional, global spa)al scales. integra)on across disciplines, e.g., climatology, hydrology, forestry, etc., and methodologies Common Eco Science Data Infrastructure address quality, heterogeneity and scale interfaces and services for accessing and processing data 16

MODerate resolu)on Imaging Spectroradiometer (MODIS) Two MODIS satellites near polar orbits global coverage every one to two days Data Integra)on challenges ~ 35 science data products including atmospheric and land products products are in different projec)on, resolu)ons (spa)al and temporal), different )mes data volume and processing requirements exceed desktop capacity 17

Windows Azure Cloud Solu)on Lower resource entry barriers Hide the complexi)es in data collec)on, reprojec)on and management from domain scien)sts A generic Reduc6on Service for scien)sts to upload arbitrary executables to perform scien)fic analysis on reprojected data. 90X Improvement over scien)st desktop MODIS Source Data Windows Azure Cloud Compu6ng PlaOorm Data Processing Pipeline 18 Scien)fic Results

An Enabling Service for Scien)sts Gives scien)st the ability to do analysis that was not possible before Programming model future experimenta)on with Dryad/MapReduce frameworks Interac)ve cases need to refined intermediate data products upload executables to perform scien)fic analysis on data. 19

DOE Cloud Research Magellan Project DOE Advanced Scien)fic Compu)ng Research (ASCR) $32.8M project at NERSC and Argonne (ALCF) ~100 TF/s compute cloud testbed (across sites) Petabyte scale storage cloud testbed Mission Deploy a test bed cloud to serve the needs of midrange scien)fic compu)ng. Evaluate the effec)veness of this system for a wide spectrum of DOE/SC applica)ons in comparison with other plasorm models. Determine the appropriate role for commercial and/ or private cloud compu)ng for DOE/SC midrange workloads 20

NERSC Magellan Cluster 720 nodes, 5760 cores in 9 Scalable Units (SUs) 61.9 Teraflops SU = IBM idataplex rack with 640 Intel Nehalem cores SU SU SU SU SU SU SU SU SU 18 Login/network nodes Login Network I/O 10G Ethernet Network Network 8G FC I/O Internet Load Balancer 100 G Router NERSC Global Filesystem HPSS (15PB) ANI 1 Petabyte with GPFS 21

NERSC Magellan Research Ques)ons What are the unique needs and features of a science cloud? What applica)ons can efficiently run on a cloud? Are cloud compu)ng APIs such as Hadoop effec)ve for scien)fic applica)ons? Can scien)fic applica)ons use a DaaS or SaaS model? Is it prac)cal to deploy a cloud services across mul)ple DOE sites? What are the security implica)ons of user controlled cloud images? What is the cost and energy efficiency of clouds? 22

Summary Cloud environments impact performance ongoing work to improve these environments for scien)fic applica)ons Cloud tools require customiza)ons suitable for scien)fic data processing Rethinking service model support for interac)ve applica)ons and dynamic sotware environments 23

Acknowledgements NERSC Benchmarks Harvey Wasserman, John Shalf IT Benchmarks Greg Bell, Keith Greg Kurtzer, Krishna Muriki, John White BLAST on Hadoop Victor Markowitz, John Shalf, Shane Canon, Lavanya Ramakrishnan, Shreyas Cholia, Nick Wright Supernova Factory on EC2 Rollin Thomas, Greg Aldering, Lavanya Ramakrishnan Berkeley Water Center Deb Agarwal, Catharine van Ingen (MSR), Jie Li (UVa), Youngryel Ryu (UCB), Marty Humphrey (UVa), Windows Azure team 24