Cloud Compu)ng for Science. Keith R. Jackson Computa)onal Research Division

Size: px
Start display at page:

Download "Cloud Compu)ng for Science. Keith R. Jackson krjackson@lbl.gov Computa)onal Research Division"

Transcription

1 Cloud Compu)ng for Science Keith R. Jackson Computa)onal Research Division

2 Why Clouds for Science? On demand access to compu)ng and cost associa)vity Parallel programming models for data intensive science e.g., BLAST parametric runs Customized and controlled environments e.g., Supernova Factory codes have sensi)vity to OS/ compiler versions Overflow capacity to supplement exis)ng systems e.g., Berkeley Water Center has analysis that far exceeds capacity of desktops 2

3 Cloud Early Evalua)ons How do DOE/LBL workloads perform in Cloud environments? What is the impact on performance from virtual environments? What do Cloud programming models like Hadoop offer to scien)fic experimenta)on? What does it take for scien)fic applica)ons to run in Cloud environments such as Amazon EC2? Do Clouds provide an alterna)ve for dataintensive interac)ve science? 3

4 Scien)fic Workloads at LBL High performance compu)ng codes supported by NERSC and other supercompu)ng centers Mid range compu)ng workloads that are serviced by LBL/IT Services, other local cluster environments Interac)ve data intensive processing usually run on scien)st s desktops 4

5 NERSC 6 Benchmarking Subset of NERSC 6 applica)on benchmarks for EC2 with smaller input sizes represent the requirements of the NERSC workload rigorous process for selec)on of codes workload and algorithm/science area coverage Run on EC2 high CPU XL (64 bit) nodes Intel C/Fortran compilers OpenMPI patched for cross subnet comm. $0.80/hour 5

6 Experiments on Amazon EC2 Codes Science Area Algorithm Space Configura6on Slowdown Reduc6on factor (SSP) Rela)ve to Franklin CAM Climate (BER) Navier Stokes CFD 200 processors Standard IPCC5 D Mesh resolu)on Could not complete 240 proc run due to transient node failures. Some I/O and small messages MILC Lafce Gauge Physics (NP) Conjugate gradient, sparse matrix; FFT Weak scaled: 14 4 lafce on 8, 32, 64, 128, and 256processors Erra)c execu)on )me IMPAC T T Acceler ator Physics (HEP) PIC, FFT component 64 processors, 64x128x128 grid and 4M par)cles PIC por)on performs well, but 3D FFT poor due to small message size MAEST RO Astrop hysics (HEP) Low Mach Hydro; block structuredgrid mul)physics 128 processors for 128^3 computa)onal mesh Small messages and allreduce for implicit solve. 6

7 Mid range codes on Amazon EC2 Lawrencium Cluster 64 bit/dual sockets per node/8 cores per node/ 16GB memory, Infiniband interconnect EC2 64 bit/2 cores per node/75gb,15gb and 7GB memory Code Slow down factor FMMSpeed. Fast Multipole Method. Pthread parallel code with ½ GB IO 1.3 to 2.1 GASBOR. A Genetic algortihm ab initio reconstruction algorithm. Serial workload, minimal I/O (KB) ABINT. DFT code that calculates the energy, charge density and electronic structure for molecules and periodic solids. Parallel MPI, minimal I/O to to 2.43 Hpcc. HPC Challenge Benchmark 2.8 to 8.8 VASP. Simulates property of systems at the atomic scale. MPI parallel application IMB. Intel (formerly Pallas) Memory Benchmark. Alltoall among all MPI threads to to 15.79

8 Performance Observa)ons Setup to look like conven)onal GigE cluster achieves 0.26 * SSP (Sustained System Performance) of franklin per CPU but must evaluate throughput/$ Performance Characteris)cs Good TCP performance for large messages Nonuniform execu)on )mes (VMMs have lots of noise/jiper) Bare metal access to hardware High overhead for small messages No OS bypass (it s a VMM), so no efficient one sided messaging Poor shared disk I/O (good local I/O) Need more robust (infiniband) interconnect 8

9 What codes work well? Minimal synchroniza)on, Modest I/O requirements Large messages or very liple communica)on Low core counts (non uniform execu)on and limited scaling) Generally applica,ons that would do well on midrange clusters mostly run in LBL/IT and local cluster resources today 9

10 Integrated Microbial Genomes (IMG) Goal: improving overall quality of microbial genome data suppor)ng the compara)ve analysis of metagenomes genomes in IMG together with all available GEBA genomes Large amount of sequencing of microbial genomes and meta genome samples using BLAST the computa)on scheduled within a certain )me range takes about 3 weeks on a modest sized Linux cluster projected to exceed current compu)ng resources What can we do to help such applica)ons? Does cloud compu)ng and tools such as Hadoop help manage the task farming? 10

11 Hardware Plasorms Franklin: Tradi6onal HPC System 40k core, 360TFLOP Cray XT4 system at NERSC Lustre parallel filesystem Planck: Tradi6onal Midrange cluster 32 node Linux/x86/Infiniband Cluster at NERSC GPFS Global and Hadoop on Demand (HOD) Amazon EC2: Commercial Infrastructure as a Service Cloud Configure and boot customized virtual machines in Cloud Elas)c MapReduce/Hadoop images and S3 for parallel filesystem Yahoo M45: Shared Research PlaOorm as a Service Cloud 400 nodes, 8 cores per node, Intel Xeon E5320, 6GB per compute node, TB Hadoop/MapReduce service: HDFS and shared file system 11

12 Sotware Plasorms NCBI BLAST (2.2.22) Reference IMG genomes of 6.5 mil genes (~3Gb in size) Full input set 12.5 mil metagenome genes against reference BLAST Task Farming Implementa6on Server reads inputs and manages the tasks Client runs blast, copies database to local disk or ramdisk once on startup, pushes back results Advantages: fault resilient and allows incremental expansion as resources come available Hadoop/MapReduce implementa6on of BLAST Hadoop is open source implementa)on of MapReduce Sotware framework for processing huge datasets 12

13 Hadoop Processing Model Advantages of Hadoop Broadly supported on cloud plasorms, Transparent Data Replica)on Data locality aware scheduling Fault tolerance capabili)es Dynamic resource management for growing Implementa6on details Use streaming to launch a script that calls executable HDFS for input, need shared file system for binary and database Each sequence needs to be in a single line to use standard input format reader Custom input format reader that can understand blast sequences 13

14 Performance Comparison Evaluated small scale problem (2500 sequences) on mul6ple plaoorms (Limited by access and costs) Similar per core performance across plaoorms Time (seconds) EC2 Hadoop Planck Hadoop On Demand (HOD) Planck Task Farming Franklin Task Farming Number of Cores 14

15 Supernova Factory Tools to measure expansion of universe and energy image matching algorithms data pipeline, task parallel workflow large data volume for supernova search Using Amazon EC2 Stable 32 bit Linux compu)ng environment Data requirements about 0.5 TB exis)ng data about 1 TB of storage for 12 months and 1 TB of transfer from the cloud. 15

16 Berkeley Water Center Studying global scale environmental processes integra)on of local, regional, global spa)al scales. integra)on across disciplines, e.g., climatology, hydrology, forestry, etc., and methodologies Common Eco Science Data Infrastructure address quality, heterogeneity and scale interfaces and services for accessing and processing data 16

17 MODerate resolu)on Imaging Spectroradiometer (MODIS) Two MODIS satellites near polar orbits global coverage every one to two days Data Integra)on challenges ~ 35 science data products including atmospheric and land products products are in different projec)on, resolu)ons (spa)al and temporal), different )mes data volume and processing requirements exceed desktop capacity 17

18 Windows Azure Cloud Solu)on Lower resource entry barriers Hide the complexi)es in data collec)on, reprojec)on and management from domain scien)sts A generic Reduc6on Service for scien)sts to upload arbitrary executables to perform scien)fic analysis on reprojected data. 90X Improvement over scien)st desktop MODIS Source Data Windows Azure Cloud Compu6ng PlaOorm Data Processing Pipeline 18 Scien)fic Results

19 An Enabling Service for Scien)sts Gives scien)st the ability to do analysis that was not possible before Programming model future experimenta)on with Dryad/MapReduce frameworks Interac)ve cases need to refined intermediate data products upload executables to perform scien)fic analysis on data. 19

20 DOE Cloud Research Magellan Project DOE Advanced Scien)fic Compu)ng Research (ASCR) $32.8M project at NERSC and Argonne (ALCF) ~100 TF/s compute cloud testbed (across sites) Petabyte scale storage cloud testbed Mission Deploy a test bed cloud to serve the needs of midrange scien)fic compu)ng. Evaluate the effec)veness of this system for a wide spectrum of DOE/SC applica)ons in comparison with other plasorm models. Determine the appropriate role for commercial and/ or private cloud compu)ng for DOE/SC midrange workloads 20

21 NERSC Magellan Cluster 720 nodes, 5760 cores in 9 Scalable Units (SUs) 61.9 Teraflops SU = IBM idataplex rack with 640 Intel Nehalem cores SU SU SU SU SU SU SU SU SU 18 Login/network nodes Login Network I/O 10G Ethernet Network Network 8G FC I/O Internet Load Balancer 100 G Router NERSC Global Filesystem HPSS (15PB) ANI 1 Petabyte with GPFS 21

22 NERSC Magellan Research Ques)ons What are the unique needs and features of a science cloud? What applica)ons can efficiently run on a cloud? Are cloud compu)ng APIs such as Hadoop effec)ve for scien)fic applica)ons? Can scien)fic applica)ons use a DaaS or SaaS model? Is it prac)cal to deploy a cloud services across mul)ple DOE sites? What are the security implica)ons of user controlled cloud images? What is the cost and energy efficiency of clouds? 22

23 Summary Cloud environments impact performance ongoing work to improve these environments for scien)fic applica)ons Cloud tools require customiza)ons suitable for scien)fic data processing Rethinking service model support for interac)ve applica)ons and dynamic sotware environments 23

24 Acknowledgements NERSC Benchmarks Harvey Wasserman, John Shalf IT Benchmarks Greg Bell, Keith Greg Kurtzer, Krishna Muriki, John White BLAST on Hadoop Victor Markowitz, John Shalf, Shane Canon, Lavanya Ramakrishnan, Shreyas Cholia, Nick Wright Supernova Factory on EC2 Rollin Thomas, Greg Aldering, Lavanya Ramakrishnan Berkeley Water Center Deb Agarwal, Catharine van Ingen (MSR), Jie Li (UVa), Youngryel Ryu (UCB), Marty Humphrey (UVa), Windows Azure team 24

Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011

Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011 Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011 Magellan Exploring Cloud Computing Co-located at two DOE-SC Facilities

More information

Science in the Cloud Exploring Cloud Computing for Science Shane Canon. Moab Con May 11, 2011

Science in the Cloud Exploring Cloud Computing for Science Shane Canon. Moab Con May 11, 2011 Science in the Cloud Exploring Cloud Computing for Science Shane Canon Moab Con May 11, 2011 Outline Definitions The Magellan Project Experience and Lessons Learned Cloud Misconceptions Closing remarks

More information

Performance of HPC Applications on the Amazon Web Services Cloud

Performance of HPC Applications on the Amazon Web Services Cloud Cloudcom 2010 November 1, 2010 Indianapolis, IN Performance of HPC Applications on the Amazon Web Services Cloud Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, Harvey

More information

Cloud-Based Computation and Collaboration: the Challenges for IT Infrastructure. Greg Bell Driving e-research Across the Pacific `09 Sydney: 11/12/09

Cloud-Based Computation and Collaboration: the Challenges for IT Infrastructure. Greg Bell Driving e-research Across the Pacific `09 Sydney: 11/12/09 Cloud-Based Computation and Collaboration: the Challenges for IT Infrastructure Greg Bell Driving e-research Across the Pacific `09 Sydney: 11/12/09 Current State of Cloud Utilization beginnings of adoption

More information

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

MAGELLAN 54 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG

MAGELLAN 54 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG MAGELLAN Exploring CLOUD Computing for DOE s Scientific Mission Cloud computing is gaining traction in the commercial world, with companies like Amazon, Google, and Yahoo offering pay-to-play cycles to

More information

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD Kashif.iqbal@ichec.ie ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo

More information

Debunking some Common Misconceptions of Science in the Cloud

Debunking some Common Misconceptions of Science in the Cloud Debunking some Common Misconceptions of Science in the Cloud Shane Canon Lawrence Berkeley National Lab ScienceCloud 2011 San Jose, CA The Push towards Clouds A survey of 102 large, multinational companies

More information

High Performance Computing (HPC)

High Performance Computing (HPC) High Performance Computing (HPC) High Performance Computing (HPC) White Paper Attn: Name, Title Phone: xxx.xxx.xxxx Fax: xxx.xxx.xxxx 1.0 OVERVIEW When heterogeneous enterprise environments are involved,

More information

Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis

Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis Catharine van Ingen 1, Jie Li 2, Youngryel Ryu 3, Marty Humphrey 2, Deb Agarwal 4, Keith Jackson

More information

Big Data and Clouds: Challenges and Opportuni5es

Big Data and Clouds: Challenges and Opportuni5es Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015 NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015-1 - A little bit about myself Computer Scien.st Brown, IIT Delhi Real- 3me Graphics, Virtual Reality, HCI Computa3onal

More information

Big Data Research at DKRZ

Big Data Research at DKRZ Big Data Research at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scien:fic Compu:ng Research Group Symposium Big Data in Science Karlsruhe October 7th, 2014 Big Data in Climate Research Big

More information

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM Data Center Evolu.on and the Cloud Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM 1 Hardware Evolu.on 2 Where is hardware going? x86 con(nues to move upstream Massive compute

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

How to Build a Data Center?

How to Build a Data Center? Next up Cloud Compu-ng Warehouse scale computers How to build/program data centers Google so?ware stack GFS BigTable Sawzall Chubby Map/reduce What is cloud compu-ng Illusion of infinite compu-ng resources

More information

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012 MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

Behind the scene III Cloud computing

Behind the scene III Cloud computing Behind the scene III Cloud computing Athens, 15.11.2014 M. Dolenc / R. Klinc Why we do it? Engineering in the cloud is a combina3on of cloud based services and rich interac3ve applica3ons allowing engineers

More information

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain

More information

Big Data and Scientific Discovery

Big Data and Scientific Discovery Big Data and Scientific Discovery Bill Harrod Office of Science William.Harrod@science.doe.gov! February 26, 2014! Big Data and Scien*fic Discovery Next genera*on scien*fic breakthroughs require: Major

More information

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012 Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................

More information

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster Fang (Cherry) Liu, PhD fang.liu@oit.gatech.edu A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Targets

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

Open Cirrus: Towards an Open Source Cloud Stack

Open Cirrus: Towards an Open Source Cloud Stack Open Cirrus: Towards an Open Source Cloud Stack Karlsruhe Institute of Technology (KIT) HPC2010, Cetraro, June 2010 Marcel Kunze KIT University of the State of Baden-Württemberg and National Laboratory

More information

Clusters in the Cloud

Clusters in the Cloud Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Compu:ng Specialist eresearch SA October 2014 Use Cases Make the cloud easier to use for compute jobs Par:cularly for users

More information

I/O Performance of Virtualized Cloud Environments

I/O Performance of Virtualized Cloud Environments I/O Performance of Virtualized Cloud Environments Devarshi Ghoshal Indiana University Bloomington, IN 4745 dghoshal@cs.indiana.edu R. Shane Canon Lawrence Berkeley National Lab Berkeley, CA 9472 scanon@lbl.gov

More information

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially

More information

SR-IOV: Performance Benefits for Virtualized Interconnects!

SR-IOV: Performance Benefits for Virtualized Interconnects! SR-IOV: Performance Benefits for Virtualized Interconnects! Glenn K. Lockwood! Mahidhar Tatineni! Rick Wagner!! July 15, XSEDE14, Atlanta! Background! High Performance Computing (HPC) reaching beyond traditional

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Cloud Computing for Science

Cloud Computing for Science The Magellan Report on Cloud Computing for Science U.S. Department of Energy Office of Advanced Scientific Computing Research (ASCR) December, 2011 CSO 23179 The Magellan Report on Cloud Computing for

More information

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research 2 Cloud Concepts Data Center Architecture The cloud flavors: IaaS, PaaS, SaaS Our world of client devices plus the cloud

More information

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title: I/O Performance of Virtualized Cloud Environments Author: Ghoshal, Devarshi Publication Date: 02-12-2013 Permalink: http://escholarship.org/uc/item/67z2q3qc

More information

Cloud Computing through Virtualization and HPC technologies

Cloud Computing through Virtualization and HPC technologies Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC

More information

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

Harnessing the High Performance Capabili5es of Cloud over the Internet

Harnessing the High Performance Capabili5es of Cloud over the Internet Harnessing the High Performance Capabili5es of Cloud over the Internet Jaison Paul Mulerikkal, PhD HPC Knowledge Portal Meeting 2015 Barcelona, Spain About Me Jaison Paul Mulerikkal B Tech Mahatma Gandhi

More information

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman A Very Brief Introduction To Cloud Computing Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman What is The Cloud Cloud computing refers to logical computational resources accessible via a computer

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5 A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5 R. Henschel, S. Teige, H. Li, J. Doleschal, M. S. Mueller October 2010 Contents HPC at Indiana University

More information

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

Cloud Computing. Alex Crawford Ben Johnstone

Cloud Computing. Alex Crawford Ben Johnstone Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a

More information

HPCHadoop: MapReduce on Cray X-series

HPCHadoop: MapReduce on Cray X-series HPCHadoop: MapReduce on Cray X-series Scott Michael Research Analytics Indiana University Cray User Group Meeting May 7, 2014 1 Outline Motivation & Design of HPCHadoop HPCHadoop demo Benchmarking Methodology

More information

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope copej@mcs.anl.gov

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope copej@mcs.anl.gov Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems Jason Cope copej@mcs.anl.gov Computation and I/O Performance Imbalance Leadership class computa:onal scale: >100,000

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

How To Make A Cloud Based Computer Power Available To A Computer (For Free)

How To Make A Cloud Based Computer Power Available To A Computer (For Free) Cloud Compu)ng Adam Belloum Ins)tute of Informa)cs University of Amsterdam a.s.z.belloum@uva.nl High Performance compu)ng Curriculum, Jan 2015 hgp://www.hpc.uva.nl/ UvA- SURFsara What is Cloud Compu)ng?

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

Parallel Large-Scale Visualization

Parallel Large-Scale Visualization Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Processing of Mix- Sensi0vity Video Surveillance Streams on Hybrid Clouds

Processing of Mix- Sensi0vity Video Surveillance Streams on Hybrid Clouds Processing of Mix- Sensi0vity Video Surveillance Streams on Hybrid Clouds Chunwang Zhang, Ee- Chien Chang School of Compu2ng, Na2onal University of Singapore 28 th June, 2014 Outline 1. Mo0va0on 2. Hybrid

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Cluster Implementation and Management; Scheduling

Cluster Implementation and Management; Scheduling Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /

More information

The CNMS Computer Cluster

The CNMS Computer Cluster The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the

More information

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline

More information

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity Cloud computing et Virtualisation : applications au domaine de la Finance Denis Caromel, CEO Ac.veEon Orchestrate and Accelerate Applica.ons Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst

More information

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud) Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University

More information

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007 Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements

More information

On Demand Satellite Image Processing

On Demand Satellite Image Processing On Demand Satellite Image Processing Next generation technology for processing Terabytes of imagery on the Cloud WHITEPAPER MARCH 2015 Introduction Profound changes are happening with computing hardware

More information

Internet Storage Sync Problem Statement

Internet Storage Sync Problem Statement Internet Storage Sync Problem Statement draft-cui-iss-problem Zeqi Lai Tsinghua University 1 Outline Background Problem Statement Service Usability Protocol Capabili?es Our Explora?on on Protocol Capabili?es

More information

Some Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo

Some Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo Some Security Challenges of Cloud Compu6ng Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo Cloud Compu6ng: the Next Big Thing Tremendous momentum ahead: Prediction

More information

Modeling Big Data/HPC Storage Using Massively Parallel Simula:on

Modeling Big Data/HPC Storage Using Massively Parallel Simula:on Modeling Big Data/HPC Storage Using Massively Parallel Simula:on Chris Carothers (CCNI) Misbah Mubarak (CS) Rensselaer Polytechnic Ins:tute chrisc@cs.rpi.edu Rob Ross Phil Carns MCS/ANL rross@mcs.anl.gov

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Data Semantics Aware Cloud for High Performance Analytics

Data Semantics Aware Cloud for High Performance Analytics Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement

More information

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research 2 Cloud Concepts Data Center Architecture The cloud flavors: IaaS, PaaS, SaaS Our world of client devices plus the cloud

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Cloud-based Analytics and Map Reduce

Cloud-based Analytics and Map Reduce 1 Cloud-based Analytics and Map Reduce Datasets Many technologies converging around Big Data theme Cloud Computing, NoSQL, Graph Analytics Biology is becoming increasingly data intensive Sequencing, imaging,

More information

UAB Cyber Security Ini1a1ve

UAB Cyber Security Ini1a1ve UAB Cyber Security Ini1a1ve Purpose of the Cyber Security Ini1a1ve? To provide a secure Compu1ng Environment Individual Mechanisms Single Source for Inventory and Asset Management Current Repor1ng Environment

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

How To Understand Cloud Compueng

How To Understand Cloud Compueng Data Management in the Cloud Introduc)on (Lecture 1) Do one thing every day that scares you. Eleanor Roosevelt 1 Data Management in the Cloud LOGISTICS AND ORGANIZATION 2 Kris)n TuCe FAB 115-09 Personnel

More information

Licensing++ for Clouds. Mark Perry

Licensing++ for Clouds. Mark Perry Licensing++ for Clouds Mark Perry Plan* 1. Cloud? 2. Survey 3. Some ques@ons 4. Some ideas 5. Some sugges@ons (that would be you) * Plan 9 future events such as these will affect you in the future Clouds

More information

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Scientific Computing Meets Big Data Technology: An Astronomy Use Case Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver

More information

Bulk Synchronous Programmers and Design

Bulk Synchronous Programmers and Design Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications ROSS 2011 Tucson, AZ Terry Jones Oak Ridge National Laboratory 1 Managed by UT-Battelle Outline Motivation Approach & Research Design

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Computing in clouds: Where we come from, Where we are, What we can, Where we go

Computing in clouds: Where we come from, Where we are, What we can, Where we go Computing in clouds: Where we come from, Where we are, What we can, Where we go Luc Bougé ENS Cachan/Rennes, IRISA, INRIA Biogenouest With help from many colleagues: Gabriel Antoniu, Guillaume Pierre,

More information

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Performance Evaluation of Amazon EC2 for NASA HPC Applications! National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!

More information

Evaluating MapReduce and Hadoop for Science

Evaluating MapReduce and Hadoop for Science Evaluating MapReduce and Hadoop for Science Lavanya Ramakrishnan LRamakrishnan@lbl.gov Lawrence Berkeley National Lab Computation and Data are critical parts of the scientific process Three Pillars of

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National

More information

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Viswanath Nandigam Sriram Krishnan Chaitan Baru Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance

More information

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

Performance Across the Generations: Processor and Interconnect Technologies

Performance Across the Generations: Processor and Interconnect Technologies WHITE Paper Performance Across the Generations: Processor and Interconnect Technologies HPC Performance Results ANSYS CFD 12 Executive Summary Today s engineering, research, and development applications

More information

Recommended hardware system configurations for ANSYS users

Recommended hardware system configurations for ANSYS users Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range

More information

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and

More information

BENCHMARKING V ISUALIZATION TOOL

BENCHMARKING V ISUALIZATION TOOL Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information