Cloud Compu)ng for Science. Keith R. Jackson Computa)onal Research Division
|
|
- Simon Greene
- 8 years ago
- Views:
Transcription
1 Cloud Compu)ng for Science Keith R. Jackson Computa)onal Research Division
2 Why Clouds for Science? On demand access to compu)ng and cost associa)vity Parallel programming models for data intensive science e.g., BLAST parametric runs Customized and controlled environments e.g., Supernova Factory codes have sensi)vity to OS/ compiler versions Overflow capacity to supplement exis)ng systems e.g., Berkeley Water Center has analysis that far exceeds capacity of desktops 2
3 Cloud Early Evalua)ons How do DOE/LBL workloads perform in Cloud environments? What is the impact on performance from virtual environments? What do Cloud programming models like Hadoop offer to scien)fic experimenta)on? What does it take for scien)fic applica)ons to run in Cloud environments such as Amazon EC2? Do Clouds provide an alterna)ve for dataintensive interac)ve science? 3
4 Scien)fic Workloads at LBL High performance compu)ng codes supported by NERSC and other supercompu)ng centers Mid range compu)ng workloads that are serviced by LBL/IT Services, other local cluster environments Interac)ve data intensive processing usually run on scien)st s desktops 4
5 NERSC 6 Benchmarking Subset of NERSC 6 applica)on benchmarks for EC2 with smaller input sizes represent the requirements of the NERSC workload rigorous process for selec)on of codes workload and algorithm/science area coverage Run on EC2 high CPU XL (64 bit) nodes Intel C/Fortran compilers OpenMPI patched for cross subnet comm. $0.80/hour 5
6 Experiments on Amazon EC2 Codes Science Area Algorithm Space Configura6on Slowdown Reduc6on factor (SSP) Rela)ve to Franklin CAM Climate (BER) Navier Stokes CFD 200 processors Standard IPCC5 D Mesh resolu)on Could not complete 240 proc run due to transient node failures. Some I/O and small messages MILC Lafce Gauge Physics (NP) Conjugate gradient, sparse matrix; FFT Weak scaled: 14 4 lafce on 8, 32, 64, 128, and 256processors Erra)c execu)on )me IMPAC T T Acceler ator Physics (HEP) PIC, FFT component 64 processors, 64x128x128 grid and 4M par)cles PIC por)on performs well, but 3D FFT poor due to small message size MAEST RO Astrop hysics (HEP) Low Mach Hydro; block structuredgrid mul)physics 128 processors for 128^3 computa)onal mesh Small messages and allreduce for implicit solve. 6
7 Mid range codes on Amazon EC2 Lawrencium Cluster 64 bit/dual sockets per node/8 cores per node/ 16GB memory, Infiniband interconnect EC2 64 bit/2 cores per node/75gb,15gb and 7GB memory Code Slow down factor FMMSpeed. Fast Multipole Method. Pthread parallel code with ½ GB IO 1.3 to 2.1 GASBOR. A Genetic algortihm ab initio reconstruction algorithm. Serial workload, minimal I/O (KB) ABINT. DFT code that calculates the energy, charge density and electronic structure for molecules and periodic solids. Parallel MPI, minimal I/O to to 2.43 Hpcc. HPC Challenge Benchmark 2.8 to 8.8 VASP. Simulates property of systems at the atomic scale. MPI parallel application IMB. Intel (formerly Pallas) Memory Benchmark. Alltoall among all MPI threads to to 15.79
8 Performance Observa)ons Setup to look like conven)onal GigE cluster achieves 0.26 * SSP (Sustained System Performance) of franklin per CPU but must evaluate throughput/$ Performance Characteris)cs Good TCP performance for large messages Nonuniform execu)on )mes (VMMs have lots of noise/jiper) Bare metal access to hardware High overhead for small messages No OS bypass (it s a VMM), so no efficient one sided messaging Poor shared disk I/O (good local I/O) Need more robust (infiniband) interconnect 8
9 What codes work well? Minimal synchroniza)on, Modest I/O requirements Large messages or very liple communica)on Low core counts (non uniform execu)on and limited scaling) Generally applica,ons that would do well on midrange clusters mostly run in LBL/IT and local cluster resources today 9
10 Integrated Microbial Genomes (IMG) Goal: improving overall quality of microbial genome data suppor)ng the compara)ve analysis of metagenomes genomes in IMG together with all available GEBA genomes Large amount of sequencing of microbial genomes and meta genome samples using BLAST the computa)on scheduled within a certain )me range takes about 3 weeks on a modest sized Linux cluster projected to exceed current compu)ng resources What can we do to help such applica)ons? Does cloud compu)ng and tools such as Hadoop help manage the task farming? 10
11 Hardware Plasorms Franklin: Tradi6onal HPC System 40k core, 360TFLOP Cray XT4 system at NERSC Lustre parallel filesystem Planck: Tradi6onal Midrange cluster 32 node Linux/x86/Infiniband Cluster at NERSC GPFS Global and Hadoop on Demand (HOD) Amazon EC2: Commercial Infrastructure as a Service Cloud Configure and boot customized virtual machines in Cloud Elas)c MapReduce/Hadoop images and S3 for parallel filesystem Yahoo M45: Shared Research PlaOorm as a Service Cloud 400 nodes, 8 cores per node, Intel Xeon E5320, 6GB per compute node, TB Hadoop/MapReduce service: HDFS and shared file system 11
12 Sotware Plasorms NCBI BLAST (2.2.22) Reference IMG genomes of 6.5 mil genes (~3Gb in size) Full input set 12.5 mil metagenome genes against reference BLAST Task Farming Implementa6on Server reads inputs and manages the tasks Client runs blast, copies database to local disk or ramdisk once on startup, pushes back results Advantages: fault resilient and allows incremental expansion as resources come available Hadoop/MapReduce implementa6on of BLAST Hadoop is open source implementa)on of MapReduce Sotware framework for processing huge datasets 12
13 Hadoop Processing Model Advantages of Hadoop Broadly supported on cloud plasorms, Transparent Data Replica)on Data locality aware scheduling Fault tolerance capabili)es Dynamic resource management for growing Implementa6on details Use streaming to launch a script that calls executable HDFS for input, need shared file system for binary and database Each sequence needs to be in a single line to use standard input format reader Custom input format reader that can understand blast sequences 13
14 Performance Comparison Evaluated small scale problem (2500 sequences) on mul6ple plaoorms (Limited by access and costs) Similar per core performance across plaoorms Time (seconds) EC2 Hadoop Planck Hadoop On Demand (HOD) Planck Task Farming Franklin Task Farming Number of Cores 14
15 Supernova Factory Tools to measure expansion of universe and energy image matching algorithms data pipeline, task parallel workflow large data volume for supernova search Using Amazon EC2 Stable 32 bit Linux compu)ng environment Data requirements about 0.5 TB exis)ng data about 1 TB of storage for 12 months and 1 TB of transfer from the cloud. 15
16 Berkeley Water Center Studying global scale environmental processes integra)on of local, regional, global spa)al scales. integra)on across disciplines, e.g., climatology, hydrology, forestry, etc., and methodologies Common Eco Science Data Infrastructure address quality, heterogeneity and scale interfaces and services for accessing and processing data 16
17 MODerate resolu)on Imaging Spectroradiometer (MODIS) Two MODIS satellites near polar orbits global coverage every one to two days Data Integra)on challenges ~ 35 science data products including atmospheric and land products products are in different projec)on, resolu)ons (spa)al and temporal), different )mes data volume and processing requirements exceed desktop capacity 17
18 Windows Azure Cloud Solu)on Lower resource entry barriers Hide the complexi)es in data collec)on, reprojec)on and management from domain scien)sts A generic Reduc6on Service for scien)sts to upload arbitrary executables to perform scien)fic analysis on reprojected data. 90X Improvement over scien)st desktop MODIS Source Data Windows Azure Cloud Compu6ng PlaOorm Data Processing Pipeline 18 Scien)fic Results
19 An Enabling Service for Scien)sts Gives scien)st the ability to do analysis that was not possible before Programming model future experimenta)on with Dryad/MapReduce frameworks Interac)ve cases need to refined intermediate data products upload executables to perform scien)fic analysis on data. 19
20 DOE Cloud Research Magellan Project DOE Advanced Scien)fic Compu)ng Research (ASCR) $32.8M project at NERSC and Argonne (ALCF) ~100 TF/s compute cloud testbed (across sites) Petabyte scale storage cloud testbed Mission Deploy a test bed cloud to serve the needs of midrange scien)fic compu)ng. Evaluate the effec)veness of this system for a wide spectrum of DOE/SC applica)ons in comparison with other plasorm models. Determine the appropriate role for commercial and/ or private cloud compu)ng for DOE/SC midrange workloads 20
21 NERSC Magellan Cluster 720 nodes, 5760 cores in 9 Scalable Units (SUs) 61.9 Teraflops SU = IBM idataplex rack with 640 Intel Nehalem cores SU SU SU SU SU SU SU SU SU 18 Login/network nodes Login Network I/O 10G Ethernet Network Network 8G FC I/O Internet Load Balancer 100 G Router NERSC Global Filesystem HPSS (15PB) ANI 1 Petabyte with GPFS 21
22 NERSC Magellan Research Ques)ons What are the unique needs and features of a science cloud? What applica)ons can efficiently run on a cloud? Are cloud compu)ng APIs such as Hadoop effec)ve for scien)fic applica)ons? Can scien)fic applica)ons use a DaaS or SaaS model? Is it prac)cal to deploy a cloud services across mul)ple DOE sites? What are the security implica)ons of user controlled cloud images? What is the cost and energy efficiency of clouds? 22
23 Summary Cloud environments impact performance ongoing work to improve these environments for scien)fic applica)ons Cloud tools require customiza)ons suitable for scien)fic data processing Rethinking service model support for interac)ve applica)ons and dynamic sotware environments 23
24 Acknowledgements NERSC Benchmarks Harvey Wasserman, John Shalf IT Benchmarks Greg Bell, Keith Greg Kurtzer, Krishna Muriki, John White BLAST on Hadoop Victor Markowitz, John Shalf, Shane Canon, Lavanya Ramakrishnan, Shreyas Cholia, Nick Wright Supernova Factory on EC2 Rollin Thomas, Greg Aldering, Lavanya Ramakrishnan Berkeley Water Center Deb Agarwal, Catharine van Ingen (MSR), Jie Li (UVa), Youngryel Ryu (UCB), Marty Humphrey (UVa), Windows Azure team 24
Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011
Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011 Magellan Exploring Cloud Computing Co-located at two DOE-SC Facilities
More informationScience in the Cloud Exploring Cloud Computing for Science Shane Canon. Moab Con May 11, 2011
Science in the Cloud Exploring Cloud Computing for Science Shane Canon Moab Con May 11, 2011 Outline Definitions The Magellan Project Experience and Lessons Learned Cloud Misconceptions Closing remarks
More informationPerformance of HPC Applications on the Amazon Web Services Cloud
Cloudcom 2010 November 1, 2010 Indianapolis, IN Performance of HPC Applications on the Amazon Web Services Cloud Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, Harvey
More informationCloud-Based Computation and Collaboration: the Challenges for IT Infrastructure. Greg Bell Driving e-research Across the Pacific `09 Sydney: 11/12/09
Cloud-Based Computation and Collaboration: the Challenges for IT Infrastructure Greg Bell Driving e-research Across the Pacific `09 Sydney: 11/12/09 Current State of Cloud Utilization beginnings of adoption
More informationHow To Compare Amazon Ec2 To A Supercomputer For Scientific Applications
Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationMAGELLAN 54 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG
MAGELLAN Exploring CLOUD Computing for DOE s Scientific Mission Cloud computing is gaining traction in the commercial world, with companies like Amazon, Google, and Yahoo offering pay-to-play cycles to
More informationKashif Iqbal - PhD Kashif.iqbal@ichec.ie
HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD Kashif.iqbal@ichec.ie ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo
More informationDebunking some Common Misconceptions of Science in the Cloud
Debunking some Common Misconceptions of Science in the Cloud Shane Canon Lawrence Berkeley National Lab ScienceCloud 2011 San Jose, CA The Push towards Clouds A survey of 102 large, multinational companies
More informationHigh Performance Computing (HPC)
High Performance Computing (HPC) High Performance Computing (HPC) White Paper Attn: Name, Title Phone: xxx.xxx.xxxx Fax: xxx.xxx.xxxx 1.0 OVERVIEW When heterogeneous enterprise environments are involved,
More informationEnabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis
Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis Catharine van Ingen 1, Jie Li 2, Youngryel Ryu 3, Marty Humphrey 2, Deb Agarwal 4, Keith Jackson
More informationBig Data and Clouds: Challenges and Opportuni5es
Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationNERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015
NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015-1 - A little bit about myself Computer Scien.st Brown, IIT Delhi Real- 3me Graphics, Virtual Reality, HCI Computa3onal
More informationBig Data Research at DKRZ
Big Data Research at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scien:fic Compu:ng Research Group Symposium Big Data in Science Karlsruhe October 7th, 2014 Big Data in Climate Research Big
More informationData Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM
Data Center Evolu.on and the Cloud Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM 1 Hardware Evolu.on 2 Where is hardware going? x86 con(nues to move upstream Massive compute
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationHow to Build a Data Center?
Next up Cloud Compu-ng Warehouse scale computers How to build/program data centers Google so?ware stack GFS BigTable Sawzall Chubby Map/reduce What is cloud compu-ng Illusion of infinite compu-ng resources
More informationMapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012
MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte
More informationSun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
More informationBehind the scene III Cloud computing
Behind the scene III Cloud computing Athens, 15.11.2014 M. Dolenc / R. Klinc Why we do it? Engineering in the cloud is a combina3on of cloud based services and rich interac3ve applica3ons allowing engineers
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationBig Data and Scientific Discovery
Big Data and Scientific Discovery Bill Harrod Office of Science William.Harrod@science.doe.gov! February 26, 2014! Big Data and Scien*fic Discovery Next genera*on scien*fic breakthroughs require: Major
More informationDepartment of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012
Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................
More informationLinux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech
Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster Fang (Cherry) Liu, PhD fang.liu@oit.gatech.edu A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Targets
More informationAgenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
More informationOpen Cirrus: Towards an Open Source Cloud Stack
Open Cirrus: Towards an Open Source Cloud Stack Karlsruhe Institute of Technology (KIT) HPC2010, Cetraro, June 2010 Marcel Kunze KIT University of the State of Baden-Württemberg and National Laboratory
More informationClusters in the Cloud
Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Compu:ng Specialist eresearch SA October 2014 Use Cases Make the cloud easier to use for compute jobs Par:cularly for users
More informationI/O Performance of Virtualized Cloud Environments
I/O Performance of Virtualized Cloud Environments Devarshi Ghoshal Indiana University Bloomington, IN 4745 dghoshal@cs.indiana.edu R. Shane Canon Lawrence Berkeley National Lab Berkeley, CA 9472 scanon@lbl.gov
More informationCloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research
Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially
More informationSR-IOV: Performance Benefits for Virtualized Interconnects!
SR-IOV: Performance Benefits for Virtualized Interconnects! Glenn K. Lockwood! Mahidhar Tatineni! Rick Wagner!! July 15, XSEDE14, Atlanta! Background! High Performance Computing (HPC) reaching beyond traditional
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationCloud Computing for Science
The Magellan Report on Cloud Computing for Science U.S. Department of Energy Office of Advanced Scientific Computing Research (ASCR) December, 2011 CSO 23179 The Magellan Report on Cloud Computing for
More informationDennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research
Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research 2 Cloud Concepts Data Center Architecture The cloud flavors: IaaS, PaaS, SaaS Our world of client devices plus the cloud
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title: I/O Performance of Virtualized Cloud Environments Author: Ghoshal, Devarshi Publication Date: 02-12-2013 Permalink: http://escholarship.org/uc/item/67z2q3qc
More informationCloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
More informationMap- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering
Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming
More informationClusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
More informationHarnessing the High Performance Capabili5es of Cloud over the Internet
Harnessing the High Performance Capabili5es of Cloud over the Internet Jaison Paul Mulerikkal, PhD HPC Knowledge Portal Meeting 2015 Barcelona, Spain About Me Jaison Paul Mulerikkal B Tech Mahatma Gandhi
More informationA Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman
A Very Brief Introduction To Cloud Computing Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman What is The Cloud Cloud computing refers to logical computational resources accessible via a computer
More informationSURFsara HPC Cloud Workshop
SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current
More informationA PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5
A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5 R. Henschel, S. Teige, H. Li, J. Doleschal, M. S. Mueller October 2010 Contents HPC at Indiana University
More informationHunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
More informationComputational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar
Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working
More informationCloud Computing. Alex Crawford Ben Johnstone
Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a
More informationHPCHadoop: MapReduce on Cray X-series
HPCHadoop: MapReduce on Cray X-series Scott Michael Research Analytics Indiana University Cray User Group Meeting May 7, 2014 1 Outline Motivation & Design of HPCHadoop HPCHadoop demo Benchmarking Methodology
More informationPortable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope copej@mcs.anl.gov
Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems Jason Cope copej@mcs.anl.gov Computation and I/O Performance Imbalance Leadership class computa:onal scale: >100,000
More informationNextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
More informationSURFsara HPC Cloud Workshop
SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current
More informationHow To Make A Cloud Based Computer Power Available To A Computer (For Free)
Cloud Compu)ng Adam Belloum Ins)tute of Informa)cs University of Amsterdam a.s.z.belloum@uva.nl High Performance compu)ng Curriculum, Jan 2015 hgp://www.hpc.uva.nl/ UvA- SURFsara What is Cloud Compu)ng?
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationData Requirements from NERSC Requirements Reviews
Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationAppro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
More informationParallel Large-Scale Visualization
Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationProcessing of Mix- Sensi0vity Video Surveillance Streams on Hybrid Clouds
Processing of Mix- Sensi0vity Video Surveillance Streams on Hybrid Clouds Chunwang Zhang, Ee- Chien Chang School of Compu2ng, Na2onal University of Singapore 28 th June, 2014 Outline 1. Mo0va0on 2. Hybrid
More informationCan High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationCluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
More informationThe CNMS Computer Cluster
The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the
More informationLeveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
More informationDenis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity
Cloud computing et Virtualisation : applications au domaine de la Finance Denis Caromel, CEO Ac.veEon Orchestrate and Accelerate Applica.ons Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst
More informationOpen Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)
Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University
More informationPerformance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements
More informationOn Demand Satellite Image Processing
On Demand Satellite Image Processing Next generation technology for processing Terabytes of imagery on the Cloud WHITEPAPER MARCH 2015 Introduction Profound changes are happening with computing hardware
More informationInternet Storage Sync Problem Statement
Internet Storage Sync Problem Statement draft-cui-iss-problem Zeqi Lai Tsinghua University 1 Outline Background Problem Statement Service Usability Protocol Capabili?es Our Explora?on on Protocol Capabili?es
More informationSome Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo
Some Security Challenges of Cloud Compu6ng Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo Cloud Compu6ng: the Next Big Thing Tremendous momentum ahead: Prediction
More informationModeling Big Data/HPC Storage Using Massively Parallel Simula:on
Modeling Big Data/HPC Storage Using Massively Parallel Simula:on Chris Carothers (CCNI) Misbah Mubarak (CS) Rensselaer Polytechnic Ins:tute chrisc@cs.rpi.edu Rob Ross Phil Carns MCS/ANL rross@mcs.anl.gov
More informationMapReduce and Hadoop Distributed File System V I J A Y R A O
MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationData Semantics Aware Cloud for High Performance Analytics
Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement
More informationDennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research
Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research 2 Cloud Concepts Data Center Architecture The cloud flavors: IaaS, PaaS, SaaS Our world of client devices plus the cloud
More information- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationCloud-based Analytics and Map Reduce
1 Cloud-based Analytics and Map Reduce Datasets Many technologies converging around Big Data theme Cloud Computing, NoSQL, Graph Analytics Biology is becoming increasingly data intensive Sequencing, imaging,
More informationUAB Cyber Security Ini1a1ve
UAB Cyber Security Ini1a1ve Purpose of the Cyber Security Ini1a1ve? To provide a secure Compu1ng Environment Individual Mechanisms Single Source for Inventory and Asset Management Current Repor1ng Environment
More informationAmazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
More informationHow To Understand Cloud Compueng
Data Management in the Cloud Introduc)on (Lecture 1) Do one thing every day that scares you. Eleanor Roosevelt 1 Data Management in the Cloud LOGISTICS AND ORGANIZATION 2 Kris)n TuCe FAB 115-09 Personnel
More informationLicensing++ for Clouds. Mark Perry
Licensing++ for Clouds Mark Perry Plan* 1. Cloud? 2. Survey 3. Some ques@ons 4. Some ideas 5. Some sugges@ons (that would be you) * Plan 9 future events such as these will affect you in the future Clouds
More informationScientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver
More informationBulk Synchronous Programmers and Design
Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications ROSS 2011 Tucson, AZ Terry Jones Oak Ridge National Laboratory 1 Managed by UT-Battelle Outline Motivation Approach & Research Design
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationComputing in clouds: Where we come from, Where we are, What we can, Where we go
Computing in clouds: Where we come from, Where we are, What we can, Where we go Luc Bougé ENS Cachan/Rennes, IRISA, INRIA Biogenouest With help from many colleagues: Gabriel Antoniu, Guillaume Pierre,
More informationPerformance Evaluation of Amazon EC2 for NASA HPC Applications!
National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!
More informationEvaluating MapReduce and Hadoop for Science
Evaluating MapReduce and Hadoop for Science Lavanya Ramakrishnan LRamakrishnan@lbl.gov Lawrence Berkeley National Lab Computation and Data are critical parts of the scientific process Three Pillars of
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationHadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
More informationAnalysis and Optimization of Massive Data Processing on High Performance Computing Architecture
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National
More informationRecognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationViswanath Nandigam Sriram Krishnan Chaitan Baru
Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance
More informationA Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
More informationPerformance Across the Generations: Processor and Interconnect Technologies
WHITE Paper Performance Across the Generations: Processor and Interconnect Technologies HPC Performance Results ANSYS CFD 12 Executive Summary Today s engineering, research, and development applications
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationSo#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and
More informationBENCHMARKING V ISUALIZATION TOOL
Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien
More informationEfficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
More information