benchmarking Amazon EC2 for high-performance scientific computing
|
|
|
- Eleanor Quinn
- 9 years ago
- Views:
Transcription
1 Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received his PhD from the University of York (UK) in 1994, and his research interests include designing fault-tolerant distributed systems, HPC programming languages, and user-centric operating/run-time systems. [email protected] Benchmark results can be downloaded from login/ /benchmark_results.tgz. How effective are commercial cloud computers for high-performance scientific computing compared to currently available alternatives? I aim to answer a specific instance of this question by examining the performance of Amazon EC2 for high-performance scientific applications. I used macro and micro benchmarks to study the performance of a cluster composed of EC2 high-cpu compute nodes and compared this against the performance of a cluster composed of equivalent processors available to the open scientific research community. My results show a significant performance gap in the examined clusters that system builders, computational scientists, and commercial cloud computing vendors need to be aware of. The computer industry is at the cusp of an important breakthrough in high-performance computing (HPC) services. Commercial vendors such as IBM, Google, Sun, and Amazon have discovered the monetizing potential of leasing compute time on nodes managed by their global datacenters to customers on the Internet. In particular, since August 2006, Amazon has allowed anyone with a credit card to lease CPUs with their Elastic Compute Cloud (EC2) service. Amazon provides the user with a suite of Web-services tools to request, monitor, and manage any number of virtual machine instances running on physical compute nodes in their datacenters. The leased virtual machine instances provide to the user a highly customizable Linux operating system environment, allowing applications such as Web hosting, distributed data analysis, and scientific simulations to be run. Recently, some large physics experiments such as STAR [1] have also experimented with building virtual-machine-based clusters using Amazon EC2 for scientific computation. However, there is a significant absence of quantitative studies on the suitability of these cloud computers for HPC applications. It is important to note what this article is not about. This is not an article on the benefits of virtualization or a measurement of its overhead, as this is extensively covered elsewhere [2]. This is also not an article evaluating the counterpart online storage service Amazon S3, although a quantitative study of this is also critical. Finally, this is 18 ;LOGIN: VOL. 33, NO. 5
2 not an article examining the cost benefits of using cloud computing in IT organizations, as this is amplified elsewhere by its more eloquent advocates [3]. Instead, this article describes my results in using macro and micro benchmarks to examine the delta between clusters composed of currently available state-of-the-art CPUs from Amazon EC2 versus clusters available to the HPC scientific community circa My results were obtained by using the NAS Parallel Benchmarks to measure the performance of these clusters for frequently occurring scientific calculations. Also, since the Message-Passing Interface (MPI) library is an important programming tool used widely in scientific computing, my results demonstrate the MPI performance in these clusters by using the mpptest micro benchmark. The article provides a measurement-based yardstick to complement the often hand-waving nature of expositions concerning cloud computing. As such, I hope it will be of value to system builders and computational scientists across a broad range of disciplines to guide their computational choices, as well as to commercial cloud computing vendors to guide future upgrade opportunities. Hardware Specifications In our performance evaluation, we compare the performance of a cluster composed of EC2 compute nodes against an HPC cluster at the National Center for Supercomputing Applications (NCSA) called Abe. For this benchmark study we use the high-cpu extra large instances provided by the EC2 service. A comparison of the hardware specifications of the high-cpu extra large instances and the NCSA cluster used in this study is shown in Table 1. We verified from information in /proc/cpuinfo in the Linux kernel on both clusters that the same processor chip sets were used in our comparison study: dual-socket, quad-core 2.33-GHz Intel Xeon processors. Compute Node Network Interconnect EC2 High-CPU Cluster 7 GB memory, 4 CPU cores per processor (2.33-GHz Xeon), 8 CPU per node, 64 bits, 1690 GB storage High I/O performance (specific interconnect technology unknown) NCSA Cluster 8 GB memory, 4 CPU cores per processor (2.33-GHz Xeon), 8 CPU per node, 64 bits, 73 GB storage Infiniband switch T a b l e 1. H a r d w a r e S p e c i f i c a t i o n s o f E C 2 H i g h - C P U I n s t a n c e s and NCSA Abe Cluster. NAS Parallel Benchmark The NAS Parallel Benchmarks (NPB) [4] comprise a widely used set of programs designed to evaluate the performance of HPC systems. The core benchmark consists of eight programs: five parallel kernels and three simulated applications. In aggregate, the benchmark suite mimics the critical computation and data movement involved in computational fluid dynamics and other typical scientific computation. A summary of the characteristics of the programs for the Class B version of NPB used in this study is shown in Table 2. ;LOGIN: October 2008 benchmarking Amazon EC2 19
3 The benchmark suite comes in a variety of versions, each using different parallelizing technologies: OpenMP, MPI, HPF, and Java. In this study we use the OpenMP [5] version to measure the performance of the eight-cpu single compute node. We also use the MPI [7] version to characterize the distributed-memory performance of our clusters. Program Description Size Memory (Mw) EP MG CG FT IS LU SP BT Embarrassingly parallel Monte Carlo kernel to compute the solution of an integral. Multigrid kernel to compute the solution of the 3-D Poisson equation. Kernel to compute the smallest eigenvalue of a symmetric positive definite matrix. Kernel to solve a 3-D partial differential equation using an FFT-based method. Parallel sort kernel based on bucket sort. Computational fluid dynamics application using symmetric successive over-relaxation (SSOR). Computational fluid dynamics application using the Beam-Warming approximate factorization method. Computational fluid dynamics application using an implicit solution method Table 2. NPB Class B Program Characteristics npb-omp version We ran the OpenMP version of NPB (NPB3.3-OMP) Class B on a high-cpu extra large instance and on a compute node on the NCSA cluster. Each compute node provides eight CPU cores (from the dual sockets), so we allowed the benchmark to schedule up to eight parallel threads for each benchmark program. On the NCSA cluster and EC2, we compiled the benchmarks using the Intel compiler with the option flags -openmp -O3. Figure 1 shows the runtimes of each of the programs in the benchmark. In general we see a performance degradation of approximately 7% 21% for the programs running on the EC2 nodes compared to running them on the NCSA cluster compute node. This percentage degradation is shown in the overlaid line-chart in Figure 1. This is a surprising result; we expected the performance of the compute nodes to be equivalent. 20 ;LOGIN: VOL. 33, NO. 5
4 Figure 1. NPB-OMP (Class B) Runtimes on 8 CPUs on EC2 and N C S A C l u s t e r C o m p u t e N o d e s. O v e r l a i d i s t h e p e r c e n t a g e performance degradation in the EC2 runs. npb-mpi version We ran the MPI version of NPB (NPB3.3-MPI) Class B on multiple compute nodes on the EC2 provisioned cluster and on the NCSA cluster. For the EC2 provisioned cluster, we requested 4 high-cpu extra large instances, of 8 CPUs each, for each run. On both the EC2 and NCSA cluster compute nodes, the benchmarks were compiled with the Intel compiler with option flag -O3. For the EC2 MPI runs we used the MPICH2 MPI library (1.0.7), and for the NCSA MPI runs we used the MVAPICH2 MPI library (0.9.8p2). All the programs were run with 32 CPUs, except BT and SP, which were run with 16 CPUs. Figure 2 shows the run times of the benchmark programs. From the results, we see approximately 40% 1000% performance degradation in the EC2 runs compared to the NCSA runs. Greater then 200% performance degradation is seen in the programs CG, FT, IS, IU, and MG. Surprisingly, even EP (embarrassingly parallel), where no message-passing communication is performed during the computation and only a global reduction is performed at the end, exhibits approximately 50% performance degradation in the EC2 run. Figure 2. NPB-MPI (Class B) Runtimes on 32 CPUs on the NCSA a n d E C 2 C l u s t e r. B T a n d S P w e r e r u n w i t h 1 6 C P U s o n ly. Overlaid is the percentage degradation in the EC2 runs. ;LOGIN: October 2008 benchmarking Amazon EC2 21
5 mpi performance benchmarks We hypothesize that the Infiniband switch fabric in the NCSA cluster is enabling much higher performance for NPB-MPI. However, we want to quantitatively understand the message-passing performance difference between using a scientific cluster with a high-performance networking fabric and a cluster simply composed of Amazon EC2 compute nodes. The following results use the mpptest benchmark [5] to characterize the message-passing performance in the two clusters. The representative results shown in this article are from the bisection test. In the bisection test, the complete system is divided into two subsystems, and the aggregate latency and bandwidth are measured for different message sizes sent between the two subsystems. In the cases shown, we conducted the bisection test using 32-CPU MPI jobs. Figures 3 and 4 show the bisection bandwidth and latency, respectively, for MPI message sizes from 0 to 1024 bytes. It is clearly seen that message-passing latencies and bandwidth are an order of magnitude inferior between EC2 compute nodes compared to between compute nodes on the NCSA cluster. Consequently, substantial improvements can be provided to the HPC scientific community if a high-performance network provisioning solution can be devised for this problem. F i g u r e 3. M P I B a n d w i d t h P e r f o r m a n c e i n t h e m p p t e s t Benchmark on the NCSA and EC2 Clusters Figure 4. MPI Latency Performance in the mpptest Benchmark on the NCSA and EC2 Clusters 22 ;LOGIN: VOL. 33, NO. 5
6 Conclusion The opportunity of using commercial cloud computing services for HPC is compelling. It unburdens the large majority of computational scientists from maintaining permanent cluster fixtures, and it encourages free openmarket competition, allowing researchers to pick the best service based on the price they are willing to pay. However, the delivery of HPC performance with commercial cloud computing services such as Amazon EC2 is not yet mature. This article has shown that a performance gap exists between performing HPC computations on a traditional scientific cluster and on an EC2 provisioned scientific cluster. This performance gap is seen not only in the MPI performance of distributed-memory parallel programs but also in the single compute node OpenMP performance for shared-memory parallel programs. For cloud computing to be a viable alternative for the computational science community, vendors will need to upgrade their service offerings, especially in the area of high-performance network provisioning, to cater to this unique class of users. references [1] The STAR experiment: [2] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, Xen and the Art of Virtualization, ACM Symposium on Operating System Principles, [3] Simson Garfinkel, Commodity Grid Computing with Amazon s S3 and EC2, ;login:, Feb [4] NAS Parallel Benchmarks: npb.html. [5] OpenMP specification: [6] Message-Passing Interface (MPI) specification: [7] mpptest Measuring MPI Performance: mpi/mpptest/. ;LOGIN: October 2008 benchmarking Amazon EC2 23
Kashif Iqbal - PhD [email protected]
HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD [email protected] ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Performance Evaluation of Amazon EC2 for NASA HPC Applications!
National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!
PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS
PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS Philip J. Sokolowski Department of Electrical and Computer Engineering Wayne State University 55 Anthony Wayne Dr. Detroit, MI 822
Performance of the NAS Parallel Benchmarks on Grid Enabled Clusters
Performance of the NAS Parallel Benchmarks on Grid Enabled Clusters Philip J. Sokolowski Dept. of Electrical and Computer Engineering Wayne State University 55 Anthony Wayne Dr., Detroit, MI 4822 [email protected]
Running Scientific Codes on Amazon EC2: a Performance Analysis of Five High-end Instances
JCS&T Vol. 13 No. 3 December 213 Running Scientific Codes on Amazon EC2: a Performance Analysis of Five High-end Instances Roberto R. Expósito, Guillermo L. Taboada, Xoán C. Pardo, Juan Touriño and Ramón
Performance and Scalability of the NAS Parallel Benchmarks in Java
Performance and Scalability of the NAS Parallel Benchmarks in Java Michael A. Frumkin, Matthew Schultz, Haoqiang Jin, and Jerry Yan NASA Advanced Supercomputing (NAS) Division NASA Ames Research Center,
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster
Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Ryousei Takano Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
SR-IOV: Performance Benefits for Virtualized Interconnects!
SR-IOV: Performance Benefits for Virtualized Interconnects! Glenn K. Lockwood! Mahidhar Tatineni! Rick Wagner!! July 15, XSEDE14, Atlanta! Background! High Performance Computing (HPC) reaching beyond traditional
Cloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
RDMA over Ethernet - A Preliminary Study
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement
Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster
Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni ([email protected]) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data
1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications
Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Building a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
SR-IOV In High Performance Computing
SR-IOV In High Performance Computing Hoot Thompson & Dan Duffy NASA Goddard Space Flight Center Greenbelt, MD 20771 [email protected] [email protected] www.nccs.nasa.gov Focus on the research side
Cloud Computing. Alex Crawford Ben Johnstone
Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a
Network Infrastructure Services CS848 Project
Quality of Service Guarantees for Cloud Services CS848 Project presentation by Alexey Karyakin David R. Cheriton School of Computer Science University of Waterloo March 2010 Outline 1. Performance of cloud
Building a Private Cloud with Eucalyptus
Building a Private Cloud with Eucalyptus 5th IEEE International Conference on e-science Oxford December 9th 2009 Christian Baun, Marcel Kunze KIT The cooperation of Forschungszentrum Karlsruhe GmbH und
Assessing the Performance of OpenMP Programs on the Intel Xeon Phi
Assessing the Performance of OpenMP Programs on the Intel Xeon Phi Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven, and Matthias S. Müller [email protected] Rechen- und Kommunikationszentrum
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
Using the Windows Cluster
Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1
Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. [email protected] http://www.dell.com/clustering
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
HPC performance applications on Virtual Clusters
Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK [email protected] 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
Amazon EC2 XenApp Scalability Analysis
WHITE PAPER Citrix XenApp Amazon EC2 XenApp Scalability Analysis www.citrix.com Table of Contents Introduction...3 Results Summary...3 Detailed Results...4 Methods of Determining Results...4 Amazon EC2
PRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud
StACC: St Andrews Cloud Computing Co laboratory A Performance Comparison of Clouds Amazon EC2 and Ubuntu Enterprise Cloud Jonathan S Ward StACC (pronounced like 'stack') is a research collaboration launched
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams
Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three
Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam
Computer Technology and Application 4 (2013) 532-537 D DAVID PUBLISHING Performance of the Cloud-Based Commodity Cluster Van-Hau Pham, Duc-Cuong Nguyen and Tien-Dung Nguyen School of Computer Science and
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft
Infrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications
EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications Jiang Dejun 1,2 Guillaume Pierre 1 Chi-Hung Chi 2 1 VU University Amsterdam 2 Tsinghua University Beijing Abstract. Cloud
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
Comparing the performance of the Landmark Nexus reservoir simulator on HP servers
WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus
Cloud-WIEN2k. A Scientific Cloud Computing Platform for Condensed Matter Physics
Penn State, August 2013 Cloud-WIEN2k A Scientific Cloud Computing Platform for Condensed Matter Physics K. Jorissen University of Washington, Seattle, U.S.A. Supported by NSF grant OCI-1048052 www.feffproject.org
NAVAL POSTGRADUATE SCHOOL THESIS
NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS FEASIBILITY OF VIRTUAL MACHINE AND CLOUD COMPUTING TECHNOLOGIES FOR HIGH PERFORMANCE COMPUTING by Richard Chad Hutchins December 2013 Thesis Co-Advisors:
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters Martin J. Chorley a, David W. Walker a a School of Computer Science and Informatics, Cardiff University, Cardiff, UK Abstract
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Cloud computing for fire engineering. Chris Salter Hoare Lea, London, United Kingdom, [email protected]
Cloud computing for fire engineering Chris Salter Hoare Lea, London, United Kingdom, [email protected] Abstract: Fire Dynamics Simulator (FDS) is a computing tool used by various research and industrial
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers
Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers Todd Deshane, Demetrios Dimatos, Gary Hamilton, Madhujith Hapuarachchi, Wenjin Hu, Michael McCabe, Jeanna
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov
Modeling Virtual Machine Performance: Challenges and Approaches
Modeling Virtual Machine Performance: Challenges and Approaches Omesh Tickoo Ravi Iyer Ramesh Illikkal Don Newell Intel Corporation Intel Corporation Intel Corporation Intel Corporation [email protected]
An Enhanced CPU Scheduler for XEN Hypervisor to Improve Performance in Virtualized Environment
An Enhanced CPU Scheduler for XEN Hypervisor to Improve Performance in Virtualized Environment Chia-Ying Tseng 1 and Yi-Ling Chung Department of Computer Science and Engineering, Tatung University #40,
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
Key words: cloud computing, cluster computing, virtualization, hypervisor, performance evaluation
Hypervisors Performance Evaluation with Help of HPC Challenge Benchmarks Reza Bakhshayeshi; [email protected] Mohammad Kazem Akbari; [email protected] Morteza Sargolzaei Javan; [email protected]
Cloud Computing on Amazon's EC2
Technical Report Number CSSE10-04 1. Introduction to Amazon s EC2 Brandon K Maharrey [email protected] COMP 6330 Parallel and Distributed Computing Spring 2009 Final Project Technical Report Cloud Computing
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
Accelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Cloud Computing and Amazon Web Services
Cloud Computing and Amazon Web Services Gary A. McGilvary edinburgh data.intensive research 1 OUTLINE 1. An Overview of Cloud Computing 2. Amazon Web Services 3. Amazon EC2 Tutorial 4. Conclusions 2 CLOUD
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
Simulation Platform Overview
Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace
Performance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
wu.cloud: Insights Gained from Operating a Private Cloud System
wu.cloud: Insights Gained from Operating a Private Cloud System Stefan Theußl, Institute for Statistics and Mathematics WU Wirtschaftsuniversität Wien March 23, 2011 1 / 14 Introduction In statistics we
Dynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
Scientific Computing Data Management Visions
Scientific Computing Data Management Visions ELI-Tango Workshop Szeged, 24-25 February 2015 Péter Szász Group Leader Scientific Computing Group ELI-ALPS Scientific Computing Group Responsibilities Data
The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems
202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
