A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
|
|
- Marvin Lyons
- 8 years ago
- Views:
Transcription
1 A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO thomas_babu@vssc.gov.in ASHOK.V ADCG, AERO, VSSC, ISRO v_ashok@vssc.gov.in ABSTRACT In this paper, we describe a GPU based supercomputer, SAGA (Supercomputer for Aerospace with GPU Architecture), developed in VSSC and the challenges involved in developing a Computational Fluid Dynamics code, PARAS-3D, which runs on SAGA. This GPU facility together with PARAS-3D helped to solve CFD problems in a very cost effective manner with considerable reduction in solution time. SAGA supercomputer is extensively used for the aerodynamic design and analysis of Launch Vehicles. Categories and Subject Descriptors [GPU Facility]: nvidia GPUs Tesla C2070, M2090, [CFD Application ]: PARAS-3D, CUDA, MPI, SIMD Architecture General Terms SAGA Supercomputer NVIDIA GPUs & GPU Architecture Linux Operating System Resource manager and Job Scheduler PARAS-3D CFD Code CUDA Programming Keywords SAGA, GPU, SIMD, CFD, PARAS-3D, GPGPU, CUDA 1. INTRODUCTION A GPU based supercomputer, SAGA (Supercomputer for Aerospace with GPU Architecture) is developed in VSSC using Intel Xeon processors and nvidia GPUs. This supercomputer has a theoretical peak performance of 448 TFLOPS (DP). GPU version of Linpack Benchmark code is used to evaluate sustained performance and full benchmarking of the facility is in progress. We will be submitting the Benchmark results to include SAGA in top500.org during the update in June A photograph of SAGA supercomputer is shown in Figure 1. A short introduction of SAGA can be found in Reference.[1]. A GPU based Computational Fluid Dynamics code PARAS-3D is also developed in VSSC. PARAS-3D is a major application software running on SAGA, which is written for GPUs using CUDA programming model. This paper describes the development of SAGA supercomputer and the challenges involved in developing a GPU based application. The SAGA supercomputer together with PARAS-3D GPU application is extensively used for the aerodynamic design and analysis of Launch Vehicles. Section.2 describes the development of SAGA supercomputer and Section.3 describes the development of GPU application PARAS-3D. Section.3 provides some of the analysis carried out in SAGA followed by conclusion and references. 2. SAGA SUPERCOMPUTER
2 A GPU based supercomputer, SAGA is developed in VSSC using Intel processors and nvidia GPUs. The supercomputer consists of 736 numbers of Intel Xeon processors 736 nvidia Tesla GPUs. The individual nodes of the supercomputer are configured with two CPUs and two GPUs to have 1:1 ratio. The nodes are disk-less machines using compressed RAM file system. A pair of brain servers is used for network booting of nodes, job scheduling, power and resource management. Another five pairs of redundant servers are used to provide Network File System (NFS). The Linux operating system for SAGA is configured using open source components. Application software PARAS- 3D, job scheduler, resource and power manager are developed in VSSC. Cluster and infrastructure design, electrical and communication network and the design of precision airconditioning systems are also done in-house. SAGA supercomputer have a theoretical peak performance of 448 TFLOPS (DP). The following sub-sections give the details of each of these activities. 2.1 Operating System SAGA uses in-house configured Linux Operating System. The 64-bit Linux operating system (OS) for SAGA is configured using LFS (Linux from Scratch) with support for GPUs and InfiniBand. The servers and front-end systems use this OS with NFS support. A tiny 64-bit compute-node Linux OS is also developed for the nodes and are stored in the brain server. The nodes are diskless machines, which are network booted from the brain server with the tiny OS. Linux kernel will be recompiled and updated whenever stable kernels are available. 2.2 Resource manager and job scheduler An automated resource manager and job scheduler is developed for SAGA to efficiently manage the operation of entire supercomputer. The job schedules queues the jobs and executes when sufficient number nodes are available. The resource manager will monitor the status of UPS systems, room temperature, state of nodes etc and will also switch off the nodes when there is no job in the queue for execution and switch on the nodes as per demand or when new jobs arrive in the queue. In this way, SAGA minimizes electrical power consumption. 2.3 Graphics Processing Units (GPUs) SAGA uses two types of nvidia Tesla GPUs, namely, C2070 and M2090. C2070 is the first generation double precision GPUs, which have 448 cores and capable of delivering double precision (DP) floating point performance of 515 GFLOPS per GPU. Each M2090 GPU have 512 cores and have double precision (DP) floating point performance of 665 GFLOPS. There are 436 numbers of C2070 GPUs and 300 numbers of M2090 GPUs in SAGA, giving a total performance of 414 TeraFlops, in addition to the CPU power of about 34 TeraFlops. Features of C2070 and M2090 arte summarized in Table-I. Brand Name Table-1. Features of C2070 and M2090 GPUs nvidia Tesla C2070 (FERMI) No. of Cores Built-in Memory 6 GB 6 GB Double Precision Floating point Performance Single Precision Floating point Performance Power Consumption nvidia Tesla M GigaFLOPS 665 GigaFLOPS 1030 GigaFLOPS 1331 GigaFLOPS 190 W(Typical) 225 W (Max.) More about nvidia GPUs can be found in [2]. 2.4 Network and Topology 190 W(Typical) 225 W (Max.) SAGA has three types of network interconnect, namely, InfiniBand, Gigabit and IPMI network. 40 Gbps QDR InfiniBand network is used for inter-process communication between the nodes. It is configured for fully non-blocking mode using 44 switches, each having 36 numbers of QDR ports. 28 switches are connected in layer-1 and 16 switches are connected in layer-2. Gigabit Ethernet is used for network booting of nodes and user interface. A third IPMI network, which is a 10 Mbps network used for platform management such as resetting the nodes, interfacing with the hardware etc. SAGA Network layout is shown in Figure Storage and Brain Servers SAGA has a pair of brain servers used for network booting of the nodes and system management. The servers are configured using DRBD and heartbeat for fail-safe operation. Queuing and scheduling of jobs, node, power and resource management are done by these servers. The brain server also monitor the status of UPS, Room temperature etc and switch-off the facility in the event of a long power failure or air-conditioning system failure. SAGA also has 5 pair of NFS servers, which are also configured using DRBD and heartbeat for fail-safe operation. The NFS servers provide storage file system for all users including system file storage. The primary servers will provide the file system under normal operation. In the event of failure of a primary server, the corresponding secondary server will change to primary and provide the service without any intervention. If any secondary server fails, primary will continue the service. The failed server can be rectified and connected back, which will resume its function automatically. 2.6 Linpack Benchmarking GPU version of HPL is used for benchmarking our machines. We could get 58 % performance on a machine having two numbers of C2070 GPUs and two numbers of quad-core Xeon processors. As described in later sections, the Linpack code should be tuned based on CUDA programming guidelines, to obtain maximum performance. We are working on this and we
3 expect that full benchmarking of SAGA can be completed by the middle of May We are trying to include SAGA in Top500.org [3] during the website update in May-June, APPLICATION SOFTWARE- PARAS-3D A Cartesian grid based Computational Fluid Dynamics code PARAS-3D is developed for SAGA. PARAS-3D code was written for GPUs using CUDA programming model provided by nvidia. PARAS-3D has about 2.5 lakhs lines of C-code and is one of the most complex applications running on GPUs. The code is extensively used by ISRO and other aerospace organizations in the country. The advantages of PARAS-3D includes fully automatic grid generation, ability to handle complex geometries, interface for CAD geometries, adaptive grid refinement etc. The opening window of PARAS-3D is shown in figure-3. More about PARAS can be found in [4]. GPU Version of PARAS-3D was developed from its parallelmultithreaded MPI Version. Parallelisaion of Navier-Stokes code on cluster of machines[5] started in VSSC in 1990 s using PC clusters. Subsequently we moved to a cluster of DEC-Alpha machines and then to AMD clusters. The present GPU facility is based on Xeon processors and nvidia GPUs with PARAS ported to CPU-GPU hybrid computing environment. 3.1 CUDA Development Tools [6] nvidia provides a programming environment known as CUDA, which is specialized for their GPUs. OpenCL could also be used, but we prefer CUDA since our GPUs are manufactured by nvidia. CUDA provides ability to use high-level languages such as C to develop application that can take advantage of high level of performance and scalability that GPUs architecture offer. PARAS-3D is written based on the CUDA programming tools, which are available at nvidia website [6]. A number of CUDA programming guides such as CUDA Getting Started Linux, NVIDIA CUDA C Programming Guide[7], CUDA C Best Practices Guide, CUDA for Developers [8] are also available at nvidia website. CUDA manuals and binaries can also be downloaded from nvidia website [9]. 3.2 Challenges in developing GPU application The challenges involved in developing a good GPU application are discussed in this section. GPUs inherit their architecture from traditional graphics processors, which are SIMD (Single Instruction Multiple Data) processors employing data parallelism. Accordingly, to extract good performance from GPUs, the algorithm must be designed in a data parallel fashion. The underlying application must be re-written to exploit this data parallel behavior of GPUs and assigning serial portion of the code to CPUs. Memory management aspects were found to be very important for getting good performance from GPUs. The copy process between the memories of CPU and GPU are to be optimized for better performance. It is to be noted that GPUs have only a limited set of registers and cache. The application should be able to utilize this limited set of registers and caches. Moreover, the number of local variables are also to be optimized to keep them within the cache, to the extend possible. The programmer should follow the CUDA guidelines to obtain maximum performance from GPUs. Threads should only be run in groups of 32 and up for best performance. Each processing unit on GPU contains local memory that improves data manipulation and reduces fetch time. PARAS- 3D software with its adaptive Cartesian grids and with Oct-tree data structure is not inherently data parallel. With suitable algorithms, it was made more data parallel by arranging cells into different groups having varying levels of dataparallelism. In this process the most complex set of computations are assigned to CPUs. The update operations between CPU and GPU was programmed as a single update to minimize the copy process between the memories of CPU and GPU. Other aspects of GPU programming include the removal of recursion and function calls from the code, as, CUDA does not support these features. At the start of application execution, CUDA's compiled code runs like any other application. Its primary execution is happening in CPU. When kernel call is made, application continue execution of non-kernel function on CPU. At the same time, kernel function does its execution on GPU. In this manner we obtain parallel processing between CPU and GPU. 3.3 GPU version of PARAS-3D PARAS-3D code was written based on CUDA programming tools and the guidelines given in Section 3.2. A single code is used for both CPUs and GPUs, which is capable of identifying the CPUs and GPUs in a machine and the number of cores in each CPU and GPU. Users need not be aware of the number of CPUs and GPUs in the machines where they run PARAS. PARAS identifies the CPU and GPU cores in the machine and configures automatically. Presently users have the freedom of choosing the number of machines to run their application based on the number of grids and previous run history. 3.4 Performance improvements With the above modifications, PARAS gives very good performance in GPU systems. The software uses the three technologies for high performance computing, namely, distributed computing, shared memory computing and GPU accelerators. The speed-up obtained for CFD problems for a single GPU node consisting of 2 Quad Core Xeon processors and 2 GPUs is 4.5 to 7 times compared to a single CPU node having 2 Quad Core Xeon processors. The speed up depends on the complexity of the geometry, level of grid adaptation and size of the problem under consideration. Figure.4 shows the speedup obtained for PARAS when 46, 36 and 63 million cells are used. With suitable tuning of the parameters, we could achieve upto 90 % efficiency for PARAS when 40 nodes are used. In general, about 1 million cells per GPU gives very good performance ( > 90 %) upto 40 nodes. 3.5 Potential Users PARAS-3D is extensively used by Scientists and Engineers of ISRO and other government organizations DRDL, ADA and ADE. We have also distributed PARAS to some of the research institutions in the country such as IISc and IIST. Presently PARAS license is limited to government organizations. 3.6 Real-World Applications PARAS-3D is extensively used for the aerodynamic design and analysis of launch vehicles in ISRO and aircraft design in ADA. PARAS has its own pre-processor to generate geometry and grids of a problem under consideration. It also has capability of importing geometry generated by other CAD software s. A typical CFD problem for a launch vehicle with 93 million grids is shown in Figure.5. PARAS is capable of using fine grids near to
4 the body under consideration and coarse grids away from the body, as shown in the figure. Figure. 6 shows the pressure distribution obtained using PARAS, after 50,000 iterations. Some of the published CFD Simulations carried out using PARAS-3D can be found in [10,11, 12]. The list of publications in national journals/seminar/workshops is not included in this paper. Figure -2 SAGA Network Layout 4. CONCLUSION In this paper we are providing the details of a GPU based supercomputer, SAGA developed by VSSC. The cluster and infrastructure design was carried out in VSSC. Operating system, job scheduler, automated resource and power manager are also developed in-house. A GPU based application PARAS- 3D was also developed in VSSC. The challenges involved in developing a GPU based application is discussed in this paper. With suitable algorithms and tuning of parameters, PARAS gave upto 90 % efficiency for 40 nodes. In general, about 1 million cells per GPU gives very good performance ( > 90 %) upto 40 nodes 5. ACKNOWLEDGMENTS We would like to express our sincere thanks to Chairman, ISRO for providing necessary approval and funds for building a GPU based supercomputing facility in VSSC. We also express our sincere thanks to Director, VSSC for the technical and logistic support for building the facility in VSSC. We would like to thank DD,AERO, who took the initiative for establishing a GPU facility in VSSC and provided necessary support for building the facility. We extend our thanks to construction and maintenance wing of VSSC, which gave support for designing and establishing the facility. We would like to thank all our engineers, who helped in developing the facility and PARAS-3D CFD code. We have used good amount of open source software components to build SAGA supercomputer and GPU version of PARAS-3D. The developers of open source community is greatly acknowledged. Finally we would like to thank our valuable users, without them the need of the facility and PARAS code would not arise at all. Figure -3 Opening Window of PARAS-3D Figure -4 Speed up obtained for PARAS Figure -1 SAGA Supercomputer
5 Figure -6 Pressure Contours for a Typical Problem solved using PARAS-3D Figure -5 Geometry and Grids for a Typical Problem 6. REFERENCES [1] The SAGA Supercomputer, Technology Review India, Vol.3, No. 6, June [2] Tesla Product Literature, [3] org. [4] PARAS-3D Users Manual Ver.4.1.0, VSSC/ARD/GN/01/2011, December [5] Parallelisation of Navier-Stokes code on a cluster of workstations, Ashok.V and Thomas C Babu, Lecture notes in Computer Science-1745, Spinger Verlag edition pp , [6] CUDA toolkit [7] NVIDIA CUDA C Programming Guide, Ver. 4.1, Nov 2011 [8] CUDA for Developers, nvidia [9] Download CUDA manual and binaries. [10] Effect of connected pipe test conditions on scramjet engine modules for flight testing, Gnanasekhar.S, Dipankar Das, Ashok.v and Lazar T Chitilappilly, Int. J. of Aerospace Innovations, Vol.1, No.4, Dec [11] CFD Simulation of Flow field over a large protuberance on a flat plate at high supersonic mach number, K.Manokaran, G.Vidya and V.K.Goyal, AIAA [12] Numerical Simulation of single and twin jet impingement on a typical jet deflector, Navin Kumar Kessop, Dipankar Das, K. J. Devasia *, P. Jeyajothiraj, 11 th Asian Symposium on Visualization, Nigata, Japan, 2011.
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationOptimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
More informationCluster Computing at HRI
Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing
More informationRWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More information1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations
Bilag 1 2013-03-12/RB DeIC (DCSC) Scientific Computing Installations DeIC, previously DCSC, currently has a number of scientific computing installations, distributed at five regional operating centres.
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationA-CLASS The rack-level supercomputer platform with hot-water cooling
A-CLASS The rack-level supercomputer platform with hot-water cooling INTRODUCTORY PRESENTATION JUNE 2014 Rev 1 ENG COMPUTE PRODUCT SEGMENTATION 3 rd party board T-MINI P (PRODUCTION): Minicluster/WS systems
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationRetargeting PLAPACK to Clusters with Hardware Accelerators
Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.
More informationOverview of HPC systems and software available within
Overview of HPC systems and software available within Overview Available HPC Systems Ba Cy-Tera Available Visualization Facilities Software Environments HPC System at Bibliotheca Alexandrina SUN cluster
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationParallel Computing. Introduction
Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00
More informationIntroduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationGPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationResource Scheduling Best Practice in Hybrid Clusters
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Resource Scheduling Best Practice in Hybrid Clusters C. Cavazzoni a, A. Federico b, D. Galetti a, G. Morelli b, A. Pieretti
More informationPerformance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationParallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
More informationST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
More informationA Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
More informationRelease Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11
Release Notes for Open Grid Scheduler/Grid Engine Version: Grid Engine 2011.11 New Features Berkeley DB Spooling Directory Can Be Located on NFS The Berkeley DB spooling framework has been enhanced such
More informationThematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science
Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science Call for Expression of Interest (EOI) for the Supply, Installation
More informationAMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1
AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL AMD Embedded Solutions 1 Optimizing Parallel Processing Performance and Coding Efficiency with AMD APUs and Texas Multicore Technologies SequenceL Auto-parallelizing
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationBuilding Clusters for Gromacs and other HPC applications
Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationExperiences on using GPU accelerators for data analysis in ROOT/RooFit
Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationPedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationTSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale
TSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale Toshio Endo,Akira Nukada, Satoshi Matsuoka GSIC, Tokyo Institute of Technology ( 東 京 工 業 大 学 ) Performance/Watt is the Issue
More informationCloud Computing. Alex Crawford Ben Johnstone
Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a
More informationIntroduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it
t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
More informationThe High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationHPC-related R&D in 863 Program
HPC-related R&D in 863 Program Depei Qian Sino-German Joint Software Institute (JSI) Beihang University Aug. 27, 2010 Outline The 863 key project on HPC and Grid Status and Next 5 years 863 efforts on
More informationFLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
More informationREPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationEnhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
More informationPurchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers
Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationGTC Presentation March 19, 2013. Copyright 2012 Penguin Computing, Inc. All rights reserved
GTC Presentation March 19, 2013 Copyright 2012 Penguin Computing, Inc. All rights reserved Session S3552 Room 113 S3552 - Using Tesla GPUs, Reality Server and Penguin Computing's Cloud for Visualizing
More information~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
More informationOpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
More informationDepartment of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012
Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................
More informationLBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
More informationSimulation Platform Overview
Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace
More informationToward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster
Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Ryousei Takano Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology
More informationPARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
More informationThe Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems
202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationPerformance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2
Fraunhofer Institute for Algorithms and Scientific Computing SCAI Performance Comparison of ISV Simulation Codes on Microsoft HPC Server 28 and SUSE Enterprise Server 1.2 Karsten Reineck und Horst Schwichtenberg
More informationMississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC
Mississippi State University High Performance Computing Collaboratory Brief Overview Trey Breckenridge Director, HPC Mississippi State University Public university (Land Grant) founded in 1878 Traditional
More informationDell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering
More informationA Flexible Cluster Infrastructure for Systems Research and Software Development
Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure
More informationANALYSIS OF SUPERCOMPUTER DESIGN
ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis
More informationHPC Software Requirements to Support an HPC Cluster Supercomputer
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
More informationCluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
More informationOn-Demand Supercomputing Multiplies the Possibilities
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server
More informationScaling from Workstation to Cluster for Compute-Intensive Applications
Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference
More informationTrends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
More informationThe Asterope compute cluster
The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU
More informationACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
More informationultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
More informationProgram Grid and HPC5+ workshop
Program Grid and HPC5+ workshop 24-30, Bahman 1391 Tuesday Wednesday 9.00-9.45 9.45-10.30 Break 11.00-11.45 11.45-12.30 Lunch 14.00-17.00 Workshop Rouhani Karimi MosalmanTabar Karimi G+MMT+K Opening IPM_Grid
More informationUnleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
More informationEvoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q
More informationPCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
More informationPyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core
More informationSun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
More informationABAQUS High Performance Computing Environment at Nokia
ABAQUS High Performance Computing Environment at Nokia Juha M. Korpela Nokia Corporation Abstract: The new commodity high performance computing (HPC) hardware together with the recent ABAQUS performance
More informationEnabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
More informationINDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering
INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering Enquiry No: Enq/IITK/ME/JB/02 Enquiry Date: 14/12/15 Last Date of Submission: 21/12/15 Formal quotations are invited for HPC cluster.
More informationHow To Compare Amazon Ec2 To A Supercomputer For Scientific Applications
Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance
More informationClusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
More informationGPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer
GU Computing 1 2 3 The GU Advantage To ExaScale and Beyond The GU is the Computer The GU Advantage The GU Advantage A Tale of Two Machines Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World s
More informationInterconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality
More information