A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS"

Transcription

1 A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO THOMAS.C.BABU APCF, AERO, VSSC, ISRO ASHOK.V ADCG, AERO, VSSC, ISRO ABSTRACT In this paper, we describe a GPU based supercomputer, SAGA (Supercomputer for Aerospace with GPU Architecture), developed in VSSC and the challenges involved in developing a Computational Fluid Dynamics code, PARAS-3D, which runs on SAGA. This GPU facility together with PARAS-3D helped to solve CFD problems in a very cost effective manner with considerable reduction in solution time. SAGA supercomputer is extensively used for the aerodynamic design and analysis of Launch Vehicles. Categories and Subject Descriptors [GPU Facility]: nvidia GPUs Tesla C2070, M2090, [CFD Application ]: PARAS-3D, CUDA, MPI, SIMD Architecture General Terms SAGA Supercomputer NVIDIA GPUs & GPU Architecture Linux Operating System Resource manager and Job Scheduler PARAS-3D CFD Code CUDA Programming Keywords SAGA, GPU, SIMD, CFD, PARAS-3D, GPGPU, CUDA 1. INTRODUCTION A GPU based supercomputer, SAGA (Supercomputer for Aerospace with GPU Architecture) is developed in VSSC using Intel Xeon processors and nvidia GPUs. This supercomputer has a theoretical peak performance of 448 TFLOPS (DP). GPU version of Linpack Benchmark code is used to evaluate sustained performance and full benchmarking of the facility is in progress. We will be submitting the Benchmark results to include SAGA in top500.org during the update in June A photograph of SAGA supercomputer is shown in Figure 1. A short introduction of SAGA can be found in Reference.[1]. A GPU based Computational Fluid Dynamics code PARAS-3D is also developed in VSSC. PARAS-3D is a major application software running on SAGA, which is written for GPUs using CUDA programming model. This paper describes the development of SAGA supercomputer and the challenges involved in developing a GPU based application. The SAGA supercomputer together with PARAS-3D GPU application is extensively used for the aerodynamic design and analysis of Launch Vehicles. Section.2 describes the development of SAGA supercomputer and Section.3 describes the development of GPU application PARAS-3D. Section.3 provides some of the analysis carried out in SAGA followed by conclusion and references. 2. SAGA SUPERCOMPUTER

2 A GPU based supercomputer, SAGA is developed in VSSC using Intel processors and nvidia GPUs. The supercomputer consists of 736 numbers of Intel Xeon processors 736 nvidia Tesla GPUs. The individual nodes of the supercomputer are configured with two CPUs and two GPUs to have 1:1 ratio. The nodes are disk-less machines using compressed RAM file system. A pair of brain servers is used for network booting of nodes, job scheduling, power and resource management. Another five pairs of redundant servers are used to provide Network File System (NFS). The Linux operating system for SAGA is configured using open source components. Application software PARAS- 3D, job scheduler, resource and power manager are developed in VSSC. Cluster and infrastructure design, electrical and communication network and the design of precision airconditioning systems are also done in-house. SAGA supercomputer have a theoretical peak performance of 448 TFLOPS (DP). The following sub-sections give the details of each of these activities. 2.1 Operating System SAGA uses in-house configured Linux Operating System. The 64-bit Linux operating system (OS) for SAGA is configured using LFS (Linux from Scratch) with support for GPUs and InfiniBand. The servers and front-end systems use this OS with NFS support. A tiny 64-bit compute-node Linux OS is also developed for the nodes and are stored in the brain server. The nodes are diskless machines, which are network booted from the brain server with the tiny OS. Linux kernel will be recompiled and updated whenever stable kernels are available. 2.2 Resource manager and job scheduler An automated resource manager and job scheduler is developed for SAGA to efficiently manage the operation of entire supercomputer. The job schedules queues the jobs and executes when sufficient number nodes are available. The resource manager will monitor the status of UPS systems, room temperature, state of nodes etc and will also switch off the nodes when there is no job in the queue for execution and switch on the nodes as per demand or when new jobs arrive in the queue. In this way, SAGA minimizes electrical power consumption. 2.3 Graphics Processing Units (GPUs) SAGA uses two types of nvidia Tesla GPUs, namely, C2070 and M2090. C2070 is the first generation double precision GPUs, which have 448 cores and capable of delivering double precision (DP) floating point performance of 515 GFLOPS per GPU. Each M2090 GPU have 512 cores and have double precision (DP) floating point performance of 665 GFLOPS. There are 436 numbers of C2070 GPUs and 300 numbers of M2090 GPUs in SAGA, giving a total performance of 414 TeraFlops, in addition to the CPU power of about 34 TeraFlops. Features of C2070 and M2090 arte summarized in Table-I. Brand Name Table-1. Features of C2070 and M2090 GPUs nvidia Tesla C2070 (FERMI) No. of Cores Built-in Memory 6 GB 6 GB Double Precision Floating point Performance Single Precision Floating point Performance Power Consumption nvidia Tesla M GigaFLOPS 665 GigaFLOPS 1030 GigaFLOPS 1331 GigaFLOPS 190 W(Typical) 225 W (Max.) More about nvidia GPUs can be found in [2]. 2.4 Network and Topology 190 W(Typical) 225 W (Max.) SAGA has three types of network interconnect, namely, InfiniBand, Gigabit and IPMI network. 40 Gbps QDR InfiniBand network is used for inter-process communication between the nodes. It is configured for fully non-blocking mode using 44 switches, each having 36 numbers of QDR ports. 28 switches are connected in layer-1 and 16 switches are connected in layer-2. Gigabit Ethernet is used for network booting of nodes and user interface. A third IPMI network, which is a 10 Mbps network used for platform management such as resetting the nodes, interfacing with the hardware etc. SAGA Network layout is shown in Figure Storage and Brain Servers SAGA has a pair of brain servers used for network booting of the nodes and system management. The servers are configured using DRBD and heartbeat for fail-safe operation. Queuing and scheduling of jobs, node, power and resource management are done by these servers. The brain server also monitor the status of UPS, Room temperature etc and switch-off the facility in the event of a long power failure or air-conditioning system failure. SAGA also has 5 pair of NFS servers, which are also configured using DRBD and heartbeat for fail-safe operation. The NFS servers provide storage file system for all users including system file storage. The primary servers will provide the file system under normal operation. In the event of failure of a primary server, the corresponding secondary server will change to primary and provide the service without any intervention. If any secondary server fails, primary will continue the service. The failed server can be rectified and connected back, which will resume its function automatically. 2.6 Linpack Benchmarking GPU version of HPL is used for benchmarking our machines. We could get 58 % performance on a machine having two numbers of C2070 GPUs and two numbers of quad-core Xeon processors. As described in later sections, the Linpack code should be tuned based on CUDA programming guidelines, to obtain maximum performance. We are working on this and we

3 expect that full benchmarking of SAGA can be completed by the middle of May We are trying to include SAGA in Top500.org [3] during the website update in May-June, APPLICATION SOFTWARE- PARAS-3D A Cartesian grid based Computational Fluid Dynamics code PARAS-3D is developed for SAGA. PARAS-3D code was written for GPUs using CUDA programming model provided by nvidia. PARAS-3D has about 2.5 lakhs lines of C-code and is one of the most complex applications running on GPUs. The code is extensively used by ISRO and other aerospace organizations in the country. The advantages of PARAS-3D includes fully automatic grid generation, ability to handle complex geometries, interface for CAD geometries, adaptive grid refinement etc. The opening window of PARAS-3D is shown in figure-3. More about PARAS can be found in [4]. GPU Version of PARAS-3D was developed from its parallelmultithreaded MPI Version. Parallelisaion of Navier-Stokes code on cluster of machines[5] started in VSSC in 1990 s using PC clusters. Subsequently we moved to a cluster of DEC-Alpha machines and then to AMD clusters. The present GPU facility is based on Xeon processors and nvidia GPUs with PARAS ported to CPU-GPU hybrid computing environment. 3.1 CUDA Development Tools [6] nvidia provides a programming environment known as CUDA, which is specialized for their GPUs. OpenCL could also be used, but we prefer CUDA since our GPUs are manufactured by nvidia. CUDA provides ability to use high-level languages such as C to develop application that can take advantage of high level of performance and scalability that GPUs architecture offer. PARAS-3D is written based on the CUDA programming tools, which are available at nvidia website [6]. A number of CUDA programming guides such as CUDA Getting Started Linux, NVIDIA CUDA C Programming Guide[7], CUDA C Best Practices Guide, CUDA for Developers [8] are also available at nvidia website. CUDA manuals and binaries can also be downloaded from nvidia website [9]. 3.2 Challenges in developing GPU application The challenges involved in developing a good GPU application are discussed in this section. GPUs inherit their architecture from traditional graphics processors, which are SIMD (Single Instruction Multiple Data) processors employing data parallelism. Accordingly, to extract good performance from GPUs, the algorithm must be designed in a data parallel fashion. The underlying application must be re-written to exploit this data parallel behavior of GPUs and assigning serial portion of the code to CPUs. Memory management aspects were found to be very important for getting good performance from GPUs. The copy process between the memories of CPU and GPU are to be optimized for better performance. It is to be noted that GPUs have only a limited set of registers and cache. The application should be able to utilize this limited set of registers and caches. Moreover, the number of local variables are also to be optimized to keep them within the cache, to the extend possible. The programmer should follow the CUDA guidelines to obtain maximum performance from GPUs. Threads should only be run in groups of 32 and up for best performance. Each processing unit on GPU contains local memory that improves data manipulation and reduces fetch time. PARAS- 3D software with its adaptive Cartesian grids and with Oct-tree data structure is not inherently data parallel. With suitable algorithms, it was made more data parallel by arranging cells into different groups having varying levels of dataparallelism. In this process the most complex set of computations are assigned to CPUs. The update operations between CPU and GPU was programmed as a single update to minimize the copy process between the memories of CPU and GPU. Other aspects of GPU programming include the removal of recursion and function calls from the code, as, CUDA does not support these features. At the start of application execution, CUDA's compiled code runs like any other application. Its primary execution is happening in CPU. When kernel call is made, application continue execution of non-kernel function on CPU. At the same time, kernel function does its execution on GPU. In this manner we obtain parallel processing between CPU and GPU. 3.3 GPU version of PARAS-3D PARAS-3D code was written based on CUDA programming tools and the guidelines given in Section 3.2. A single code is used for both CPUs and GPUs, which is capable of identifying the CPUs and GPUs in a machine and the number of cores in each CPU and GPU. Users need not be aware of the number of CPUs and GPUs in the machines where they run PARAS. PARAS identifies the CPU and GPU cores in the machine and configures automatically. Presently users have the freedom of choosing the number of machines to run their application based on the number of grids and previous run history. 3.4 Performance improvements With the above modifications, PARAS gives very good performance in GPU systems. The software uses the three technologies for high performance computing, namely, distributed computing, shared memory computing and GPU accelerators. The speed-up obtained for CFD problems for a single GPU node consisting of 2 Quad Core Xeon processors and 2 GPUs is 4.5 to 7 times compared to a single CPU node having 2 Quad Core Xeon processors. The speed up depends on the complexity of the geometry, level of grid adaptation and size of the problem under consideration. Figure.4 shows the speedup obtained for PARAS when 46, 36 and 63 million cells are used. With suitable tuning of the parameters, we could achieve upto 90 % efficiency for PARAS when 40 nodes are used. In general, about 1 million cells per GPU gives very good performance ( > 90 %) upto 40 nodes. 3.5 Potential Users PARAS-3D is extensively used by Scientists and Engineers of ISRO and other government organizations DRDL, ADA and ADE. We have also distributed PARAS to some of the research institutions in the country such as IISc and IIST. Presently PARAS license is limited to government organizations. 3.6 Real-World Applications PARAS-3D is extensively used for the aerodynamic design and analysis of launch vehicles in ISRO and aircraft design in ADA. PARAS has its own pre-processor to generate geometry and grids of a problem under consideration. It also has capability of importing geometry generated by other CAD software s. A typical CFD problem for a launch vehicle with 93 million grids is shown in Figure.5. PARAS is capable of using fine grids near to

4 the body under consideration and coarse grids away from the body, as shown in the figure. Figure. 6 shows the pressure distribution obtained using PARAS, after 50,000 iterations. Some of the published CFD Simulations carried out using PARAS-3D can be found in [10,11, 12]. The list of publications in national journals/seminar/workshops is not included in this paper. Figure -2 SAGA Network Layout 4. CONCLUSION In this paper we are providing the details of a GPU based supercomputer, SAGA developed by VSSC. The cluster and infrastructure design was carried out in VSSC. Operating system, job scheduler, automated resource and power manager are also developed in-house. A GPU based application PARAS- 3D was also developed in VSSC. The challenges involved in developing a GPU based application is discussed in this paper. With suitable algorithms and tuning of parameters, PARAS gave upto 90 % efficiency for 40 nodes. In general, about 1 million cells per GPU gives very good performance ( > 90 %) upto 40 nodes 5. ACKNOWLEDGMENTS We would like to express our sincere thanks to Chairman, ISRO for providing necessary approval and funds for building a GPU based supercomputing facility in VSSC. We also express our sincere thanks to Director, VSSC for the technical and logistic support for building the facility in VSSC. We would like to thank DD,AERO, who took the initiative for establishing a GPU facility in VSSC and provided necessary support for building the facility. We extend our thanks to construction and maintenance wing of VSSC, which gave support for designing and establishing the facility. We would like to thank all our engineers, who helped in developing the facility and PARAS-3D CFD code. We have used good amount of open source software components to build SAGA supercomputer and GPU version of PARAS-3D. The developers of open source community is greatly acknowledged. Finally we would like to thank our valuable users, without them the need of the facility and PARAS code would not arise at all. Figure -3 Opening Window of PARAS-3D Figure -4 Speed up obtained for PARAS Figure -1 SAGA Supercomputer

5 Figure -6 Pressure Contours for a Typical Problem solved using PARAS-3D Figure -5 Geometry and Grids for a Typical Problem 6. REFERENCES [1] The SAGA Supercomputer, Technology Review India, Vol.3, No. 6, June [2] Tesla Product Literature, [3] 500.org. [4] PARAS-3D Users Manual Ver.4.1.0, VSSC/ARD/GN/01/2011, December [5] Parallelisation of Navier-Stokes code on a cluster of workstations, Ashok.V and Thomas C Babu, Lecture notes in Computer Science-1745, Spinger Verlag edition pp , [6] CUDA toolkit 4.1 [7] NVIDIA CUDA C Programming Guide, Ver. 4.1, Nov 2011 [8] CUDA for Developers, nvidia [9] Download CUDA manual and binaries. [10] Effect of connected pipe test conditions on scramjet engine modules for flight testing, Gnanasekhar.S, Dipankar Das, Ashok.v and Lazar T Chitilappilly, Int. J. of Aerospace Innovations, Vol.1, No.4, Dec [11] CFD Simulation of Flow field over a large protuberance on a flat plate at high supersonic mach number, K.Manokaran, G.Vidya and V.K.Goyal, AIAA [12] Numerical Simulation of single and twin jet impingement on a typical jet deflector, Navin Kumar Kessop, Dipankar Das, K. J. Devasia *, P. Jeyajothiraj, 11 th Asian Symposium on Visualization, Nigata, Japan, 2011.

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations Bilag 1 2013-03-12/RB DeIC (DCSC) Scientific Computing Installations DeIC, previously DCSC, currently has a number of scientific computing installations, distributed at five regional operating centres.

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

A-CLASS The rack-level supercomputer platform with hot-water cooling

A-CLASS The rack-level supercomputer platform with hot-water cooling A-CLASS The rack-level supercomputer platform with hot-water cooling INTRODUCTORY PRESENTATION JUNE 2014 Rev 1 ENG COMPUTE PRODUCT SEGMENTATION 3 rd party board T-MINI P (PRODUCTION): Minicluster/WS systems

More information

Accelerating CST MWS Performance with GPU and MPI Computing. CST workshop series

Accelerating CST MWS Performance with GPU and MPI Computing.  CST workshop series Accelerating CST MWS Performance with GPU and MPI Computing www.cst.com CST workshop series 2010 1 Hardware Based Acceleration Techniques - Overview - Multithreading GPU Computing Distributed Computing

More information

Overview of HPC systems and software available within

Overview of HPC systems and software available within Overview of HPC systems and software available within Overview Available HPC Systems Ba Cy-Tera Available Visualization Facilities Software Environments HPC System at Bibliotheca Alexandrina SUN cluster

More information

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth Shattering the 1U Server Performance Record Supermicro and NVIDIA recently announced a new class of servers that combines massively parallel GPUs with multi-core CPUs in a single server system. This unique

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Retargeting PLAPACK to Clusters with Hardware Accelerators

Retargeting PLAPACK to Clusters with Hardware Accelerators Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.

More information

HP ProLiant SL270s Gen8 Server. Evaluation Report

HP ProLiant SL270s Gen8 Server. Evaluation Report HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Case Study on Productivity and Performance of GPGPUs

Case Study on Productivity and Performance of GPGPUs Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Parallel Computing. Introduction

Parallel Computing. Introduction Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Resource Scheduling Best Practice in Hybrid Clusters

Resource Scheduling Best Practice in Hybrid Clusters Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Resource Scheduling Best Practice in Hybrid Clusters C. Cavazzoni a, A. Federico b, D. Galetti a, G. Morelli b, A. Pieretti

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

ST810 Advanced Computing

ST810 Advanced Computing ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

More information

Optimizing Performance of Parallel Programs on

Optimizing Performance of Parallel Programs on C-DAC & IIT Madras Five-Day Technology Workshop Programme ON Optimizing Performance of Parallel Programs on Emerging Multi-Core Processors and & GPUs OPECG-2009 Venue : Indian Institute of Technology Madras

More information

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11 Release Notes for Open Grid Scheduler/Grid Engine Version: Grid Engine 2011.11 New Features Berkeley DB Spooling Directory Can Be Located on NFS The Berkeley DB spooling framework has been enhanced such

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science Call for Expression of Interest (EOI) for the Supply, Installation

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Building Clusters for Gromacs and other HPC applications

Building Clusters for Gromacs and other HPC applications Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network

More information

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1 AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL AMD Embedded Solutions 1 Optimizing Parallel Processing Performance and Coding Efficiency with AMD APUs and Texas Multicore Technologies SequenceL Auto-parallelizing

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

TSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale

TSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale TSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale Toshio Endo,Akira Nukada, Satoshi Matsuoka GSIC, Tokyo Institute of Technology ( 東 京 工 業 大 学 ) Performance/Watt is the Issue

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

Cloud Computing. Alex Crawford Ben Johnstone

Cloud Computing. Alex Crawford Ben Johnstone Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Experiences on using GPU accelerators for data analysis in ROOT/RooFit Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,

More information

Pedraforca: ARM + GPU prototype

Pedraforca: ARM + GPU prototype www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

GTC Presentation March 19, 2013. Copyright 2012 Penguin Computing, Inc. All rights reserved

GTC Presentation March 19, 2013. Copyright 2012 Penguin Computing, Inc. All rights reserved GTC Presentation March 19, 2013 Copyright 2012 Penguin Computing, Inc. All rights reserved Session S3552 Room 113 S3552 - Using Tesla GPUs, Reality Server and Penguin Computing's Cloud for Visualizing

More information

HPC-related R&D in 863 Program

HPC-related R&D in 863 Program HPC-related R&D in 863 Program Depei Qian Sino-German Joint Software Institute (JSI) Beihang University Aug. 27, 2010 Outline The 863 key project on HPC and Grid Status and Next 5 years 863 efforts on

More information

Analysis of GPU Parallel Computing based on Matlab

Analysis of GPU Parallel Computing based on Matlab Analysis of GPU Parallel Computing based on Matlab Mingzhe Wang, Bo Wang, Qiu He, Xiuxiu Liu, Kunshuai Zhu (School of Computer and Control Engineering, University of Chinese Academy of Sciences, Huairou,

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems 202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012 Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

Simulation Platform Overview

Simulation Platform Overview Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2

Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2 Fraunhofer Institute for Algorithms and Scientific Computing SCAI Performance Comparison of ISV Simulation Codes on Microsoft HPC Server 28 and SUSE Enterprise Server 1.2 Karsten Reineck und Horst Schwichtenberg

More information

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Ryousei Takano Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Scaling from Workstation to Cluster for Compute-Intensive Applications

Scaling from Workstation to Cluster for Compute-Intensive Applications Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference

More information

HPC Software Requirements to Support an HPC Cluster Supercomputer

HPC Software Requirements to Support an HPC Cluster Supercomputer HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417

More information

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing

More information

Program Grid and HPC5+ workshop

Program Grid and HPC5+ workshop Program Grid and HPC5+ workshop 24-30, Bahman 1391 Tuesday Wednesday 9.00-9.45 9.45-10.30 Break 11.00-11.45 11.45-12.30 Lunch 14.00-17.00 Workshop Rouhani Karimi MosalmanTabar Karimi G+MMT+K Opening IPM_Grid

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC Mississippi State University High Performance Computing Collaboratory Brief Overview Trey Breckenridge Director, HPC Mississippi State University Public university (Land Grant) founded in 1878 Traditional

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

ANALYSIS OF SUPERCOMPUTER DESIGN

ANALYSIS OF SUPERCOMPUTER DESIGN ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Cluster Implementation and Management; Scheduling

Cluster Implementation and Management; Scheduling Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information

On-Demand Supercomputing Multiplies the Possibilities

On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server

More information

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information