HPC and Parallel efficiency

Size: px
Start display at page:

Download "HPC and Parallel efficiency"

Transcription

1 HPC and Parallel efficiency Martin Hilgeman EMEA product technologist HPC

2 What is parallel scaling?

3 Parallel scaling Parallel scaling is the reduction in application execution time when more than one core is used Number of cores Wall clock time (seconds) Speedup factor 7x faster on 8 cores does not seem to be that bad, but

4 Amdahl s Law

5 Amdahl s Law Gene Amdahl (1967): "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities". AFIPS Conference Proceedings (30): The effort expended on achieving high parallel processing rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude a = n (p + n 1 p ) a: speedup n: number of processors p: parallel fraction

6 Amdahl s Law: model 80% parallel application Parallel scaling is hindered by the domination of the serial parts in the application Wall clock time parallel serial Number of processors

7 Amdahl s Law limits maximal speedup Infinite number of processors 250 a = 1 (1 p) 200 a: speedup 200 n: number of processors Parallel Speedup p: parallel fraction % 97.0% 99.0% 99.5% Amdahl's Law percentage

8 Amdahl s Law curves a = n (p + n 1 p ) % 95.0% 97.0% 99.0% 99.5% 99.9% 100.0% a: speedup n: number of processors p: parallel fraction Parallel speedup Multiprocessing can exceed 99% Most Parallel codes are between 95% and 99% Number of processor cores

9 Amdahl s Law and Efficiency 100.0% 99.0% 98.0% 97.0% 96.0% Diminishing returns: Amdahl's Law Percentage 95.0% 94.0% 93.0% 92.0% 91.0% 90.0% 89.0% e = 1 p n n 1 Tension between the desire to use more processors and the associated cost 88.0% 87.0% 86.0% % 66.7% 85.7% 90.9% 93.3% 94.7% 95.7% 96.8% 97.4% 97.9% 98.2% 98.4% 98.7% 98.9% 99.1% 99.2% 66.7% 83.4% 92.9% 95.5% 96.7% 97.4% 97.8% 98.4% 98.7% 98.9% 99.1% 99.2% 99.4% 99.5% 99.6% 99.6% 75.0% 88.9% 95.2% 97.0% 97.8% 98.2% 98.6% 98.9% 99.1% 99.3% 99.4% 99.5% 99.6% 99.6% 99.7% 99.7% Number of processor cores

10 Best practices for efficient computing

11 Taking care of process placement

12 Beware of memory topology In the SMP days, the mapping of processors to memory was straightforward I/O hub 0 Processors controller

13 NUMA for Intel Xeon 56xx Intel Nehalem and AMD Opteron have their memory controller on the processor access becomes non-uniform (NUMA) QPI QPI QPI I/O hub Intel Westmere-EP

14 NUMA for Intel Xeon E5 Intel Sandy Bridge also has the PCIe controller on the chip PCI device access becomes non-uniform (NUPA?) QPI I/O hub I/O hub Intel Sandy Bridge EN/EP

15 NUMA for Intel Xeon E7 processors 4-socket Intel systems (PowerEdge R810, R910, M910) are connected with QPI links QPI QPI QPI Intel Westmere-EX

16 NUMA for AMD 4-socket AMD systems (PowerEdge R815, M915, C6145) are connected with HyperTransport3 links HT3 HT3 HT3 AMD Bulldozer

17 Use tools for memory placement Programming tools: sched_setaffinity() set the process CPU affinity mask Operating system tools: numactl - control NUMA policy for processes or shared memory taskset - set a process CPU affinity Most MPI libraries have memory placement tools enabled by default Intel MPI I_MPI_PIN_* environment variables Open MPI --bind_to_core, --bind_to_socket MVAPICH2 MV2_ENABLE_AFFINITY environment variable Some set affinity by default and do the right thing, but be careful

18 Case study: gysela5d plasma physics application Claimed 92% efficiency on 8192 cores Uses hybrid MPI/OpenMP parallelism Run on 64 cores Dell PowerEdge R815 with AMD GHz processors 8 MPI ranks x 8 OpenMP threads MVAPICH2 1.6

19 PowerEdge R815 CPU core layout PowerEdge R815 has a non-standard dual plane core mapping, making default placement to fail

20 No placement MPI ranks and OpenMP threads scattered across the sockets Sleeping processes (idle OpenMP threads) can start to roam Wall clock time: 1296 seconds

21 Naïve (default) placement Starting from core 0 and counting onwards MPI ranks on every 8 th core, OpenMP ranks in between Wall clock time: 249 seconds

22 Optimal placement Use a placement script which calculates the right mapping for numactl Wall clock time: 191 seconds

23 PowerEdge R910 CPU core layout PowerEdge R910 has the same non-linear core mapping as the R815!

24 No placement MPI ranks and OpenMP threads scattered across the sockets Sleeping processes (idle OpenMP threads) can start to roam QPI QPI QPI Wall clock time: 152 seconds

25 Naïve (default) placement Starting from core 0 and counting onwards MPI ranks on every 8 th core, OpenMP ranks in between QPI QPI QPI Wall clock time: 150 seconds

26 Optimal placement Use a placement script which calculates the right mapping for numactl QPI QPI QPI Wall clock time: 131 seconds

27 Placement program Written in C99 (~ 2,000 lines of code) Works on all major distributions RHEL 5.x and open variants SLES11 SP2 Supports all major MPI libraries MVAPICH MVAPICH2 OpenMPI Platform MPI/HP MPI Intel MPI Tested with > 5,000 core runs Supports hybrid MPI/OpenMP runs too!

28 Placement program Supports all machines and processor vendor models: Intel Nehalem-EP Intel Nehalem-EX Intel Westmere-EP Intel Westmere-EX Intel Sandy Bridge EP AMD Magny Cours AMD Interlagos Knows which cores are sharing L3 caches Understands the AMD Bulldozer module concept to make maximal use of available resources if possible

29 Works only on Dell systems! Try to run on a whitebox system $ mpirun np 1./dell_affinity.exe hello_world.exe This is not a Dell system. Exiting.

30 Program examples: Usage: open64_acml]$ mpirun -np 1 dell_affinity.exe -h Invalid option: -h Usage: dell_affinity.exe -n <# local MPI ranks> -t <# OpenMP threads per rank> TACC single node, 6 cores MPI login2$ mpirun -np 6 dell_affinity.exe./mpi.exe dell_affinity.exe: Using MVAPICH2. dell_affinity.exe: PPN: 6 OMP_NUM_THREADS: 1. dell_affinity.exe: Intel Westmere processor detected. dell_affinity.exe: Placing MPI rank 0 on host login2 local rank 0 cpulist 1 memlist 0 dell_affinity.exe: Placing MPI rank 1 on host login2 local rank 1 cpulist 3 memlist 0 dell_affinity.exe: Placing MPI rank 2 on host login2 local rank 2 cpulist 5 memlist 0 dell_affinity.exe: Placing MPI rank 3 on host login2 local rank 3 cpulist 7 memlist 0 dell_affinity.exe: Placing MPI rank 4 on host login2 local rank 4 cpulist 9 memlist 0 dell_affinity.exe: Placing MPI rank 5 on host login2 local rank 5 cpulist 11 memlist 0

31 Program examples: TACC single node 2 MPI ranks, 6 OpenMP threads per rank: login1$ mpirun -np 4./dell_affinity.exe -n 2 -t 6 -v /bin/true./dell_affinity.exe: Using Open MPI../dell_affinity.exe: PPN = 2./dell_affinity.exe: OMP_NUM_THREADS = 6./dell_affinity.exe: Intel Westmere EP processor detected../dell_affinity.exe: node 1: cpulist: /dell_affinity.exe: node 0: cpulist: /dell_affinity.exe: Placing MPI rank 0 on host login1.ls4.tacc.utexas.edu local rank 0 cpulist 0,2,4,6,8,10 memlist 0./dell_affinity.exe: Placing MPI rank 3 on host login1.ls4.tacc.utexas.edu local rank 1 cpulist 1,3,5,7,9,11 memlist 1./dell_affinity.exe: Placing MPI rank 1 on host login1.ls4.tacc.utexas.edu local rank 1 cpulist 1,3,5,7,9,11 memlist 1./dell_affinity.exe: Placing MPI rank 2 on host login1.ls4.tacc.utexas.edu local rank 0 cpulist 0,2,4,6,8,10 memlist 0

32 Program examples: TACC two nodes, 4 MPI ranks per node, 3 OpenMP threads per rank: login2$ mpirun_rsh -ssh -hostfile./hosts -np 8 dell_affinity.exe -n 4 -t 3./mpi.exe dell_affinity.exe: Using MVAPICH2. dell_affinity.exe: PPN: 4 OMP_NUM_THREADS: 3. dell_affinity.exe: Intel Westmere processor detected. dell_affinity.exe: Placing MPI rank 0 on host login1 local rank 0 cpulist 1,3,5 memlist 0 dell_affinity.exe: Placing MPI rank 1 on host login1 local rank 0 cpulist 7,9,11 memlist 0 dell_affinity.exe: Placing MPI rank 2 on host login1 local rank 2 cpulist 0,2,4 memlist 1 dell_affinity.exe: Placing MPI rank 3 on host login1 local rank 3 cpulist 6,8,10 memlist 1 dell_affinity.exe: Placing MPI rank 4 on host login2 local rank 0 cpulist 1,3,5 memlist 0 dell_affinity.exe: Placing MPI rank 5 on host login2 local rank 1 cpulist 7,9,11 memlist 0 dell_affinity.exe: Placing MPI rank 6 on host login2 local rank 2 cpulist 0,2,4 memlist 1 dell_affinity.exe: Placing MPI rank 7 on host login2 local rank 3 cpulist 6,8,10 memlist 1

33 Program examples: Cambridge Single C6145 with AMD Interlagos, 16 MPI ranks: open64_acml]$ mpirun -np 16 dell_affinity.exe ~/martinh/bin/mpi.exe dell_affinity.exe: Using MVAPICH2. dell_affinity.exe: PPN: 16 OMP_NUM_THREADS: 1. dell_affinity.exe: AMD Interlagos processor detected. dell_affinity.exe: Placing OMP threads on separate modules. dell_affinity.exe: Placing MPI rank 0 on host bench local rank 0 cpulist 0 memlist 0 dell_affinity.exe: Placing MPI rank 1 on host bench local rank 1 cpulist 4 memlist 0 dell_affinity.exe: Placing MPI rank 2 on host bench local rank 2 cpulist 8 memlist 1 dell_affinity.exe: Placing MPI rank 3 on host bench local rank 3 cpulist 12 memlist 1 dell_affinity.exe: Placing MPI rank 4 on host bench local rank 4 cpulist 16 memlist 2 dell_affinity.exe: Placing MPI rank 5 on host bench local rank 5 cpulist 20 memlist 2 dell_affinity.exe: Placing MPI rank 6 on host bench local rank 6 cpulist 24 memlist 3 dell_affinity.exe: Placing MPI rank 7 on host bench local rank 7 cpulist 28 memlist 3 dell_affinity.exe: Placing MPI rank 8 on host bench local rank 8 cpulist 32 memlist 4 dell_affinity.exe: Placing MPI rank 9 on host bench local rank 9 cpulist 36 memlist 4 dell_affinity.exe: Placing MPI rank 10 on host bench local rank 10 cpulist 40 memlist 5 dell_affinity.exe: Placing MPI rank 11 on host bench local rank 11 cpulist 44 memlist 5 dell_affinity.exe: Placing MPI rank 12 on host bench local rank 12 cpulist 48 memlist 6 dell_affinity.exe: Placing MPI rank 13 on host bench local rank 13 cpulist 52 memlist 6 dell_affinity.exe: Placing MPI rank 14 on host bench local rank 14 cpulist 56 memlist 7 dell_affinity.exe: Placing MPI rank 15 on host bench local rank 15 cpulist 60 memlist 7 Dell Confidential

34 Program examples: Cambridge Single C6145 with AMD Interlagos, 8 MPI ranks, 8 OpenMP threads per rank: [dell-guest@bench open64_acml]$ mpirun -np 8 dell_affinity.exe -t 8 ~/martinh/bin/mpi.exe dell_affinity.exe: Using MVAPICH2. dell_affinity.exe: PPN: 8 OMP_NUM_THREADS: 8. dell_affinity.exe: AMD Interlagos processor detected. dell_affinity.exe: Placing MPI rank 0 on host bench local rank 0 cpulist 0,1,2,3,4,5,6,7 memlist 0 dell_affinity.exe: Placing MPI rank 1 on host bench local rank 1 cpulist 8,9,10,11,12,13,14,15 memlist 1 dell_affinity.exe: Placing MPI rank 2 on host bench local rank 2 cpulist 16,17,18,19,20,21,22,23 memlist 2 dell_affinity.exe: Placing MPI rank 3 on host bench local rank 3 cpulist 24,25,26,27,28,29,30,31 memlist 3 dell_affinity.exe: Placing MPI rank 4 on host bench local rank 4 cpulist 32,33,34,35,36,37,38,39 memlist 4 dell_affinity.exe: Placing MPI rank 4 on host bench local rank 5 cpulist 40,41,42,43,44,45,46,47 memlist 5 dell_affinity.exe: Placing MPI rank 6 on host bench local rank 6 cpulist 48,49,50,51,52,53,54,55 memlist 6 dell_affinity.exe: Placing MPI rank 7 on host bench local rank 7 cpulist 56,57,58,59,60,61,62,63 memlist 7

35 LS-DYNA benchmark: neon_refined LS-DYNA mpp971 v5.1.1 Platform MPI Ran on PE R MPI ranks Architecture knowledge Is key! Mode MPI ranks Wall clock (s) As-is Platform MPI pinning dell_affinity Mode MPI ranks Wall clock (s) As-is Platform MPI pinning dell_affinity

36 Parallel optimization

37 Parallel optimization A lot of attention is being paid to: Infiniband networking buzzwords: Fat-tree multi-rail QDR/FDR/EDR non-blocking MPI library features: shared memory optimization collective offloading single sided messaging message buffering Better start at the root of the parallel performance

38 Do these programs run efficient? LS-DYNA explicit PARATEC

39 Case study: PARATEC load balancing PARAllel Total Energy Code Developed at NERSC for ab initio electronic structure calculations in materials science Uses Density Functional Theory (DFT) to describe the electronic structure of a material (solid, crystal, metal) Knowing the electronic structure of a material tells you everything about its properties Electronic structure is described by wave functions, which (unfortunately) cannot be solved mathematically Approach: Expand the wave functions in plane waves (in Fourier space) Describe the nucleus of a atom with a pseudopotential 3D parallel Fourier Transformations are needed to convert to real (cartesian) space, which are *very* expensive!

40 Benchmark setup Si (silicium) in diamond structure 686 atoms, 7x7x7 cell, 1372 electronic bands Jobs ran at Texas Advanced Computing Center Dell Linux Cluster Lonestar 1,888 Dell PowerEdge M610 blades 22,656 Intel Xeon X GHz Mellanox QDR Infiniband 1 PB Lustre parallel storage Used 196 cores for the calculations

41 Default g vector distribution Computational Time MPI Time Wall Clock Time (s) seconds Uneven load MPI Rank Computation time : 648 seconds Communication time: 276 seconds Communication % : 29.9 % Load imbalance : 21.2 %

42 Optimized g vector distribution Computational Time MPI Time Speedup: 14.3 % 800 Wall Clock Time (s) seconds MPI rank Even load Computation time : 638 seconds Communication time: 154 seconds Communication % : 19.5 % Load imbalance : 5.8 %

43 Conclusion

44 Conclusion Architecture knowledge is key to obtain good scalability People concentrate on MPI optimization work but often forget load balancing issues Use system tools and profilers as standard practice!

45 Questions?

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

How System Settings Impact PCIe SSD Performance

How System Settings Impact PCIe SSD Performance How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage

More information

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Ryousei Takano Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

Basic Concepts in Parallelization

Basic Concepts in Parallelization 1 Basic Concepts in Parallelization Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline

More information

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD Kashif.iqbal@ichec.ie ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo

More information

High Performance Computing Infrastructure at DESY

High Performance Computing Infrastructure at DESY High Performance Computing Infrastructure at DESY Sven Sternberger & Frank Schlünzen High Performance Computing Infrastructures at DESY DV-Seminar / 04 Feb 2013 Compute Infrastructures at DESY - Outline

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

How to Run Parallel Jobs Efficiently

How to Run Parallel Jobs Efficiently How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz)

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz) Center for e-research (eresearch@nesi.org.nz) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using

More information

The CNMS Computer Cluster

The CNMS Computer Cluster The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the

More information

Lecture 1: the anatomy of a supercomputer

Lecture 1: the anatomy of a supercomputer Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

A quick tutorial on Intel's Xeon Phi Coprocessor

A quick tutorial on Intel's Xeon Phi Coprocessor A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed

More information

Auto-Tunning of Data Communication on Heterogeneous Systems

Auto-Tunning of Data Communication on Heterogeneous Systems 1 Auto-Tunning of Data Communication on Heterogeneous Systems Marc Jordà 1, Ivan Tanasic 1, Javier Cabezas 1, Lluís Vilanova 1, Isaac Gelado 1, and Nacho Navarro 1, 2 1 Barcelona Supercomputing Center

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Bill Barth, Kent Milfeld, Dan Stanzione Tommy Minyard Texas Advanced Computing Center Jim Jeffers, Intel June 2013, Leipzig, Germany

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Innovativste XEON Prozessortechnik für Cisco UCS

Innovativste XEON Prozessortechnik für Cisco UCS Innovativste XEON Prozessortechnik für Cisco UCS Stefanie Döhler Wien, 17. November 2010 1 Tick-Tock Development Model Sustained Microprocessor Leadership Tick Tock Tick 65nm Tock Tick 45nm Tock Tick 32nm

More information

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Advancing Applications Performance With InfiniBand

Advancing Applications Performance With InfiniBand Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and

More information

PSE Molekulardynamik

PSE Molekulardynamik OpenMP, bigger Applications 12.12.2014 Outline Schedule Presentations: Worksheet 4 OpenMP Multicore Architectures Membrane, Crystallization Preparation: Worksheet 5 2 Schedule 10.10.2014 Intro 1 WS 24.10.2014

More information

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University

More information

Know your Cluster Bottlenecks and Maximize Performance

Know your Cluster Bottlenecks and Maximize Performance Know your Cluster Bottlenecks and Maximize Performance Hands-on training March 2013 Agenda Overview Performance Factors General System Configuration - PCI Express (PCIe) Capabilities - Memory Configuration

More information

Getting Started with HPC

Getting Started with HPC Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage

More information

HPC Update: Engagement Model

HPC Update: Engagement Model HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems mikev@sun.com Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value

More information

Performance Characteristics of Large SMP Machines

Performance Characteristics of Large SMP Machines Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information

How to control Resource allocation on pseries multi MCM system

How to control Resource allocation on pseries multi MCM system How to control Resource allocation on pseries multi system Pascal Vezolle Deep Computing EMEA ATS-P.S.S.C/ Montpellier FRANCE Agenda AIX Resource Management Tools WorkLoad Manager (WLM) Affinity Services

More information

Programming Techniques for Supercomputers: Multicore processors. There is no way back Modern multi-/manycore chips Basic Compute Node Architecture

Programming Techniques for Supercomputers: Multicore processors. There is no way back Modern multi-/manycore chips Basic Compute Node Architecture Programming Techniques for Supercomputers: Multicore processors There is no way back Modern multi-/manycore chips Basic ompute Node Architecture SimultaneousMultiThreading (SMT) Prof. Dr. G. Wellein (a,b),

More information

Recommended hardware system configurations for ANSYS users

Recommended hardware system configurations for ANSYS users Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Using the Windows Cluster

Using the Windows Cluster Using the Windows Cluster Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster

More information

NUMA Best Practices for Dell PowerEdge 12th Generation Servers

NUMA Best Practices for Dell PowerEdge 12th Generation Servers NUMA Best Practices for Dell PowerEdge 12th Generation Servers Tuning the Linux OS for optimal performance with NUMA systems John Beckett Solutions Performance Analysis Enterprise Solutions Group Contents

More information

Using NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz)

Using NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz) NeSI Computational Science Team (support@nesi.org.nz) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...

More information

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based

More information

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Hardware Unit for event measurements Performance Monitoring Unit (PMU) Originally for CPU-Debugging

More information

LANL Computing Environment for PSAAP Partners

LANL Computing Environment for PSAAP Partners LANL Computing Environment for PSAAP Partners Robert Cunningham rtc@lanl.gov HPC Systems Group (HPC-3) July 2011 LANL Resources Available To Alliance Users Mapache is new, has a Lobo-like allocation Linux

More information

HPC Hardware Overview

HPC Hardware Overview HPC Hardware Overview John Lockman III February 7, 2012 Texas Advanced Computing Center The University of Texas at Austin Outline Some general comments Lonestar System Dell blade-based system InfiniBand

More information

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

benchmarking Amazon EC2 for high-performance scientific computing

benchmarking Amazon EC2 for high-performance scientific computing Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012 Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................

More information

Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86 64 Processors

Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86 64 Processors Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86 64 Processors Robert Schöne, Daniel Hackenberg, and Daniel Molka Center for Information Services and High Performance Computing

More information

Parallel Processing using the LOTUS cluster

Parallel Processing using the LOTUS cluster Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS

More information

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission.

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Stovepipes to Clouds Rick Reid Principal Engineer SGI Federal 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Agenda Stovepipe Characteristics Why we Built Stovepipes Cluster

More information

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

Workshare Process of Thread Programming and MPI Model on Multicore Architecture Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

An Introduction to the Gordon Architecture

An Introduction to the Gordon Architecture An Introduction to the Gordon Architecture Gordon Summer Institute & Cyberinfrastructure Summer Institute for Geoscientists August 8-11, 2011 Shawn Strande Gordon Project Manager San Diego Supercomputer

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM White Paper Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM September, 2013 Author Sanhita Sarkar, Director of Engineering, SGI Abstract This paper describes how to implement

More information

Running applications on the Cray XC30 4/12/2015

Running applications on the Cray XC30 4/12/2015 Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes

More information

RDMA over Ethernet - A Preliminary Study

RDMA over Ethernet - A Preliminary Study RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement

More information

Technical Computing Suite Job Management Software

Technical Computing Suite Job Management Software Technical Computing Suite Job Management Software Toshiaki Mikamo Fujitsu Limited Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Outline System Configuration and Software Stack Features The major functions

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

EVLA Post Processing Cluster Recommendation

EVLA Post Processing Cluster Recommendation The following document provides a description of CASA parallelization and scaling issues for processing JVLA astronomical data and concludes with a design recommendation for the computing facilities necessary

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

OpenMP and Performance

OpenMP and Performance Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Research Report The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Executive Summary Information technology (IT) executives should be

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Performance Analysis and Optimization Tool

Performance Analysis and Optimization Tool Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop

More information

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center SRNWP Workshop HP Solutions and Activities in Climate & Weather Research Michael Riedmann European Performance Center Agenda A bit of marketing: HP Solutions for HPC A few words about recent Met deals

More information

Kriterien für ein PetaFlop System

Kriterien für ein PetaFlop System Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working

More information

Distributed communication-aware load balancing with TreeMatch in Charm++

Distributed communication-aware load balancing with TreeMatch in Charm++ Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration

More information

Vers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques.

Vers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques. Vers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques Brice Goglin 15 avril 2014 Towards generic Communication Mechanisms

More information

Keys to node-level performance analysis and threading in HPC applications

Keys to node-level performance analysis and threading in HPC applications Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION

More information

DDR3 memory technology

DDR3 memory technology DDR3 memory technology Technology brief, 3 rd edition Introduction... 2 DDR3 architecture... 2 Types of DDR3 DIMMs... 2 Unbuffered and Registered DIMMs... 2 Load Reduced DIMMs... 3 LRDIMMs and rank multiplication...

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

NCCS Brown Bag Series

NCCS Brown Bag Series NCCS Brown Bag Series Tips for Monitoring Memory Usage in PBS jobs on Discover Chongxun (Doris) Pan doris.pan@nasa.gov October 16, 2012 After the talk, you will understand -- What s memory swapping, really?

More information

Agenda. Using HPC Wales 2

Agenda. Using HPC Wales 2 Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software

More information

Cloud Computing through Virtualization and HPC technologies

Cloud Computing through Virtualization and HPC technologies Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC

More information

Recent Advances in HPC for Structural Mechanics Simulations

Recent Advances in HPC for Structural Mechanics Simulations Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the

More information

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve

More information

Parallel Large-Scale Visualization

Parallel Large-Scale Visualization Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU

More information

Improved LS-DYNA Performance on Sun Servers

Improved LS-DYNA Performance on Sun Servers 8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms

More information

SUN ORACLE EXADATA STORAGE SERVER

SUN ORACLE EXADATA STORAGE SERVER SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information