High Performance Linux Cluster and Multicore Nehalem Processors

Size: px
Start display at page:

Download "High Performance Linux Cluster and Multicore Nehalem Processors"

Transcription

1 High Performance Linux Cluster and Multicore Nehalem Processors Zhang Xinhuai (High Performance Computing, Computer Centre) The fast and constant improvements in microprocessors have continued to enhance High Performance Computer (HPC) systems. The launch of the Intel Nehalem processors last year marked a new era for microprocessors, dramatically advancing the efficiency of HPC servers and providing unmatched computing capability. A new Linux cluster acquired in the Compute Centre last year has been equipped with Nehalem processors. It has a theoretical peak performance of 8.17 TFLOPs and a total of 4,608 GB memory. The benchmark results show that, on average, the cluster (Atlas5) with Nehalem processors X5550 at 2.66GHz is 25% faster than the cluster (Atlas3) with Intel Dualcore processors X5160 at 3.0 GHz and 19% faster than the cluster (Atlas4) with Intel Quadcore processors E5430 at 2.66GHz. 1. Beowulf Linux Cluster Since the first Beowulf cluster was setup in the world in the mid 1990, the commodity-based Linux clusters which are built using identical, commercially available computers and a high speed network have gradually replaced the large shared memory Symmetric Multiprocessing (SMP) computers or Non- Uniform Memory Access (NUMA) architecture computers which had dominated High Performance Parallel computing ten years ago. The world s TOP 500 supercomputer list in November 2009 gives us a rough idea of how far this trend has come: 83% or 417 systems out of 500 are cluster systems, while 10 years ago there were only seven clusters or 1.4% in the TOP 500 supercomputers. The driving forces for this trend, I think, are: 1) Continued and fast increases in microprocessor capacity have rendered other forms of computers almost obsolete 2) The cost of building a cluster is much lower than the traditional SMP or NUMA system 3) The speed of the cluster interconnection has increased and the latency decreased dramatically. 2. Linux cluster at HPC, Computer Centre At HPC, Compute Centre, we acquired our first Atlas series of Beowulf Linux cluster for High Performance Parallel computing in In the years after, the Linux cluster gradually became the main task force for HPC. The table below shows details of the configurations of the clusters. From the table we

2 can see how the compute power increases dramatically with the development of the new microprocessors, especially with the introduction of the multicore processor systems. Table 1: System Configuration of HPC Linux cluster at Computer Centre Cluster Name(launc h time) Atlas1(2003) Atlas2(2005) Model of Systems 6 nodes 2-way Dell PE 2650 and 12 nodes 2-way Dell PE nodes 2-way IBM HS20 Blade Processors 2.8GHz Cluster Interconne ct Total Numb er of Proce ssor/ Cores Total Memory (GB) Theoretical Peak Performance (GFlops) Myrinet Infiniband Atlas3(2007) HP Blade 460c 2- way Quadcore & 2 ways Dualcore Atlas4(2008) HP Blade BL460c, 2-way Quadcore E5430 and 4-way quadcore E7330 Atlas5(2010) HP BL460c, 2- way Quadcore Nehalem X5550 X5355, 2.66GHz & X5160, 3.0GHz E5430, 2.66 GHz & E7330, 2.40GHz X5550, 2.66GHz Infiniband Infiniband Infiniband Multicore Processors About five years ago, microprocessor designers began to acknowledge that they could no longer rely on higher clock rates and increasing Instruction-Level Parallelism (ILP) for performance increases. The increased power consumption with the increased clock rate proved to be a key challenge faced by the micro architecture designers. It was found that with 13% increase of the clock rate, the power consumption increased by about 73%. Other issues like memory bandwidth was also one of factors affecting performance improvement. On the other hand, multicore processors are ideal as they are able to improve performance while addressing the above issues of single high speed processors. A multi-core processor combines two or more processing elements (called cores) in a single package, on a single die or

3 multiple dies. The cores share the interconnects to the rest of the system and often share on-chip cache memory. The currently available dualcore or quadcore processors allow performance gains approaching two times or four times without dramatically changing the programming model or requiring specialpurpose tools and expert knowledge. Hence, multicore became the way forward. Of course, multicore processor systems open opportunities for large scale parallel computing. Just imagine if hundreds of cores are available in one processor, a two-processor personal computer will be a powerful supercomputer which can allow you to run large parallel codes with hundreds of threads. This may happen not too long in the future. However, there are also issues with the multicore system which can impair performance. For example, quad-core processors are composed of two separate dies, which means some cached data have to travel outside the processor to get from core to core. This brings out the new architectural design of the Nehalem processors. 4. Nehalem Processors In March 2009, Intel announced the release of Nehalem processors. The Nehalem processor is a singledie, 64-bit architecture with 8MB of fully shared L3 cache readily available to each of the four processor cores. The result is fast access to cache data and greater application performance. In the Intel processor, system memory is often connected to a processor through a separate I/O controller. But each Intel Nehalem processor features an integrated memory controller. The integrated memory controller, along with fast 1066MHz DDR3 ECC SDRAM, allows for enhanced system performance. Turbo Mode allows the processor to opportunistically improve performance by raising the frequency of the processor when there is thermal and electrical headroom. The magnitude of the frequency improvement is greater as the number of active cores decreases. In addition to new micro architecture performance features, Nehalem also radically changes the way memory and I/O are accessed. The Nehalem architecture eliminates the front side bus for accessing memory and I/O and moves to one where each CPU package includes an integrated memory controller and one or more high speed serial link known as Quick Path Interconnect (QPI) to access other CPU packages and I/O. The result is a massive improvement in memory and I/O bandwidth.

4 The processor cache is also revamped. The new cache hierarchy is known as Smart Cache. The Smart Cache implementation on Nehalem has two small caches included in each processor core to improve performance and scalability. A shared third level cache is included in the un-core. This allows the cache to vary in size with the number of cores, and can easily be increased in size with future implementations. The L3 cache has an inclusive policy everything in each core s L1/L2 cache must be present in the L3 cache. This keeps snoop traffic constant as the core count increases, and minimises the effective cache latency by eliminating cross-core snoops in the common case. 5. Atlas5 Cluster Performance In February 2010, we launched the new Linux cluster with Nehalem processors - the atlas5 cluster. With atlas5 cluster, we have the following improvements over the previous atlas clusters: 1) New Nehalem processors at 2.66GHz, 8 MB smart cache. 2) 36 ports Infiniband switch interconnect allows a 4:1 blocking factor in the intercommunication between nodes, which enables faster message passing between nodes for the MPI parallel applications. 3) 48 GB memory on each node provides sufficient memory for the most memory intensive applications. 4) 40tb High Performance parallel files system used as the working space for faster disk I/O access. The following is the benchmark results showing the performance improvement for some commercial and open source applications.

5 Table 1, Performance comparison between three Linux clusters Code No. of Procs Benchmark Results atlas3 atlas4 atlas5 Atlas5 Speedup over Atlas4 Atlas5 Speedup over Atlas3 Amber 16 CPU time (s) Turn Around Time (s) % 32.55% Gromacs 16 CPU time (s) Turn Around Time(s) % 16.92% serial parallel Linda Linda Linda 1 CPU time(s) Turn Around Time(s) % 11.08% 4 CPU time (s) Turn Around Time(s) % 26.01% 4 CPU time(s) Turn Around Time(s) % 4.15% 8 CPU time(s) Turn Around Time(s) % 37.48% 16 CPU time(s) Turn Around Time(s) % 50.81% Average 19.01% 25.57% It is obvious that the performance of the parallel Amber and Gromacs jobs and Gaussian 09 jobs on the atlas5 cluster is much better than the performance of those codes on the atlas3 and atlas4 clusters. The average speedup for atlas5 over atlas3 is 25% and 19% over atlas4. Let us also look at the scalability of the codes on the atlas5 cluster. Figure 1 and Figure 2 show the benchmark results for Amber, Linda and Gromacs jobs using a different number of CPU/cores. In Figure 3, we have the speed up of the three codes when they are run on the atlas5 cluster. The speedup for running Amber and Gromacs code using 32 CPUs can reach 19.2 and 24.5 respectively. For Gaussian 09 Linda, the speedup is about 8.5 for a job with 16 Linda processors.

6 Amber10 Linda Figure 1, Benchmark results of Amber10 and Linda code using different CPU/cores Gromacs Figure 2, Benchmark results of Gromacs code using different CPU/cores

7 Amber Gromacs Linda Figure 3, Speedup for different applications on Atlas5 cluster 6. Conclusion The development of microprocessors has ushered in a new era in the High Performance scientific computing area. As more and more super powerful computers are being built up, the solving of once difficult and challenging problems has now become quite easy. Multicore processors enable people to run large parallel jobs which helps to reduce computing time and increase the job efficiency.

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus

More information

Lecture 1: the anatomy of a supercomputer

Lecture 1: the anatomy of a supercomputer Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

Lecture 1. Course Introduction

Lecture 1. Course Introduction Lecture 1 Course Introduction Welcome to CSE 262! Your instructor is Scott B. Baden Office hours (week 1) Tues/Thurs 3.30 to 4.30 Room 3244 EBU3B 2010 Scott B. Baden / CSE 262 /Spring 2011 2 Content Our

More information

Using PCI Express Technology in High-Performance Computing Clusters

Using PCI Express Technology in High-Performance Computing Clusters Using Technology in High-Performance Computing Clusters Peripheral Component Interconnect (PCI) Express is a scalable, standards-based, high-bandwidth I/O interconnect technology. Dell HPC clusters use

More information

Building Clusters for Gromacs and other HPC applications

Building Clusters for Gromacs and other HPC applications Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering slyness@appro.com Company Overview

More information

Intel Xeon Processor 5500 Series. An Intelligent Approach to IT Challenges

Intel Xeon Processor 5500 Series. An Intelligent Approach to IT Challenges Intel Xeon Processor 5500 Series An Intelligent Approach to IT Challenges A Giant Leap for IT and Business Capabilities In many organizations, IT infrastructure has begun to constrain business efficiency

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

Intel Xeon Processor 3400 Series-based Platforms

Intel Xeon Processor 3400 Series-based Platforms Product Brief Intel Xeon Processor 3400 Series Intel Xeon Processor 3400 Series-based Platforms A new generation of intelligent server processors delivering dependability, productivity, and outstanding

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

IT@Intel. Comparing Multi-Core Processors for Server Virtualization

IT@Intel. Comparing Multi-Core Processors for Server Virtualization White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information

Recommended hardware system configurations for ANSYS users

Recommended hardware system configurations for ANSYS users Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Research Report The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Executive Summary Information technology (IT) executives should be

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Developments in Internet Infrastructure

Developments in Internet Infrastructure Developments in Internet Infrastructure Anne CM Johnson acw@cunningsystems.com, acw@aristanetworks.com, acw@xkl.com May 2009 1 Developments Infrastructure Latency, bandwidth, diversity DWDMs and RAMAN

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

HPC Growing Pains. Lessons learned from building a Top500 supercomputer HPC Growing Pains Lessons learned from building a Top500 supercomputer John L. Wofford Center for Computational Biology & Bioinformatics Columbia University I. What is C2B2? Outline Lessons learned from

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

Performance Across the Generations: Processor and Interconnect Technologies

Performance Across the Generations: Processor and Interconnect Technologies WHITE Paper Performance Across the Generations: Processor and Interconnect Technologies HPC Performance Results ANSYS CFD 12 Executive Summary Today s engineering, research, and development applications

More information

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Hybrid Storage Performance Gains for IOPS and Bandwidth Utilizing Colfax Servers and Enmotus FuzeDrive Software NVMe Hybrid

More information

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner (Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical

More information

Industry First X86-based Single Board Computer JaguarBoard Released

Industry First X86-based Single Board Computer JaguarBoard Released Industry First X86-based Single Board Computer JaguarBoard Released HongKong, China (May 12th, 2015) Jaguar Electronic HK Co., Ltd officially launched the first X86-based single board computer called JaguarBoard.

More information

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing

More information

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted

More information

Understanding PCI Bus, PCI-Express and In finiband Architecture

Understanding PCI Bus, PCI-Express and In finiband Architecture White Paper Understanding PCI Bus, PCI-Express and In finiband Architecture 1.0 Overview There is some confusion in the market place concerning the replacement of the PCI Bus (Peripheral Components Interface)

More information

DDR3 memory technology

DDR3 memory technology DDR3 memory technology Technology brief, 3 rd edition Introduction... 2 DDR3 architecture... 2 Types of DDR3 DIMMs... 2 Unbuffered and Registered DIMMs... 2 Load Reduced DIMMs... 3 LRDIMMs and rank multiplication...

More information

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive

More information

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

Fujitsu PRIMERGY BX920 S2 Dual-Socket Server

Fujitsu PRIMERGY BX920 S2 Dual-Socket Server Datasheet Fujitsu PRIMERGY BX920 S2 Dual-Socket Server Blade Datasheet for Red Hat certification Universal Dual-Sockel Server Blade with high computing and I/O performance in a small form factor The PRIMERGY

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit Unit A451: Computer systems and programming Section 2: Computing Hardware 1/5: Central Processing Unit Section Objectives Candidates should be able to: (a) State the purpose of the CPU (b) Understand the

More information

Business white paper. HP Process Automation. Version 7.0. Server performance

Business white paper. HP Process Automation. Version 7.0. Server performance Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6

More information

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

SAS Business Analytics. Base SAS for SAS 9.2

SAS Business Analytics. Base SAS for SAS 9.2 Performance & Scalability of SAS Business Analytics on an NEC Express5800/A1080a (Intel Xeon 7500 series-based Platform) using Red Hat Enterprise Linux 5 SAS Business Analytics Base SAS for SAS 9.2 Red

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers

CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW Dell PowerEdge M-Series Blade Servers Simplifying IT The Dell PowerEdge M-Series blade servers address the challenges of an evolving IT environment by delivering

More information

Michael Kagan. michael@mellanox.com

Michael Kagan. michael@mellanox.com Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies michael@mellanox.com Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

How System Settings Impact PCIe SSD Performance

How System Settings Impact PCIe SSD Performance How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

How To Write An Article On An Hp Appsystem For Spera Hana

How To Write An Article On An Hp Appsystem For Spera Hana Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

QUADRICS IN LINUX CLUSTERS

QUADRICS IN LINUX CLUSTERS QUADRICS IN LINUX CLUSTERS John Taylor Motivation QLC 21/11/00 Quadrics Cluster Products Performance Case Studies Development Activities Super-Cluster Performance Landscape CPLANT ~600 GF? 128 64 32 16

More information

HP ProLiant BL460c achieves #1 performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Microsoft, Oracle

HP ProLiant BL460c achieves #1 performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Microsoft, Oracle HP ProLiant BL460c achieves #1 performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Microsoft, Oracle HP ProLiant BL685c takes #2 spot HP Leadership» The HP ProLiant BL460c

More information

Improving Grid Processing Efficiency through Compute-Data Confluence

Improving Grid Processing Efficiency through Compute-Data Confluence Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 AMD PhenomII Architecture for Multimedia System -2010 Prof. Cristina Silvano Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 Outline Introduction Features Key architectures References AMD Phenom

More information

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011 Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB

More information

SERVER CLUSTERING TECHNOLOGY & CONCEPT

SERVER CLUSTERING TECHNOLOGY & CONCEPT SERVER CLUSTERING TECHNOLOGY & CONCEPT M00383937, Computer Network, Middlesex University, E mail: vaibhav.mathur2007@gmail.com Abstract Server Cluster is one of the clustering technologies; it is use for

More information

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012 Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

Intel 965 Express Chipset Family Memory Technology and Configuration Guide Intel 965 Express Chipset Family Memory Technology and Configuration Guide White Paper - For the Intel 82Q965, 82Q963, 82G965 Graphics and Memory Controller Hub (GMCH) and Intel 82P965 Memory Controller

More information

HPC Update: Engagement Model

HPC Update: Engagement Model HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems mikev@sun.com Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value

More information

Cooling and thermal efficiently in

Cooling and thermal efficiently in Cooling and thermal efficiently in the datacentre George Brown HPC Systems Engineer Viglen Overview Viglen Overview Products and Technologies Looking forward Company Profile IT hardware manufacture, reseller

More information

White Paper Solarflare High-Performance Computing (HPC) Applications

White Paper Solarflare High-Performance Computing (HPC) Applications Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters

More information

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494 Performance Guide ANSYS, Inc. Release 12.1 Southpointe November 2009 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317 certified to ISO ansysinfo@ansys.com 9001:2008. http://www.ansys.com (T) 724-746-3304

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

EVALUATING NEW ARCHITECTURAL FEATURES OF THE INTEL(R) XEON(R) 7500 PROCESSOR FOR HPC WORKLOADS

EVALUATING NEW ARCHITECTURAL FEATURES OF THE INTEL(R) XEON(R) 7500 PROCESSOR FOR HPC WORKLOADS Computer Science Vol. 12 2011 Paweł Gepner, David L. Fraser, Michał F. Kowalik, Kazimierz Waćkowski EVALUATING NEW ARCHITECTURAL FEATURES OF THE INTEL(R) XEON(R) 7500 PROCESSOR FOR HPC WORKLOADS In this

More information

benchmarking Amazon EC2 for high-performance scientific computing

benchmarking Amazon EC2 for high-performance scientific computing Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

More information

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6 WHITE PAPER PERFORMANCE REPORT PRIMERGY BX620 S6 WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6 This document contains a summary of the benchmarks executed for the PRIMERGY BX620

More information

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every

More information

SR-IOV In High Performance Computing

SR-IOV In High Performance Computing SR-IOV In High Performance Computing Hoot Thompson & Dan Duffy NASA Goddard Space Flight Center Greenbelt, MD 20771 hoot@ptpnow.com daniel.q.duffy@nasa.gov www.nccs.nasa.gov Focus on the research side

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information