A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
|
|
|
- Jeremy Hood
- 10 years ago
- Views:
Transcription
1 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the effects of differences in problem size, number of cores per processor, and interconnect on the Hybrid LS-DYNA s performance are studied. The result shows that combination of these three factors determines when Hybrid LS-DYNA has a performance advantage over the MPP LS-DYNA. Introduction The newly developed Hybrid LS-DYNA is a hybridization of SMP LS-DYNA and MPP LS- DYNA. (We will call Hybrid LS-DYNA the Hybrid Method and MPP LS-DYNA the MPP method from now on in this paper.) Both methods decompose the problem into subdomains that are solved by a single message passing interface (MPI) process. These MPI processes are commonly called ranks. In the MPP Method, ranks are not threaded, while the ranks of the Hybrid method are thread parallel. The following relationship applies for both methods: Number of ranks = Number of subdomains For the MPP Method, the number of cores used is the number of ranks: Number of cores = Number of ranks while the Hybrid Method uses more cores than ranks: Number of cores = Number of threads per rank Number of ranks Given a number of cores and a given problem, two factors that affect performances of the two methods can be drawn: Communication cost: The communication cost is in general proportional to the number of ranks. For a given number of cores, the Hybrid Method uses much fewer ranks than the MPP Method, so the communication cost for the Hybrid method should be much less than that of the MPP Method. Furthermore, the communication cost is affected by the speed of interconnect and the problem size. Computation cost on each subdomain: The scalability of the each subdomain s solution should behave similarly to that of the SMP LS-DYNA, which also uses thread parallelism. SMP LS-DYNA is well-known to scale sub-linearly with respect to the number of cores and does not scale well beyond 8 cores. It is also commonly known that the scalability of a server with multiple processors drops substantially across processors. Modern processor chips include memory controllers, so sharing memory on multiple chips means using caches that are not shared and memory access will be slow. Therefore, 8-19
2 Computing Technology 11 th International LS-DYNA Users Conference for optimal performance, the number of threads per rank in the Hybrid Method should equal to the number of cores per processor for a given architecture. This paper investigates these factors experimentally with two problem sizes, two interconnects with different speeds, and two architectures with different numbers of cores per processor. Scalability versus Problem Sizes, Architectures, and Interconnects Two car crash problems from National Crash Analysis Center, described below, are used in the study: Problem Number of Termination Description Elements Time Three-car Collision 0.8 M 150 ms Impact of 3 Caravan at 35 mph Car-to-car Collision 2.4 M 120 ms Impact of 2 Caravan at 35 mph Benchmarks are run on two clusters: One is comprised of HP BL280 nodes containing two Intel Nehalem Xeon processors, and the other of HP BL465c nodes containing two AMD Istanbul processors. A summary of the hardware configuration for the two clusters is shown below: Node type HP BL280c HP BL465c Architecture Intel Xeon X5560 (Nehalem) (quadcore) (2.8 GHz) AMD Opteron 2435 (Istanbul) (sextuple-core) (2.6 GHz) Processors/Cores per node 2 processors/ 8 cores per node 2 processors/ 12 cores per node Highest level Cache Configuration One 8 MB L3 cache shared among 4 cores One 6 MB L3 cache shared among 6 cores Memory per node 24 GB 24 GB Interconnect InfiniBand QDR or GigE InfiniBand DDR or GigE O/S SLES 11.0 SLES 11.0 The number of threads per rank for the Hybrid Method, used in call cases of this section, is equal to the number of cores per processor, i.e., 4 for the BL 280c cluster and 6 for the BL465c cluster. Scalability versus Problem Size Figure 1 shows the elasped times measured for the Car-to-car Collision, as well as their comparison, with both methods, versus number of cores on the BL280 cluster with InfiniBand (IB) QDR; Figure 2 is the performance of the Three-car Collision run on the same cluster, Figures 3 and 4 are the performances run on the BL465c cluster with InfiniBand DDR; Figure 3 is for the Car-to-car Collision, and Figure 4 for the Three-car Collision. 8-20
3 11 th International LS-DYNA Users Conference Computing Technology Figure 1. Elapsed times of the Car-to-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL280c cluster with IB QDR. Figure 2. Elapsed times of the Three-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL280c cluster with IB QDR. 8-21
4 Computing Technology 11 th International LS-DYNA Users Conference Figure 3. Elapsed times of the Car-to-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL465c cluster with IB DDR. Figure 4. Elapsed times of the Three-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL465c cluster with IB DDR. 8-22
5 11 th International LS-DYNA Users Conference Computing Technology The following observations can be drawn from the four figures, which are on clusters equipped with the fast interconnect InfiniBand: Regardless of architecture, the Hybrid Method has advantage over the MPP Method only if the problem size and the number of cores are large enough. For the larger problem, the Car-to-car Collision, the Hybrid Method starts to have a performance advantage over the MPP Method at about 192 cores. For the smaller problem, the Three-car Collision, the Hybrid Method performs worse than the MPP Method regardless of the number of cores; so there is no reason to use the Hybrid method. Scalability versus Architecture For the Car-to-car Collision, at 384 cores, the Hybrid Method outperforms the MPP Method by 4 percent on the BL280c cluster (Figure 1), whereas the former method outperforms the latter by 6 percent on the BL465c cluster (Figure 3). As explained in an early paper [1], the performance difference can be attributed to difference in number of cores per processor (same as number of threads per rank in the Hybrid Method) between the two architectures: 4 on the BL280c cluster versus 6 on the BL465c cluster. Scalability versus Interconnects Figure 5 shows the elasped times measured for the Car-to-car Collision, as well as their comparison, with both methods, versus number of cores on the BL280 cluster with Gigabit Ethernet (GigE); Figure 6 is the performance of the Three-car Collision run on the same cluster. Figures 7 and 8 are the performances run on the BL465 cluster; Figure 7 is for the Car-to-car Collision, and Figure 8 is for the Three-car Collision. The following observations can be drawn from the four figures, which are on clusters equipped with the slow interconnect GigE: Regardless of architecture and problem size, the MPP Method stalls at about144 cores. In contrast, the Hybrid Method still scales at about 144 cores. For the smaller problem, the Three-car Collision, the Hybrid Method has a performance advantage over the MPP Method with GigE, unlike that with IB. It is notable that Figure 6 shows that the Hybrid Method at 144 cores is 1.4 times faster than the MPP Method at 48 cores on the BL280c cluster, and Figure 8 shows 1.6 times faster on the BL465c cluster. Discussion The performance behavior of the Hybrid Method relative to the MPP Method can be explained by difference in message pattern. Shown in Figure 8 are message patterns for the MPP Method, the Hybrid Method with two threads per rank, and the Hybrid Method with four threads per rank. 8-23
6 Computing Technology 11 th International LS-DYNA Users Conference Figure 5. Elapsed times of the Car-to-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL280c cluster with GigE. (Arrow indicates out of the axis s scale.) Figure 6. Elapsed times of the Three-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL280c cluster with GigE. (Arrow indicates out of the axis s scale.) 8-24
7 11 th International LS-DYNA Users Conference Computing Technology Figure 7. Elapsed times of the Car-to-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL465c cluster with GigE. (Arrow indicates out of the axis s scale.) Figure 8. Elapsed times of the Three-car Collision with the MPP Method and the Hybrid Method, and their comparison, versus number of cores on the BL465c cluster with GigE. (Arrow indicates out of the axis s scale.) 8-25
8 Computing Technology 11 th International LS-DYNA Users Conference Figure 9. The Car-to-car Collision s 256-core message patterns for the MPP Method, the Hybrid Method with 2 threads per rank, and the Hybrid Method with 4 threads per rank. Message pattern of the MPP Method is identical to that of the Hybrid Method with one thread per rank. Figure 9 shows the dramatic reduction in the number of messages and message sizes, which reduces the communication cost, as the number of threads is increased. The effectiveness of the Hybrid Method can be attributed to this ability in reducing the communication cost. Conclusions To summarize, this paper has established, from actual measurement of elapsed times, the following: The relative performance between the MPP Method and the Hybrid Method depends on problem size, number of threads per rank (hence number of cores per processor in an architecture), and interconnect. For a fast interconnect, such as InfiniBand, the larger the problem size, the better the scalability of the Hybrid Method relative to the MPP Method. For a slow interconnect, such as GigE, the Hybrid Method offers good scalability beyond 96 cores, for a problem size of 0.8 million elements. References 1. Lin, Y. and Wang, J. Performance of the Hybrid LS-DYNA on Crash Simulation with the Multicore Architecture. 7 th European LS-DYNA Conference, Salzburg, May
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
Comparing the performance of the Landmark Nexus reservoir simulator on HP servers
WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus
Scaling Study of LS-DYNA MPP on High Performance Servers
Scaling Study of LS-DYNA MPP on High Performance Servers Youn-Seo Roh Sun Microsystems, Inc. 901 San Antonio Rd, MS MPK24-201 Palo Alto, CA 94303 USA [email protected] 17-25 ABSTRACT With LS-DYNA MPP,
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
RDMA over Ethernet - A Preliminary Study
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications
Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance
A Flexible Cluster Infrastructure for Systems Research and Software Development
Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure
SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center
SRNWP Workshop HP Solutions and Activities in Climate & Weather Research Michael Riedmann European Performance Center Agenda A bit of marketing: HP Solutions for HPC A few words about recent Met deals
Recommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Michael Kagan. [email protected]
Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies [email protected] Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
Building Clusters for Gromacs and other HPC applications
Building Clusters for Gromacs and other HPC applications Erik Lindahl [email protected] CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network
Cloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
Clusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission.
Stovepipes to Clouds Rick Reid Principal Engineer SGI Federal 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Agenda Stovepipe Characteristics Why we Built Stovepipes Cluster
Cluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
HP Proliant BL460c G7
HP Proliant BL460c G7 The HP Proliant BL460c G7, is a high performance, fully fault tolerant, nonstop server. It s well suited for all mid-level operations, including environments with local storage, SAN
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
HPC Update: Engagement Model
HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems [email protected] Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value
Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers
Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing
Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution
Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing
Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing WHITE PAPER Highlights: There is a large number of HPC applications that need the lowest possible latency for best performance
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
SERVER CLUSTERING TECHNOLOGY & CONCEPT
SERVER CLUSTERING TECHNOLOGY & CONCEPT M00383937, Computer Network, Middlesex University, E mail: [email protected] Abstract Server Cluster is one of the clustering technologies; it is use for
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server Runs
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
SUN ORACLE EXADATA STORAGE SERVER
SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand
Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server
Performance brief for IBM WebSphere Application Server.0 with VMware ESX.0 on HP ProLiant DL0 G server Table of contents Executive summary... WebSphere test configuration... Server information... WebSphere
7 Real Benefits of a Virtual Infrastructure
7 Real Benefits of a Virtual Infrastructure Dell September 2007 Even the best run IT shops face challenges. Many IT organizations find themselves with under-utilized servers and storage, yet they need
Business white paper. HP Process Automation. Version 7.0. Server performance
Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6
Convergence-A new keyword for IT infrastructure transformation
Convergence-A new keyword for IT infrastructure transformation www.huawei.com Derek Liu, Sr. Marketing Director Singapore, Nov. 2013 HUAWEI TECHNOLOGIES CO., LTD. Evolution of IT Infrastructure/Stack IBM/DEC/
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters Martin J. Chorley a, David W. Walker a a School of Computer Science and Informatics, Cardiff University, Cardiff, UK Abstract
JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert
Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA
SUN ORACLE DATABASE MACHINE
SUN ORACLE DATABASE MACHINE FEATURES AND FACTS FEATURES From 2 to 8 database servers From 3 to 14 Sun Oracle Exadata Storage Servers Up to 5.3 TB of Exadata QDR (40 Gb/second) InfiniBand Switches Uncompressed
Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance
Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Hybrid Storage Performance Gains for IOPS and Bandwidth Utilizing Colfax Servers and Enmotus FuzeDrive Software NVMe Hybrid
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network
A typical Linux cluster consists of a group of compute nodes for executing parallel jobs and a head node to which users connect to build and launch their jobs. Often the compute nodes are connected to
The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems
202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
Recent Advances in HPC for Structural Mechanics Simulations
Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the
StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud
StACC: St Andrews Cloud Computing Co laboratory A Performance Comparison of Clouds Amazon EC2 and Ubuntu Enterprise Cloud Jonathan S Ward StACC (pronounced like 'stack') is a research collaboration launched
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering [email protected] Company Overview
Current Status of FEFS for the K computer
Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system
1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2
Fraunhofer Institute for Algorithms and Scientific Computing SCAI Performance Comparison of ISV Simulation Codes on Microsoft HPC Server 28 and SUSE Enterprise Server 1.2 Karsten Reineck und Horst Schwichtenberg
Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015
Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015 Xian Shi 1 bio I am a second-year Ph.D. student from Combustion Analysis/Modeling Lab,
Kashif Iqbal - PhD [email protected]
HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD [email protected] ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo
How System Settings Impact PCIe SSD Performance
How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage
Microsoft Exchange Server 2003 Deployment Considerations
Microsoft Exchange Server 3 Deployment Considerations for Small and Medium Businesses A Dell PowerEdge server can provide an effective platform for Microsoft Exchange Server 3. A team of Dell engineers
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
Legal Notices... 2. Introduction... 3
HP Asset Manager Asset Manager 5.10 Sizing Guide Using the Oracle Database Server, or IBM DB2 Database Server, or Microsoft SQL Server Legal Notices... 2 Introduction... 3 Asset Manager Architecture...
Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation
Microsoft Compute Clusters in High Performance Technical Computing Björn Tromsdorf, HPC Product Manager, Microsoft Corporation Flexible and efficient job scheduling via Windows CCS has allowed more of
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
Sage SalesLogix White Paper. Sage SalesLogix v8.0 Performance Testing
White Paper Table of Contents Table of Contents... 1 Summary... 2 Client Performance Recommendations... 2 Test Environments... 2 Web Server (TLWEBPERF02)... 2 SQL Server (TLPERFDB01)... 3 Client Machine
1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations
Bilag 1 2013-03-12/RB DeIC (DCSC) Scientific Computing Installations DeIC, previously DCSC, currently has a number of scientific computing installations, distributed at five regional operating centres.
Scaling from Workstation to Cluster for Compute-Intensive Applications
Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494
Performance Guide ANSYS, Inc. Release 12.1 Southpointe November 2009 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317 certified to ISO [email protected] 9001:2008. http://www.ansys.com (T) 724-746-3304
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Accelerating Microsoft Exchange Servers with I/O Caching
Accelerating Microsoft Exchange Servers with I/O Caching QLogic FabricCache Caching Technology Designed for High-Performance Microsoft Exchange Servers Key Findings The QLogic FabricCache 10000 Series
Shared Parallel File System
Shared Parallel File System Fangbin Liu [email protected] System and Network Engineering University of Amsterdam Shared Parallel File System Introduction of the project The PVFS2 parallel file system
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION AFFORDABLE, RELIABLE, AND GREAT PRICES FOR EDUCATION Optimized Sun systems run Oracle and other leading operating and virtualization platforms with greater
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
