High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates



Similar documents
HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Recent Advances in HPC for Structural Mechanics Simulations

Recommended hardware system configurations for ANSYS users

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

High Performance Computing in CST STUDIO SUITE

Clusters: Mainstream Technology for CAE

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

FLOW-3D Performance Benchmark and Profiling. September 2012

The Value of High-Performance Computing for Simulation

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS UPDATE

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Cloud Computing through Virtualization and HPC technologies

The Hardware Dilemma. Stephanie Best, SGI Director Big Data Marketing Ray Morcos, SGI Big Data Engineering

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

Accelerating CFD using OpenFOAM with GPUs

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009

Best practices for efficient HPC performance with large models

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Scientific Computing Data Management Visions

Autodesk Revit 2016 Product Line System Requirements and Recommendations

Building Clusters for Gromacs and other HPC applications

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Enabling Technologies for Distributed and Cloud Computing

Cluster Implementation and Management; Scheduling

Enabling Technologies for Distributed Computing

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Fujitsu PRIMERGY BX920 S2 Dual-Socket Server

PRIMERGY server-based High Performance Computing solutions

Brainlab Node TM Technical Specifications

IBM System x family brochure

Self service for software development tools

Terminal Server Software and Hardware Requirements. Terminal Server. Software and Hardware Requirements. Datacolor Match Pigment Datacolor Tools

PCI Express and Storage. Ron Emerick, Sun Microsystems

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Quantifying Hardware Selection in an EnCase v7 Environment

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Parallel Computing with MATLAB

Scaling from Workstation to Cluster for Compute-Intensive Applications

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

LS DYNA Performance Benchmarks and Profiling. January 2009

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Cluster Computing at HRI

PCI Express Impact on Storage Architectures and Future Data Centers

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Business white paper. HP Process Automation. Version 7.0. Server performance

Sun Microsystems Special Promotions for Education and Research January 9, 2007

IOS110. Virtualization 5/27/2014 1

Priority Pro v17: Hardware and Supporting Systems

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

Parallels Cloud Storage

IBM System Cluster 1350 ANSYS Microsoft Windows Compute Cluster Server

Findings in High-Speed OrthoMosaic

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Using the Windows Cluster

ANSYS Computing Platform Support. June 2014

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

ANSYS Computing Platform Support. July 2013

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

GPUs for Scientific Computing

7 Real Benefits of a Virtual Infrastructure

Deep Learning GPU-Based Hardware Platform

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

VTrak SATA RAID Storage System

SGI HPC Systems Help Fuel Manufacturing Rebirth

Pedraforca: ARM + GPU prototype

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Parallel Programming Survey

HP Z Turbo Drive PCIe SSD

ECLIPSE Performance Benchmarks and Profiling. January 2009

Performance Analysis of Remote Desktop Virtualization based on Hyper-V versus Remote Desktop Services

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

LLamasoft K2 Enterprise 8.1 System Requirements

Seradex White Paper. Focus on these points for optimizing the performance of a Seradex ERP SQL database:

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Couchbase Server: Accelerating Database Workloads with NVM Express*

AP ENPS ANYWHERE. Hardware and software requirements

QUESTIONS & ANSWERS. ItB tender 72-09: IT Equipment. Elections Project

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Analysis of VDI Storage Performance During Bootstorm

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Transcription:

High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates

Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of HPC: HPC Cluster & Workstation HPC Hardware Components: CPU vs. Cores, GPU vs. Phi, HDD vs. SSD Interconnects GPU Acceleration 2

CAE Associates Inc. Engineering Consulting Firm in Middlebury, CT specializing in FEA and CFD analysis. ANSYS Channel Partner since 1985 providing sales of the ANSYS products, training and technical support. 3

e-learning Webinar Series This presentation is part of a series of e-learning webinars offered by CAE Associates. You can view many of our previous e-learning session either on our website or on the CAE Associates YouTube channel: If you are a New Jersey or New York resident you can earn continuing education credit for attending the full webinar and completing a survey which will be emailed to you after the presentation. 4

CAEA Resource Library Our Resource Library contains over 250 items including: Consulting Case Studies Conference and Seminar Presentations Software demonstrations Useful macros and scripts The content is searchable and you can download copies of the material to review at your convenience. 5

CAEA Engineering Advantage Blog Our Engineering Advantage Blog offers weekly insights from our experienced technical staff. 6

CAEA ANSYS Training Classes can be held at our Training Center at CAE Associates or on-site at your location. CAE Associates is offering on-line training classes in 2015! Registration is available ailable on our website. 7

Agenda Introduction HPC Background Why HPC Licensing SMP vs. DMP HPC Terminology Types of HPC: HPC Cluster & Workstation HPC Hardware Components: CPU vs. Cores, GPU vs. Phi, HDD vs. SSD Interconnects GPU Acceleration 8

Why High Performance Computing (HPC)? Remove computing limitations from engineers in all phases of design, analysis, and testing Impact product design Faster simulation More efficient parametric studies Larger Models More accuracy Turbulence modeling, particle tracking. More refined models Design Optimization More runs for a fixed hardware configuration 9

Why HPC? Using today s multicore computers are key for companies to remain competitive. ANSYS HPC product suite allows scalability to whatever computational level required, from single-user or small user group options at entrylevel up to virtually unlimited parallel capacity or large user group options at enterprise level. Reduce turnaround time Examine more design variants faster Simulate larger or more complex models 10

4 Main Product Licenses Parallel HPC (per-process) HPC Pack HPC product rewarding volume parallel processing for high-fidelity simulations. Each simulation consumes one or more Packs. Parallel enabled increases quickly with added Packs. HPC Workgroup HPC product rewards volume parallel processing for increased simulation throughput shared among engineers throughout a single location or the world. 16 to 32768 parallel shared across any number of simulations on a single server. HPC Parametric Pack Enables simultaneous execution of multiple design points while consuming just one set of licenses. Enabled (Cores) 32768 8 32 128 512 2048 8192 1 2 3 4 5 6 7 HPC Packs per Simulation 11

Poll #01 Poll #01 12

Shared and Distributed Memory Shared Memory: Single Machine Shared Memory Parallel (SMP) systems share a single global memory image that may be distributed physically across multiple cores, but is globally addressable. OpenMP is the industry standard. Distributed Memory: Distributed memory parallel processing (DMP) assumes that physical memory for Distributed Memory each process is separate from all other processes Requires message passaging software to communicate between cores MPI is the industry standard 13

Distributed ANSYS Architecture Domain decomposition approach Sparse, PCG & LANPCG all Break problem into N pieces support distributed Solve the global problem independently within each domain Benefits Communicate information across the boundaries as necessary DMP on single node or cluster! SMP for single node only The entire SOLVE phase is parallel l More computations performed in parallel with faster solution time. Better speed-ups than SMP Can achieve > 4x speed-up on 8 cores (Try getting that with SMP!!!!) Can be used for jobs running on hundreds of cores. Can take advantage of resources on multiple machines Memory usage and bandwidth scales Disk (I/O) usage scales (i.e. parallel I/O) 14

ANSYS Mechanical Scaling 6M Degrees of Freedom Plasticity, Contact Bolt pretension 4 load steps v15 15

Parallel Settings ANSYS APDL SMP With GPU Acceleration Settings DMP: For Multiple Core or Node Processing For GPU Acceleration using DMP: Customization Preferences Tab Additional Parameters add command line argument: acc nvidia 16

Parallel Settings ANSYS CFX/Fluent CFX Parallel l Settings Options Fluent Multiple Core Processing and GPU Acceleration Options Fluent Parallel Settings Options 17

2 Common Types of HPC HPC Cluster HPC Cluster Communication via series of switches and interconnects Infiniband, Gigabit (1GB/s,10GB/s) Fiber Scalable DOE Supercomputer: 1.6M cores Workstation HPC Workstation HPC Single desktop communication More than 2 cores, commonly 8 or more Quad Socket Current Builds Xeon E5-4600 up to 48 cores Up to 1TB of 1866 DDR3 1866 MHz RAM 18

Poll #02 Poll #02 19

PC Components 20

Central Processing Unit and Cores Intel Xeon E5 Processor Series E5: 4-18 Cores per CPU Frequency: 1.8-3.5 GHz L3 Cache up to 2.5MB/Core Bus: 6.4-9 GT/s QPI Quad-Socket MOBO Intel Xeon E7 Processor Series: E7: 4-18 Cores per CPU Frequency: 1.9-3.2 GHz L3 Cache up to 2.5MB/Core CPU Bus: 6.4-9 GT/s QPI RAM DDR4: Supports 2-4k MT/s (10 6 transfers/s) DDR3: Supports 0.8-2k MT/s 21

Graphical and Co-Processing Units GPU-accelerated computing is the Co-Processing is a computer use of a graphics processing unit (GPU) together with a CPU to accelerate scientific, analytics, engineering, processor (PCI-Card) used to supplement the functions of the primary processor consumer, and enterprise applications. Floating-point arithmetic Signal processing Supported Cards Mechanical and Fluent Only 64-bit Windows or Linux x64 Tesla K10 and K20 series Quadro 6000 Quadro K5000 and K6000 Supported Cards Xeon Phi 3000, 5000, 7000 series (ANSYS Mechanical only) 22

Improved Parallel Performance & Scaling ANSYS FLUENT 23

GPU Acceleration ANSYS Mechanical ANSYS Fluent Only For models with solid elements > 500k DOF DMP is preferred DOF>5M add another card or a single card with 12GB (k40, k6000) PCG/JCG solver: MSAVE off Models with lower Lev_Diff better suited Higher AMG are ideal for GPU acceleration. Coupled problems benefit from GPUs Whole problem must fit on GPU 1e06 cells require ~4 GB GPU RAM Better performance with lower CPU core counts 3t to 4CPUC Cores per 1GPU ANSYS Fluent 24

GPU/CoProcessing Licensing Licensing Options HPC Packs for quick scale-up HPC Workgroup for flexibility GPUs treated same as CPU cores in the licensing model As you scale-up, license cost decreases per core 25

Poll #03 Poll #03 26

Hard Disk Conventional SATA SAS & SATA 7200 RPM and 10k RPM Ideal for volume storage Cheapest Serial Attached SCSI (SAS) 15k RPM drives (RAID 0) Ideal scratch space drives Solid State Drives (SSD) Fastest read/write operations Lower power, cooler, quieter No mechanical parts Ideal for OS drive Cost per GB is highest 25 2.5 SSD 27

Interconnects Internal CAT5e Infiniband ib Internal Controlled by motherboard Intel QuickPath Interconnect (QPI) External PCIe 30x8=63Gb/s 3.0 PCIe 2.0 x1 = 32 Gb/s PCIe 4.0 x8 = 125 Gb/s Gigabit (1 Gb/s) (1 GB/s) (40 + GB/s) Infiniband (56 Gb/s) Mechanical/APDL requires at least 10 Fibre Channel s (16 Gb/s) GB/s interconnect for scaling past 1- Ethernet RDMA (40 Gb/s) node. Prefer Infiniband FRD/QDR FDR for large clusters 28

Basic Guidelines Faster cores = faster solution Faster RAM = faster solution Most be aware of memory bandwidth Faster HD = faster solution Especially for intensive I/O RAID 0 for multiple disks SSD or SAS 15k drives Parallel file systems 4 GB RAM/Core ANSYS CFD Hyper-threading: Off Turbo-Boost: Only for low core counts Faster is better! More is better. Must balance budget/performance 29

Poll #04 Poll #04 30

HPC Revolution Every computer today is a parallel computer. Every simulation in ANSYS can benefit from parallel processing. 31

Questions 2015 CAE Associates