David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems



Similar documents
FLOW-3D Performance Benchmark and Profiling. September 2012

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Cluster Implementation and Management; Scheduling

10- High Performance Compu5ng

Clusters: Mainstream Technology for CAE

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

High Performance Computing in CST STUDIO SUITE

Overlapping Data Transfer With Application Execution on Clusters

Parallel Programming Survey

Cisco UCS B-Series M2 Blade Servers

Private cloud computing advances

An Introduction to Parallel Computing/ Programming

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

PRIMERGY server-based High Performance Computing solutions

Pedraforca: ARM + GPU prototype

HPC enabling of OpenFOAM R for CFD applications

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Parallel Algorithm Engineering

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Symmetric Multiprocessing

Unified Computing Systems

Technical Overview of Windows HPC Server 2008

Self service for software development tools

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

- An Essential Building Block for Stable and Reliable Compute Clusters

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

MySQL performance in a cloud. Mark Callaghan

Overview of HPC Resources at Vanderbilt

MOSIX: High performance Linux farm

Findings in High-Speed OrthoMosaic

Cisco UCS B440 M2 High-Performance Blade Server

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Cluster Computing at HRI

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Zadara Storage Cloud A

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Scaling Database Performance in Azure

WINDOWS SERVER MONITORING

Block based, file-based, combination. Component based, solution based

Scala Storage Scale-Out Clustered Storage White Paper

Michael Kagan.

Recent Advances in HPC for Structural Mechanics Simulations

HPC Wales Skills Academy Course Catalogue 2015

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Using the Windows Cluster

Scalability and Classifications

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

Intel DPDK Boosts Server Appliance Performance White Paper

GPUs for Scientific Computing

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Introduction to Cloud Computing

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Windows 8 SMB 2.2 File Sharing Performance

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing March 26, 2014

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Understanding the Benefits of IBM SPSS Statistics Server

Cluster, Grid, Cloud Concepts

Enabling Technologies for Distributed Computing

Symantec NetBackup 5220

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Brainlab Node TM Technical Specifications

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

Data Center and Cloud Computing Market Landscape and Challenges

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Stanford HPC Conference. Panasas Storage System Integration into a Cluster

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

How To Build A Cloud Server For A Large Company

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

Server and Storage Virtualization with IP Storage. David Dale, NetApp

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Transcription:

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM - School of Aeronautic and Space Engineers (280 cores) Universidade de Coimbra (56 cores) Universidad Carlos III de Madrid (176 cores) Universidad de Castilla-La Mancha (96 cores) Network and Storage projects Microsoft Certified Technology Specialist 03/11/2010 2

About Englobe Origins in global reference CFD research group Building clusters since 2002 (Aeolos) HPC Department Computational High Performance Networks Storage Systems Service and Support Microsoft HPC Partner 03/11/2010 3

Contents Basics on supercomputing (knowing the problem) The power issue Parallelization Performance HPC hardware (alternatives and what they mean for us) Architectures Types of parallel machine High performance networks Storage HPC management (how to control the machine) Manager's wishlist Data center strategies Examples & solutions 03/11/2010 4

Knowing the problem BASICS ON SUPERCOMPUTING 03/11/2010 5

The power issue (or «why parallel?») Old limit: size MOS capacitance effect New limit: dissipation dq dt = ha s T s T e Let s increase clock frequency F(fluid,surface) Air conditioning power limited Oops! Put smaller components dq x dt = ka T x 03/11/2010 6

Parallelization Multicore processor cores Profit from smaller devices by adding more circuits CPU CPU core CPU core CPU core CPU core c c c c c c Multiple sockets Multiple processors in a server c c c c c c Chipset Resources (memory, I/O) c c c c c c 03/11/2010 7

Parallelization Concurrent programming Fork-join paradigm t Big problem Fork t t Join Solution t Developing: OpenMP libraries Processes accessing the same memory space Mutex and semaphores needed Most applications exploit multicore systems But this is usually not enough 03/11/2010 8

Parallelization Distributed memory system Put N servers to work together Network More alternatives on next block 03/11/2010 9

Parallelization Message Passing Interface (MPI) Processes own memory Processes «ask» for data belonging others Multiple implementations MSMPI is based in MPICH2 p p p Big problem Partition mpirun p p Merge Solution p p p 03/11/2010 10

Performance Speedup: S u = T s T p Speedup of «N» Where N means cores? $? watts? Cores: developer s choice, but not valid to compare different architectures $: not only acquisition (TCO, developing cost, waiting time while running) Power: related to TOC, but a compromise solution is needed (example: Atom vs. Xeon) Performance also depends on the application 03/11/2010 11

Performance Causes of performance loss Sequential S u = 1 Embarrassingly parallel S u N Processmessaging S u N Unbalanced system S u? 03/11/2010 12

Example 1: perf. problem Agilent ADS - Momentum Solver RF circuit simulation Calculating 70 frequency points Taking 8:48 minutes in 8 core server Using 1 process with threads Strange behaviour: 8 threads taking longer than 4 Threads User Wall User [s] Wall [s] 1 00:14:41 00:14:43 881 883 2 00:16:43 00:09:29 1003 569 4 00:20:26 00:08:11 1226 491 8 00:24:25 00:08:48 1465 528 How large is the parallel part? s 1 p 1 s 2 p 2 /2 s 4 p 4 /4 s 8 p 8 /8 s 1 + p 1 = 881? s 2 + p 2 = 1003 s 2 + p 2 2 = 569 s 2 = 135 p 2 = 868 s 4 + p 4 = 1226 s 4 + p 4 4 = 491 s 2 = 246 p 2 = 980 s 8 + p 8 = 1465 s 8 + p 8 8 = 528 s 2 394 p 2 1071 03/11/2010 13

Example 1: perf. solution Problem: Cores not being exploited 100% Single iteration too short Not a problem of 9 minutes but 70 little problems of ~8 seconds 70 times 100% 12.5% 8s 16s 24s Analysing the process: Iterations were independent (no need for data from previous frequencies) Perfect for a parametric sweep 9 times 100% 12.5% 8s 16s 24s 03/11/2010 14

Example 1: perf. results Momentum interacting with a queue system Hybrid solution: jobs used as many cores as available Tried up to 8 parametric tasks This mode allows to use more than 1 node Before: >8,5 min After: < 2 min Bigger problems scale better, though 03/11/2010 15

Alternatives and what they mean for us HPC HARDWARE 03/11/2010 16

Architectures Vector processors (SIMD) Data has to be aligned in memory An operation is performed in a long array of data GPU accelerators High memory bw and performance Bottleneck in communications to CPU Power x86_64

Parallel machines Distributed memory system Put N servers to work together Network N E T W O R K 03/11/2010 18

Parallel machines Shared memory system Lots of sockets accessing the same memory Resources This is expensive and not so scalable 03/11/2010 19

Parallel machines Virtual Shared Memory Resources are «exported» through a network N E T W O R K virt Operating System Virtual system Backplane (Network) virt virt virt HW HW HW HW 03/11/2010 20

HPC Networks Target: reduce wait times in MPI programs High bandwidth, low latency and scalability needed Approaches Avoid congestion by using switched fabrics Direct memory access for lower latency Reduce protocol overheads Intelligent NICs for communication process automation Examples Gigabit Ethernet 10-Gigabit Ethernet InfiniBand Myricom 03/11/2010 21

HPC Networks What s being used? Using top500.org as an indicator 03/11/2010 22

Storage HPC systems are continuously generating output data Data has to be stored for post-process Critical component Has to be reliable Has to be accessible Example: wrapping + redundancy + replication RAID controller

Storage Parallel file systems Client LAN Storage server Storage server Meta-data server

How to control the machine HPC MANAGEMENT 03/11/2010 25

Manager s wishlist What do I want as a HPC system manager? Work from home or at least do things from home if needed Automatic deployment (WDS) Remote management If system is up: RDP, cluster console, event viewer If system is down: IPMI or other KVM redirection Infraestructure integration AD/DS Users and groups policies Quick troubleshooting Diagnostics and events Reimage can save time HPC nodes don t need much customization WDS can perform all operations needed to have a freshly installed node 03/11/2010 26

Data center strategies Basic data center 03/11/2010 27

Data center strategies Redundancy On critical components (power supplies, coolers, some disks ) Reduces downtime Enhanced monitoring Detailed information (temperature, power) Failure and warning notification Failure protection U Auto power-off on failure (temperature, long power cut) Remote operation Remote desktop and console or KVM Hardware control (IPMI) 03/11/2010 28

Example: data center airflow Energy efficiency Good practices can drastically reduce TCO Study of airflow through equipment Cold/Hot corridor approach 03/11/2010 29