SOSCIP Platforms. SOSCIP Platforms



Similar documents
Parallel Computing. Introduction

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Visit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

Parallel Programming Survey

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

第 十 三 回 PCクラスタシンポジウム. Cray クラスタ 製 品 のご 紹 介 クレイ ジャパン インク

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi


Introduction to Supercomputing with Janus

High-Performance Computing and Big Data Challenge

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Xeon+FPGA Platform for the Data Center

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Cloud Data Center Acceleration 2015

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Multi-Threading Performance on Commodity Multi-Core Processors

OpenMP Programming on ScaleMP

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Building Clusters for Gromacs and other HPC applications

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Cluster Implementation and Management; Scheduling

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

PRIMERGY server-based High Performance Computing solutions

The Hartree Centre helps businesses unlock the potential of HPC

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

High Performance Computing in CST STUDIO SUITE

Infrastructure Matters: POWER8 vs. Xeon x86

ANALYSIS OF SUPERCOMPUTER DESIGN

Overview of HPC Resources at Vanderbilt

High Performance Computing and Big Data: The coming wave.

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Supercomputing Resources in BSC, RES and PRACE

Parallel Computing. Benson Muite. benson.

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

New Storage System Solutions

Data Center and Cloud Computing Market Landscape and Challenges

Performance Characteristics of Large SMP Machines

CFD Implementation with In-Socket FPGA Accelerators

ST810 Advanced Computing

Main Memory Data Warehouses

Big Data System and Architecture

Trends in High-Performance Computing for Power Grid Applications

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Supercomputing Status und Trends (Conference Report) Peter Wegner

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Simulation Platform Overview

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Dell Reference Configuration for Hortonworks Data Platform

The K computer: Project overview

Networking Virtualization Using FPGAs

Introduction to Cloud Computing

Evaluation of CUDA Fortran for the CFD code Strukti

High Performance Computing. Course Notes HPC Fundamentals

Achieving Performance Isolation with Lightweight Co-Kernels

Linux Cluster Computing An Administrator s Perspective

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Interoperability Testing and iwarp Performance. Whitepaper

ORACLE BIG DATA APPLIANCE X3-2

FPGA-based MapReduce Framework for Machine Learning

Kalray MPPA Massively Parallel Processing Array

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial

LS DYNA Performance Benchmarks and Profiling. January 2009

Model-based system-on-chip design on Altera and Xilinx platforms

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

IBM System x GPFS Storage Server

Enabling Technologies for Distributed Computing

HPC Wales Skills Academy Course Catalogue 2015

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Introduction to GPU hardware and to CUDA

Enabling Technologies for Distributed and Cloud Computing

Intel Xeon +FPGA Platform for the Data Center

HPC Update: Engagement Model

Jezelf Groen Rekenen met Supercomputers

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Crossing the Performance Chasm with OpenPOWER

Software & systems for the neuromorphic generation of computing. Peter Suma co-ceo peter.suma@appliedbrainresearch.

Transcription:

SOSCIP Platforms SOSCIP Platforms 1

SOSCIP HPC Platforms Blue Gene/Q Cloud Analytics Agile Large Memory System 2

SOSCIP Platforms Blue Gene/Q Platform 3

top500.org Rank Site System Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (kw) 1 National Super Computer Center in Guangzhou, China Tianhe-2 (MilkyWay-2)- TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P, NUDT 2 DOE/SC/Oak Ridge National Laboratory, United States Titan- Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x, Cray Inc. 3 DOE/NNSA/LLNL, United States Sequoia- BlueGene/Q, Power BQC 16C 1.60 GHz, Custom, IBM 3120000 33862.7 54902.4 17808 560640 17590 27112.5 8209 1572864 17173.2 20132.7 7890 4 RIKEN Advanced Institute for Computational Science (AICS), Japan K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect, Fujitsu 705024 10510 11280.4 12660 5 DOE/SC/Argonne National Laboratory, United States Mira- BlueGene/Q, Power BQC 16C 1.60GHz, Custom, IBM 786432 8586.6 10066.3 3945 6 DOE/NNSA/LANL/SNL, United States Trinity- Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect, Cray Inc. 301056 8100.9 11078.9 7 Swiss National Supercomputing Centre (CSCS), Switzerland 8 HLRS - Höchstleistungsrechenzentrum Stuttgart, Germany 9 King Abdullah University of Science and Technology, Saudi Arabia 10 Texas Advanced Computing Center/Univ. of Texas, United States 4 Piz Daint- Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x, Cray Inc. Hazel Hen- Cray XC40, Xeon E5-2680v3 12C 2.5GHz, Aries interconnect, Cray Inc. Shaheen II- Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect, Cray Inc. Stampede- PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, Infiniband FDR, Intel Xeon Phi SE10P, Dell 115984 6271 7788.9 2325 185088 5640.2 7403.5 196608 5537 7235.2 2834 462462 5168.1 8520.1 4510

SOSCIP's Blue Gene/Q Specifications 64k core cluster 1.6 GHz cores 64 TB RAM 5-D torus interconnect Performance Measured: 716 Tflops Efficiency: 2.1 Gflops/W 5

Typical Applications Disaster planning and mitigation Molecular modeling Protein folding Drug discovery Computational fluid dynamics Nuclear fusion Genomics Brain modeling Climate and weather 6 Complex infrastructure simulation

Suitable Applications Large-scale, massively parallel and distributed 1,024 cores or more Need low-latency, high-bandwidth communication for ( ) { } Written in C/C++, Fortran, or Python // // GPL // // // Me 2015 // MPI_Init(...)... Sendrecv(... )... Custom and open-source software Use MPI and OpenMP 7

Ocean Mixing Simulation H. Salehipour & W.R. Peltier (University of Toronto) 8

SOSCIP Platforms Cloud Analytics Platform 9

Big Data Challenges Three V's: Volume, Velocity, Variety Processing power Storage and data locality Analytics frameworks 10

Cloud Platform Hardware ps- & hs-series blades POWER servers Powerful x86 and IBM Power Systems servers GPFS storage system (1 petabyte) Infiniband and 10 GbE networks x86 NeXtScale x86 idataplex 11

Software Available IBM InfoSphere Streams ILOG CPLEX IBM InfoSphere BigInsights 12 Plus almost anything from the IBM Academic Initiative catalog!

Typical Applications Real-time Medical Data Collection and Analytics Text Analytics Cybersecurity Document Filtering Energy Systems Data Analytics Image Processing Machine Learning Social Media Analytics Medical Records Analytics 13

Suitable Applications Require IBM analytics software Require other commercial software Small number of cores (< 100) Small clusters Big data storage (1-100 TB) 14

Analyzing Geospatial Patterns Neil Banerjee (Western University) Multidirectional edge detection algorithm for mineral exploration and mining application Detected edges highly correlate with known mineral deposits IBM BigInsights for high level of automation 16 Know Mineral Locations Hill shading Feature detection

SOSCIP Platforms Large Memory System 17

Big Data and Memory CPU ~5-8 GB/s Latency: > 1 ms Disk Storage Pros - Inexpensive - Large Capacity Cons -Slow! 50-200 GB/s Latency: < 100 ns System Memory (DRAM) Pros - Fast! Cons - Cost per GB - Limited capacity per server 18

LMS Specification 3-nodes acting as single system 64 x86 cores vsmp 4.5 TB RAM (1.5 TB per node) Shared memory programming 19

Suitable Applications Need to keep all data in RAM for speed Generate large amount of intermediate data Need > 128 GB of RAM Require medium number of cores (< 64) Need shared memory programming paradigm Use commercial software 20

SOSCIP Platforms Agile Computing Platform 21

The Case for FPGAs CPU Scaling ICT Power Consumption 22 FPGA acceleration offers: Algorithms in re-configurable circuitry High performance parallelism High power efficiency

Agile Computing Platform Development Environment Runtime Environment Fast x86 servers Simulate, debug, build Tools x86 and POWER8 Stratix V FPGAs 10 GbE FPGA Network on POWER8 23 Development Kit

Suitable Applications Real-time or time-critical processing Compute intensive Exploit parallelism in depth and/or width Wide vectorization Big data 0 3 0 6 Non-traditional data types 24

Typical Applications Health/Medical systems Image/Video Processing Machine Learning Signal Processing Data Security 25 Big Data Analytics

Real-time fmri Brain Analytics Mark Daley (Western University) The problem: brain activity scans take days to analyze The solution: an FPGA-accelerated real-time analytics engine FPGA replaces 48 x86 cores and implements superior motion correction algorithm IBM InfoSphere Streams on POWER constructs graphs of brain networks 40x faster than single process on x86 26 Graph updates every 0.6-0.8s Results in seconds instead of days!

When am I ready for SOSCIP? Scaling computing power Scaling storage size Unique technology needs Software needs 28

Platform Summary Platform CPU Operating Systems Commercial Software Languages Support Blue Gene/Q PowerPC Linux No C, C++, Python, Fortran SciNet, IBM specialist Large Memory System x86 Linux Yes All HPCVL Cloud Analytics x86, POWER Linux, Windows, AIX Yes All SHARCNET, IBM specialist Agile x86, POWER8 Linux Yes All + OpenCL, Verilog/VHDL SHARCNET, IBM specialist 29

Notices: POWER8, Power Systems logo, InfoSphere, InfoSphere Streams logo, InfoSphere BigInsights, InfoSphere BigInsights logo, SPSS, Cognos, Rational, and the IBM logo are trademarks or registered trademarks of International Business Machines Corp. Altera, Quartus II and Stratix are trademarks of Altera Corp. ModelSim is a trademark of Mentor Graphics OpenCL and the OpenCL logo are trademarks of Apple Inc. Matlab and the Matlab logo are trademarks of Mathworks Inc. Python and the Python logo are trademarks of the Python Software Foundation Nallatech is a trademark of Interconnect Systems Inc. 30

31 backup

What are FPGAs? FPGA = Field Programmable Gate Array Multiply Configure groups of logic elements to construct function blocks FPGA chips have 100,000's of configurable logic circuits Load Add Add Store Connect several blocks into data pipeline Logic Element Simultaneously work on different elements of a data stream at each stage on every clock cycle 32 Interconnect Onboard RAM

OpenCL TM attribute ((num_simd_work_items(4))) attribute ((reqd_work_group_size(64,1,1))) attribute ((num_compute_units(2))) kernel void vectoradd( global const int *x, global const int *y, global int *restrict z) { int index = get_global_id(0); } Kernel code z[index] = x[index] + y[index]; OpenCL Compiler Programming file Host code clenqueuendrangekernel(queue, kernel, dim, offset, size, local_size,...); 33

CAPI Coherent Accelerator Processor Interface SMP links PSL PCIe CAPP CAPI Protocol Features Shared Virtual Address Space HW Managed Cache Coherence RAM CAPP: Coherent Attached Processor Proxy, PSL: POWER Service Layer 34