Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer
|
|
- Willa Banks
- 8 years ago
- Views:
Transcription
1 Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer Steffen Christgau, Bettina Schnor Potsdam University Institute of Computer Science Operating Systems and Distributed Systems 21. May 2012
2 Outline The Single-Chip Cloud Computer (SCC) RCKMPI MPI on the SCC Topology-Awareness for RCKMPI The concept Topology-Awareness for RCKMPI Evaluation Summary and Future Work Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 2 of 19
3 Outline The Single-Chip Cloud Computer (SCC) RCKMPI MPI on the SCC Topology-Awareness for RCKMPI The concept Topology-Awareness for RCKMPI Evaluation Summary and Future Work Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 2 of 19
4 Outline The Single-Chip Cloud Computer (SCC) RCKMPI MPI on the SCC Topology-Awareness for RCKMPI The concept Topology-Awareness for RCKMPI Evaluation Summary and Future Work Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 2 of 19
5 Outline The Single-Chip Cloud Computer (SCC) RCKMPI MPI on the SCC Topology-Awareness for RCKMPI The concept Topology-Awareness for RCKMPI Evaluation Summary and Future Work Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 2 of 19
6 Outline The Single-Chip Cloud Computer (SCC) RCKMPI MPI on the SCC Topology-Awareness for RCKMPI The concept Topology-Awareness for RCKMPI Evaluation Summary and Future Work Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 2 of 19
7 1. The Single-Chip Cloud Computer (SCC) Many-Core Architecture Research Community (MARC) established by Intel research of universities together with Intel world-wide community (dominated by European institutions) Website, regular symposia (every 6 months, 5 up to now) Our group is MARC member Focus: application scalability Experiences with parallel ASP, climate simulation Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 3 of 19
8 Single-Chip Cloud Computer L2 Cache 256 KB Router Core1 MPB 16 KB MC 1 MC 3 L2 Cache 256 KB Core0 MC 0 MC 2 VRC SIF 24 tiles, 48 P54C cores, connected via Network-on-Chip, no Cache-Coherence fast 16 KB tile SRAM on each tile Message Passing Buffer (MPB) Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 4 of 19
9 2. RCKMPI: MPI on the SCC fork of MPICH2 Process Management Interface Sock Application Message Passing Interface MPICH2 ROMIO Abstract Device Interface (ADI3) Channel-3 Device BG Cray Nemesis SCCMPB SCCSHM SCCMulti (MPI IO Implementation) ADIO MPI Implementation MPI Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 5 of 19
10 2. RCKMPI: MPI on the SCC fork of MPICH2 Process Management Interface Sock Application Message Passing Interface MPICH2 ROMIO Abstract Device Interface (ADI3) Channel-3 Device BG Cray Nemesis SCCMPB SCCSHM SCCMulti (MPI IO Implementation) ADIO MPI Implementation MPI Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 5 of 19
11 SCCMPB uses the fast Message Passing Buffer of each tile as shared memory and divides it into n equal-size Exclusive Write Sections (EWS) = remote write, local read MPI_COMM_WORLD MPI Rank 0 MPI Rank n Core 0 Core n rank 1 rank rank n-1 rank n MPB of core/rank 0 Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 6 of 19
12 SCCMPB uses the fast Message Passing Buffer of each tile as shared memory and divides it into n equal-size Exclusive Write Sections (EWS) = remote write, local read MPI_COMM_WORLD MPI Rank 0 MPI Rank n Core 0 Core n rank 1 rank rank n-1 rank n MPB of core/rank 0 Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 6 of 19
13 Comparison of different CH3-devices at maximum Manhattan distance bandwidth / MByte/s RCKMPI sccmulti CH device RCKMPI sccmpb CH device RCKMPI sccshm CH device Ki 4 Ki 16 Ki 64 Ki 256 Ki 1 Mi 4 Mi message size / Byte Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 7 of 19
14 Bandwidths for Manhattan distance 0, 5 and 8 (two processes started) Core 00 and 01 Core 00 and 10 Core 00 and bandwidth / MByte/s Ki 4 Ki 16 Ki 64 Ki 256 Ki 1 Mi 4 Mi message size / Byte Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 8 of 19
15 Bandwidths for maximum Manhattan distance 8, and varied number of MPI processes MPI processes 12 MPI processes 24 MPI processes 48 MPI processes 50 bandwidth / MByte/s Ki 4 Ki 16 Ki 64 Ki 256 Ki 1 Mi 4 Mi message size / Byte Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 9 of 19
16 Remember: SCCMPB uses the fast Message Passing Buffer of each tile as shared memory (total 384 KB) MPI_COMM_WORLD MPI Rank 0 MPI Rank n Core 0 Core n rank 1 rank rank n-1 rank n MPB of core/rank 0 The MPB is equally devided in n sections = depending on the number of started MPI processes. Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 10 of 19
17 3. Topology Awareness for RCKMPI: The Concept The bandwidth between 2 RCKMPI processes depends on: the number of started MPI processes since a fully-connected network between all MPI processes is managed. Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 11 of 19
18 Application Behaviour Tv process 1 qv Tu process 2 qu uu process n vv uv Goal: The bandwidth between communicating processes, so-called neighbors in the Task Interaction Graph should be increased. Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 12 of 19
19 Application Behaviour Tv process 1 qv Tu process 2 qu uu process n vv uv Goal: The bandwidth between communicating processes, so-called neighbors in the Task Interaction Graph should be increased. Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 12 of 19
20 Requirements: 1 An improved MPB layout must consider both: communication neighbors and group communication (barriers, broadcasts, gather/scatter,...) 2 Each MPI process has to know its new offset within all remote MPBs. Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 13 of 19
21 Putting things together... message payload channel header original layout proc. 1 proc. 2 proc. n-1 proc. n write section new layout without topology information p 1 p 2 p n proc. 1 proc. 2 proc. n-1 proc. n channel headers payload area with topology information p 1 p 2 p n p1 proc. 2 pn Internal barrier for recalculation phase of new MPB addresses. Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 14 of 19
22 MPI offers API to specify virtual process topology 1 #d e f i n e NUM_DIMS i n t grid_dims [NUM_DIMS] ; 4 i n t grid_periods [NUM_DIMS] ; 5 MPI_Comm comm_topo ; 6 7 /* f o r a grid, s e t a l l items o f grid_periods to 0 */ 8 f o r ( i n t i = 0 ; i < NUM_DIMS; i++) 9 grid_periods [ i ] = 0 ; 0 1 MPI_Dims_create ( numprocs, NUM_DIMS, grid_dims ) ; 2 MPI_Cart_create (MPI_COMM_WORLD, NUM_DIMS, grid_dims, 3 grid_periods, true, &comm_topo ) ; Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 15 of 19
23 MPI offers API to specify virtual process topology 1 #d e f i n e NUM_DIMS i n t grid_dims [NUM_DIMS] ; 4 i n t grid_periods [NUM_DIMS] ; 5 MPI_Comm comm_topo ; 6 7 /* f o r a grid, s e t a l l items o f grid_periods to 0 */ 8 f o r ( i n t i = 0 ; i < NUM_DIMS; i++) 9 grid_periods [ i ] = 0 ; 0 1 MPI_Dims_create ( numprocs, NUM_DIMS, grid_dims ) ; 2 MPI_Cart_create (MPI_COMM_WORLD, NUM_DIMS, grid_dims, 3 grid_periods, true, &comm_topo ) ; Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 15 of 19
24 4. Topology Awareness for RCKMPI: Evaluation bandwidth / MByte/s Ki 4 Ki 16 Ki 64 Ki 256 Ki 1 Mi 4 Mi message size / Byte enhanced RCKMPI with 1D topology (48 procs, 2 Cache lines) enhanced RCKMPI with 1D topology (48 procs, 3 Cache lines) enhanced RCKMPI without topology (48 procs) Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 16 of 19
25 Results for 2D CFD application with ring topology: process 1 process 2 process n Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 17 of 19
26 speedup enhanced RCKMPI with topology information, 2 CL original RCKMPI number of processes Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 18 of 19
27 Summary and Future work SCC is equipped with a fast NoC. Message Passing Buffer on tile is beneficial for fast communication. Bandwidth performance gain for MPI applications by using MPI s virtual process topologies and rearranging the MPB layout. Current/Future Work: Comparison with I. C. Ureña, and M. Gerndt: Improved RCKMPI s SCCMPB Channel: Scaling and Dynamic Processes Support, ARCS 2012 Fixed the One-Sided Communication in RCKMPI = support of applications based on Global Arrays Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 19 of 19
28 Summary and Future work SCC is equipped with a fast NoC. Message Passing Buffer on tile is beneficial for fast communication. Bandwidth performance gain for MPI applications by using MPI s virtual process topologies and rearranging the MPB layout. Current/Future Work: Comparison with I. C. Ureña, and M. Gerndt: Improved RCKMPI s SCCMPB Channel: Scaling and Dynamic Processes Support, ARCS 2012 Fixed the One-Sided Communication in RCKMPI = support of applications based on Global Arrays Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 19 of 19
29 Summary and Future work SCC is equipped with a fast NoC. Message Passing Buffer on tile is beneficial for fast communication. Bandwidth performance gain for MPI applications by using MPI s virtual process topologies and rearranging the MPB layout. Current/Future Work: Comparison with I. C. Ureña, and M. Gerndt: Improved RCKMPI s SCCMPB Channel: Scaling and Dynamic Processes Support, ARCS 2012 Fixed the One-Sided Communication in RCKMPI = support of applications based on Global Arrays Bettina Schnor (Potsdam University) MPI Topology Awareness on the SCC Frame 19 of 19
DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER
DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER ANDREAS-LAZAROS GEORGIADIS, SOTIRIOS XYDIS, DIMITRIOS SOUDRIS MICROPROCESSOR AND MICROSYSTEMS LABORATORY ELECTRICAL AND
More information- Nishad Nerurkar. - Aniket Mhatre
- Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,
More informationMPICH FOR SCI-CONNECTED CLUSTERS
Autumn Meeting 99 of AK Scientific Computing MPICH FOR SCI-CONNECTED CLUSTERS Joachim Worringen AGENDA Introduction, Related Work & Motivation Implementation Performance Work in Progress Summary MESSAGE-PASSING
More informationInvasive MPI on Intel s Single-Chip Cloud Computer
Invasive MPI on Intel s Single-Chip Cloud Computer Isaías A. Comprés Ureña 1, Michael Riepen 2, Michael Konow 2, and Michael Gerndt 1 1 Technical University of Munich (TUM), Institute of Informatics, Boltzmannstr.
More informationFast Fluid Dynamics on the Single-chip Cloud Computer
Fast Fluid Dynamics on the Single-chip Cloud Computer Marco Fais, Francesco Iorio High-Performance Computing Group Autodesk Research Toronto, Canada francesco.iorio@autodesk.com Abstract Fast simulation
More informationSingle-chip Cloud Computer IA Tera-scale Research Processor
Single-chip Cloud Computer IA Tera-scale esearch Processor Jim Held Intel Fellow & Director Tera-scale Computing esearch Intel Labs August 31, 2010 www.intel.com/info/scc Agenda Tera-scale esearch SCC
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationEXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE
EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE Andrew B. Hastings Sun Microsystems, Inc. Alok Choudhary Northwestern University September 19, 2006 This material is based on work supported
More informationParallel sorting on Intel Single-Chip Cloud computer
Parallel sorting on Intel Single-Chip Cloud computer Kenan Avdic, Nicolas Melot, Jörg Keller 2, and Christoph Kessler Linköpings Universitet, Dept. of Computer and Inf. Science, 5883 Linköping, Sweden
More informationBottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring
Julian M. Kunkel - Euro-Par 2008 1/33 Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring Julian M. Kunkel Thomas Ludwig Institute for Computer Science Parallel and Distributed
More informationHow To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
More informationHyper Node Torus: A New Interconnection Network for High Speed Packet Processors
2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,
More informationProceedings of the 4th Many-core Applications Research Community (MARC) Symposium
Proceedings of the 4th Many-core Applications Research Community (MARC) Symposium Peter Tröger, Andreas Polze (Eds.) Technische Berichte Nr. 55 des Hasso-Plattner-Instituts für Softwaresystemtechnik an
More informationA Fast Inter-Kernel Communication and Synchronization Layer for MetalSVM
A Fast Inter-Kernel Communication and Synchronization Layer for MetalSVM Pablo Reble, Stefan Lankes, Carsten Clauss, Thomas Bemmerl Chair for Operating Systems, RWTH Aachen University Kopernikusstr. 16,
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationEarly experience with the Barrelfish OS and the Single-Chip Cloud Computer
Early experience with the Barrelfish OS and the Single-Chip Cloud Computer Simon Peter, Adrian Schüpbach, Dominik Menzi and Timothy Roscoe Systems Group, Department of Computer Science, ETH Zurich Abstract
More informationSingle-chip Cloud Computer A many-core research platform from Intel Labs
Single-chip Cloud Computer A many-core research platform from Intel Labs Compute evolving to Tera- Scale Entertainment, Learning Performance TIPS GIPS MIPS KIPS Multimedia Model- Based Apps 3D and Video
More informationDistributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
More informationPerformance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
More informationExploring the Intel Single-Chip Cloud Computer and its possibilities for SVP
Master Computer Science Computer Science University of Amsterdam Exploring the Intel Single-Chip Cloud Computer and its possibilities for SVP Roy Bakker 0583650 bakkerr@science.uva.nl October 7, 2011 Supervisors:
More informationBuilding an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
More informationOpTiMSoC Build Your Own System-on-Chip!
OpTiMSoC Build Your Own System-on-Chip! FOSDEM 2014 February 2nd 2014 Institute for Integrated Systems Prof. Dr. Andreas Herkersdorf Arcisstraße 21 80333 München Germany http://www.lis.ei.tum.de 2 Photo
More information1. If we need to use each thread to calculate one output element of a vector addition, what would
Quiz questions Lecture 2: 1. If we need to use each thread to calculate one output element of a vector addition, what would be the expression for mapping the thread/block indices to data index: (A) i=threadidx.x
More informationOpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
More informationLecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,
More informationIntroduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip
Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO silvano@elet.polimi.it Politecnico di Milano, Milano (Italy) Talk Outline
More informationAlgorithms of Scientific Computing II
Technische Universität München WS 2010/2011 Institut für Informatik Prof. Dr. Hans-Joachim Bungartz Alexander Heinecke, M.Sc., M.Sc.w.H. Algorithms of Scientific Computing II Exercise 4 - Hardware-aware
More informationInternet Packets. Forwarding Datagrams
Internet Packets Packets at the network layer level are called datagrams They are encapsulated in frames for delivery across physical networks Frames are packets at the data link layer Datagrams are formed
More informationBeyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp
Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationArchitectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
More informationOptimizing Configuration and Application Mapping for MPSoC Architectures
Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends
More informationDynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
More informationComparing the Power and Performance of Intel s SCC to State-of-the-Art CPUs and GPUs
Comparing the Power and Performance of Intel s SCC to State-of-the-Art CPUs and GPUs Ehsan Totoni, Babak Behzad, Swapnil Ghike, Josep Torrellas Department of Computer Science, University of Illinois at
More informationThe 48-core SCC processor: the programmer s view
1 The 48-core SCC processor: the programmer s view Timothy G. Mattson, Rob F. Van der Wijngaart, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Vangal, Nitin
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationFD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling
Center for Information Services and High Performance Computing (ZIH) FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Symposium on HPC and Data-Intensive Applications in Earth
More informationSWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
More informationBare-metal message passing API for many-core systems
Bare-metal message passing API for many-core systems Johannes Scheller Institut Superieur de l Aeronautique et de l Espace, Toulouse, France Eric Noulard and Claire Pagetti and Wolfgang Puffitsch ONERA-DTIM,
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
More informationAdvances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.
Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.49-54 : isrp13-005 Optimized Communications on Cloud Computer Processor by Using
More informationGlobus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
More informationInitial Performance Evaluation of the Cray SeaStar
Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith D. Underwood Sandia National Laboratories PO Box 58 Albuquerque, NM 87185-111 E-mail: {rbbrigh,ktpedre,kdunder}@sandia.gov
More informationCloud computing. Intelligent Services for Energy-Efficient Design and Life Cycle Simulation. as used by the ISES project
Intelligent Services for Energy-Efficient Design and Life Cycle Simulation Project number: 288819 Call identifier: FP7-ICT-2011-7 Project coordinator: Technische Universität Dresden, Germany Website: ises.eu-project.info
More informationOn-Chip Interconnection Networks Low-Power Interconnect
On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007 Outline Demand for On-Chip Networks
More informationPerformance and scalability of MPI on PC clusters
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 24; 16:79 17 (DOI: 1.12/cpe.749) Performance Performance and scalability of MPI on PC clusters Glenn R. Luecke,,
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
More informationCommunication Networks. MAP-TELE 2011/12 José Ruela
Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)
More information基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器
基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器 楊 竹 星 教 授 國 立 成 功 大 學 電 機 工 程 學 系 Outline Introduction OpenFlow NetFPGA OpenFlow Switch on NetFPGA Development Cases Conclusion 2 Introduction With the proposal
More informationAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and
More informationComparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov
More informationApplications. Network Application Performance Analysis. Laboratory. Objective. Overview
Laboratory 12 Applications Network Application Performance Analysis Objective The objective of this lab is to analyze the performance of an Internet application protocol and its relation to the underlying
More informationAxon: A Flexible Substrate for Source- routed Ethernet. Jeffrey Shafer Brent Stephens Michael Foss Sco6 Rixner Alan L. Cox
Axon: A Flexible Substrate for Source- routed Ethernet Jeffrey Shafer Brent Stephens Michael Foss Sco6 Rixner Alan L. Cox 2 Ethernet Tradeoffs Strengths Weaknesses Cheap Simple High data rate Ubiquitous
More information- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
More informationLOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
More informationMemory Architecture and Management in a NoC Platform
Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011 Overview Motivation State of the Art Data Management Engine
More informationChapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
More informationCOS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File
Topics COS 318: Operating Systems File Layout and Directories File system structure Disk allocation and i-nodes Directory and link implementations Physical layout for performance 2 File System Components
More informationExploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication
Workshop for Communication Architecture in Clusters, IPDPS 2002: Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication Joachim Worringen, Andreas Gäer, Frank Reker
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationNetwork Infrastructure Services CS848 Project
Quality of Service Guarantees for Cloud Services CS848 Project presentation by Alexey Karyakin David R. Cheriton School of Computer Science University of Waterloo March 2010 Outline 1. Performance of cloud
More informationRouter and Routing Basics
Router and Routing Basics Malin Bornhager Halmstad University Session Number 2002, Svenska-CNAP Halmstad University 1 Routing Protocols and Concepts CCNA2 Routing and packet forwarding Static routing Dynamic
More informationSeptember 25, 2007. Maya Gokhale Georgia Institute of Technology
NAND Flash Storage for High Performance Computing Craig Ulmer cdulmer@sandia.gov September 25, 2007 Craig Ulmer Maya Gokhale Greg Diamos Michael Rewak SNL/CA, LLNL Georgia Institute of Technology University
More informationTechnical Computing Suite Job Management Software
Technical Computing Suite Job Management Software Toshiaki Mikamo Fujitsu Limited Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Outline System Configuration and Software Stack Features The major functions
More informationPerformance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
More informationScalable Source Routing
Scalable Source Routing January 2010 Thomas Fuhrmann Department of Informatics, Self-Organizing Systems Group, Technical University Munich, Germany Routing in Networks You re there. I m here. Scalable
More informationScientific application deployment on Cloud: A Topology-Aware Method
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 212; :1 2 Published online in Wiley InterScience (www.interscience.wiley.com). Scientific application deployment
More informationEfficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols
Universitat Politècnica de València Master Thesis Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols Author: Mario Lodde Advisor: Prof. José Flich Cardo A thesis
More informationA Generic Network Interface Architecture for a Networked Processor Array (NePA)
A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationDiablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015
A Diablo Technologies Whitepaper Diablo and VMware TM powering SQL Server TM in Virtual SAN TM May 2015 Ricky Trigalo, Director for Virtualization Solutions Architecture, Diablo Technologies Daniel Beveridge,
More informationPerformance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
More informationA Comparison of Three MPI Implementations for Red Storm
A Comparison of Three MPI Implementations for Red Storm Ron Brightwell Scalable Computing Systems Sandia National Laboratories P.O. Box 58 Albuquerque, NM 87185-111 rbbrigh@sandia.gov Abstract. Cray Red
More informationAnalysis of Power Management Strategies for a Single-Chip Cloud Computer
Analysis of Power Management Strategies for a Single-Chip Cloud Computer Robert Eksten, Kameswar Rao Vaddina, Pasi Liljeberg and Juha Plosila Turku Center for Computer Science (TUCS), Joukahaisenkatu 3-5,
More informationADVANCED NETWORK CONFIGURATION GUIDE
White Paper ADVANCED NETWORK CONFIGURATION GUIDE CONTENTS Introduction 1 Terminology 1 VLAN configuration 2 NIC Bonding configuration 3 Jumbo frame configuration 4 Other I/O high availability options 4
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationImplementing MPI-IO Shared File Pointers without File System Support
Implementing MPI-IO Shared File Pointers without File System Support Robert Latham, Robert Ross, Rajeev Thakur, Brian Toonen Mathematics and Computer Science Division Argonne National Laboratory Argonne,
More informationLoad Balancing Support for Grid-enabled Applications
John von Neumann Institute for Computing Load Balancing Support for Grid-enabled Applications S. Rips published in Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the
More informationAccelerating the Data Plane With the TILE-Mx Manycore Processor
Accelerating the Data Plane With the TILE-Mx Manycore Processor Bob Doud Director of Marketing EZchip Linley Data Center Conference February 25 26, 2015 1 Announcing the World s First 100-Core A 64-Bit
More informationCray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak
Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance
More informationQoSIP: A QoS Aware IP Routing Protocol for Multimedia Data
QoSIP: A QoS Aware IP Routing Protocol for Multimedia Data Md. Golam Shagadul Amin Talukder and Al-Mukaddim Khan Pathan* Department of Computer Science and Engineering, Metropolitan University, Sylhet,
More informationPerformance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09
Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,
More informationIn-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015
In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 June 29-30, 2015 Contacts Alexandre Boudnik Senior Solution Architect, EPAM Systems Alexandre_Boudnik@epam.com
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationPRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
More informationSAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
More informationInterconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
More informationOperating System Support for Multiprocessor Systems-on-Chip
Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.
More informationUsing IPv6 and 6LoWPAN for Home Automation Networks
Using IPv6 and 6LoWPAN for Home Automation Networks Thomas Scheffler / Bernd Dörge ICCE-Berlin Berlin, 06.09.2011 Overview IPv6 and 6LoWPAN for Home Automation Networks 6LoWPAN Application & Network Architecture
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card
More informationPerformance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware
Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware 1 / 17 Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware X. Besseron 1 V.
More informationMicrosoft SMB 2.2 - Running Over RDMA in Windows Server 8
Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8 Tom Talpey, Architect Microsoft March 27, 2012 1 SMB2 Background The primary Windows filesharing protocol Initially shipped in Vista and Server
More informationLoad Imbalance Analysis
With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationPerformance analysis with Periscope
Performance analysis with Periscope M. Gerndt, V. Petkov, Y. Oleynik, S. Benedict Technische Universität München September 2010 Outline Motivation Periscope architecture Periscope performance analysis
More informationD1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,
More informationArchitecture of distributed network processors: specifics of application in information security systems
Architecture of distributed network processors: specifics of application in information security systems V.Zaborovsky, Politechnical University, Sait-Petersburg, Russia vlad@neva.ru 1. Introduction Modern
More informationWorkshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
More information