Introduction to HPC. Lecture 17

Size: px
Start display at page:

Download "Introduction to HPC. Lecture 17"

Transcription

1 Introduction to HPC Lecture 17 Dept of Computer Science Clusters 1

2 Clusters Recall: Bus Connected SMPs (UMAs) Processor Processor Processor Processor Cache Cache Cache Cache Single Bus Memory I/O Caches are used to reduce latency and to lower bus traffic Must provide hardware for cache coherence and process synchronization Bus traffic and bandwidth limits scalability (<~ 36 processors) 2

3 Network Connected Multiprocessors Memory Memory Memory Cache Cache Cache Processor Processor Processor Interconnection Network (IN) Either a single address space (NUMA and ccnuma) with implicit processor communication via loads and stores or multiple private memories with message passing communication with sends and receives Interconnection network supports interprocessor communication Adapted from Networks Facets people talk a lot about: direct (point-to-point) vs. indirect (multi-hop) topology (e.g., bus, ring, DAG) routing algorithms switching (aka multiplexing) wiring (e.g., choice of media, copper, coax, fiber) What really matters: latency bandwidth cost reliability 3

4 Interconnections (Networks) Examples: MPP and Clusters: 100s 10s of 1000s of nodes; 100 meters per link Local Area Networks: 100s 1000s of nodes; a few 1000 meters Wide Area Network: 1000s nodes; 5,000,000 meters Interconnection Network MPP = Massively Parallel Processor Networks 3 cultures for 3 classes of networks MPP and Clusters: latency and bandwidth LAN: workstations, cost WAN: telecommunications, revenue 4

5 Network Performance Measures Universal Performance Metrics Sender Sender Overhead Transmission time (size bandwidth) Receiver (processor busy) Time of Flight Transmission time (size bandwidth) Transport Delay Receiver Overhead (processor busy) Total Delay Total Delay = Sender Overhead + Time of Flight + Message Size BW + Receiver Overhead Includes header/trailer in BW calculation 5

6 Simplified Latency Model Total Delay = Latency + Message Size / BW Latency = Sender Overhead + Time of Flight + Receiver Overhead 1,000 Example: show what happens as vary Latency: 1, 25, 500 µsec 100 BW: 10,100, 1000 Mbit/sec 10 (factors of 10) 1 Message Size: 16 Bytes to 4 MB (factors of 4) 0 If overhead 500 µsec, how big a message > 10 Mb/s? 0 Effective Bandwidth (Mbit/sec) o1, bw1000 o1, o25, bw100 bw100 o1, bw10 o25, bw10 o500, bw10 o25, bw1000 o500, bw100 Message Size (bytes) o500, bw E+06 4E+06 Example Performance Measures Interconnect MPP LAN WAN Example CM-5 Ethernet ATM Bisection BW N x 5 MB/s MB/s N x 10 MB/s Int./Link BW 20 MB/s MB/s 10 MB/s Transport Latency 5 µsec 15 µsec 50 to 10,000 µs HW Overhead to/from 0.5/0.5 µs 6/6 µs 6/6 µs SW Overhead to/from 1.6/12.4 µs 200/241 µs 207/360 µs (TCP/IP on LAN/WAN) Software overhead dominates in LAN, WAN 6

7 Source: Mike Levine, PSC DEISA Symp, May 2005 HW Interface Issues Where to connect network to computer? Cache consistent to avoid flushes? (=> memory bus) Latency and bandwidth? (=> memory bus) Standard interface card? (=> I/O bus) MPP => memory bus; Clusters, LAN, WAN => I/O bus CPU $ L2 $ Network I/O Controller Network I/O Controller ideal: high bandwidth, low latency, standard interface Memory Bus I/O bus Memory Bus Adaptor 7

8 Interconnect First level: Chip/Board AMD on-chip Hyper Transport on board Hyper Transport Interconnect First level: Chip/Board Intel Quick Path Interconnect (QPI) (Board) Ring (MIC, Sandy Bridge) Mesh (Polaris, SCC) SCC Sandy Bridge /SCC_Sympossium_Feb212010_FINAL-A.pdf MIC reference/isc_2010_skaugen_keynote.pdf 8

9 Internode connection Where to Connect? Internal bus/network (MPP) CM-1, CM-2, CM-5 IBM Blue Gene, Power7 Cray SGI Ultra Violett I/O bus (Clusters) Typically PCI bus Interconnect examples for MPP (Proprietary interconnection technology) 9

10 IBM Blue Gene/P 3.4 GF/s (DP) 13.6 GF/s Memory BW/F = 1 B/F Comm BW/F = (6*3.4*2/8) = B/F IBM BG/P 3 Dimensional Torus Interconnects all compute nodes Communications backbone for computations Adaptive cut-through hardware routing 3.4 Gb/s on all 12 node links (5.1 GB/s per node) 0.5 μs latency between nearest neighbors, 5 μs to the farthest Note: 0.5(# of hops) MPI: 3 μs latency for one hop, 10 μs to the farthest.7/2.6 TB/s bisection bandwidth, 188TB/s total bandwidth (72k machine) Collective Network Interconnects all compute and I/O nodes (1152) One-to-all broadcast functionality Reduction operations functionality 6.8 Gb/s of bandwidth per link Latency of one way tree traversal 2 μs, MPI 5 μs ~62TB/s total binary tree bandwidth (72k machine) Low Latency Global Barrier and Interrupt Latency of one way to reach all 72K nodes 0.65 μs, MPI 1.6 μs Other networks 10Gb Functional Ethernet I/O nodes only 1Gb Private Control Ethernet Provides JTAG access to hardware.accessible only from Service Node system 10

11 IBM BG/P Ping-Pong MPI Ping-Pong Latency: µs, avg 4.7 µs MPI Ping-Pong Bandwidth: 0.38 GB/s 147,456 cores Measured: ~650MF/s out of 13.6 GF (~5% of peak) Estimate based on memory BW: (13.6/24*2*0.85)= GF/s Estimate based on measured BW: (9.4/24*2*0.85)=0.665 GF/s (BW measured by Stream and reported as part of HPCC) Blue Gene Q 11

12 BG/Q 5-D Torus Network BG/Q Networks Networks 5 D torus in compute nodes, 2 GB/s bidirectional bandwidth on all (10+1) links, 5D nearest neighbor exchange measured at ~1.75 GB/s per link Both collective and barrier networks are embedded in this 5-D torus network. Virtual Cut Through (VCT) Floating point addition support in collective network Compute rack to compute rack bisection BW (46X BG/L, 19X BG/P) 20.1PF: bisection is 2x16x16x12x2 (bidi)x2(torus, not mesh)x2gb/s link bandwidth = TB/s 26.8PF: bisection is 2x16x16x16x4x2GB/s = TB/s BGL at LLNL is 0.7 TB/s I/O Network to/from Compute rack 2 links (4GB/s in 4GB/s out) feed an I/O PCI-e port (4GB/s in, 4GB/s out) Every Q32 node card has up to I/O 8 links or 4 ports Every rack has up to 32x8 = 256 links or 128 ports I/O rack 8 I/O nodes/drawer, each node has 2 links from compute rack, and1 PCI-e port to the outside world 12/drawers/rack 96 I/O, or 96x4 (PCI-e) = 384 TB/s = 3 Tb/s 12

13 BG/Q Network All-to-all: 97% of peak Bisection: > 93% of peak Nearest-neighbor: 98% of peak Collective: FP reductions at 94.6% of peak No performance problems identified in network logic Cray XE6 Gemini Network Day_2_Session_2_all.pdf 13

14 Cray XE6 Gemini Network MPI Ping-Pong Latency: 6 9 µs, avg 7.5 µs Note: Seastar no Gemini MPI Ping-Pong Bandwidth: 1.6 GB/s Note: Seastar no Gemini 224,256 cores Cray XE6 Gemini Network Presentations/Courses_Ws_2011/Multi-Threaded_Course_Feb11/Day_2_Session_2_all.pdf 14

15 SGI Ultra Violet (UV) 1 rack, 16 nodes, 32 sockets Max 3 hops External Numalink-5 routers:16 ports UV Hub Two QPI interfaces 2x25GB/s Four Numalink 5 links 4x10 GB/s SGI UV 8 racks, 128 nodes, 256 sockets, fat-tree, ¼ shown. 16TB shared memory (4 racks with 16GB/DIMMs) MPI Ping-Pong Latency: µs, avg 1.6 µs MPI Ping-Pong Bandwidth: GB/s, avg 3 GB/s 64 cores 512 racks, 8192 nodes, sockets, 8x8 torus of 128 node fat-trees. Each torus link consists of 2 Numalink-5 bidirectional links. Maximum estimated latency for 1024 rack system: <2µs

16 IBM Power7 IBM Power7 Hub 61 mm x 96 mm Glass Ceramic LGA module 56 12X optical modules LGA attach onto substrate TB/s interconnect bandwidth 45 nm lithography, Cu, SOI13 levels metal 440M transistors 582 mm mm x 21.8 mm 3707 signal I/O 11,328 total I/O 16

17 IBM Power7 Integrated Switch Router Two tier, full graph network 3.0 GHz internal 56x56 crossbar switch 8 HFI, 7 LL, 24 LR, 16 D, and SRV ports Virtual channels for deadlock prevention Input/Output Buffering 2 KB maximum packet size 128B FLIT size Link Reliability CRC based link-level retry Lane steering for failed links IP Multicast Support Multicast route tables per ISR for replicating and forwarding multicast packets Global Counter Support ISR compensates for link latencies as counter information is propagated HW synchronization with Network Management setup and maintenance Routing Characteristics 3-hop L-D-L longest direct route 5-hop L-D-L-D-L longest indirect route Cut-through Wormhole routing Full hardware routing using distributed route tables across the ISRsSource route tables for packets injected by the HFI Port route tables for packets at each hop in the network Separate tables for inter-supernode and intrasupernode routes FLITs of a packet arrive in order, packets of a message can arrive out of order Routing Modes Hardware Single Direct Routing Hardware Multiple Direct Routing For less than full-up system where more than one direct path exists Hardware Indirect Routing for data striping and failover Round-Robin, Random Software controlled indirect routing through hardware route tables I/O bus technologies (clusters) 17

18 Peripheral Component Interconnect (PCI) PCI: V1.0 (1992): 32-bit, MHz PCI-X: V1.0 (1998): 64-bit, 66 MHz, 100 MHz, 133 MHz V2.0 (2003): 64-bit wide, 266 MHz, 533 MHz PCI Express (PCIe): V1.0 (2003): 256 MiB/s per lane (16 lanes = 4 GiB/s) V2.0 (2007): 512 MiB/s per lane (16 lanes = 8 GiB/s) V3.0 (2010): 1024 MiB/s per lane (16 lanes = 16 GiB/s) PCIe is defined for 1,2,4,8,16, and 32 lanes x4 x16 x1 x16 PCI (32-bit) Cluster Interconnect Technologies Ethernet: 1 GigE (1995), 10 GigE (2001), 40 GigE (2010), 100 GigE (2010) Infiniband: 2001: Single Data Rate (SDR), 2.5 Gbps/lane, x4 (2003)10 Gbps, 8/10 encoding (net data rate 2 Gbps, x4 8 Gbps) 2005: Double Data Rate (DDR), 5 Gbps/lane, x4 20 Gbps, 8/10 encoding (net data rate 4 Gbps, x4 16 Gbps) 2007: Quad Data Rate (QDR), 10 Gbps/lane, x4 40 Gbps, 8/10 encoding (net data rate 8 Gbps, x4 32 Gbps) 2011: Fourteen Data Rate (FDR), Gbps, x Gbps, 64/66 encoding (net data rate Gbps, x Gbps) 2013: Enhanced Data Rate (EDR), Gbps, x Gbps, 64/66 encoding, (net data rate 25 Gbps, x4 100 Gbps) Switch latency: SDR 200 ns, DDR 140 ns, QDR 100 ns. Mellanox current switch chip has 1.4 billion transistors and a throughput of 4 Gbps on 36 ports and a port-to-port latency of 165 ns. Myrinet: 0.64 Gbps (1994), 1.28 Gbps (1996), 2 Gbps (2000), 10 Gbps (2006) 18

19 Infiniband Roadmap SDR - Single Data Rate DDR - Double Data Rate QDR - Quad Data Rate FDR - Fourteen Data Rate EDR - Enhanced Data Rate HDR - High Data Rate NDR - Next Data Rate Typical Infiniband Network HCA = Host Channel Adapter TCA = Target Channel Adapter 19

20 Interconnect Technology Properties Mellanox ConnectX IB 40Gb/s PCIe x8 InfiniBand Proprietary GigE 10GigE QLogic InfiniPath IB 20Gb/s PCIe x8 Myrinet 10G PCIe x8 Quadrics QSNetII Chelsio T210-CX PCIe x8 Application Latency (µs) < Peak Unidirectional Bandwidth (MB/s) for PCIe Gen1 Peak Unidirectional Bandwidth (MB/s) for PCIe Gen N/A N/A N/A N/A N/A Mellanox ConnectX InfiniBand IB 10Gb/s PCIe Gen1 IB 20Gb/s IB 20Gb/s PCIe Gen2 IB 40Gb/s PCIe Gen2 IPoIB Bandwidth 939MB/s 1410MB/s 1880MB/s 2950MB/s MPI Ping-Pong measurements MPI Ping-Pong latency µs MPI Ping-Pong bandwidth GB/s Mellanox QDR , avg 3.6 (4320 cores) , avg 1.8 (4320 cores) Infinipath QDR , avg 1.6 (192 cores) , avg 2.5 (192 cores) MPI Ping-Pong latency µs MPI Ping-Pong bandwidth GB/s BG/P Cray Seastar SGI UV , avg 4.7 (147,456 cores) 0.38 (147,456 cores) 6 9, avg 7.5 (224,256 cores) 1.6 (224,256 cores) , avg 1.6 (64 cores) , avg 3 (64 cores) 20

21 Cray XC30 Interconnection Network Cray XC30 Interconnection Network Chassis 1 Group Chassis Chassis

22 Cray XC30 Interconnection Network Aeris chip 40 nm technology 16.6 x 18.9 mm 217M gates 184 lanes of SerDes 30 optical lanes 90 electrical lanes 64 PCIe 3.0 lanes Cray XC30 Network Overview 22

23 Interconnection Networks References CSE 431, Computer Architecture, Fall 2005, Lecture 27. Network Connected Multi s, Mary Jane Irwin, Lecture 21: Networks & Interconnect Introduction, Dave A. Patterson, Jan Rabaey, CS 252, Spring 2000, Technology Trends in High Performance Computing, Mike Levine, DEISA Symposium, May 9 10, Single-chip Cloud Computer: An experimental many-core processor from Intel Labs, Jim Held, Petascale to Exascale - Extending Intel s HPC Commitment, Kirk Skaugen, Blue Gene: A Next Generation Supercomputer (BlueGene/P), Alan Gara, Blue Gene/P Architecture: Application Performance and Data Analytics, Vitali Morozov, HPC Challenge, Multi-Threaded Course, February 15 17, 2011, DAY 2: Introduction to Cray MPP Systems with Multi-core Processors Multi-threaded Programming, Tuning and Optimization on Multi-core MPP Platforms, 011/Multi-Threaded_Course_Feb11/Day_2_Session_2_all.pdf Gemini Description, MPI, Jason Beech-Brandt, Technical Advances in the SGI UV Architecture, SGI UV Solving the World s Most Data Intensive Problems, 23

24 References cont d The IBM POWER7 HUB Module: A Terabyte Interconnect Switch for High-Performance Computer Systems, Hot Chips 22, August 2010, Baba Arimilli, Steve Baumgartner, Scott Clark, Dan Dreps, Dave Siljenberg, Andrew Mak, Intel Core i7 I/O Hub and I/O Controller Hub, PCI Express, InfiniBand and 10-Gigabit Ethernet for Dummies, A Tutorial at Supercomputing 09, DK Panda, Pavan Balaji, Matthew Koop, Infiniband Roadmap, Infiniband Performance, A Complexity Theory for VLSI, Clark David Thompson, Doctoral Thesis, ACM, Microprocessors, Exploring Chip Layers, Cray T3E, Complexity issues in VLSI, Frank Thomson Leighton, MIT Press, 1983 The Tree Machine: An Evaluation of Strategies For Reducing Program Loading Time, Li, Pey-yun Peggy and Johnsson, Lennart, Dado: A Tree-Structured Architecture for Artificial Intelligence Computation, S J Stolfo, and D P Miranker, Annual Review of Computer Science, Vol. 1: 1-18 (Volume publication date June 1986), DOI: /annurev.cs , Architecture and Applications of DADO: A Large-Scale Parallel Computer for Artificial Intelligence, Salvatore J. Stolfo, Daniel Miranker, David Elliot Shaw, Introduction to Algorithms, Charles E. Leiserson, September 15, 2004, References (cont d) UC Berkeley, CS 252, Spring 2000, Dave Patterson Interconnection Networks, Computer Architecture: A Quantitative Approach 4th Edition, Appendix E, Timothy Mark Pinkston, USC, Jose Duato, Universidad Politecnica de Valencia, Access and Alignment of Data in an Array Processor, D H Lawrie, IEEE Trans Computers, C-24, No. 12, pp , December 1975, ieeexplore.ieee.org%2fxpls%2fabs_all.jsp%3farnumber%3d SP2 System Architecture, T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, IBM J. Res Dev, v. 34, no. 2, pp , 1995, Inside the TC2000, BBN Advanced Computer Inc., Preliminary version, 1989 A Study of Non-Blocking Switching Networks, Charles Clos, Bell Systems Technical Journal, vol. 32, 1953, pp On Rearrangeable Three-Stage Connecting Networks, V. E "Vic" Benes, BSTJ, vol. XLI, Sep. 1962, No. 5, pp GF11: M Kumar, IBM J. Res Dev, v. 36, no. 6, pp , Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing. Charles E. Leiserson, IEEE Trans. Computers 34(10): (1985 Cray XC30 series Network, B Alverson, E. Froese, L. Kaplan, D. Roweth, Cray High Speed Networking, Hot Interconnect, August 2012, Cost-Efficient Dragonfly Topology for Large-Scale Systems, J. Kim, W. Dally, S. Scott, D Abts, Vol. 29, no. 1, pp33 40, Jan Feb 2009, IEEE Micro, Technology-Drive, Highly-Scalable Dragonfly Topology, J. Kim, W. Dally, S. Scott, D Abts, pp , 35 th International Symposium on Computer Architecture (ISCA), 2008, 24

25 References (cont d) Microarchitecture of a High-Radix Router, J. Kim, W. J.Dally, B. Towles, A. K. Gupta, The BlackWidow High-Radix Clos Network, S. Scott, D. Abts, J. Kim, W. Dally, pp , 33rd International Symposium on Computer Architecture (ISCA), 2006, Flattened Butterfly Network: A Cost-Efficient Topology for High-Radix Networks, J. Kim, W. J. Dally, D. Abts, pp , 34th International Symposium on Computer Architecture(ISCA), 2007, Flattened Butterfly Topology for On-Chip Networks, J. Kim, J. Balfour, W. J. Dally, vol. 6, no. 2, pp , Jul Dec, 2007, IEEE Computer Architecture Letters, Flattened Butterfly Topology for On-Chip Networks, J. Kim, J. Balfour, W. J. Dally, pp , 40th Annual IEEE/ACM International Symposium on Micro-architecture (MICRO), 2007, From Hypercubes to Dragonflies: A Short History of Interconnect, W. J. Dally, 25

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap

More information

On-Chip Interconnection Networks Low-Power Interconnect

On-Chip Interconnection Networks Low-Power Interconnect On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007 Outline Demand for On-Chip Networks

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

From Hypercubes to Dragonflies a short history of interconnect

From Hypercubes to Dragonflies a short history of interconnect From Hypercubes to Dragonflies a short history of interconnect William J. Dally Computer Science Department Stanford University IAA Workshop July 21, 2008 IAA: # Outline The low-radix era High-radix routers

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering slyness@appro.com Company Overview

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission.

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Stovepipes to Clouds Rick Reid Principal Engineer SGI Federal 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Agenda Stovepipe Characteristics Why we Built Stovepipes Cluster

More information

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems PCI Express Impact on Storage Architectures Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may

More information

Advanced Computer Networks. High Performance Networking I

Advanced Computer Networks. High Performance Networking I Advanced Computer Networks 263 3501 00 High Performance Networking I Patrick Stuedi Spring Semester 2014 1 Oriana Riva, Department of Computer Science ETH Zürich Outline Last week: Wireless TCP Today:

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

Interconnecting Future DoE leadership systems

Interconnecting Future DoE leadership systems Interconnecting Future DoE leadership systems Rich Graham HPC Advisory Council, Stanford, 2015 HPC The Challenges 2 Proud to Accelerate Future DOE Leadership Systems ( CORAL ) Summit System Sierra System

More information

RDMA over Ethernet - A Preliminary Study

RDMA over Ethernet - A Preliminary Study RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance

More information

Advancing Applications Performance With InfiniBand

Advancing Applications Performance With InfiniBand Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and

More information

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

PCI Express and Storage. Ron Emerick, Sun Microsystems

PCI Express and Storage. Ron Emerick, Sun Microsystems Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

Interconnection Networks. B649 Parallel Computing Seung-Hee Bae Hyungro Lee

Interconnection Networks. B649 Parallel Computing Seung-Hee Bae Hyungro Lee Interconnection Networks B649 Parallel Computing Seung-Hee Bae Hyungro Lee Outline Introduction Interconnecting Two Devices Connecting More than Two Devices Network Topology Network Routing, Arbitration,

More information

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every

More information

InfiniBand Clustering

InfiniBand Clustering White Paper InfiniBand Clustering Delivering Better Price/Performance than Ethernet 1.0 Introduction High performance computing clusters typically utilize Clos networks, more commonly known as Fat Tree

More information

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing WHITE PAPER Highlights: There is a large number of HPC applications that need the lowest possible latency for best performance

More information

HPC Update: Engagement Model

HPC Update: Engagement Model HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems mikev@sun.com Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value

More information

High Speed I/O Server Computing with InfiniBand

High Speed I/O Server Computing with InfiniBand High Speed I/O Server Computing with InfiniBand José Luís Gonçalves Dep. Informática, Universidade do Minho 4710-057 Braga, Portugal zeluis@ipb.pt Abstract: High-speed server computing heavily relies on

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

Current Trend of Supercomputer Architecture

Current Trend of Supercomputer Architecture Current Trend of Supercomputer Architecture Haibei Zhang Department of Computer Science and Engineering haibei.zhang@huskymail.uconn.edu Abstract As computer technology evolves at an amazingly fast pace,

More information

PCI Express Impact on Storage Architectures and Future Data Centers

PCI Express Impact on Storage Architectures and Future Data Centers PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation Author: Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is

More information

Storage Architectures. Ron Emerick, Oracle Corporation

Storage Architectures. Ron Emerick, Oracle Corporation PCI Express PRESENTATION and Its TITLE Interfaces GOES HERE to Flash Storage Architectures Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

Performance of Software Switching

Performance of Software Switching Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

PCI Express IO Virtualization Overview

PCI Express IO Virtualization Overview Ron Emerick, Oracle Corporation Author: Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

The University of Arizona Department of Electrical and Computer Engineering Term Paper (and Presentation) for ECE 569 Fall 2006 21 February 2006

The University of Arizona Department of Electrical and Computer Engineering Term Paper (and Presentation) for ECE 569 Fall 2006 21 February 2006 The University of Arizona Department of Electrical and Computer Engineering Term Paper (and Presentation) for ECE 569 Fall 2006 21 February 2006 The term project for this semester is an independent study

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E) Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers

More information

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates

More information

A Scalable Ethernet Clos-Switch

A Scalable Ethernet Clos-Switch A Scalable Ethernet Clos-Switch Norbert Eicker John von Neumann-Institute for Computing Research Centre Jülich Technisches Seminar Desy Zeuthen 9.5.2006 Outline Motivation Clos-Switches Ethernet Crossbar

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being

More information

Comparison of interconnection networks

Comparison of interconnection networks Department of Information Technology CT30A7001 Concurrent and parallel computing Document for seminar presentation Comparison of interconnection networks Otso Lonka Student ID: 0279351 Otso.lonka@lut.fi

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

University of Castilla-La Mancha

University of Castilla-La Mancha University of Castilla-La Mancha A publication of the Department of Computer Science A Strategy to Compute the InfiniBand Arbitration Tables by Francisco J. Alfaro, JoséL.Sánchez, José Duato Technical

More information

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP: Overview HP supports 56 Gbps Fourteen Data Rate (FDR) and 40Gbps 4X Quad Data Rate (QDR) InfiniBand (IB) products that include mezzanine Host Channel Adapters (HCA) for server blades, dual mode InfiniBand

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering JuRoPA Jülich Research on Petaflop Architecture One Year on Hugo R. Falter, COO Lee J Porter, Engineering HPC Advisoy Counsil, Workshop 2010, Lugano 1 Outline The work of ParTec on JuRoPA (HF) Overview

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

Interconnection Networks

Interconnection Networks CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,

More information

Technology-Driven, Highly-Scalable Dragonfly Topology

Technology-Driven, Highly-Scalable Dragonfly Topology Technology-Driven, Highly-Scalable Dragonfly Topology By William J. Dally et al ACAL Group Seminar Raj Parihar parihar@ece.rochester.edu Motivation Objective: In interconnect network design Minimize (latency,

More information

New Storage System Solutions

New Storage System Solutions New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

White Paper Solarflare High-Performance Computing (HPC) Applications

White Paper Solarflare High-Performance Computing (HPC) Applications Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters

More information

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni (mahidhar@sdsc.edu) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute PADS GPFS Filesystem: Crash Root Cause Analysis Computation Institute Argonne National Laboratory Table of Contents Purpose 1 Terminology 2 Infrastructure 4 Timeline of Events 5 Background 5 Corruption

More information

PCI Express* Ethernet Networking

PCI Express* Ethernet Networking White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network

More information

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department

More information

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

Using PCI Express Technology in High-Performance Computing Clusters

Using PCI Express Technology in High-Performance Computing Clusters Using Technology in High-Performance Computing Clusters Peripheral Component Interconnect (PCI) Express is a scalable, standards-based, high-bandwidth I/O interconnect technology. Dell HPC clusters use

More information

PCI Express Overview. And, by the way, they need to do it in less time.

PCI Express Overview. And, by the way, they need to do it in less time. PCI Express Overview Introduction This paper is intended to introduce design engineers, system architects and business managers to the PCI Express protocol and how this interconnect technology fits into

More information

Cisco SFS 7000P InfiniBand Server Switch

Cisco SFS 7000P InfiniBand Server Switch Data Sheet Cisco SFS 7000P Infiniband Server Switch The Cisco SFS 7000P InfiniBand Server Switch sets the standard for cost-effective 10 Gbps (4X), low-latency InfiniBand switching for building high-performance

More information

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano Intel Itanium Quad-Core Architecture for the Enterprise Lambert Schaelicke Eric DeLano Agenda Introduction Intel Itanium Roadmap Intel Itanium Processor 9300 Series Overview Key Features Pipeline Overview

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

AMD Opteron Quad-Core

AMD Opteron Quad-Core AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

More information

Appendix E final-draft.fm Page 1 Friday, July 14, 2006 10:23 AM

Appendix E final-draft.fm Page 1 Friday, July 14, 2006 10:23 AM Appendix E final-draft.fm Page 1 Friday, July 14, 2006 10:23 AM E.1 Introduction 3 E.2 Interconnecting Two Devices 6 E.3 Connecting More than Two Devices 20 E.4 Network Topology 30 E.5 Network Routing,

More information

Building Clusters for Gromacs and other HPC applications

Building Clusters for Gromacs and other HPC applications Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network

More information

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards WHITE PAPER Delivering HPC Applications with Juniper Networks and Chelsio Communications Ultra Low Latency Data Center Switches and iwarp Network Interface Cards Copyright 20, Juniper Networks, Inc. Table

More information

Initial Performance Evaluation of the Cray SeaStar

Initial Performance Evaluation of the Cray SeaStar Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith D. Underwood Sandia National Laboratories PO Box 58 Albuquerque, NM 87185-111 E-mail: {rbbrigh,ktpedre,kdunder}@sandia.gov

More information

Cost/Performance Evaluation of Gigabit Ethernet and Myrinet as Cluster Interconnect

Cost/Performance Evaluation of Gigabit Ethernet and Myrinet as Cluster Interconnect Cost/Performance Evaluation of Gigabit Ethernet and Myrinet as Cluster Interconnect Helen Chen, Pete Wyckoff, and Katie Moor Sandia National Laboratories PO Box 969, MS911 711 East Avenue, Livermore, CA

More information

The IBM Blue Gene/Q Interconnection Network and Message Unit

The IBM Blue Gene/Q Interconnection Network and Message Unit The IBM Blue Gene/Q Interconnection Network and Message Unit Dong Chen, Noel A. Eisley, Philip Heidelberger, Robert M. Senger, Yutaka Sugawara, Sameer Kumar, Valentina Salapura, David L. Satterfield IBM

More information

Achieving Low-Latency Security

Achieving Low-Latency Security Achieving Low-Latency Security In Today's Competitive, Regulatory and High-Speed Transaction Environment Darren Turnbull, VP Strategic Solutions - Fortinet Agenda 1 2 3 Firewall Architecture Typical Requirements

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces

Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces Douglas Doerfler and Ron Brightwell Center for Computation, Computers, Information and Math Sandia

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

A Quantum Leap in Enterprise Computing

A Quantum Leap in Enterprise Computing A Quantum Leap in Enterprise Computing Unprecedented Reliability and Scalability in a Multi-Processor Server Product Brief Intel Xeon Processor 7500 Series Whether you ve got data-demanding applications,

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information