Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak
|
|
|
- Donna Hill
- 10 years ago
- Views:
Transcription
1 Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak
2 Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance 7. Software stack 8. Interesting to know 9. References Denys Sobchyshak / TUM 2
3 Introduction Torus interconnect is a network topology for connecting processing nodes in parallel computer system. Gemini is a network for Cray's supercomputer systems first used in 2010 Successor of Gemini is an Aries interconnect network first used around 2013 Denys Sobchyshak / TUM 3
4 Introduction Most known example of Gemini interconnect based supercomputer currently occupying 2 nd place in TOP500: Titan - Cray XK7, Opteron C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x Denys Sobchyshak / TUM 4
5 Overview Develops the highly scalable Seastar design used to deliver the 225,000 core Jaguar system, improving network functionality, latency and issue rate Novel system-on-chip (SoC) design to construct direct 3D torus networks that can scale to in excess of 100,000 multi-core nodes Capable of tens of millions of MPI messages per second Each hybrid compute node is interfaced to the Gemini interconnect through HyperTransport 3.0 technology. This direct connect architecture bypasses the PCI bottlenecks inherent in commodity networks and provides a peak of over 20 GB/s of injection bandwidth per node The proven 3-D torus topology provides powerful bisection and global bandwidth characteristics as well as support for dynamic routing of messages Denys Sobchyshak / TUM 5
6 Architecture Cray XT 3D Torus Network Each SeaStar router provides one node of 3D Torus, Gemini provides two. Denys Sobchyshak / TUM 6
7 Gemini Blocks Each Gemini ASIC provides two network interface controllers (NICs), and a 48-port Yarc router. Each of the NICs has its own HyperTransport 3 host interface, enabling Gemini to connect two Opteron nodes to the network. This 2 node building block provides 10 torus links, 4 each in two of the dimension ( x and z ) and 2 in the third dimension ( y ), as shown in previous slide. Traffic between the two nodes connected to a single Gemini is routed internally. The router uses a tiled design, with 8 tiles dedicated to the NICs and 40 (10 groups of 4) dedicated to the network. Denys Sobchyshak / TUM 7
8 FMA & BTA Fast Memory Access (FMA) is a mechanism whereby user processes generate network transactions, puts, gets and atomic memory operations (AMO), by storing directly to the NIC. The FMA block translates stores by the processor into fully qualified network requests. FMA provides both low latency and high issue rate on small transfers. The Block Transfer Engine (BTE) supports asynchronous transfer between local and remote memory. Software writes block transfer descriptors to a queue and the Gemini hardware performs the transfers asynchronously. The BTE supports memory operations (put/get) where the user specifies a local address, a network address and a transfer size. To put it simply: FMA is used for small transfers and BTE for large. FMA transfers are lower latency. BTE transfers take longer to start, but once running can transfer large amounts of data (upto 4GB) without CPU involvement. Denys Sobchyshak / TUM 8
9 Fault tolerance Gemini is designed for large systems in which failures are to be expected and applications must run on in the presence of errors. Each torus link comprises 4 groups of 3 lanes. Packet CRCs are checked by each device with automatic link level retry on error. In the event of the failure of a link, the router will select an alternate path of adaptively routed traffic. Gemini uses ECC to protect major memories and data paths within the device. Error-correcting code memory (Error Checking & Correction, ECC memory) is a type of computer data storage that can detect and correct the most common kinds of internal data corruption. A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents; on retrieval the calculation is repeated, and corrective action can be taken against presumed data corruption if the check values do not match. Note: ECC memory is a common thing even in less complex systems, for example a simple rack DIMM (Dual In-line Memory Module, think of it as a set of RAM slots) might work with ECC memory circuits. Denys Sobchyshak / TUM 9
10 Software stack Gemini supports both kernel communication, through a Linux device driver and direct user space communication, where the driver is used to establish communication domains and handle errors, but can be bypassed for data transfer. Parallel applications typically use a library such as Cray MPI or SHMEM in which the programmer makes explicit communication calls. Denys Sobchyshak / TUM 10
11 Interesting to know Performance metrics: on a quiet network, the end-point latency is 1.0μs or less for a remote put, 1.5μs or less for a small MPI message. The per hop latency is 105ns on a quiet network. Updating SeaStar to Gemini: The Gemini mezzanine card is pin compatible with the SeaStar network card used on Cray XT5 and XT6 systems allowing them to be upgraded to Gemini. The upgrade is straightforward, each blade is removed and its network card is replaced. There are no changes to the chassis, cabinets or cabling. Future of Cray Interconnects: Cray sold patterns (10% of whole pattern portfolio), technology itself (starting from Aries interconnect) and key people to Intel in 2012.The agreement excludes Intel from reselling standalone Aries chips to anyone but Cray till It s been clear that the important technology trends are starting to intersect, enabling more powerful and energy-efficient systems interconnects by more closely integrating this technology with the processor. (Cray CEO Peter Ungaro) What will be changing in 2017, and beyond, is that we won t continue to build system interconnect hardware. (Cray CEO Peter Ungaro) Denys Sobchyshak / TUM 11
12 References Denys Sobchyshak / TUM 12
Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET
CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every
The Gemini Network. Rev 1.1. Cray Inc.
The Gemini Network Rev 1.1 Cray Inc. 2010 Cray Inc. All Rights Reserved. Unpublished Proprietary Information. This unpublished work is protected by trade secret, copyright and other laws. Except as permitted
Cray XC Series Network. Bob Alverson, Edwin Froese, Larry Kaplan and Duncan Roweth Cray Inc.
Cray XC Series Network Bob Alverson, Edwin Froese, Larry Kaplan and Duncan Roweth Cray Inc. WP-Aries01-1112 www.cray.com Table of Contents Introduction... 3 Cost-Effective, High-Bandwidth Networks... 4
Router Architectures
Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components
Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School
Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
On-Chip Interconnection Networks Low-Power Interconnect
On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007 Outline Demand for On-Chip Networks
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
PCI Express and Storage. Ron Emerick, Sun Microsystems
Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature
Advanced Computer Networks. High Performance Networking I
Advanced Computer Networks 263 3501 00 High Performance Networking I Patrick Stuedi Spring Semester 2014 1 Oriana Riva, Department of Computer Science ETH Zürich Outline Last week: Wireless TCP Today:
EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation
PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
PCI Express* Ethernet Networking
White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network
SeaMicro SM10000-64 Server
SeaMicro SM10000-64 Server Building Datacenter Servers Using Cell Phone Chips Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Dhiraj Mallick, Jim Bauman, Sundar
Network Contention and Congestion Control: Lustre FineGrained Routing
Network Contention and Congestion Control: Lustre FineGrained Routing Matt Ezell HPC Systems Administrator Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory International Workshop on
PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation
PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
Certification: HP ATA Servers & Storage
HP ExpertONE Competency Model Certification: HP ATA Servers & Storage Overview Achieving an HP certification provides relevant skills that can lead to a fulfilling career in Information Technology. HP
Application Server Platform Architecture. IEI Application Server Platform for Communication Appliance
IEI Architecture IEI Benefits Architecture Traditional Architecture Direct in-and-out air flow - direction air flow prevents raiser cards or expansion cards from blocking the air circulation High-bandwidth
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Copyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)
Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers
Performance of Software Switching
Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance
Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network
A typical Linux cluster consists of a group of compute nodes for executing parallel jobs and a head node to which users connect to build and launch their jobs. Often the compute nodes are connected to
CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers
CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW Dell PowerEdge M-Series Blade Servers Simplifying IT The Dell PowerEdge M-Series blade servers address the challenges of an evolving IT environment by delivering
From Hypercubes to Dragonflies a short history of interconnect
From Hypercubes to Dragonflies a short history of interconnect William J. Dally Computer Science Department Stanford University IAA Workshop July 21, 2008 IAA: # Outline The low-radix era High-radix routers
Building Clusters for Gromacs and other HPC applications
Building Clusters for Gromacs and other HPC applications Erik Lindahl [email protected] CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
Unified Computing Systems
Unified Computing Systems Cisco Unified Computing Systems simplify your data center architecture; reduce the number of devices to purchase, deploy, and maintain; and improve speed and agility. Cisco Unified
Motherboard- based Servers versus ATCA- based Servers
Motherboard- based Servers versus ATCA- based Servers Summary: A comparison of costs, features and applicability for telecom application hosting After many years of struggling for market acceptance, it
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
Cray DVS: Data Virtualization Service
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
OpenSPARC T1 Processor
OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware
Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors
2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,
Service Partition Specialized Linux nodes. Compute PE Login PE Network PE System PE I/O PE
2 Service Partition Specialized Linux nodes Compute PE Login PE Network PE System PE I/O PE Microkernel on Compute PEs, full featured Linux on Service PEs. Service PEs specialize by function Software Architecture
Lustre Networking BY PETER J. BRAAM
Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information
Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)
White Paper Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability) White Paper July, 2012 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public
Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect
White PAPER Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect Introduction: High Performance Data Centers As the data center continues to evolve to meet rapidly escalating
Cisco MCS 7825-H3 Unified Communications Manager Appliance
Cisco MCS 7825-H3 Unified Communications Manager Appliance Cisco Unified Communications is a comprehensive IP communications system of voice, video, data, and mobility products and applications. It enables
HPC Update: Engagement Model
HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems [email protected] Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value
Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
HP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich [email protected]
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
Mellanox Academy Online Training (E-learning)
Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
RouteBricks: A Fast, Software- Based, Distributed IP Router
outebricks: A Fast, Software- Based, Distributed IP outer Brad Karp UCL Computer Science (with thanks to Katerina Argyraki of EPFL for slides) CS GZ03 / M030 18 th November, 2009 One-Day oom Change! On
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
Bivio 7000 Series Network Appliance Platforms
W H I T E P A P E R Bivio 7000 Series Network Appliance Platforms Uncompromising performance. Unmatched flexibility. Uncompromising performance. Unmatched flexibility. The Bivio 7000 Series Programmable
10GBASE T for Broad 10_Gigabit Adoption in the Data Center
10GBASE T for Broad 10_Gigabit Adoption in the Data Center Contributors Carl G. Hansen, Intel Carrie Higbie, Siemon Yinglin (Frank) Yang, Commscope, Inc 1 Table of Contents 10Gigabit Ethernet: Drivers
Virtualised MikroTik
Virtualised MikroTik MikroTik in a Virtualised Hardware Environment Speaker: Tom Smyth CTO Wireless Connect Ltd. Event: MUM Krackow Feb 2008 http://wirelessconnect.eu/ Copyright 2008 1 Objectives Understand
Why 25GE is the Best Choice for Data Centers
Why 25GE is the Best Choice for Data Centers Gilles Garcia Xilinx Director Wired Communication Business Santa Clara, CA USA April 2015 1 Outline - update Data center drivers Why 25GE The need for 25GE
Petascale Software Challenges. Piyush Chaudhary [email protected] High Performance Computing
Petascale Software Challenges Piyush Chaudhary [email protected] High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)
COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University [email protected] COMP
INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering
INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering Enquiry No: Enq/IITK/ME/JB/02 Enquiry Date: 14/12/15 Last Date of Submission: 21/12/15 Formal quotations are invited for HPC cluster.
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
PCI Express Overview. And, by the way, they need to do it in less time.
PCI Express Overview Introduction This paper is intended to introduce design engineers, system architects and business managers to the PCI Express protocol and how this interconnect technology fits into
InfiniBand Clustering
White Paper InfiniBand Clustering Delivering Better Price/Performance than Ethernet 1.0 Introduction High performance computing clusters typically utilize Clos networks, more commonly known as Fat Tree
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud
A Platform Built for Server Virtualization: Cisco Unified Computing System
A Platform Built for Server Virtualization: Cisco Unified Computing System What You Will Learn This document discusses how the core features of the Cisco Unified Computing System contribute to the ease
Supercomputing Clusters with RapidIO Interconnect Fabric
Supercomputing Clusters with RapidIO Interconnect Fabric Devashish Paul, Director Strategic Marketing, Systems Solutions [email protected] Ethernet Summit 2015 April 14-16, 2015 Santa Clara, CA Integrated
50. DFN Betriebstagung
50. DFN Betriebstagung IPS Serial Clustering in 10GbE Environment Tuukka Helander, Stonesoft Germany GmbH Frank Brüggemann, RWTH Aachen Slide 1 Agenda Introduction Stonesoft clustering Firewall parallel
LinuxWorld Conference & Expo Server Farms and XML Web Services
LinuxWorld Conference & Expo Server Farms and XML Web Services Jorgen Thelin, CapeConnect Chief Architect PJ Murray, Product Manager Cape Clear Software Objectives What aspects must a developer be aware
Switch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment
December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used
Software and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
Upgrading Small Business Client and Server Infrastructure E-LEET Solutions. E-LEET Solutions is an information technology consulting firm
Thank you for considering E-LEET Solutions! E-LEET Solutions is an information technology consulting firm that specializes in low-cost high-performance computing solutions. This document was written as
Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster
Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni ([email protected]) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data
UCS M-Series Modular Servers
UCS M-Series Modular Servers The Next Wave of UCS Innovation Marian Klas Cisco Systems June 2015 Cisco UCS - Powering Applications at Every Scale Edge-Scale Computing Cloud-Scale Computing Seamlessly Extend
Cloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest
I3: Maximizing Packet Capture Performance. Andrew Brown
I3: Maximizing Packet Capture Performance Andrew Brown Agenda Why do captures drop packets, how can you tell? Software considerations Hardware considerations Potential hardware improvements Test configurations/parameters
SAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
Topological Properties
Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any
FPGA-based MapReduce Framework for Machine Learning
FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China
Hadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
HPC Software Requirements to Support an HPC Cluster Supercomputer
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
How To Build A Cisco Ukcsob420 M3 Blade Server
Data Sheet Cisco UCS B420 M3 Blade Server Product Overview The Cisco Unified Computing System (Cisco UCS ) combines Cisco UCS B-Series Blade Servers and C-Series Rack Servers with networking and storage
Kriterien für ein PetaFlop System
Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working
OpenSoC Fabric: On-Chip Network Generator
OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation
Fault Tolerance & Reliability CDA 5140. Chapter 3 RAID & Sample Commercial FT Systems
Fault Tolerance & Reliability CDA 5140 Chapter 3 RAID & Sample Commercial FT Systems - basic concept in these, as with codes, is redundancy to allow system to continue operation even if some components
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels Jiannan Ouyang, Brian Kocoloski, John Lange The Prognostic Lab @ University of Pittsburgh Kevin Pedretti Sandia National Laboratories HPDC 2015
What is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
