FPGA-Accelerated Heterogeneous Hyperscale Server Architecture for Next-Generation Compute Clusters
|
|
|
- Harvey Wright
- 9 years ago
- Views:
Transcription
1 FPGA-Accelerated Heterogeneous Hyperscale Server Architecture for Next-Generation Clusters Rene Griessl, Peykanu Meysam, Jens Hagemeyer, Mario Porrmann Bielefeld University, Germany Stefan Krupop, Micha vor dem Berge Christmann, Germany Lars Kosmann, Patrick Knocke OFFIS, Germany Michał Kierzynka, Ariel Oleksiak Poznan Supercomputing and Networking Center, Poland
2 FiPS Motivation Rising demand for computing power Complexity of calculations Increasing number of users Energy consumption is rising 15 % of electrical energy consumption by computers CO 2 emissions Systems have to be cooled Complexity/Cost rising FiPS (Field Programmable Servers) Founded by EC Integrate FPGA / ARM 2
3 RECS Box Concept One single rack with up to 100 TFLOPS Up to 2,400 Microservers Up to 200 accelerators (e.g. GPGPU, Xeon Phi, FPGA) Up to 2.5 PB storage Up to 70 TB of RAM Main components RCU: RECS Box Unit Contains CPU nodes, networking, management RPU: RECS Box Power Unit Intelligent power supply for RCUs Can deliver power for multiple RCUs 3
4 Backplane Backplane Backplane RECS Box System Overview Modular Microserver Architecture Arbitrary combinations of up to 72 Microservers in a single 1 RU Server Front Panel 4
5 Backplane Backplane Backplane RECS Box System Overview Modular Microserver Architecture Arbitrary combinations of up to 72 Microservers in a single 1 RU Server Communication Backplane Scalable from 6 to 18 compute boards Flexible communication using dedicated Net-s s Modular Architecture Integrate microservers based on traditional CPUs and mobile CPUs Hardware Accelerators Integrated in specialized microserver Attached via PCIe Front Panel 5
6 RECS Box Communication Backplane 18 x 1/10 GbE from Net s KVM/SM-Bus Three Levels of Interconnect Monitoring and Control Distributed monitoring environment Continuous sensing of volt., curr., temp., Integrated KVM Management Gb Ethernet, switched on backplane High Throughput 10 GbE Infiniband Network (PCIe/GbE) RECS RECS RECS Network (PCIe/GbE) Network (PCIe/GbE) Network 18 x 1/10 GbE to ext. Switch 2 x 1/10 GbE Net (1/10 GbE) 2 x 1/10 GbE Net (1/10 GbE) 2 x 1/10 GbE Net (1/10 GbE) GbE Switch 4 x GbE KVM, Monitoring & Control USB, HDMI, Ethernet Backplane Management GbE Network (PCIe/GbE) RECS Network (PCIe/GbE) RECS Network (PCIe/GbE) RECS Front Panel 6
7 RECS Box Microserver Overview Apalis-based microservers COM Express-based microservers ARM CPUs ranging from Cortex- A9 to Cortex-A15 Xilinx Zynq-based module x86 modules range from Atom single cores to i7 quad core FPGA-based COM Express module developed, integrating Xilinx Zynq SoC 7
8 RECS Box Microserver Zynq COM Express Module Zynq-7000, ARM A9 Dual Core, 1 GHz Tightly integrated Programmable Logic Used to extend Processing System High performance ARM AXI interfaces IP cores on PL Memory interfaces 1 GByte of DDR3 (32-bit) PS Memory up to 4 GByte DDR3 SO-DIMM module (64-bit-wide) PL Memory emmc memory (16 GByte) Secure Digital (SD) card slot Management network: Gigabit Ethernet (PS) High-speed serial links (PCIe 2.1, 5 Gb/s) PCIe-based high toughput network can be implemented 8
9 Processing System Programmable Logic I/O Zynq-7000 Overview Zynq-7000 AP SoC Devices Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100 Processor Core Dual ARM Cortex -A9 MPCore Processor Extensions NEON & Single / Double Precision Floating Point Max Frequency 866 MHz (-3) / 766 MHz (-2) 1 GHz (-3) / 800 MHz (-2) Memory External Memory Support Peripherals L1 Cache 32KB I / D, L2 Cache 512KB, on-chip Memory 256KB DDR3, DDR2, LPDDR2, 2x QSPI, NAND, NOR 2x USB 2.0 (OTG), 2x Tri-mode Gigabit Ethernet, 2x SD/SDIO, 2x UART, 2x CAN 2.0B, 2x I2C, 2x SPI, 4x 32b GPIO Approximate ASIC Gates Peak DSP Performance (Symmetric FIR) PCI Express (Root Complex or Endpoint) Agile Mixed Signal (XADC) ~430K (30k LC) ~1.1K (74k LC) ~1.3M (85k LC) ~1.9M (125k LC) ~4.1M (275k LC) ~5.2M (350k LC) ~6.6M (444k LC) Block RAM 240KB 380KB 560KB 1,060KB 2,000KB 2,180KB 3,020KB 100 GMACS 160 GMACS 276 GMACS 593 GMACS 1,334 GMACS 1334 GMACS 2622 GMACS - Gen2 x4 Gen2 x8 Gen2 x8 Gen2 x8 2x 12bit 1Msps A/D Converter Processor System IO 130 Multi Standards 3.3V IO Multi Standards High Performance 1.8V IO Multi Gigabit Transceivers - 4 (6.25 Gbit/s) - 4 (12.5 Gbit/s) 8 (12.5 Gbit/s) 8 (12.5 Gbit/s) 16 (12.5 Gbit/s) [Xilinx, Zynq-7000 All Programmable SoC Overview, 2014] 9
10 SO-DIMM Module Zynq-7000 COM Express Mechanical Overview PS-DDR3 RAM HS Connector Piggyback Connector Zynq-7000 Crosspoint Switch Array COM-Express Connector Microcontroller HS Connector 10
11 RECS BOX Communication Backplane 18 x 1/10 GbE from Net s KVM/SM-Bus Fourth Level of Interconnect Direct low latency 8x 12.5 Gb/s 300 ns Network (PCIe/GbE) RECS 2 x 1/10 GbE Net (1/10 GbE) 2 x 1/10 GbE Network (PCIe/GbE) RECS High Throughput 10 GbE Infiniband Management Gb Ethernet Network (PCIe/GbE) RECS Network (PCIe/GbE) RECS Net (1/10 GbE) 2 x 1/10 GbE Net (1/10 GbE) 4 x GbE Network (PCIe/GbE) RECS Network (PCIe/GbE) RECS Monitoring and Control GbE Switch Backplane Network 18 x 1/10 GbE to ext. Switch KVM, Monitoring & Control USB, HDMI, Ethernet Management GbE Front Panel 11
12 Benchmark DNA sequencing GPU implementation Needleman-Wunsch and dynamic programming (DP): Data dependencies: left, upper and diagonal elements are needed H i, j = max H i 1, j G penalty H i, j 1 G penalty H i 1, j 1 + SM(s 1 i, s 2 [j]) GPU implementation: The whole matrix is processed by a single GPU thread, thousands of threads work in parallel MxN matrix is divided into sub-matricies of KxK (K is the unroll factor) Up to 256 cells computed from a single data fetch Highly optimised for NVIDIA Fermi architecture using CUDA 12
13 Benchmark DNA sequencing FPGA implementation Based on Vivado HLS Starting from basic C Needleman- Wunsch implementation Each PE calculates one submatrix of KxK nucleoids Systolic array style pipeline structure 11 PEs fit in Zynq-7045 Zynq PS initializes and manages data flow Direct low latency links can be used for multi FPGA architecture 32-bit unsigned int ACAGTATAGATTACAT ACAGTATAGATTACAT ACAGTATAGATTACAT ACAGTATAGATTACAT PE 1 PE 2 PE 3 PE 1 PE 2 PE 3 PE 1 PE 2 PE 3 Total FPGA resources (11 PE) 13
14 Benchmark DNA sequencing Results Fastest implementation on Tesla GPU Most energy efficient implementation on FPGA Live Demo available Visit Booth
15 Summary and Outlook RECS Resource Efficient Cluster Server o Integration of CPUs, embedded CPUs, GPGPUs and FPGAs Evaluation using DNA sequencing o HLS FPGA implementation provides maximum energy efficency Outlook: EC Horizon 2020 project M2DC Modular Microserver Data Center Architectural improvements focusing on communication New Microservers (64-bit ARM, Zynq Ultrascale+, Hybrid Memory Cube) Large scale testbeds / applications 15
16 Many Thanks! Visit RECS at booth 2303 René Griessl Cognitronics and Sensor Systems Center of Excellence Cognitive Interaction Technology Bielefeld University, Germany
17 Zynq-7000 COM Express System Architecture COM Express Connector I²C UART CAN COM-Express-Carrier DDR3 SODIMM 8 GByte/ 64 Bit USB Hub USB2514B USB SATA TUSB9261 SD Card QSPI Flash S25FL256S emmc Flash KLMxGxxE2x I²C Display Port ANX9804 USB 2.0 PHY USB3320 USB 2.0 PHY USB3320 DDR3 1 GByte/ 32 Bit GbE PHY 88E1116R ZYNQ-7000-PL AXI EMIO ZYNQ-7000-PS Compatible with COM Express Type 6 Standards 17
All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule
All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften
ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG
ARM Processors for Computer-On-Modules Christian Eder Marketing Manager congatec AG COM Positioning Proprietary Modules Qseven COM Express Proprietary Modules Small Module Powerful Module No standard feature
SABRE Lite Development Kit
SABRE Lite Development Kit Freescale i.mx 6Quad ARM Cortex A9 processor at 1GHz per core 1GByte of 64-bit wide DDR3 @ 532MHz UART, USB, Ethernet, CAN, SATA, SD, JTAG, I2C Three Display Ports (RGB, LVDS
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
Accelerate Cloud Computing with the Xilinx Zynq SoC
X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec
Standardization with ARM on COM Qseven Zeljko Loncaric, Marketing engineer congatec overview COM concept and ARM positioning ARM vendor and standard decision Freescale ARM COM on Qseven conga-qmx6 mulitmedia
Zynq SATA Storage Extension (Zynq SSE) - NAS. Technical Brief 20140501 from Missing Link Electronics:
Technical Brief 20140501 from Missing Link Electronics: Zynq SSE for Network-Attached Storage for the Avnet Mini-ITX For the evaluation of Zynq SSE MLE supports two separate hardware platforms: The Avnet
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended
Cisco Unified Computing System Hardware
Cisco Unified Computing System Hardware C22 M3 C24 M3 C220 M3 C220 M4 Form Factor 1RU 2RU 1RU 1RU Number of Sockets 2 2 2 2 Intel Xeon Processor Family E5-2400 and E5-2400 v2 E5-2600 E5-2600 v3 Processor
EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation
PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab
FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search
How To Build An Ark Processor With An Nvidia Gpu And An African Processor
Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
Model-based system-on-chip design on Altera and Xilinx platforms
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect [email protected] Agenda 3T Company profile Technology
CFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
Getting Started with the Xilinx Zynq- 7000 All Programmable SoC Mini-ITX Development Kit
Getting Started with the Xilinx Zynq- 7000 All Programmable SoC Mini-ITX Development Kit Table of Contents ABOUT THIS GUIDE... 3 ADDITIONAL DOCUMENTATION... 3 ADDITIONAL SUPPORT RESOURCES... 3 INTRODUCTION...
PCI Express and Storage. Ron Emerick, Sun Microsystems
Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature
Network Security Appliance. Overview Performance Platform Mainstream Platform Desktop Platform Industrial Firewall
9 Network Security Appliance Oeriew Performance Platform Mainstream Platform Desktop Platform Industrial Firewall Is Your Info Protected? The inention of the Internet has broken down geographic barriers
HPC Update: Engagement Model
HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems [email protected] Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value
System Design Issues in Embedded Processing
System Design Issues in Embedded Processing 9/16/10 Jacob Borgeson 1 Agenda What does TI do? From MCU to MPU to DSP: What are some trends? Design Challenges Tools to Help 2 TI - the complete system The
PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation
PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
System Performance Analysis of an All Programmable SoC
XAPP1219 (v1.1) November 5, 2015 Application Note: Zynq-7000 AP SoC System Performance Analysis of an All Programmable SoC Author: Forrest Pickett Summary This application note educates users on the evaluation,
ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654
ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654 Features ARM Cortex-A8 processor, 800MHz Xilinx Spartan-6 FPGA expands vision processing capabilities Dual MIPI CSI-2 CMOS camera ports,
Cloud Data Center Acceleration 2015
Cloud Data Center Acceleration 2015 Agenda! Computer & Storage Trends! Server and Storage System - Memory and Homogenous Architecture - Direct Attachment! Memory Trends! Acceleration Introduction! FPGA
SC1-ALLEGRO CompactPCI Serial CPU Card Intel Core i7-3xxx Processor Quad-Core (Ivy Bridge)
Product Information SC1-ALLEGRO CompactPCI Serial CPU Card Intel Core i7-3xxx Processor Quad-Core (Ivy Bridge) Document No. 6460 4 December 2013 The SC1-ALLEGRO is a rich featured high performance 4HP/3U
Supercomputing Clusters with RapidIO Interconnect Fabric
Supercomputing Clusters with RapidIO Interconnect Fabric Devashish Paul, Director Strategic Marketing, Systems Solutions [email protected] Ethernet Summit 2015 April 14-16, 2015 Santa Clara, CA Integrated
Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers
Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
præsentation oktober 2011
Johnny Olesen System X presale præsentation oktober 2011 2010 IBM Corporation 2 Hvem er jeg Dagens agenda Server overview System Director 3 4 Portfolio-wide Innovation with IBM System x and BladeCenter
Chapter 4 System Unit Components. Discovering Computers 2012. Your Interactive Guide to the Digital World
Chapter 4 System Unit Components Discovering Computers 2012 Your Interactive Guide to the Digital World Objectives Overview Differentiate among various styles of system units on desktop computers, notebook
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
High-Density Network Flow Monitoring
Petr Velan [email protected] High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible
Chapter 5 Cubix XP4 Blade Server
Chapter 5 Cubix XP4 Blade Server Introduction Cubix designed the XP4 Blade Server to fit inside a BladeStation enclosure. The Blade Server features one or two Intel Pentium 4 Xeon processors, the Intel
Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
Storage Architectures. Ron Emerick, Oracle Corporation
PCI Express PRESENTATION and Its TITLE Interfaces GOES HERE to Flash Storage Architectures Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest
High speed pattern streaming system based on AXIe s PCIe connectivity and synchronization mechanism
High speed pattern streaming system based on AXIe s connectivity and synchronization mechanism By Hank Lin, Product Manager of ADLINK Technology, Inc. E-Beam (Electron Beam) lithography is a next-generation
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
HP Moonshot: An Accelerator for Hyperscale Workloads
HP Moonshot: An Accelerator for Hyperscale Workloads Sponsored by HP, see HP Moonshot for more information www.hp.com/go/moonshot Executive Summary Hyperscale data center customers have specialized workloads,
Industry First X86-based Single Board Computer JaguarBoard Released
Industry First X86-based Single Board Computer JaguarBoard Released HongKong, China (May 12th, 2015) Jaguar Electronic HK Co., Ltd officially launched the first X86-based single board computer called JaguarBoard.
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION AFFORDABLE, RELIABLE, AND GREAT PRICES FOR EDUCATION Optimized Sun systems run Oracle and other leading operating and virtualization platforms with greater
Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security
NVIDIA Jetson TK1 Development Kit
Technical Brief NVIDIA Jetson TK1 Development Kit Bringing GPU-accelerated computing to Embedded Systems P a g e 2 V1.0 P a g e 3 Table of Contents... 1 Introduction... 4 NVIDIA Tegra K1 A New Era in Mobile
Reconfigurable System-on-Chip Design
Reconfigurable System-on-Chip Design MITCHELL MYJAK Senior Research Engineer Pacific Northwest National Laboratory PNNL-SA-93202 31 January 2013 1 About Me Biography BSEE, University of Portland, 2002
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
Intel Xeon Processor E5-2600
Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Architecting High-Speed Data Streaming Systems. Sujit Basu
Architecting High-Speed Data Streaming Systems Sujit Basu stream ing [stree-ming] verb 1. The act of transferring data to or from an instrument at a rate high enough to sustain continuous acquisition or
Experiences With Mobile Processors for Energy Efficient HPC
Experiences With Mobile Processors for Energy Efficient HPC Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, Alex Ramirez Barcelona Supercomputing Center Universitat Politècnica
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q
FPGA-based MapReduce Framework for Machine Learning
FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China
Copyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring
CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract
Building a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
Our innovation, Your Applications. Your Own Custom Embedded Board in 5 weeks!
Our innovation, Your Applications Your Own Custom Embedded Board in 5 weeks! What is Mi-embedded? 4 Boards, 5 weeks, 6k, almost as easy as 1,2,3 Long Product life 7 year extended lifetime CPUs Extended
High-Level Synthesis for FPGA Designs
High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch
A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.
Appro HyperBlade A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing. Appro HyperBlade clusters are flexible, modular scalable offering a high-density
ZigBee Technology Overview
ZigBee Technology Overview Presented by Silicon Laboratories Shaoxian Luo 1 EM351 & EM357 introduction EM358x Family introduction 2 EM351 & EM357 3 Ember ZigBee Platform Complete, ready for certification
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
Innovative development and deployment of Intuitive Human Machine Interface for embedded applications
Innovative development and deployment of Intuitive Human Machine Interface for embedded applications MSC @ Glance MSC is operating in two business areas Distribution Technical Department Boards Systems
Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez
Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built
PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation
PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
Open Flow Controller and Switch Datasheet
Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development
Pre-tested System-on-Chip Design. Accelerates PLD Development
Pre-tested System-on-Chip Design Accelerates PLD Development March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 Pre-tested
Kalray MPPA Massively Parallel Processing Array
Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
Product Brief. R7A-200 Processor Card. Rev 1.0
Product Brief R7A-200 Processor Card Rev 1.0 Order Codes for Default Configuration: 900-015-601 900-017-601 R7A-200 Broadcom XLR732 atca Processor/Switch Board with dual 1.0Ghz CPU's, 8GB DDR2 per XLR,
WiSER: Dynamic Spectrum Access Platform and Infrastructure
WiSER: Dynamic Spectrum Access Platform and Infrastructure I. Seskar, D. Grunwald, K. Le, P. Maddala, D. Sicker, D. Raychaudhuri Rutgers, The State University of New Jersey University of Colorado, Boulder
OpenSoC Fabric: On-Chip Network Generator
OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation
VPX Implementation Serves Shipboard Search and Track Needs
VPX Implementation Serves Shipboard Search and Track Needs By: Thierry Wastiaux, Senior Vice President Interface Concept Defending against anti-ship missiles is a problem for which high-performance computing
sontheim Wir leben Elektronik! We live electronics! Industrie Elektronik GmbH Computer-on-Modules Overview of our Computer-on-Modules
Wir leben Elektronik! We live electronics! sontheim Industrie Elektronik GmbH Computer-on-Modules Overview of our Computer-on-Modules 04 Computer-on-Modules Overview of our Computer-on-Modules Computer-on-Modules
A-CLASS The rack-level supercomputer platform with hot-water cooling
A-CLASS The rack-level supercomputer platform with hot-water cooling INTRODUCTORY PRESENTATION JUNE 2014 Rev 1 ENG COMPUTE PRODUCT SEGMENTATION 3 rd party board T-MINI P (PRODUCTION): Minicluster/WS systems
760 Veterans Circle, Warminster, PA 18974 215-956-1200. Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974.
760 Veterans Circle, Warminster, PA 18974 215-956-1200 Technical Proposal Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974 for Conduction Cooled NAS Revision 4/3/07 CC/RAIDStor: Conduction
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems
PCI Express Impact on Storage Architectures Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may
Discovering Computers 2011. Living in a Digital World
Discovering Computers 2011 Living in a Digital World Objectives Overview Differentiate among various styles of system units on desktop computers, notebook computers, and mobile devices Identify chips,
FPO. Expanding Intel Architecture Flexibility in the Data Center. Markus Leberecht Data Center Solutions Architect, Intel EMEA March 20, 2013
FPO Expanding Intel Architecture Flexibility in the Data Center Markus Leberecht Data Center Solutions Architect, Intel EMEA March 20, 2013 Agenda Micro Servers Properties and Benefits Workload Suitability
Video/Cameras, High Bandwidth Data Handling on imx6 Cortex-A9 Single Board Computer
Video/Cameras, High Bandwidth Data Handling on imx6 Cortex-A9 Single Board Computer The SBC4661 is a powerful 1 GHz Quad Core Cortex-A9 with multiple video ports, quad USB3.0 and dual GigE Ethernet. Using
HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server
HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server HUAWEI FusionServer X6800 Data Center Server Data Center Cloud Internet App Big Data HPC As the IT infrastructure changes with
Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC
Nutaq PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET QUEBEC I MONTREAL I N E W YO R K I nutaq.com Nutaq PicoDigitizer 125-Series The PicoDigitizer 125-Series
Redefining Flash Storage Solution
Redefining Flash Storage Solution Through Capacity + Efficiency + Performance + Form PRODUCT GUIDE Holistic Approach to Redefine Flash Storage Novachips is a leading provider of a broad range of Flash
Servers, Clients. Displaying max. 60 cameras at the same time Recording max. 80 cameras Server-side VCA Desktop or rackmount form factor
Servers, Clients Displaying max. 60 cameras at the same time Recording max. 80 cameras Desktop or rackmount form factor IVR-40/40-DSKT Intellio standard server PC 60 60 Recording 60 cameras Video gateway
Motherboard- based Servers versus ATCA- based Servers
Motherboard- based Servers versus ATCA- based Servers Summary: A comparison of costs, features and applicability for telecom application hosting After many years of struggling for market acceptance, it
Pedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
HUAWEI Tecal E6000 Blade Server
HUAWEI Tecal E6000 Blade Server Professional Trusted Future-oriented HUAWEI TECHNOLOGIES CO., LTD. The HUAWEI Tecal E6000 is a new-generation server platform that guarantees comprehensive and powerful
Features Rich Expansion. Specifications Dimensions Optional Kit. Packing List Ordering Information Optional Modules
AFL2-W15A-N270/ L325 IEI designed a completely new modular architecture for its new AFOLUX GEN II panel PCs. A variety of expansion module boards connecting through a common interface give it functional
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation
Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance Alex Ho, Product Manager Innodisk Corporation Outline Innodisk Introduction Industry Trend & Challenge
