How To Build An Ark Processor With An Nvidia Gpu And An African Processor



Similar documents
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Data Center and Cloud Computing Market Landscape and Challenges

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Computer Graphics Hardware An Overview

ICT4 - Customised and low power computing

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

#OpenPOWERSummit. Join the conversation at #OpenPOWERSummit 1

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Infrastructure Matters: POWER8 vs. Xeon x86

HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Next Generation Operating Systems

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

The Future of the ARM Processor in Military Operations

ICRI-CI Retreat Architecture track

Low power GPUs a view from the industry. Edvard Sørgård

Intel PCI and PCI Express*

FPGA-based Multithreading for In-Memory Hash Joins

Xeon+FPGA Platform for the Data Center

Mobile Processors: Future Trends

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec

System Models for Distributed and Cloud Computing

Embedded Systems: map to FPGA, GPU, CPU?

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Supercomputing Clusters with RapidIO Interconnect Fabric

CFD Implementation with In-Socket FPGA Accelerators

Kalray MPPA Massively Parallel Processing Array

Data Centric Systems (DCS)

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Industry First X86-based Single Board Computer JaguarBoard Released

PCI Express Basic Info *This info also applies to Laptops

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks

GPU-based Decompression for Medical Imaging Applications

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

Processor Architectures

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

Direct GPU/FPGA Communication Via PCI Express

Findings in High-Speed OrthoMosaic

Computer Architectures

FPGA-based MapReduce Framework for Machine Learning

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

high-performance computing so you can move your enterprise forward

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Unified Computing Systems

Summit and Sierra Supercomputers:


Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?

HP Moonshot: An Accelerator for Hyperscale Workloads

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

How PCI Express Works (by Tracy V. Wilson)

Next Generation GPU Architecture Code-named Fermi

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

LS DYNA Performance Benchmarks and Profiling. January 2009

Pedraforca: ARM + GPU prototype

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

AppliedMicro s X-Gene: Minimizing Power in Data-Center Servers

Performance Measurement of a High-Performance Computing System Utilized for Electronic Medical Record Management

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

HP ProLiant SL270s Gen8 Server. Evaluation Report

GPU Parallel Computing Architecture and CUDA Programming Model

Which ARM Cortex Core Is Right for Your Application: A, R or M?

How To Build A Cloud Computer

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

PCI Express: Interconnect of the future

Kriterien für ein PetaFlop System

White Paper Streaming Multichannel Uncompressed Video in the Broadcast Environment

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1

OpenSoC Fabric: On-Chip Network Generator

Trends in High-Performance Computing for Power Grid Applications

[Whitepaper] New Generation RISC Processing Power - Green Technology Engenders New Business Opportunities

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

InfiniBand -- Industry Standard Data Center Fabric is Ready for Prime Time

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Introduction to GPGPU. Tiziano Diamanti

~ Greetings from WSU CAPPLab ~

7a. System-on-chip design and prototyping platforms

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Accelerating CFD using OpenFOAM with GPUs

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Cloud Data Center Acceleration 2015

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

HPC Update: Engagement Model

Cabot Partners. 3D Virtual Desktops that Perform. Cabot Partners Group, Inc. 100 Woodcrest Lane, Danbury CT 06810,

Hardware Acceleration for CST MICROWAVE STUDIO

NVIDIA Jetson TK1 Development Kit

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

Emerging Solutions. Laura Stark Senior Vice President and General Manager

Transcription:

Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/

Project Denver Announced On January 5, 2011, NVIDIA announced that it is developing high-performance ARM-based CPUs designed to power future products ranging from personal computers to servers and supercomputers. Known under the internal codename Project Denver, this initiative features an NVIDIA CPU running the ARM instruction set, which will be fully integrated on the same chip as the NVIDIA GPU. This initiative is extremely important for NVIDIA and the computing industry for several reasons. 2

Heterogeneous Computing Platform NVIDIA s Project Denver will usher in a new era for computing by extending the performance range of the ARM instruction-set architecture, enabling the ARM architecture to cover a larger portion of the computing space. Coupled with an NVIDIA GPU, it will provide the heterogeneous computing platform of the future by combining a standard architecture with awesome performance and energy efficiency. 3

Project Denver: High-end ARM CPU ARM is already the standard architecture for mobile devices. Project Denver extends the range of ARM systems upward to PCs, data center servers, and supercomputers. ARM s modern architecture, open business model, and vibrant eco-system have led to its pervasiveness in cell phones, tablets, and other embedded devices. Denver is the catalyst that will enable these same factors to propel ARM to become pervasive in higherend systems. 4

Freedom Denver frees PCs, workstations and servers from the hegemony and inefficiency of the x86 architecture. For several years, makers of high-end computing platforms have had no choice about instruction-set architecture. The only option was the x86 instruction set with variable-length instructions, a small register set, and other features that interfered with modern compiler optimizations, required a larger area for instruction decoding, and substantially reduced energy efficiency. 5

Choice Denver provides a choice. System builders can now choose a high-performance processor based on a RISC instruction set with modern features such as fixed-width instructions, predication, and a large general register file. These features enable advanced compiler techniques and simplify implementation, ultimately leading to higher performance and a more energy-efficient processor. 6

Windows on ARM Microsoft s announcement that it is bringing Windows to ultra-low power processors like ARM-based CPUs provides the final ingredient needed to enable ARMbased PCs based on Denver. Along with software stacks based on Android, Symbian, and ios, Windows for ultra-low power processors demonstrates the huge momentum behind low-power solutions that will ultimately propel the ARM architecture to dominance. 7

CPU+GPU An ARM processor coupled with an NVIDIA GPU represents the computing platform of the future. A high-performance CPU with a standard instruction set will run the serial parts of applications and provide compatibility while a highly-parallel, highly-efficient GPU will run the parallel portions of programs. 8

Project Denver Result The result is that future systems from the thinnest laptops to the biggest data centers, and everything in between will deliver an outstanding combination of performance and power efficiency. Their processors will provide the best of both worlds, while enabling increased battery life for mobile solutions. NVIDIA Confidential 9

FPGAs in a Denver World Mike Butts FPGA 2011 Pre-Conference Workshop 2/27/11: The Role of FPGAs in a Converged Future with Heterogeneous Programmable Processors

Scalable Silicon = GPUs and FPGAs Only massively parallel silicon scales with Moore s Law GPUs and FPGAs have complimentary strengths GPUs Processor parallelism Software programming Max TFLOPS, TF/W, TF/$ Max memory bandwidth Max PCIe bandwidth FPGAs Boolean parallelism Hardware programming Non-arithmetic datatypes Irregular fine-grain control 1000 programmable pins NVIDIA Confidential Mike Butts 11

Role of FPGA + (GPU+CPU)? Sensors SPI-4, XAUI, etc. Wireless baseband FPGA PCIe GPU CPU PCIe GigE Other nodes Storage Networks Memory PCI Express is the system connection for GPUs and CPUs PCIe Gen 3 has latency optimizations for accelerators FPGA as sensor or comm front-end: 1000 programmable pins Medical ultrasound: 64-256 12-16-bit ADCs @ 50-100 MHz: 400 Gb/s FPGA integer beamforming front-end feeds GPU compute software FPGA as accelerator: App-specific datatypes, fine irregular control Get good at low latency with PCIe Gen3 NVIDIA Confidential Mike Butts 12