Flot de conception d applications parallèles sur plateforme reconfigurable dynamiquement

Similar documents
FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Kirchhoff Institute for Physics Heidelberg

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Laboratoryof Electronics, Antennas and Telecommunications (UMR 7248)

Networking Virtualization Using FPGAs

Design and Implementation of the Heterogeneous Multikernel Operating System

Extending the Power of FPGAs. Salil Raje, Xilinx

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz

Router Architectures

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Data Center and Cloud Computing Market Landscape and Challenges

FPGA area allocation for parallel C applications

Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs

Next Generation Operating Systems

LS DYNA Performance Benchmarks and Profiling. January 2009

ReCoSoC'11 Montpellier, France. Implementation Scenario for Teaching Partial Reconfiguration of FPGA

ECLIPSE Performance Benchmarks and Profiling. January 2009

Open Flow Controller and Switch Datasheet

Performance Oriented Management System for Reconfigurable Network Appliances

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

Cryptography & Network-Security: Implementations in Hardware

How To Write Security Enhanced Linux On Embedded Systems (Es) On A Microsoft Linux (Amd64) (Amd32) (A Microsoft Microsoft 2.3.2) (For Microsoft) (Or

Simple Introduction to Clusters

High-performance reconfigurable computers

A General Framework for Tracking Objects in a Multi-Camera Environment

Operating System for the K computer

VPX Implementation Serves Shipboard Search and Track Needs

Arquitectura Virtex. Delay-Locked Loop (DLL)

Distributed Reconfigurable Hardware for Image Processing Acceleration

Cellular Computing on a Linux Cluster

The MeeGo Multimedia Stack. Dr. Stefan Kost Nokia - The MeeGo Multimedia Stack - CELF Embedded Linux Conference Europe

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

High-Density Network Flow Monitoring

Laboratory of Electronics, Antennas and Telecommunications (UMR 7248)

Cluster, Grid, Cloud Concepts

7a. System-on-chip design and prototyping platforms

Is High-Performance Reconfigurable Computing the Next Supercomputing Paradigm?

AHCI and NVMe as Interfaces for SATA Express Devices - Overview

Xeon+FPGA Platform for the Data Center

Intel DPDK Boosts Server Appliance Performance White Paper

Operating Systems (Linux)

Principles and characteristics of distributed systems and environments

Experience with the integration of distribution middleware into partitioned systems

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Accelerate Cloud Computing with the Xilinx Zynq SoC

Component Based Software Design using CORBA. Victor Giddings, Objective Interface Systems Mark Hermeling, Zeligsoft

Reconfig'09 Cancun, Mexico

Design Patterns for Packet Processing Applications on Multi-core Intel Architecture Processors

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Open Network Install Environment (ONIE) LinuxCon North America 2015

Job Management System Extension To Support SLAAC-1V Reconfigurable Hardware

FlexPath Network Processor

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

FPGA-based Multithreading for In-Memory Hash Joins

An Open Architecture through Nanocomputing

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Speaker: Dr. Whai-En Chen

Asymmetry Everywhere (with Automatic Resource Management) Onur Mutlu

Using PCI Express Technology in High-Performance Computing Clusters

Optimizing service availability in VoIP signaling networks, by decoupling query handling in an asynchronous RPC manner

PCI Express and Storage. Ron Emerick, Sun Microsystems

UNLOCK YOUR IEC TESTING EXCELLENCE

Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012

Embedded Systems: map to FPGA, GPU, CPU?

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Energiatehokas laskenta Ubi-sovelluksissa

Energy-aware job scheduler for highperformance

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

Linux Process Scheduling Policy

Solid State Storage in Massive Data Environments Erik Eyberg

COS 318: Operating Systems. Virtual Machine Monitors

HP StorageWorks MPX200 Simplified Cost-Effective Virtualization Deployment

Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U

Programming and Scheduling Model for Supporting Heterogeneous Architectures in Linux

How Solace Message Routers Reduce the Cost of IT Infrastructure

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Developing reliable Multi-Core Embedded-Systems with NI Linux Real-Time

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

Mellanox Academy Online Training (E-learning)

EMC ISILON AND ELEMENTAL SERVER

Cray DVS: Data Virtualization Service

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

GATEWAY TRAFFIC COMPRESSION

Managing Variability in Software Architectures 1 Felix Bachmann*

A Distributed Render Farm System for Animation Production

The Design and Implementation of Content Switch On IXP12EB

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

Recent Advances in Circuits, Communications and Signal Processing

How A V3 Appliance Employs Superior VDI Architecture to Reduce Latency and Increase Performance

Switch Fabric Implementation Using Shared Memory

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

COS 318: Operating Systems. Virtual Machine Monitors

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Cryptanalysis with a cost-optimized FPGA cluster

Improving Scalability for Citrix Presentation Server

Transcription:

Flot de conception d applications parallèles sur plateforme reconfigurable dynamiquement Clément Foucher, Fabrice Muller et Alain Giulieri Université de Nice-Sophia Antipolis (UNS), (LEAT/ CNRS) {Clement.Foucher ; Fabrice.Muller ; Alain.Giulieri}@unice.fr

Plan Parallel and reconfigurable computing today Context Systems limitations Our proposal: the SPORE system Implementing SPORE Application design The hardware platform and its implementations Conclusion & future works 2 /28

Parallel systems Personal computers Low amount of software cores 3 /28

Parallel systems Personal computers Low amount of software cores Manycore systems Still rising 3 /28

Parallel systems Personal computers Low amount of software cores Manycore systems Still rising High performance computers Massively parallel software systems 3 /28

Reconfigurable systems Generic hardware Arrays of reconfigurable elements linked by a configurable network Configurable into particular systems Blank FPGA Routed FPGA 4 /28

Reconfigurable systems Generic hardware Arrays of reconfigurable elements linked by a configurable network Configurable into particular systems Evolution: partial dynamic reconfiguration Change only a part of the device on the fly Dynamically reconfigurable areas Implementation 1 Implementation 2 FPGA 4 /28

Reconfigurable parallel systems High Performance Reconfigurable Computers (HPRC) Introduce generic hardware Accelerate specific portions of code ( application kernels ) by devolving computations to hardware accelerators Node Node Node Node Network Standard HPC structure 5 /28

Reconfigurable parallel systems High Performance Reconfigurable Computers (HPRC) Introduce generic hardware Accelerate specific portions of code ( application kernels ) by devolving computations to hardware accelerators Two kinds [1] Nonuniform Node, Uniform System (NNUS) Hw Sw Hw Sw Hw Sw Hw Sw Uniform Node, Nonuniform System (UNNS) Hw node Hw node Sw node Sw node 5 /28

Plan Parallel and reconfigurable computing today Context Systems limitations Our proposal: the SPORE system Implementing SPORE Application design The hardware platform and its implementations Conclusion & future works 6 /28

Software nature of applications Historically, systems were software only By the time, hardware resources are added Hardware is static while software is flexible Applications are still mainly software, even if some particular computations are hardware accelerated Reconfigurable systems Add more flexibility to hardware elements But applications conception did not change: software-based, with some computations devolved to hardware 7 /28

Applications linked to execution platform HPRCs Ability to use hardware resources depends on the ratio hardware vs. software resources UNNSs are more flexible than NNUSs Communication performances between software and hardware depends on underlying buses Platform change can lead to performances collapse Hw Sw Hw Sw Hw Sw Hw Sw NNUS Hw node Hw node Sw node Sw node UNNS 8 /28

Applications linked to execution platform HPRCs Ability to use hardware resources depends on the ratio hardware vs. software resources UNNSs are more flexible than NNUSs Communication performances between software and hardware depends on underlying buses Platform change can lead to performances collapse Applications are thus deeply linked to the underlying hardware Changing the execution platform of legacy application can force partial application re-write To maintain performances Or even to make the application compatible 8 /28

Plan Parallel and reconfigurable computing today Context Systems limitations Our proposal: the SPORE system Implementing SPORE Application design The hardware platform and its implementations Conclusion & future works 9 /28

Our proposal Applications Build applications as sets of kernels Kernels linked by data flows A kernel is what to do without knowing how to do Kernel 2 Encoder Audio encoder Data in MPEG flow Kernel 1 MPEG2 decoder OR Data out flow H264 encoder 10 /28

Our proposal Kernels Kernel implementation is handled independently from the application Each kernel can have various implementations, hardware ones and/or software ones Kernel Implementation 2 Implementation1 Bitstream Initial context Accessors Bitstream Accessors Initial context 11 /28

Our proposal Execution platform The platform Various nodes connected through a network The nodes A host cell, in charge of inter-node communication Various computing cells The computing cells Reconfigurable This is the Simple Parallel platform for Reconfigurable Environment (SPORE) 12 /28

Plan Parallel and reconfigurable computing today Context Systems limitations Our proposal: the SPORE system Implementing SPORE Application design The hardware platform and its implementations Conclusion & future works 13 /28

Application design Application Kernel 2 Kernel 2 Impl. 1 Impl. 2 Kernel 1 Impl. 3 Kernel 3 Implementation 3 All these elements are described using XML files Descriptors Bitstream Accessors 14 /28

Accessors Different actions are needed by kernels Context Passing input data Retrieving results Same kernel, various implementations The same elements are needed But the way to provide them can differ E.g. to start a computation Implementation 1 requires writing the value 0x00000001 in register 2 Implementation 2 requires writing the value 0x10000000 in register 4 Accessors Sets of specific interactions to execute to realize a particular action Read / Write Registers / Memory range / FIFO 15 /28

Plan Parallel and reconfigurable computing today Context Systems limitations Our proposal: the SPORE system Implementing SPORE Application design The hardware platform and its implementations Conclusion & future works 16 /28

SPORE: 1 st implementation [2] Purpose Propose a HPC-like platform allowing evolving to reconfigurable architectures Concept proof on actual implementation Architecture based on HPCs Globally distributed locally shared memory architecture Implementing MPI for communication Software only execution units Based on Xilinx ml507 board Virtex 5 fx70t FPGA Contains a PowerPC 440 256 MiB DDR2 Ethernet interface CompactFlash reader 17 /28

SPORE: 1 st implementation [2] Node Linux 18 /28

SPORE: 1 st implementation results [2] Communication time versus number of jobs/board 19 /28

SPORE: 2 nd implementation Application development flow concept proof Includes reconfigurable hardware Dynamic scheduling of kernels Bus-based communication Try to reduce memory issues No MPI communication 20 /28

SPORE: 2 nd implementation Data server Global scheduler Ethernet network Local sched. OS Node Host cell Storage manager Local storage Xilinx s ICAP superset [3] Local sched. OS Node Host cell Storage manager Local storage Cell mana. Reconfig. manager FARM Cell mana. Reconfig. manager FARM Bus Bus Kernel controller Kernel controller Kernel controller Kernel controller Kernel controller Kernel controller Thread Thread Kernel host Computing cells Kernel host Thread Kernel host 21 /28

SPORE: 2 nd implementation Control Linux-based control Scheduler Data server communication Dynamic reconfiguration Linux driver for FARM Cells management (accessors) Linux driver for cells XML parsing 22 /28

SPORE: 2 nd implementation results Only basic tests performed until now Simple AES encrypt-then-decrypt application 2 channels in parallel Fully functional for this basic test XML-based application description Hardware kernels reconfiguration Kernels configuration and other accessors 23 /28

Plan Parallel and reconfigurable computing today Context Systems limitations Our proposal: the SPORE system Implementing SPORE Application design The hardware platform and its implementations Conclusion & future works 24 /28

Conclusion SPORE is a platform virtualization tool Can be adapted to any underlying reconfigurable hardware Preliminary SPORE implementations working General application flow validated Still need a complete SPORE implementation Flow and HPC Software and hardware 25 /28

Future work 2 nd platform Perform further tests Application containing more kernels Video (H264) 3 rd platform Include elements from both previous Software AND reconfigurable hardware MPI communication Improvements NOC-based communication Scheduling No real algorithm for now 26 /28

And why not Xilinx s Linux Zynq Cell Cell Cell Cell Cell Cell Cell 27 /28

Questions.

[Bibliography] [1] Tarek El-Ghazawi, Esam El-Araby, Miaoqing Huang, Kris Gaj, Volodymyr Kindratenko, and Duncan Buell. The promise of High-Performance Reconfigurable Computing. Computer, 41 :69 76 February 2008 [2] Clément Foucher, Fabrice Muller, and Alain Giulieri. Exploring FPGAs capability to host a HPC design. 28 th Norchip Conference (Norchip 2010), pages 1 4, Tampere Finland November 2010 [3] François Duhem, Fabrice Muller, and Philippe Lorenzini. FaRM: Fast reconfiguration manager for reducing reconfiguration time overhead on FPGA. 7 th International Symposium on Applied Reconfigurable Computing (ARC 2011), Belfast, United Kingdom March 2011