Institut d Electronique et des Télécommunications de Rennes. Equipe Image

Size: px
Start display at page:

Download "Institut d Electronique et des Télécommunications de Rennes. Equipe Image"

Transcription

1 1 D ÉLCTRONI QU T D NICATIONS D RNNS Institut d lectronique et des Télécommunications de Rennes March quipe Image

2 2 The team xpertise: ITR Image Team D ÉLCTRONI 10 teachers-researcher QU ~ T 15 D PhD & post-docs NICATIONS D RNNS Image : analysis, compression Architecture : multi-core, embedded systems Research themes: Image analysis for semantic indexation and embedded vision, 2D/3D image and video coding, Cryptography, Architecture,

3 3 D ÉLCTRONI QU T D NICATIONS D RNNS ITR Image Architecture theme

4 4 Objectives D ÉLCTRONI signal processing applications QU T D distributed and embedded platforms NICATIONS D RNNS Throughput Latency nergy Memory Programming Time Dataflow-based Methods and Tools for: Optimizing

5 5 Target Applications D ÉLCTRONI MPG4 Part2, AVC, SVC, HVC, SHVC MPG Participation QU T D NICATIONS D RNNS Stereo Vision, SLAM MPG Decoders Computer Vision and 3D Processing Cryptography Chaotic-based Cryptography Telecommunications 3GPP LT enodeb

6 6 D ÉLCTRONI QU T D NICATIONS D RNNS Target Platforms Texas Instruments Keystone I and II Zboard with Xilinx Zynq Odroid with Samsung xynos 5 Kalray MPPA

7 7 D ÉLCTRONI Throughput QU T D NICATIONS Latency D RNNS Optimizing nergy Memory Programming Time Methods Dataflow programming SIMD & Parallelism Data representation nergy-aware processing

8 8 Softwares D ÉLCTRONI QU T D NICATIONS D RNNS Open SVC Decoder (C code, x86 ASM) Open HVC Decoder (C code, x86 & ARM ASM) FFmpeg Orcc Compiler (Java, XTend) PRSM Rapid Prototyping Tool (Java, XTend)

9 9 D ÉLCTRONI QU T D NICATIONS D RNNS Academic Partners

10 10 D ÉLCTRONI QU T D NICATIONS D RNNS Industrial Partners

11 11 D ÉLCTRONI QU T D NICATIONS D RNNS

12 D ÉLCTRONI QU T D NICATIONS D RNNS Motivations log Introduction Lines of code/chip x2 every 10 months Transistors/chip x2 every 18 months Software Productivity Gap Lines of code/day x2 every 5 years Source: ITRS & Hardware-dependent Software, cker et al., Springer

13 Hardware Complexity D ÉLCTRONI QU T D NICATIONS 5000 D RNNS Nb of P per SoC Source: ITRS System Drivers 2011 Introduction

14 14 D ÉLCTRONI QU T D NICATIONS D RNNS What is PRSM? Algorithm PRSM Architecture PRSM +C compiler Simulator + Debugger + Profiler P Multicore Runtime DSP DSP P P P P DSP DSP Peripherals Main Memory

15 15 PRSM Tool and applications available on GitHub D ÉLCTRONI QU T D NICATIONS D RNNS What is PRSM? (Parallel Real-time mbedded xecutives Scheduling Method) A rapid prototyping framework An open-source project A set of eclipse plugins

16 16 PRSM D ÉLCTRONI QU T D Design of parallel algorithms NICATIONS Throughput/Latency D evaluation RNNS Using PRSM to design an embedded system: To provide metrics Predictable memory footprints To build a working prototype Code generation for multicore architectures Guaranteed deadlock-freeness Inter-core communications For design-space exploration Seamless porting to a new architecture Legacy code reusability

17 17 D ÉLCTRONI QU T D NICATIONS D RNNS Inputs Algorithm Architecture PRSM +C compiler Simulator + Debugger + Profiler P Multicore Runtime DSP DSP P P P P DSP DSP Peripherals Main Memory

18 D ÉLCTRONI QU T D Actors and Data ports NICATIONS FIFO queues D RNNS PRSM Inputs Algorithm descriptions using Dataflow Graphs Synchronous Dataflow (SDF) A B C D. Lee and D. Messerschmitt, Synchronous data flow, Proceedings of the I,

19 PRSM Inputs D ÉLCTRONI QU T D An actor is fired when its input FIFOs contain enough data-tokens. NICATIONS D RNNS A Algorithm descriptions using Dataflow Graphs Data-driven execution B 1 2 C D Core 1 A B C C D. Lee and D. Messerschmitt, Synchronous data flow, Proceedings of the I,

20 D ÉLCTRONI QU T D A 2 NICATIONS D 1 RNNS 1 2 PRSM Inputs Algorithm descriptions using Dataflow Graphs xpression of parallelisms: Task / Data / Pipeline / B Core 1 Core 2 C D x2 Pipeline Internal Task Data parallelism A B C C D Internal Core 3. Lee and D. Messerschmitt, Synchronous data flow, Proceedings of the I,

21 in out PRSM Inputs PiSDF (Parameterized and Interfaced Synchronous Dataflow) D ÉLCTRONI QU T D Read Header Size =4 NICATIONS D RNNS Read Size Size Filter Send Size Size Image Size SetNb Slices =2 N Size Size Size /N Kernel Size /N Size

22 PRSM Inputs PiSDF (Parameterized and Interfaced Synchronous Dataflow) D ÉLCTRONI QU PiSDF T is: D Hierarchical & Compositional NICATIONS Statically D parameterizable RNNS Dynamically reconfigurable Lightweight runtime overhead PiSDF fosters: Predictability Parallelism Developer-friendliness K. Desnos, M. Pelcat, J.-F. Nezan, S. S. Bhattacharyya, S. Aridhi PiMM: Parameterized and Interfaced Dataflow Meta-Model for MPSoCs Runtime Reconfiguration, SAMOS XIII

23 D ÉLCTRONI QU T D NICATIONS D RNNS PRSM Inputs Directed Data Link S-LAM (System-Level Architecture Model) Communication Nodes Parallel Node Communication nablers Contention Node Processing lement Operator Set-up Link Undirected Data Link RAM DMA M. Pelcat, J.-F. Nezan, J. Piat, J. Croizer and S. Aridhi, A System-Level Architecture Model for Rapid Prototyping of Heterogeneous Multicore mbedded Systems, DASIP2009

24 D ÉLCTRONI QU T D NICATIONS D RNNS PRSM Inputs S-LAM (System-Level Architecture Model) core1 DMA RAM CN 1 Gbit/s core2 core3 M. Pelcat, J.-F. Nezan, J. Piat, J. Croizer and S. Aridhi, A System-Level Architecture Model for Rapid Prototyping of Heterogeneous Multicore mbedded Systems, DASIP2009

25 D ÉLCTRONI QU T D core1 NICATIONS D RNNS PRSM Inputs S-LAM (System-Level Architecture Model) core2 core3 TCP2 DMA SCR VCP2 DMA RIO SCR 1 Gb/s 2 GB/s 2 GB/s VCP2 TCP2 core1 core2 core3 DSP 1 DSP 2

26 D ÉLCTRONI QU T D NICATIONS D RNNS PRSM Inputs Algorithm/Architecture independence PiSDF graphs are architecture-independent S-LAM graphs are application-independent Scenario Define information/constraints for the deployment of a specific algorithm on a specific architecture Mapping constraints Heterogeneous timing constraints

27 27 D ÉLCTRONI QU T D NICATIONS D RNNS Algorithm Architecture Deployment PRSM +C compiler Simulator + Debugger + Profiler P Multicore Runtime DSP DSP P P P P DSP DSP Peripherals Main Memory

28 PRSM Deployment Customizable accuracy (w.r.t. communications) D ÉLCTRONI QU T D NICATIONS D RNNS Mapping/Scheduling for static graphs State-of-the-art algorithms (FAST, List, ) Latency and load balancing optimization core1 core2 core3 core4

29 PRSM Deployment D ÉLCTRONI QU SPIDR: T D Synchronous Parameterized and Interfaced Dataflow mbedded Runtime NICATIONS D RNNS Mapping/Scheduling for reconfigurable PiSDF Timings Jobs Params Jobs Slave Master Master tasks: - Run jobs - Map & Schedule - Manage graphs - Monitor & Trace Data Data Pool of data FIFOs Jobs Slave Slave task: - Run jobs

30 PRSM Deployment D ÉLCTRONI QU T D valuate the memory requirements NICATIONS Adjust the D size of architecture memory RNNS Memory optimizations for static graphs Bounding the memory needs of an application graph to Assess the optimality of a memory allocation Insufficient memory Possible allocated memory Wasted memory 0 Lower Bound Upper Bound Available Memory

31 D ÉLCTRONI QU T D 200 NICATIONS D RNNS PRSM Deployment Memory optimizations for static graphs Graph level memory reuse optimization x1 x2 x2 x2 x1 A B C D x75 A B 2 C 2 B 1 C 1 D x D AB AB B 1 C B 2 C C 1 C 2 75 C 1 D 1 50 C 2 D 2 50 D 2 25 D 1 25 Core 1 Core 2 A B 1 C 2 D 1 B 2 C 1 D 2 xecution order AB AB B 1 C B 2 C C 1 C 2 75 C 1 D 1 50 C 2 D 2 50 D 2 25 D 1 25

32 D ÉLCTRONI QU T D NICATIONS D RNNS PRSM Deployment Memory optimizations for static graphs Buffer merging technique for SDF graphs A 30 AB 30 B BC 20 BD C D No buffer merging AB 30 memory BC 20 BC 20 Buffer merging AB 30 memory BD 10 BD 10

33 PRSM Deployment Multiple input/output buffers merge. D ÉLCTRONI QU T D NICATIONS D RNNS Memory optimizations for static graphs 48% less memory than state-of-the-art techniques Techniques are independent from host language. No modification of the SDF MoC/applications graphs.

34 D ÉLCTRONI QU T D NICATIONS D RNNS PRSM Deployment nergy optimization: platform xynos 5 Odroid xynos 5 Big.LITTL A7 A7 A7 A7 A15 A15 A15 A15

35 D ÉLCTRONI QU core1 T D core2 NICATIONS core3 D core4 RNNS nergy optimization setup PRSM Deployment Image Processing QoS P=0 P=1 P=0.5 P=1 P=0.5 P=0 P-Value Linux-based Runtime (Abo Akademi) DVFS DPM Odroid xynos 5 Big.LITTL A7 A7 A7 A7 A15 A15 A15 A15

36 D ÉLCTRONI QU T D NICATIONS D RNNS nergy optimization results PRSM Deployment 20% energy savings on a parallel Sobel + sequential postprocessing wrt. Linux completely fair scheduler and on-demand governor S. Holmbacka,. Nogues, M. Pelcat, S. Lafond, and J. Lilius. nergy fficiency and Performance Management of Parallel Dataflow Applications. DASIP 2014, Madrid

37 37 D ÉLCTRONI QU T D NICATIONS D RNNS Algorithm Architecture Outputs PRSM +C compiler Simulator + Debugger + Profiler Multicore Runtime P DSP DSP P P P P DSP DSP Peripherals Main Memory

38 D ÉLCTRONI QU T D B A NICATIONS D C RNNS PRSM Outputs Generation of self-timed multicore code D o1 Actor A Actor B Actor D o1 o2 A B C D o2 Actor C time Actor

39 PRSM Outputs D ÉLCTRONI QU T D TMS320c6678 from Texas Instruments NICATIONS Supports D the activation of the DSP caches. RNNS Code generation for multiple targets Multi-C6X DSPs: Multi-x86 and multi-arm CPUs: Linux and Windows, pthread OMAP4 heterogeneous platform: dual-core ARM Cortex-A9, 2 Cortex-M3, and a C64xT DSP.

40 40 D ÉLCTRONI QU T D Algorithm NICATIONS D RNNS Demo Time Architecture PRSM +C compiler Simulator + Debugger + Profiler Multicore Runtime P DSP DSP P P P P DSP DSP Peripherals Main Memory

41 D ÉLCTRONI QU T D Available on GitHub NICATIONS D RNNS PRSM features Open Source Tool Research-Oriented Tool Summary New models, optimizations, scheduling clipse-based Integrated Tool Several plug-ins, metamodels xtended Web Tutorials

42 D ÉLCTRONI QU T D NICATIONS D RNNS Questions?

Dataflow-Based Rapid Prototyping for Multicore DSP Systems

Dataflow-Based Rapid Prototyping for Multicore DSP Systems Dataflow-Based Rapid Prototyping for Multicore DSP Systems Technical Report PREESM/2014-05TR01, 2014 Maxime Pelcat, Karol Desnos, Julien Heulot, Clément Guy, Jean-François Nezan, Slaheddine Aridhi 1 Introduction

More information

Throughput constraint for Synchronous Data Flow Graphs

Throughput constraint for Synchronous Data Flow Graphs Throughput constraint for Synchronous Data Flow Graphs *Alessio Bonfietti Michele Lombardi Michela Milano Luca Benini!"#$%&'()*+,-)./&0&20304(5 60,7&-8990,.+:&;/&."!?@A>&"'&=,0B+C. !"#$%&'()* Resource

More information

Performance Monitor Based Power Management for big.little Platforms

Performance Monitor Based Power Management for big.little Platforms Performance Monitor Based Power Management for big.little Platforms Simon Holmbacka, Sébastien Lafond, Johan Lilius Department of Information Technologies, Åbo Akademi University 20530 Turku, Finland firstname.lastname@abo.fi

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

Going Linux on Massive Multicore

Going Linux on Massive Multicore Embedded Linux Conference Europe 2013 Going Linux on Massive Multicore Marta Rybczyńska 24th October, 2013 Agenda Architecture Linux Port Core Peripherals Debugging Summary and Future Plans 2 Agenda Architecture

More information

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE Guillène Ribière, CEO, System Architect Problem Statement Low Performances on Hardware Accelerated Encryption: Max Measured 10MBps Expectations: 90 MBps

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance MPSoC Designs: Driving Storage Management IP to Critical Importance Design IP has become an essential part of SoC realization it is a powerful resource multiplier that allows SoC design teams to focus

More information

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Which ARM Cortex Core Is Right for Your Application: A, R or M? Which ARM Cortex Core Is Right for Your Application: A, R or M? Introduction The ARM Cortex series of cores encompasses a very wide range of scalable performance options offering designers a great deal

More information

High Performance or Cycle Accuracy?

High Performance or Cycle Accuracy? CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing

More information

Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System

Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System , pp.97-108 http://dx.doi.org/10.14257/ijseia.2014.8.6.08 Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System Suk Hwan Moon and Cheol sick Lee Department

More information

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux White Paper Real-time Capabilities for Linux SGI REACT Real-Time for Linux Abstract This white paper describes the real-time capabilities provided by SGI REACT Real-Time for Linux. software. REACT enables

More information

Real-Time Operating Systems for MPSoCs

Real-Time Operating Systems for MPSoCs Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor

More information

Embedded Development Tools

Embedded Development Tools Embedded Development Tools Software Development Tools by ARM ARM tools enable developers to get the best from their ARM technology-based systems. Whether implementing an ARM processor-based SoC, writing

More information

EVALUATING POWER MANAGEMENT CAPABILITIES OF LOW-POWER CLOUD PLATFORMS. Jens Smeds

EVALUATING POWER MANAGEMENT CAPABILITIES OF LOW-POWER CLOUD PLATFORMS. Jens Smeds EVALUATING POWER MANAGEMENT CAPABILITIES OF LOW-POWER CLOUD PLATFORMS Jens Smeds Master of Science Thesis Supervisor: Prof. Johan Lilius Advisor: Dr. Sébastien Lafond Embedded Systems Laboratory Department

More information

System Considerations

System Considerations System Considerations Interfacing Performance Power Size Ease-of Use Programming Interfacing Debugging Cost Device cost System cost Development cost Time to market Integration Peripherals Different Needs?

More information

Resource Utilization of Middleware Components in Embedded Systems

Resource Utilization of Middleware Components in Embedded Systems Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system

More information

MPSoC Virtual Platforms

MPSoC Virtual Platforms CASTNESS 2007 Workshop MPSoC Virtual Platforms Rainer Leupers Software for Systems on Silicon (SSS) RWTH Aachen University Institute for Integrated Signal Processing Systems Why focus on virtual platforms?

More information

Thèse. Memory Study and Dataflow Representations for Rapid Prototyping of Signal Processing Applications on MPSoCs

Thèse. Memory Study and Dataflow Representations for Rapid Prototyping of Signal Processing Applications on MPSoCs Thèse THESE INSA Rennes sous le sceau de l Université européenne de Bretagne pour obtenir le titre de DOCTEUR DE L INSA DE RENNES Spécialité : Traitement du Signal et des Images présentée par Karol Desnos

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

NORTHEASTERN UNIVERSITY Graduate School of Engineering. Thesis Title: CRASH: Cognitive Radio Accelerated with Software and Hardware

NORTHEASTERN UNIVERSITY Graduate School of Engineering. Thesis Title: CRASH: Cognitive Radio Accelerated with Software and Hardware NORTHEASTERN UNIVERSITY Graduate School of Engineering Thesis Title: CRASH: Cognitive Radio Accelerated with Software and Hardware Author: Jonathon Pendlum Department: Electrical and Computer Engineering

More information

MAQAO Performance Analysis and Optimization Tool

MAQAO Performance Analysis and Optimization Tool MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Energiatehokas laskenta Ubi-sovelluksissa

Energiatehokas laskenta Ubi-sovelluksissa Energiatehokas laskenta Ubi-sovelluksissa Jarmo Takala Tampereen teknillinen yliopisto Tietokonetekniikan laitos email: jarmo.takala@tut.fi Energy-Efficiency Comparison: VGA 30 frames/s, 512kbit/s Software

More information

Overview. Surveillance Systems. The Smart Camera - Hardware

Overview. Surveillance Systems. The Smart Camera - Hardware Overview A Mobile AgentAgent-based System for Dynamic Task Allocation in Clusters of Embedded Smart Cameras Introduction The Smart Camera Michael Bramberger1,, Bernhard Rinner1, and Helmut Schwabach Surveillance

More information

Virtual Network Provisioning and Fault-Management across Multiple Domains

Virtual Network Provisioning and Fault-Management across Multiple Domains Virtual Network Provisioning and Fault-Management across Multiple Domains Distinguished Speaker Series Democritus University of Thrace, Greece Panagiotis Papadimitriou November 2010 Introduction The Internet

More information

A Tutorial On Network Marketing And Video Transoding

A Tutorial On Network Marketing And Video Transoding SCALABLE DISTRIBUTED VIDEO TRANSCODING ARCHITECTURE Tewodros Deneke Master of Science Thesis Supervisor: Prof. Johan Lilius Advisor: Dr. Sébastien Lafond Embedded Systems Laboratory Department of Information

More information

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems A. Carbon, Y. Lhuillier, H.-P. Charles CEA LIST DACLE division Embedded Computing Embedded Software Laboratories France

More information

How To Design An Image Processing System On A Chip

How To Design An Image Processing System On A Chip RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING B.Kovář 1, J. Kloub 1, J. Schier 1, A. Heřmánek 1, P. Zemčík 2, A. Herout 2 (1) Institute of Information Theory and Automation Academy of

More information

Cisco Integrated Services Routers Performance Overview

Cisco Integrated Services Routers Performance Overview Integrated Services Routers Performance Overview What You Will Learn The Integrated Services Routers Generation 2 (ISR G2) provide a robust platform for delivering WAN services, unified communications,

More information

A case study of mobile SoC architecture design based on transaction-level modeling

A case study of mobile SoC architecture design based on transaction-level modeling A case study of mobile SoC architecture design based on transaction-level modeling Eui-Young Chung School of Electrical & Electronic Eng. Yonsei University 1 EUI-YOUNG(EY) CHUNG, EY CHUNG Outline Introduction

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Inspecting GNU Radio Applications with ControlPort and Performance Counters

Inspecting GNU Radio Applications with ControlPort and Performance Counters Inspecting GNU Radio Applications with ControlPort and Performance Counters Thomas W. Rondeau University of Pennsylvania Philadelphia, PA 19104, USA tom@trondeau.com Timothy O Shea University of Maryland

More information

Operating System Support for Multiprocessor Systems-on-Chip

Operating System Support for Multiprocessor Systems-on-Chip Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.

More information

Partial and Dynamic reconfiguration of FPGAs: a top down design methodology for an automatic implementation

Partial and Dynamic reconfiguration of FPGAs: a top down design methodology for an automatic implementation Partial and Dynamic reconfiguration of FPGAs: a top down design methodology for an automatic implementation Florent Berthelot, Fabienne Nouvel, Dominique Houzet To cite this version: Florent Berthelot,

More information

Linux Performance Optimizations for Big Data Environments

Linux Performance Optimizations for Big Data Environments Linux Performance Optimizations for Big Data Environments Dominique A. Heger Ph.D. DHTechnologies (Performance, Capacity, Scalability) www.dhtusa.com Data Nubes (Big Data, Hadoop, ML) www.datanubes.com

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

Kalray MPPA Massively Parallel Processing Array

Kalray MPPA Massively Parallel Processing Array Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights

More information

Intel CoFluent Methodology for SysML *

Intel CoFluent Methodology for SysML * Intel CoFluent Methodology for SysML * UML* SysML* MARTE* Flow for Intel CoFluent Studio An Intel CoFluent Design White Paper By Thomas Robert and Vincent Perrier www.cofluent.intel.com Acronyms and abbreviations

More information

Runtime Verification for Real-Time Automotive Embedded Software

Runtime Verification for Real-Time Automotive Embedded Software Runtime Verification for Real-Time Automotive Embedded Software S. Cotard, S. Faucou, J.-L. Béchennec, A. Queudet, Y. Trinquet 10th school of Modelling and Verifying Parallel processes (MOVEP) Runtime

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

On some Potential Research Contributions to the Multi-Core Enterprise

On some Potential Research Contributions to the Multi-Core Enterprise On some Potential Research Contributions to the Multi-Core Enterprise Oded Maler CNRS - VERIMAG Grenoble, France February 2009 Background This presentation is based on observations made in the Athole project

More information

THE most significant value of software-defined radio

THE most significant value of software-defined radio A Fixed-Point DSP Architecture for Software-Defined Radio Wouter Kriegler and Gert-Jan van Rooyen Department of Electrical and Electronic Engineering University of Stellenbosch Abstract Software-defined

More information

Product Development Flow Including Model- Based Design and System-Level Functional Verification

Product Development Flow Including Model- Based Design and System-Level Functional Verification Product Development Flow Including Model- Based Design and System-Level Functional Verification 2006 The MathWorks, Inc. Ascension Vizinho-Coutry, avizinho@mathworks.fr Agenda Introduction to Model-Based-Design

More information

Developing reliable Multi-Core Embedded-Systems with NI Linux Real-Time

Developing reliable Multi-Core Embedded-Systems with NI Linux Real-Time Developing reliable Multi-Core Embedded-Systems with NI Linux Real-Time Oliver Bruder National Instruments Switzerland oliver.bruder@ Embedded Product Design Surveys 66% Product designs complete over budget

More information

Embedded Systems: map to FPGA, GPU, CPU?

Embedded Systems: map to FPGA, GPU, CPU? Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Application of Android OS as Real-time Control Platform**

Application of Android OS as Real-time Control Platform** AUTOMATYKA/ AUTOMATICS 2013 Vol. 17 No. 2 http://dx.doi.org/10.7494/automat.2013.17.2.197 Krzysztof Ko³ek* Application of Android OS as Real-time Control Platform** 1. Introduction An android operating

More information

Deeply Embedded Real-Time Hypervisors for the Automotive Domain Dr. Gary Morgan, ETAS/ESC

Deeply Embedded Real-Time Hypervisors for the Automotive Domain Dr. Gary Morgan, ETAS/ESC Deeply Embedded Real-Time Hypervisors for the Automotive Domain Dr. Gary Morgan, ETAS/ESC 1 Public ETAS/ESC 2014-02-20 ETAS GmbH 2014. All rights reserved, also regarding any disposal, exploitation, reproduction,

More information

DIPLODOCUS: An Environment for. the Hardware/Software Partitioning of. Institut Mines-Telecom. Complex Embedded Systems

DIPLODOCUS: An Environment for. the Hardware/Software Partitioning of. Institut Mines-Telecom. Complex Embedded Systems DIPLODOCUS: An Environment for Institut Mines-Telecom the Hardware/Software Partitioning of Complex Embedded Systems Ludovic Apvrille, ludovic.apvrille@telecom-paristech.fr ETR 2013, Toulouse, France Goals

More information

CprE 588 Embedded Computer Systems Homework #1 Assigned: February 5 Due: February 15

CprE 588 Embedded Computer Systems Homework #1 Assigned: February 5 Due: February 15 CprE 588 Embedded Computer Systems Homework #1 Assigned: February 5 Due: February 15 Directions: Please submit this assignment by the due date via WebCT. Submissions should be in the form of 1) a PDF file

More information

Experience with the integration of distribution middleware into partitioned systems

Experience with the integration of distribution middleware into partitioned systems Experience with the integration of distribution middleware into partitioned systems Héctor Pérez Tijero (perezh@unican.es) J. Javier Gutiérrez García (gutierjj@unican.es) Computers and Real-Time Group,

More information

Design a medical application for Android platform using model-driven development approach

Design a medical application for Android platform using model-driven development approach Design a medical application for Android platform using model-driven development approach J. Yepes, L. Cobaleda 2, J. Villa D, J. Aedo ARTICA, Microelectronic and Control Research Group 2 ARTICA, Software

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

Real-time Process Network Sonar Beamformer

Real-time Process Network Sonar Beamformer Real-time Process Network Sonar Gregory E. Allen Applied Research Laboratories gallen@arlut.utexas.edu Brian L. Evans Dept. Electrical and Computer Engineering bevans@ece.utexas.edu The University of Texas

More information

STLinux Software development environment

STLinux Software development environment STLinux Software development environment Development environment The STLinux Development Environment is a comprehensive set of tools and packages for developing Linux-based applications on ST s consumer

More information

System Design Issues in Embedded Processing

System Design Issues in Embedded Processing System Design Issues in Embedded Processing 9/16/10 Jacob Borgeson 1 Agenda What does TI do? From MCU to MPU to DSP: What are some trends? Design Challenges Tools to Help 2 TI - the complete system The

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

The Role of Precise Timing in High-Speed, Low-Latency Trading

The Role of Precise Timing in High-Speed, Low-Latency Trading The Role of Precise Timing in High-Speed, Low-Latency Trading The race to zero nanoseconds Whether measuring network latency or comparing real-time trading data from different computers on the planet,

More information

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015 Development With ARM DS-5 Mervyn Liu FAE Aug. 2015 1 Support for all Stages of Product Development Single IDE, compiler, debug, trace and performance analysis for all stages in the product development

More information

Application Note: AN00141 xcore-xa - Application Development

Application Note: AN00141 xcore-xa - Application Development Application Note: AN00141 xcore-xa - Application Development This application note shows how to create a simple example which targets the XMOS xcore-xa device and demonstrates how to build and run this

More information

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,

More information

OPART: Towards an Open Platform for Abstraction of Real-Time Communication in Cross-Domain Applications

OPART: Towards an Open Platform for Abstraction of Real-Time Communication in Cross-Domain Applications OPART: Towards an Open Platform for Abstraction of Real-Time Communication in Cross-Domain Applications Simplification of Developing Process in Real-time Networked Medical Systems Morteza Hashemi Farzaneh,

More information

Maintaining Non-Stop Services with Multi Layer Monitoring

Maintaining Non-Stop Services with Multi Layer Monitoring Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems lahavs@emindsys.com www.emindsys.com The approach Non-stop applications can t leave on their

More information

12. Introduction to Virtual Machines

12. Introduction to Virtual Machines 12. Introduction to Virtual Machines 12. Introduction to Virtual Machines Modern Applications Challenges of Virtual Machine Monitors Historical Perspective Classification 332 / 352 12. Introduction to

More information

Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures

Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures 1 Hanwoong Jung, and 2 Youngmin Yi, 1 Soonhoi Ha 1 School of EECS, Seoul National University, Seoul, Korea {jhw7884, sha}@iris.snu.ac.kr

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

SOFTWARE DEVELOPMENT FOR EMBEDDED SYSTEMS

SOFTWARE DEVELOPMENT FOR EMBEDDED SYSTEMS SOFTWARE DEVELOPMENT FOR EMBEDDED SYSTEMS Trends and Challenges in Developing Software for Embedded Systems Motivation This survey addresses software development in the field of embedded systems. Our goal

More information

Software Synthesis from Dataflow Models for G and LabVIEW

Software Synthesis from Dataflow Models for G and LabVIEW Presented at the Thirty-second Annual Asilomar Conference on Signals, Systems, and Computers. Pacific Grove, California, U.S.A., November 1998 Software Synthesis from Dataflow Models for G and LabVIEW

More information

System Software Integration: An Expansive View. Overview

System Software Integration: An Expansive View. Overview Software Integration: An Expansive View Steven P. Smith Design of Embedded s EE382V Fall, 2009 EE382 SoC Design Software Integration SPS-1 University of Texas at Austin Overview Some Definitions Introduction:

More information

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

More information

Better Trace for Better Software

Better Trace for Better Software Better Trace for Better Software Introducing the new ARM CoreSight System Trace Macrocell and Trace Memory Controller Roberto Mijat Senior Software Solutions Architect Synopsis The majority of engineering

More information

Using Linux Clusters as VoD Servers

Using Linux Clusters as VoD Servers HAC LUCE Using Linux Clusters as VoD Servers Víctor M. Guĺıas Fernández gulias@lfcia.org Computer Science Department University of A Corunha funded by: Outline Background: The Borg Cluster Video on Demand.

More information

LTE Mobility Enhancements

LTE Mobility Enhancements Qualcomm Incorporated February 2010 Table of Contents [1] Introduction... 1 [2] LTE Release 8 Handover Procedures... 2 2.1 Backward Handover... 2 2.2 RLF Handover... 3 2.3 NAS Recovery... 5 [3] LTE Forward

More information

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur 2015 The MathWorks, Inc. 1 Model-Based Design Continuous Verification and Validation Requirements

More information

Sierraware Overview. Simply Secure

Sierraware Overview. Simply Secure Sierraware Overview Simply Secure Sierraware Software Suite SierraTEE/Micro Kernel TrustZone/GlobalPlatform TEE SierraVisor: Bare Metal Hypervisor Hypervisor for ARM Para-virtualization, TrustZone Virtualization,

More information

High-Level Synthesis for FPGA Designs

High-Level Synthesis for FPGA Designs High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Software Engineering for LabVIEW Applications. Elijah Kerry LabVIEW Product Manager

Software Engineering for LabVIEW Applications. Elijah Kerry LabVIEW Product Manager Software Engineering for LabVIEW Applications Elijah Kerry LabVIEW Product Manager 1 Ensuring Software Quality and Reliability Goals 1. Deliver a working product 2. Prove it works right 3. Mitigate risk

More information

Automated Software Testing of Memory Performance in Embedded GPUs. Sudipta Chattopadhyay, Petru Eles and Zebo Peng! Linköping University

Automated Software Testing of Memory Performance in Embedded GPUs. Sudipta Chattopadhyay, Petru Eles and Zebo Peng! Linköping University Automated Software Testing of Memory Performance in Embedded GPUs Sudipta Chattopadhyay, Petru Eles and Zebo Peng! Linköping University 1 State-of-the-art in Detecting Performance Loss Input Program profiling

More information

Practical Performance Understanding the Performance of Your Application

Practical Performance Understanding the Performance of Your Application Neil Masson IBM Java Service Technical Lead 25 th September 2012 Practical Performance Understanding the Performance of Your Application 1 WebSphere User Group: Practical Performance Understand the Performance

More information

Extending the Power of FPGAs. Salil Raje, Xilinx

Extending the Power of FPGAs. Salil Raje, Xilinx Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of

More information

PERFORMANCE TUNING ORACLE RAC ON LINUX

PERFORMANCE TUNING ORACLE RAC ON LINUX PERFORMANCE TUNING ORACLE RAC ON LINUX By: Edward Whalen Performance Tuning Corporation INTRODUCTION Performance tuning is an integral part of the maintenance and administration of the Oracle database

More information

Performance of Host Identity Protocol on Nokia Internet Tablet

Performance of Host Identity Protocol on Nokia Internet Tablet Performance of Host Identity Protocol on Nokia Internet Tablet Andrey Khurri Helsinki Institute for Information Technology HIP Research Group IETF 68 Prague March 23, 2007

More information

Embedded System Hardware - Processing (Part II)

Embedded System Hardware - Processing (Part II) 12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

CHAPTER 4: SOFTWARE PART OF RTOS, THE SCHEDULER

CHAPTER 4: SOFTWARE PART OF RTOS, THE SCHEDULER CHAPTER 4: SOFTWARE PART OF RTOS, THE SCHEDULER To provide the transparency of the system the user space is implemented in software as Scheduler. Given the sketch of the architecture, a low overhead scheduler

More information

Feb.2012 Benefits of the big.little Architecture

Feb.2012 Benefits of the big.little Architecture Feb.2012 Benefits of the big.little Architecture Hyun-Duk Cho, Ph. D. Principal Engineer (hd68.cho@samsung.com) Kisuk Chung, Senior Engineer (kiseok.jeong@samsung.com) Taehoon Kim, Vice President (taehoon1@samsung.com)

More information

EMC Documentum Interactive Delivery Services Accelerated Overview

EMC Documentum Interactive Delivery Services Accelerated Overview White Paper EMC Documentum Interactive Delivery Services Accelerated A Detailed Review Abstract This white paper presents an overview of EMC Documentum Interactive Delivery Services Accelerated (IDSx).

More information

The Kiel Reactive Processor

The Kiel Reactive Processor The Kiel Reactive Processor Reactive Processing beyond the KEP Claus Traulsen Christian-Albrechts Universität zu Kiel Synchron 2007 29. November 2007 Claus Traulsen The Kiel Reactive Processor Slide 1

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers ZT Systems (ST based) Applied Micro development platform HP Redstone platform Mitac Dell Copper platform ARM in Servers 1 Server Ecosystem Momentum 2009: Internal ARM trials hosting part of website on

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information