SoCLib : Une plate-forme de prototypage virtuel pour systèmes multi-processeurs intégrés sur puce

Similar documents
Virtual Prototyping of NoC-based MPSoC : Fundamentals & Case Studies

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

7a. System-on-chip design and prototyping platforms

Architectures and Platforms

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Operating System Support for Multiprocessor Systems-on-Chip

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

sept-2002 Computer architecture and software cells for broadband networks Va avec

Chapter 1 Computer System Overview

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to System-on-Chip

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Operating Systems 4 th Class

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Networking Virtualization Using FPGAs

What is a System on a Chip?

Multiprocessor System-on-Chip

Introducción. Diseño de sistemas digitales.1

Chapter 02: Computer Organization. Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures

Introduction to Cloud Computing

Linux A multi-purpose executive support for civil avionics applications?

SOC architecture and design

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

OpenSPARC T1 Processor

System Considerations

Codesign: The World Of Practice

Am186ER/Am188ER AMD Continues 16-bit Innovation

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

DS1104 R&D Controller Board

Control 2004, University of Bath, UK, September 2004

Networking Remote-Controlled Moving Image Monitoring System

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

OC By Arsene Fansi T. POLIMI

Trampoline OSEK-VDX & AUTOSAR Compliant Open Source Real-Time Operating System

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

Computer System Design. System-on-Chip

A Comparison of Distributed Systems: ChorusOS and Amoeba

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations

Contents. Chapter 1. Introduction

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Weighted Total Mark. Weighted Exam Mark

Code generation under Control

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Chapter 18: Database System Architectures. Centralized Systems

Real-Time Operating Systems for MPSoCs

Computer Organization

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Network connectivity controllers

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

CHAPTER 1 INTRODUCTION

Throughput constraint for Synchronous Data Flow Graphs

Open Source Software

System Design Issues in Embedded Processing

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

Overview. Surveillance Systems. The Smart Camera - Hardware

High Performance or Cycle Accuracy?

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Performance Comparison of RTOS

Chapter 13 Embedded Operating Systems

Master Interface For On-Chip Hardware Accelerator Burst Communications

Design and Implementation of the Heterogeneous Multikernel Operating System

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Introduction to Virtual Machines

Architecture of Hitachi SR-8000

FPGA-based Multithreading for In-Memory Hash Joins

Software engineering for real-time systems

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Red Hat Linux Internals

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Building Blocks for PRU Development

ELEC 5260/6260/6266 Embedded Computing Systems

FRANCIS XAVIER ENGINEERING COLLEGE

A case study of mobile SoC architecture design based on transaction-level modeling

Module 8. Industrial Embedded and Communication Systems. Version 2 EE IIT, Kharagpur 1

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

Special FEATURE. By Heinrich Munz

Computer Systems Structure Input/Output

Comparative Performance Review of SHA-3 Candidates

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Principles and characteristics of distributed systems and environments

Early Hardware/Software Integration Using SystemC 2.0

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

The Temporal Firewall--A Standardized Interface in the Time-Triggered Architecture

Going Linux on Massive Multicore

Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System

SYSTEM ecos Embedded Configurable Operating System

Transcription:

SoCLib : Une plate-forme de prototypage virtuel pour systèmes multi-processeurs intégrés sur puce FETCH 07

Outline SoCLib goals SystemC modeling principles The Mutek Real-Time Operating System The MWMR communication middleware The DSX hardware/software codesign tool An application example : Stereovision

Example of a multi-processor architecture 1.1 2.1 3.1 4.1 Input Engine Ext. RAM ctrl Ext. RAM ctrl Output Engine VCI VCI/OCP network (DSPIN) VCI VCI/OCP Local Crossbar VCI/OCP Local Crossbar VCI Proc Proc RAM Locks Proc Proc RAM Locks 5.1 5.4 5.1 5.2 9.1 9.4 9.1 9.2 Sub-system 5 Sub-system 9

GOALS Design Space Exploration : fast performance evaluation for the mapping of a multi-threaded software application on a multi-processor hardware architecture. Plat-form based design : provide the system designer one (or several) generic hardware/software multi-processor architecture(s). IP reuse : clear separation between computing ressources and communication ressources : VCI/OCP standard.

SoCLib : Industrial Partners ST Micro-electronics Grenoble (B.Jego) Thales Paris (P.Kajfasz) Thomson SC Rennes (H.Heijnen) Silicomp Grenoble (V.Joloboff) Prosilog Cergy (M.Saussay) TurboConcept Brest (J.Tousch)

SoCLib : Académic Partners LIP6 Paris (A.Greiner) INRIA Orsay (O.Temam) LISIF Paris (P. Garda) CEA-LIST Saclay (F.Derepas) ENST Paris (A.Polti) IRISA Rennes (F.Charot) LESTER Lorient (P.Coussy) IETR Rennes (D. Houzet) CEA-LETI Grenoble (J-R. Lequepeys) TIMA Grenoble (F.Petrot) CITI Lyon (T. Risset)

SoCLib : General Principles The modeling language is SystemC. Two simulation models for each hardware component: CABA (Cycle-Accurate / Bit-Accurate) TLMT (Transaction Level Model with Time) All hardware components respect the same (VCI/OCP) communication protocol. All simulation models will be available as free software. For each SoCLib component, there is a - commercially available - synthezisable RTL model.

SoCLib : Planned Components COMPONENTS ORIGINE General Purpose Processors Digital Signal Processors MIPS R3000 SPARC V8 OpenRISC 1000 ARM 7 ARM 9 PowerPC 450 PowerPC 750 ST200 C6X Texas LIP6 ENST LIP6 LIP6 CEA LRI CEA ST / LRI IRISA System Utilities VCI/OCP Interconnects Interrupt controler Locks engine DMA controler MWMR controler VCI Generic VCI on DSPIN VCI on AMBA VCI on AVALON LIP6 LIP6 IETR LIP6 LIP6 LIP6 PROSILOG IRISA

SoCLib : Planned Components COMPOSANT ORIGINE Memory Controlers Embedded SRAM External SDRAM External DDR/SDRAM LIP6 PROSILOG PROSILOG Dedicated Coprocessors Turbodecodeur TC1000 Turbodecodeur TC3000 LDPC DWT MOD DEM TURBOCONCEPT TURBOCONCEPT ENST LESTER LESTER LESTER I/O Controlers VCI / PCI bridge VCI / USB bridge VCI / CAN bridge VCI / I2C bridge IETR ENST ENST LISIF

Outline SoCLib goals SystemC modeling principles The Mutek Real-Time Operating System The MWMR communication middleware The DSX hardware/software codesign environment An application example : Stereovision

Communicating Synchronous FSM Model T0 T1 T2 CK R0 R1 R2 G0 G1 G2

Component Modeling E TRANSITION method STATE GENMEALY method GENMOORE method S S

The Mealy FSMs issue... The general behaviour of the cache component is a Mealy FSM, because there is a combinational dependency between ADR signal and INST signal PROCESSOR PC IR ADR INST => The PROCESSOR component must be evaluated before the CACHE component! combi CACHE

Mealy Dependency Graph The nodes are the signals The arrows are Mealy dependencies between signals X Y X Y Z Z

Static Scheduling As there is no cycle in the Mealy Dependency Graph, it is possible to define a static scheduling that respects all dependencies: For (cycle = 0 ; cycle < MAX ; cycle++) { MODULE_0::Transition() ;... MODULE_N::Transition() ; MODULE_0::genMoore() MODULE_N::genMoore() ; MODULE_0::genMealy() ; MODULE_N::genMealy() ; } The genmealy methods ordering must respect the dependencies, and can be derived from the sensitivity sets of the methods.

The Sensitivity Lists SC_METHOD(transition); sensitive_pos << CLK; SC_METHOD(genMoore); sensitive_neg << CLK; SC_METHOD(genMealy); sensitive_neg << CLK; sensitive << input0; sensitive << input1; To precisely describe the input -> output dependancies, there is one genmealy method per output signal for each hardware component.

SystemCASS All SocLib models can be efficiently used with the standard OSCI SystemC simulation engine that uses dynamic scheduling technics: All Transition» & «GenMoore» methods will be evaluated only one time per cycle The SoCLib models can be simulated with the SystemCASS simulation engine that uses static sceduling, and provides a simulation speedup of one order of magnitude!

Outline SoCLib goals The SoCLib virtual prototyping platform SystemC modeling principles The Mutek Real-Time Operating System The MWMR communication middleware The DSX hardware/software codesign environment An application example : Stereovision

Mutek : Embedded Operating System MUTEK is an embedded OS for MP-SoC : POSIX compliant : The multi-threaded software application can be executed on any workstation. Distributed scheduler : one scheduler per processor, for true scalability. Real-Time support : provides automatic re-initialisation in case of dead-line violation (soft real-time constraints). Controled Placement : allows the system designer to statically place the tasks on the processors, and the communication buffers on the memory banks. small foot-print : about 25 kbytes.

Mutek : status and evolution Mutek is jointly developped by several french laboratories (LIP6, TIMA, IRISA, CITI) The present version supports only symmetric, homogeneous multi-processor architectures, and has been ported on several micro-processor (MIPS, SPARC, OPENRISC, ARM). Two evolutions are in progress at LIP6 MUTEK/S : Static version (faster, but without POSIX compatibility) MUTEK/H : Heterogeneous version (supporting mixed hardware architectures, with both General Purposes Processors & Digital Signal Processors)

H-EXO layer H-EXO is a very thin hardware abstraction layer (exo-kernel), defined to support heterogeneous multi-processor architectures APPLICATIONS THREADS POSIX MUTEK/S MUTEK/H H-EXO HARDWARE

Outline SoCLib goals The SoCLib virtual prototyping platform SystemC modeling principles The Mutek Real-Time Operating System The MWMR communication middleware The DSX hardware/software codesign environment An application example : Stereovision

Software application modeling A parallel application is described with two types of components: A set of concurrent tasks A set of MWMR communication channels Each communication channel is implemented as a software FIFO. Each FIFO can have several writers and several readers. Software tasks use only two (blocking) communication functions : MWMR_read(channel, Buf_address, item_number) MWMR_write(channel, Buf_address, item_number) Each software task can be implemented as A software thread (without modification) A synthesized hardware coprocessor.

Task & Communication Graph T1 Ti Software task T0 Communication channel IN T2 OUT T1 T3

MWMR communication channel MWMR communication channels are implemented as software FIFOs, in a shared memory segment. Each MWMR FIFO is protected by a system lock. A FIFO can have several «producer» tasks and several «consumer» tasks, and each of those tasks can be implemented as hardware, or software : Both hardware and software implementations use the same protocol to acces the software FIFO. In the MWMR acces protocol, each MWMR acces generates six memory transactions. The software implementation of the MWMR_write() & MWMR_read() primitives is based on the POSIX threads => It can be directly simulated on any POSIX compliant workstation. The hardware implementation of the MWMR_write() & MWMR_read() primitives is based on a generic - VCI compliant - MWMR initiator. This hardware «wrapper» provides the synthesized hardware coprocessor a simple FIFO interface.

MWMR «wrapper» FIFO VCI HARDWARE COPROCESSOR (IP CORE) VCI_MWMR_INITIATOR WRAPPER VCI_RAM (containing the MWMR software FIFO) VCI interconnect The VCI_MWMR_initiator wrapper is a generic hardware component, that implements a variable number of read or write MWMR FIFO channels.

Outline SoCLib goals The SoCLib virtual prototyping platform SystemC modeling principles The Mutek Real-Time Operating System The MWMR communication middleware The DSX hardware/software codesign environment An application example : Stereovision

DSX : Design Space exploration The DSX environment provides 3 services : define the hardware architecture parameters (using several generic, parameterized platforms) define the software application architecture (Tasks & Communication Graph) map the software application on the architecture (memory space segmentation & placement constraints) This tool is based on the Python language. It generates : A SystemC description of the hardware architecture All script files for the binary code generation (GCC) /

DSX : Mapping Multi-thread application The system designer can : choose the hard/soft implementation for each task map the software tasks on the programmable processors and the hardware tasks on synthesized Multi-processor architecture (or existing) coprocessors map the inter-tasks communication buffers on the physical memory banks. map for each task, the private code & data segments on the physical memory banks.

Outline SoCLib goals The SoCLib virtual prototyping platform SystemC modeling principles The Mutek Real-Time Operating System The MWMR communication middleware The DSX hardware/software codesign environment An application example : Stereovision

An example : stéreo-vision The goal is obstacle detection for cars, by stereo-vision technics, using two synchronized cameras. Image size : 256 * 512 pixels (8 bits pixel) Fréquence : 40 couple of images per second The stereo-vision algorithm and the sequential software application have been developed by the LIVIC laboratory. We want to parallelize the algorithm and to map the corresponding multi-threaded software application on a MP-SoC architecture.

Stereo-vision : Genera Principle

Generic Hardware Architecture Sub-system 0 Sub-system 1 AQUIS/L MWMR µ Proc RAM Timer + Locks RAM Timer + Locks µ Proc AQUIS/R MWMR VCI Local Crossbar Local Crossbar VCI micro- network (DSPIN) VCI Local Crossbar Local Crossbar VCI µ Proc µ Proc RAM Timer + Locks µ Proc µ Proc RAM Timer + Locks Sub-system 2 Sub-system n

Stereo-Vision : Résults Hardware 8 clusters : (6 computation clusters, 2 acquisition clusters) 30 general purpose processors (MIPS R3000) 16 embedded memory banks (total memory capacity : 700 Koctets) silicon area (estimated) : 60 mm2 Software : 30 threads exécuting under the MUTEK OS Optimized task & communication channels placement (DSX) Soft real time violation management. Virtual prototyping (Cycle Accurate) : Simulation speed : 15000 cycles/s (on a 3GHz PC workstation) results : 6.8 million cycles per couple of images. => 22.6 ms at 300 MHz

Conclusion La plate-forme SoCLib vise le prototypage virtuel d applications logicielles embarquées complexes sur systèmes multi-processeurs intégrés sur puce. Cette plate-forme, financée par l ANR, sera accessible aux industriels et aux universitaires sous forme de logiciel libre : modèles de simulation des composants matériels, et outils logiciels de base. La première version accessible au public est prévue fin 2007. Cette plate-forme a clairement vocation à s intégrer dans un projet européen, qui reste à définir