Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations



Similar documents
Architectures and Platforms

Optimizing Configuration and Application Mapping for MPSoC Architectures

7a. System-on-chip design and prototyping platforms

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Operating System Support for Multiprocessor Systems-on-Chip

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA

Semi-Parallel Reconfigurable Architectures for Real-Time LDPC Decoding

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

Introduction to the Latest Tensilica Baseband Solutions

System Considerations

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

What is a System on a Chip?

Qsys and IP Core Integration

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

MIMO detector algorithms and their implementations for LTE/LTE-A

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to System-on-Chip

OpenSPARC T1 Processor

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

Multiprocessor System-on-Chip

OpenSoC Fabric: On-Chip Network Generator

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Hardware/Software Codesign

3GPP Wireless Standard

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

Switched Interconnect for System-on-a-Chip Designs

How To Develop A Nordic Mobile Network (Mms) For A Cell Phone (Norshame) (Networking) (Mobile) (Cell Phone) (Geo) (Gsm) (Smart Antenna) (Network)

STM32 F-2 series High-performance Cortex-M3 MCUs

Photonic components for signal routing in optical networks on chip

Memory Architecture and Management in a NoC Platform

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

FPGAs in Next Generation Wireless Networks

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

A case study of mobile SoC architecture design based on transaction-level modeling

Chapter 6 Wireless and Mobile Networks

CFD Implementation with In-Socket FPGA Accelerators

Design of LDPC codes

System Design Issues in Embedded Processing

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

FPGA-based Multithreading for In-Memory Hash Joins

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

Serial port interface for microcontroller embedded into integrated power meter

On-Chip Communications Network Report

Design and Verification of Nine port Network Router

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Reconfigurable System-on-Chip Design

MAJORS: Computer Engineering, Computer Science, Electrical Engineering

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

EPL 657 Wireless Networks

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

ZigBee Technology Overview

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Products. CM-i586 Highlights. Página Web 1 de 5. file://c:\documents and Settings\Daniel\Os meus documentos\humanoid\material_o...

Contents. System Development Models and Methods. Design Abstraction and Views. Synthesis. Control/Data-Flow Models. System Synthesis Models

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

A Scalable Large Format Display Based on Zero Client Processor

SPI I2C LIN Ethernet. u Today: Wired embedded networks. u Next lecture: CAN bus u Then: wireless embedded network

System on Chip Platform Based on OpenCores for Telecommunication Applications

Scalability and Classifications

Mobile Communications TCS 455

Lab Experiment 1: The LPC 2148 Education Board

Next Generation of Railways and Metros wireless communication systems IRSE ASPECT 2012 Alain BERTOUT Alcatel-Lucent

Chapter 2 Heterogeneous Multicore Architecture

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

White Paper Abstract Disclaimer

A Computer Vision System on a Chip: a case study from the automotive domain

Tyrant: A High Performance Storage over IP Switch Engine

SoC IP Interfaces and Infrastructure A Hybrid Approach

The future of mobile networking. David Kessens

Interconnection Networks

HSDPA Mobile Broadband Data A Smarter Approach to UMTS Downlink Data

Going Linux on Massive Multicore

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Hot Issues in Wireless Broadband Networking

Hunting Asynchronous CDC Violations in the Wild

Implementation of Full -Parallelism AES Encryption and Decryption

Mobile. Analyzing, Planning and Optimizing Heterogeneous Mobile Access and Core Networks

International Journal of Scientific & Engineering Research, Volume 4, Issue 6, June ISSN

Multichannel Voice over Internet Protocol Applications on the CARMEL DSP

Transcription:

Microelectronic System Design Research Group University Kaiserslautern www.eit.uni-kl.de/wehn Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations Norbert Wehn wehn@eit.uni-kl.de MPSoC 05, 11-15 July 2005 Relais de Margaux Hotel, France Wireless Implementation Challenges DECT 10 MIPS, GSM 100 MIPS, UMTS x 1000 MIPS Can we close this gap? 1

² Example: Channel Coding Turbo-Codes (1993) Revolution in channel coding e.g. UMTS, DVB, WLAN, CCSD LDPC Codes (1996) Renaissance of Gallagers Work (1960) Competitor to turbo-codes e.g. used in DVB-S2, WiMAX STM Greenside Chip: 2.5G/3G Wireless Infrastructure Chip (130nm technology) Supports WCDMA, CDMA2000, EDGE Decodes up to 256 voice channels & up to 8 384Kbps data channels simultaneously ETM ARM926 GreenSIDE Parallel Com. Interfaces UART Timers ITC UART Timers ITC DMA ST140 DSP ST140 DSP VIC Bus Switch Bus Switch Debu g Cross Bar Core Periph 1 x UARTs 32-bit GPIO System Control PLL 2 x 32-bit Timers COPRO DMA Periph Turbo Decoder 2 x Ethernet Engine MAC Convolutional 2x MSP Decoder UTOPIA Engine External Memory Interface GMM 2 MB (SRAM) Design Space MPSoC 04: Trade-offs different implementation styles ASIC, FPGA, ASIP, DSP, Multiprocessor... Architectural efficiency & design time Architecture Implement. Style Application 2

Generic Decoding Structure Subdecoder 1 Subdecoder 2 PU 1 PU N PU 1 PU N 1. Subdecoder 1 processes one complete block 2. After finishing the calculation, data are sent to subdecoder 2 3. Subdecoder 2 starts block processing... 4. Iterate until stopping crtierion fullfilled Information exchange takes place randomly Tanner graph (Parity check matrix)/ Interleaver Quality of randomness influences communcations performance High Throughput/Low Latency achitectures Interconnect centric architectures Solving Interconnect Bottleneck Crossbar functionality with output blocking conflict (MPSoC 04) Conflict free by code design Solves the interconnect problem at the system level Tanner Graph/Interleaver is designed according to a fixed architectural template with regular interconnect topology e.g. barrelshifter Architecture imposes constraints on the code Impact on communications performance Run-time conflict resolution (MPSoC 04) Largest flexibilty, no impact on communications performance NoC approach 3

Algorithm Design Space Turbo-Decoder UMTS compliant, 166MHz, 180nm technology, Throughput 100Mbit NoC approach, large flexibilty 14 parallel units, area = 16.84 mm 2 (14mm 2 PUs, 2.8mm 2 NoC) Conflict free interleaver design, limited flexibility 10 parallel units, area=11.23 mm 2 (10,3mm 2 PUs, 0.9mm 2 Barrelshifter) Implementing LDPC Decoding on Network-On-Chip, T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin, Int. Conference on VLSI Design 2005 1024 Bit block size, 1.2Gb/s (?), R=0.75 NoC: 5x5 2D mesh, dimension-order order routing, large flexibility 160nm CMOS Technology, 1.8V, synthesis, 500 MHz, 110 mm 2, ~30 Watt A. Blanksby, C. Howland, IEEE Journal on Solid-States States Circuits 1024 Bit block size, 1 Gb/s, Rate=0.5 full hardwired interconnect network, no flexibility at all 160nm CMOS Technology, 1.5 V, synthesis, 64 MHz, 52.5mm 2, ~700mW DVB-S2 LDPC implementation Blocksize 64800 Bit, R=1/4..9/10, 0.7 db to Shannon limit Synthesis results for the DVB-S2 LDPC code decoder RAMs Logic Area [mm²] (0.13um, 270MHz) Channel LLR Messages Addresses/Shuffling Functional Nodes Area [mm²] 1.997 9.117 0.075 10.8 Shuffling Network Total Area [mm²] Control logic Throughput[Mbit/s] @30iter 0.2 0.55 22.74 255 Why is this implementation so efficient? Code was defined with implementation complexity in mind IRA codes: subset of LDPC codes Limited flexibility (11 Tannergraphs) Permutation network - Shuffling network with some tricky memory allocation/assignmentsts 4

Algorithmic Flexibility Complexity Flexibility Interconnect / Memory Logic Throughput LDPC Decoder: synthesizable IP block, 130nm, 10000 bits Full flexibility i.e.. supports every LDPC code: 27 mm 2 Limited flexibility (e.g. WiMax-WLAN): 5 mm 2 This graph is independent of implementation style Analyze carefully the flexibility requirements Application/architecture Codesign Conclusion The Gap can be closed Application/Architecture Codesign Match the architecture & application Just enough flexibility Application Implementation Thank you for listening! For further information please visit http://www.eit.uni-kl.de/wehn 5