SoC-Based Microcontroller Bus Design In High Bandwidth Embedded Applications



Similar documents
Computer Organization & Architecture Lecture #19

Digi Connectware Manager:

Computer Systems Structure Input/Output

Redundancy in Serial-to-Ethernet Communications. White Paper

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

What is a System on a Chip?

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

STM32 F-2 series High-performance Cortex-M3 MCUs

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Chapter 02: Computer Organization. Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures

SoC IP Interfaces and Infrastructure A Hybrid Approach

Building Blocks for PRU Development

7a. System-on-chip design and prototyping platforms

SPI I2C LIN Ethernet. u Today: Wired embedded networks. u Next lecture: CAN bus u Then: wireless embedded network

Network connectivity controllers

Freescale Semiconductor, Inc. Product Brief Integrated Portable System Processor DragonBall ΤΜ

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Understanding LCD Memory and Bus Bandwidth Requirements ColdFire, LCD, and Crossbar Switch

OpenSPARC T1 Processor

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Chapter 13. PIC Family Microcontroller

Pre-tested System-on-Chip Design. Accelerates PLD Development

ARM Ltd 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.

Technology Note. PCI Express

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

System Design Issues in Embedded Processing

What is LOG Storm and what is it useful for?

Serial Communications

USER GUIDE EDBG. Description

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Read this before starting!

Microtronics technologies Mobile:

Lab Experiment 1: The LPC 2148 Education Board

PCI Express* Ethernet Networking

Bandwidth Calculations for SA-1100 Processor LCD Displays

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

ARM Cortex STM series

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 5 INPUT/OUTPUT UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

Chapter 13 Selected Storage Systems and Interface

10-/100-Mbps Ethernet Media Access Controller (MAC) Core

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC Microprocessor & Microcontroller Year/Sem : II/IV

White Paper Abstract Disclaimer

Lecture N -1- PHYS Microcontrollers

ontroller LSI with Built-in High- Performance Graphic Functions for Automotive Applications

Consolidating Multiple Network Appliances

USB 3.0 Connectivity using the Cypress EZ-USB FX3 Controller

Chapter 2 Heterogeneous Multicore Architecture

Serial port interface for microcontroller embedded into integrated power meter

Concept Engineering Adds JavaScript-based Web Capabilities to Nlview at DAC 2016

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

WD Hard Drive Interface Guide

System-on-a-Chip with Security Modules for Network Home Electric Appliances

Introduction to PCI Express Positioning Information

Architectures and Platforms

The Bus (PCI and PCI-Express)

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

Design and Verification of Nine port Network Router

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Embedded Development Tools

What s New in Mike Bailey LabVIEW Technical Evangelist. uk.ni.com

Software User Guide UG-461

The Dusk of FireWire - The Dawn of USB 3.0

UMBC. ISA is the oldest of all these and today s computers still have a ISA bus interface. in form of an ISA slot (connection) on the main board.

Gigabit Ethernet and Pleora s iport Connectivity Solution

White Paper. Freescale s Embedded Hypervisor for QorIQ P4 Series Communications Platform

21152 PCI-to-PCI Bridge

Communicating with devices

How PCI Express Works (by Tracy V. Wilson)

SmartDesign MSS. How to Create a MSS and Fabric AMBA AHBLite/APB3 Design (MSS Master Mode)

Hello, and welcome to this presentation of the STM32 SDMMC controller module. It covers the main features of the controller which is used to connect

AVR1309: Using the XMEGA SPI. 8-bit Microcontrollers. Application Note. Features. 1 Introduction SCK MOSI MISO SS

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

Chapter 1 Computer System Overview

PCI Express Overview. And, by the way, they need to do it in less time.

PEX 8748, PCI Express Gen 3 Switch, 48 Lanes, 12 Ports

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

AN LPC1700 timer triggered memory to GPIO data transfer. Document information. LPC1700, GPIO, DMA, Timer0, Sleep Mode

Open Flow Controller and Switch Datasheet

Application Note: AN00141 xcore-xa - Application Development

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education


High-Speed SERDES Interfaces In High Value FPGAs

Eureka Technology. Understanding SD, SDIO and MMC Interface. by Eureka Technology Inc. May 26th, Copyright (C) All Rights Reserved

Mbps PCI LAN Controller A Guide to Compatibility

Designing an efficient Programmable Logic Controller using Programmable System On Chip

Vess. Architectural & Engineering Specifications For Video Surveillance. A2200 Series. Version: 1.2 Feb, 2013

ebus Player Quick Start Guide

AND8336. Design Examples of On Board Dual Supply Voltage Logic Translators. Prepared by: Jim Lepkowski ON Semiconductor.

Operator Touch Panel PC OTP/57V esom2586 / x86

A New Chapter for System Designs Using NAND Flash Memory

A case study of mobile SoC architecture design based on transaction-level modeling

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Transcription:

SoC-Based Microcontroller Bus Design In High Bandwidth Embedded Applications White Paper Abstract 32-bit embedded designs increasingly require real-time control of highbandwidth data streams over a network. At the System-on-Chip (SoC) level, especially, it is crucial to move data and control information in a deterministic and non-contentious manner. It is also important that operations be directly under the programmable control of the system developer, which has not always been the case in bus-based SoC designs. Designers and chip providers frequently borrow from board- and system-level architectural techniques to create SoC designs with minimal design time and development costs. Since appliances such as printers and PDAs seldom require deterministic, real-time responses, traditional solutions work very well for such applications. But in many of the new networkconnected embedded designs, traditional bus architectures simply cannot handle the bus-sharing demands for high bandwidth and intense data flow. This is particularly true of applications such as Human-Machine Interface (HMI) displays, point-of-sale terminals, color printers and copiers, network-enabled projectors, surveillance cameras and others. Devices managing the combination of real-time data, a display and network activity need a deterministic way to share bus bandwidth. A number of alternative approaches based upon on-chip serial interconnects similar to serial fabrics, crossbar switches and packet-based busses are being researched. But until these new approaches are perfected, time and cost constraints dictate that ways be found to modify the sharedbus architectures borrowed from board design to accommodate the deterministic and real-time requirements of the new embedded 32-bit network-connected designs.

Traditional SoC Bus Advantages and Disadvantages SoC developers have been reluctant to give up the generic shared bus borrowed from the board-level world because it reduces the specification and validation effort in the design cycle and makes SoC top-level integration almost as simple as plugging expansion cards onto a back plane. By using generic buses, developers are free to concentrate on higherlevel decisions. ARM Limited's use of a generic bus in the Advanced Microcontroller Bus Architecture (AMBA) has allowed licensees to focus on the application at hand and move quickly to market. Microprocessors, DMA controllers, memory controllers and other higher performance blocks are connected via the AHB. Lower-performance blocks such as UARTs, General Purpose Input/Output (GPIO) and Timers are suited for connection to the APB. But many high-end embedded applications targeted by ARM-based SoCs require that while dealing with the deterministic, real-time requirements of the application, they are also able to access a high bandwidth network environment. Such applications require an SoC that issues control signals, collects data, and moves data across the network in real time. Depending on the nature of the network and its bandwidth requirements, this can stretch the capabilities of existing SoC-bus architectures to their limit. For example, a high-end network-connected embedded application may be handling video bit streams from a camera or graphics from a server to a printer via an Ethernet connection, and at the same time may be updating a local LCD display with precise requirements as to scan, refresh and update cycles. Working with the external LCD, the controller must know the precise number of bytes to be sent over the bus, the order in which the data is sent, and the specific time slots in which the data must be presented to the display and in what order. It is also necessary to constantly feed information to the LCD for updating. Adding Burst Mode DMA to the Peripheral Bus The traditional approach to the peripheral bus in AMBA-based designs assumes low performance applications for ARM core-based embedded devices. But today's devices frequently have one or more applications that must operate at high bandwidth without cutting off lower bandwidth peripherals from access to bus resources. This issue is particularly problematic in a peripheral-rich design such as the NS9750 or NS9360, which, in addition to providing high speed I/O for 10/100 Ethernet, LCD, external DMA, and PCI, provides low speed support for USB, I 2 C, four multifunctional serial modules (selectable as UART or SPI capable of up to 11 Mbps in the synchronous mode), 50 individually programmable GPIO pins, an IEEE 1284 peripheral port, and sixteen general purpose timers or counters, each with its own I/O pin. In traditional implementations of the APB, the low transfer rates of communications peripherals such as UARTs are more than adequately handled by the inclusion of FIFOs, which allow several bytes to be transferred tothe interface before the processorneeds to intervene and accessthe APB. But in many of the high-end embedded applications summarized in the article, one or more of these peripherals may have high bandwidth requirements that require immediate access to the main high-performance bus via the APB/AHB bridge. One way to allow the peripheral bus to operate in such a burst mode is to simply replace the APB bus with a burst-mode peripheral bus (Digi's BBUS) that has not one, but four bus masters each with burst mode support (Figure 3). One bus master is a DMA engine that has 13 channels supporting 12 USB end-points. A second bus master is a DMA engine that has 12 channels supporting the four serial modules with eight channels each as well as the 1284 port. The third bus master, the BBUS-to-AHB bridge, has a DMA engine with channels that provides an access to the AHB system bus. The fourth bus master is a USB host module. Additionally, this DMA engine has two separate dedicated DMA channels to support external peripherals connected to the external memory bus. To facilitate burst mode conditions, each internal DMA channel moves data between the system memory and the BBUS peripherals in a fly-by mode, while both external DMA channels use memory-to-memory style transfers. 2

The shared bus concept is not sufficient to meet such requirements in an SoC. In a typical AHB design all of the primary resources on that bus are bus masters, which means that when the bus is free, they can requisition the bus for the time necessary to accomplish a task. But in an ARM-based SoC there is no direct control by the programmer over how much of the bus resources they acquire when in control of the bus. There are a number of ways a shared bus architecture can prioritize these operations: daisy chain arbitration, centralized parallel arbitration, distributed arbitration by self-selection or collision detection, and bus arbitration with multiple bus requesters. But when the designated master takes over the bus, other operations are pushed aside. There is no mechanism by which multiple resources can gain access to the bus to the degree necessary to satisfy the application requirements without impacting the ability of other important operations to deliver deterministic and real-time responses. Figure 1: NS9xxx AHB-bus bandwidth control system One common technique used within the AMBA environment to deal with such situations is the use of arbitration channels. If there are six bus masters, the bus is designed with six arbitration channels. But rather than dedicate each channel to a particular master, on-chip arbitration logic assigns the channels based on the number of masters requesting access to the bus. If four masters are requesting the bus, the six channels are divided among the four, ensuring that each master gets equal access to the bus. However, this does not solve the basic problem of how to assign enough bus bandwidth to accomplish a particular task. If one of the operations needs three channels and the other operations cumulatively only require two, each will be assigned an equal amount of the available channel space. The result: several channels will be underused, others not used at all, and some channels will be overloaded, impacting the SoC's ability to respond to events deterministically and with sufficiently low latency. The Solution: A Programmable Bus Bandwidth Control System What is required is a programmable bus bandwidth allocation scheme that gives a particular master the bus allocation it needs at that particular moment, and allocates the remaining bus space to the other masters who may also be asking for access to the bus. And because this may change over time, some mechanism is needed to reallocate bus resources on a regular basis. Digi has developed a new bandwidth control system to replace the one used with the AMBA architecture (patent granted, patent #6, 826, 640 B1). The system is based on the use of a 16-slot rotating priority bus arbiter (Figure 1) that incorporates a set of programmable pseudo-random or rotating priority cache replacement algorithms. In the case of the NetSilicon NS9750 (Figure 2), for example, rather than contending for allocation of six channels on the AHB, the six bus masters share access to a 16-slot bus allocation scheme. Via dedicated registers in the system control module, the system developer now has three ways to allocate bus resources within the SoC. At the highest level, every time a particular bus master issues a request for access it would be responded to in the order of the request, until all six masters are polled. Based on needed bandwidth, each bus master is then assigned a specific number of slots and has exclusive access to those allocated slots. For example, if four slots in the NS9750 are assigned to the CPU, four slots to Ethernet, four slots to the BBUS Bridge (see sidebar), three slots to the LCD, and three slots to the PCI/Cardbus, this arrangement would be re-evaluated as needed by system software during the system's operation, which can be keyed to the number of AHB bus cycles. If the situation is unchanged on the next evaluation cycle, the setup remains as before. If it has changed, a new set of bus master slot assignments is negotiated. 3

For even more precise control of bus resources, this cyclical arbitration scheme provides two additional levels of programmability: the amount of bus bandwidth allocated to the ARM CPU and the degree of bandwidth utilization on each of the 16 slots. Rather than being allowed to take control of all of the bus resources when it is the bus master, the NS9750's ARM926EJ-S core is by default allowed only 50 percent of the total bus bandwidth, or eight of the 16 slots. This ensures that the other five bus masters have at least 50 percent of the bus between them at all times. However, under direct programmer control it can be directed to release some of its allocation to another bus master or to take control of additional slots for just that bus arbitration cycle, or for any number of cycles the programmer determines is required. The programmer can also select a 100, 75, 50 or 25 percent bandwidth utilization coefficient per slot. This is done by controlling when and in what sequence access to each slot is allocated: for a 25 percent coefficient, that particular slot would be polled only once every four cycles; 50 percent, every other cycle; and 75 percent, three out of four cycles. NS9750 352-Pin BGA; Lead-Free, RoHS Compliant 27-Channel DMA Figure 2: NS9750 block diagram Programming the Rotating Bus Arbiter The various options can be specified by the programmer via several registers contained in the system control module. The first is a 16-entry Bus Request Configuration (BRC) register. Each entry in this register represents a bus request for a master and a grant slot. Each request/grant slot is assigned to only one bus master at a time, but each bus master can be connected to multiple request/grant slots at the same time, depending on the bandwidth requirement of that bus master. When multiple channels are assigned to one master, these channels should be evenly distributed among the 16 channels. Each request/grant slot has a two-bit Bandwidth Reduction Field (BRF) to determine how often each slot can arbitrate for the system bus - 100, 75, 50 or 25 percent. The BRC gates the bus requesting signals going into a second 16-entry Bus Request Register (BRR). As a default, unassigned slots in the BRC block the corresponding BRR entries from being set by any bus request signals. A fourth register is used to store which bus master has data waiting for transfer to the AHB, while a fifth register is used by the programmer to assign weighted values to each bus request and grant slot assigned to a particular bus master. 4

Cyclic Arbitration in Action In the example earlier, where the LCD on a particular arbitration reassignment schedule requests additional bus access, the programmer can specify that the LCD should be given priority, based on the nature of the data stream it must handle. If the programmer decides that ten slots need to be assigned specifically to the LCD controller, the six slots left would then be assigned to the other bus masters on the original arbitration schedule. In this situation, the LCD controller would have available ten times the bandwidth it normally would have and ten times the bandwidth of the other masters to handle the load dictated in this particular situation. This capability would be crucial when the device is streaming data over the Ethernet connection at the same time the LCD screen is being refreshed. The LCD needs to be refreshed in a timely, deterministic manner, uninterrupted by the demands of the Ethernet. In a typical AMBA bus architecture, if the LCD requested the bus it would have to wait until the Ethernet master released it, no matter what the refresh requirements. With the new cyclic programmable arbitration scheme, the programmer can "back off" the Ethernet transfer, allow it to stream data out at a lower but still acceptable rate, guaranteeing that the LCD would get its appropriate level of refresh, so that the screen will not go blank. Where the requirements of the LCD's timing and bandwidth for guaranteeing an active display are very precise, the Ethernet protocol requirements are much more forgiving of lower rates of transmission. But it is not acceptable for the data stream to be stopped, which would have been the case if the LCD master had control of the bus and only gave it up when the refresh chores were completed. Advantages of NS9xxx Architecture In high bandwidth embedded systems, programmers must optimize throughput of multiple parallel data streams. The patented Digi solution described above gives programmers a powerful tool to meet specific requirements of their applications. As sub-micron process geometries continue to evolve, system on chip integration will incorporate more and more diverse functionality into a given device. This will require that RTOS and DMA architectures provide the systems engineer with flexible control over his system resources. As IP networking capability continues its penetration of embedded electronics, Digi is committed to providing the hardware and software services to make this intergration successful. Figure 3: NS9xxx bus architecture 11001 Bren Road E. Minnetonka, MN 55343 U.S.A. PH: 877-912-3444 952-912-3444 FX: 952-912-4952 France 31 rue des Poissonniers 92 200 Neuilly sur Seine PH +33-1-55-61-98-98 FX +33-1-55-61-98-99 KK NES Building South 8F 22-14 Sakuragaoka-cho, Shibuya-ku Tokyo 150-0031, Japan PH +81-3-5428-0261 FX +81-3-5428-0262 (HK) Limited Suite 1703-05, 17/F., K Wah Centre 191 Java Road North Point, Hong Kong PH: +852-2833-1008 FX: +852-2572-9989 www.digi.cn 2005-2006 Inc. Digi,, the Digi logo, NetSilicon and NET+Works are trademarks or registered trademarks in the United States and other countries worldwide. ARM and NET+ARM are trademarks or registered trademarks of ARM Limited. All other trademarks are the property of their respective owners. email: info@digi.com 91001317 B1/306