Advanced Microcontrollers Grzegorz Budzyń Lecture 11: Digital Signal Controllers & Digital Signal Processors
Plan Digital Signal Controllers Introduction Digital Signal Controllers vs Microcontrollers Digital Signal Controllers vs Digital Signal Processors DSC by Texas Instruments DSC by Freescale Digital Signal Processors
Digital Signal Controllers
Introduction Digital Signal Controller(DSC) is a combination of a Microcontroller and a Digital Signal Processor(DSP)
Introduction - Like microcontrollers DSC have: - Fast interrupt response - Controlorientedperipherals(PWM, watchdog, etc.) - Usually programmed in C++ language(although assembler programming possible)
Introduction - Like digital signal processors DSC have: - Single cyclemultiply-and-accumulatemac instructions - Barrelshifters - LargeAccumulators
Introduction - Main applications of DSCs: - Motor control - Power conversion - Sensor processing applications
Introduction - MainDSC vedors: - TexasInstruments - NXP - Microchip - Infineon - Renesas
DSC by Microchip
dspic applicationareas
Source: [1]
dspic-architecture dspic familyof16 bit RISC controllerswith DSP features Two subfamilies: dspic30f smaller, slower dspic33f highest performance The only digital signal controller on the market available in QFN-28 cases at prices down to 3$!!!
dspic-architecture Main features: Modified Harvard architecture, Optimized for C-compilers Two 40-bit accumulators with rounding Memoryoptionsas inpic24 Many single-cycle MAC operations Cases18 to 110 pins
DSC by Texas Instruments
DSC -TI portfolio Four main subfamilies: 24x 16-bit Series 28x Fixedpoint Series 28x Piccolo Series 28x Delfino Floating-point Series
DSC -TI portfolio Source: [2]
TMS320F2802x: Piccolo Series fixed point microcontrollers 40-60MHz performance up to 64KB of on-chip flash small 38-pin package options feature rich peripherals: 150-ps high resolution enhanced pulse width modulators (epwms) 4.6 MSPS 12-bit ADC high precision on-chip oscillators, analog comparators high speed 12-bit ADC support for I2C, SPI, and SCI.
Piccolo Series - TMS320F2803x: - fixed point 32-bit microcontrollers - 60 MHz speed - up to 128KB flash memory - 64 or 80-pin packages - peripherals and features of the 2802x devices plus: - control law accelerator (CLA) for high efficiency control loops - QEP module - CAN and LIN interfaces
TMS320F2833x: DelfinoSeries Integratedfloating point unit simplifies development and speeds control applications up by an average of 50% F2833x devices run at up to 150 MHz (300 MFLOPS) with two package offerings that are pinfor-pin compatible within all F2833x and F2823x controllers Featuresup to 512KB of on-chip flash and a DMA for high speed memory access.
DelfinoSeries TMS320C2834x: delivers up to 600 MFLOPS of floating-point performance up to 516KB of single-access RAM PWMswith 65-ps Direct Memory Access and a low-latency core make the C2834x an excellent solution for performance-hungry real-time control applications.
28x Fixed-PointSeries TMS320F2823x: F2823x generation of controllers is a fixed point version of the F2833x devices Pin-to-pin compatible with the F2833x series, all of the peripherals and features remain the same except for the floating point unit.
28x Fixed-PointSeries TMS320F280x: device offers 60-100Mhz performance were the first generation to feature: the on-chip 12.5 MSPS 12-bit ADC multiple high resolution PWM peripherals QEP (quadrature encoder pulse) F280xx devices have up to 256KB of flash memory.
28x Fixed-PointSeries TMS320F281x: F281x device generation features: 150Mhz core flexible Event Managers that provide access to timers, compare/pwm units, captures, and quadratureencoder units
Source: [3] C2000 Architecture
Main features: C2000 corefeatures Efficient C engine with hardware that allows a C compiler to generate compact code, resulting in industry-leading code density Single cycle read-modify-write instructions, single cycle 32-bit multiply. Fast interrupt service time (down to 9 cycles) with automatic zero-cycle context save. 96 dedicated interrupt vectors that require no software decision making
Main features: C2000 corefeatures 32-bit floating-point unit on Delfinocontrollers On select Piccolo devices, an independent Control Law Accelerator (CLA) processes floating-point control loops to free the CPU for other purposes. Three 32-bit general purpose CPU timers brings accuracy and flexibility to any applications. Code Security Module prevents reverse engineering and protects valuable intellectual property
DSC by Freescale
Freescalemicrocontrollers portfolio DSC by Freescale Source: [4]
FreescaleDSC portfolio DSC by Freescale Source: [4]
56F8000 blockdiagram DSC by Freescale Source: [5]
56F8000 details DSC by Freescale
56F8000 application DSC by Freescale Source: [5]
56F8000 application DSC by Freescale Source: [5]
56F8000 application DSC by Freescale Source: [5]
56F8000 application DSC by Freescale Source: [5]
56F8XXX comparison Source: [5]
C2000 corefeatures Source: [5]
Main features: MC56F8357 On-chip memory includes high-speed volatile and nonvolatile components: 512 KB of Program Flash 4 KB of Program RAM (836X Devices) 32 KB of Data RAM 32 KB of Data Flash (836X Devices) 32 KB of Boot Flash Access up to 4MB of off-chip program and 32MB of data memory Up to 60 MIPS at 60 MHz execution frequency
Main features: MC56F8357 Four 12-bit, Analog-to-Digital Converters Temperature Sensor Up to two FlexCAN(CAN Version 2.0 B-compliant) Two Serial Communication Interfaces (SCIs) Up to two Serial Peripheral Interfaces (SPIs) Two dedicated external interrupt pins Software-programmable Phase-Lock Loop
MC56F8357
Main features 1/2: MC56F8357 -Core Efficient 16-bit 56800E family controller engine with dual Harvard architecture Single-cycle 16 16-bit parallel Multiplier- Accumulator (MAC) Four 36-bit accumulators, including extension bits Arithmetic and logic multi-bit shifter Parallel instruction set with unique DSP addressing modes Hardware DO and REP loops
Main features 2/2: MC56F8357 -Core Three internal address buses and one external address bus Four internal data buses and one external data bus Instruction set supports both DSP and controller functions Controller-style addressing modes and instructions for compact code Efficient C compiler and local variable support Software subroutine and interrupt stack with depth limited only by memory
MC56F8357 Memory
MC56F8357 Corearchitecture
MC56F8357 Corearchitecture
MC56F8357 Corepipeline
MC56F8357 Corepipeline
Digital Signal Processors
DSP Introduction - Digital Signal Processing: application of mathematical operations to digitally represented signals - Signals represented digitally assequences of samples - Digital signals obtained from physical signals via tranducers(e.g., microphones) and analog to-digital converters (ADC)
DSP Introduction - Digital signals converted back to physical signals via digital-to-analog converters (DAC) - Digital Signal Processor (DSP):electronic system that processes digital signals
DSP Introduction - Most DSP tasks require: - Repetitive numeric computations - Attention to numeric fidelity - High memory bandwidth, mostly via array accesses - Real-time processing
DSP Introduction - DSPs must perform these tasks efficiently while minimizing: - Cost - Power - Memory use - Development time
CommonDSP applications - Applications Instrumentation and measurement: - Communications - Audio and video processing - Graphics, image enhancement, 3- D rendering - Navigation, radar, GPS - Control - robotics, machine vision, guidance
- Algorithms CommonDSP algorithms - Frequency domain filtering - FIR and IIR - Frequency- time transformations - FFT - Correlation
FIR algorithm
CommonDSP architecture
Requirements<-> Realisations
Fast data access - Needof transferring data to / from memory or DSP peripherals - Need of retrieving instructions from memory - Three main implementations: - high-bandwidth memory architectures specialized addressing modes direct memory access
High-bandwidth memory architectures
High-bandwidth memory architectures - Only Harvard (b) and Super-Harvard (c) used in DSPs Super-Harvard modification-adding to the DSP core a small bank of fast memory, called instruction cache Dataarealsoallowedto be stored in the program memory The last-executed program instructions are relocated at run time in the instruction cache
High-bandwidth memory architectures Alsodata-cachefor fastaccessto data is sometimes present
High-bandwidth memory architectures
High-bandwidth memory architectures - Cache drawbacks: Problemscausedby thelackoffullpredictabilityfor cache hits A missingcachehit happenswhenthedata orthe instructionsneededby thedsp arenot storedin cachememory, hencetheyhaveto be fetchedfroma slower memory with an execution speed penalty A situationcausinga missingcachehit is, for instance, the flow change due to branch instructions.
Specializedaddressing modes Address generator blocks controls the address generation for: specialized addressing modes such as indexing addressing, circular buffers, and bit-reversal addressing
Specializedaddressing modes Circularbuffers userfor examplein the implementation of digital filters
Specializedaddressing modes Bit-reversaladdressing necessaryfor FFT (butterfly)
Directmemory access The DMA controller is a second processor working in parallel with the DSP core Itisdedicated totransferring information between two memory areas or between peripherals and memory The DMA controller frees the DSP core for other processing tasks
Directmemory access
Directmemory access
Fast computation MAC centered The MAC operation is used by many digital processing algorithms The basic DSP arithmetic processing blocks are: a) many registers b) one or moremultipliers c) one or more Arithmetic Logic Units (ALUs) d) one or more shifters
Fast computation MAC centered
Instructionpipelining Instruction pipelining consists of: dividing the execution of instructions into different stages executing the different instructions inparallel stages. The net result is an increased throughput of the instruction execution.
Parallelarchitectures Parallel-enhanced DSP architectures started to appear on the market in the mid 1990s and were based: on instruction-level parallelism(vliw), data-level parallelism(simd), a combination of both
Parallelarchitectures-VLIW
Parallelarchitectures-VLIW In VLIW many instructions areissued at the same time and are executed in parallel by multiple execution units Characteristics of VLIW architectures include simple and regular instruction sets Instruction scheduling is done at compile-time and not at run-time writing assembly code for VLIW architecture is very complex and the optimization is often better left to the compiler
Parallelarchitectures-SIMD
Parallelarchitectures-SIMD SIMD architectures are based on data-level parallelism Onlyone instruction is issued at atime The same operation specified by the instruction is performed on multiple data sets
Numericalfidelity It isessential that the numerical fidelity be maximized Theerrors due to the finite number of bits used in the number representation and in the arithmetic operations should be minimized Improvingnumericalfidelitycanbe doneby changingthe numeric representation orby dedicated hardware features
Numericalfidelity DSP canbe categorizedinto: Fixed point (up to 64-bit, fractional arithmetic) Floating point (32- or 64-bit)
Fast executioncontrol Itisimportantthattheprogram inthedsp is executed in a deterministic way Interrupts have to be serviced with minimal latency An important DSPfeature is the implementation by hardware of looping constructs, referred to as zero-overhead hardware loop - e.g. RPT #2 NOP
Digital Signal Processor example TI Multicore DSP+ARM KeyStone II System-on-Chip(SoC)
TI 66AK2H12 Upto 5.6 GHzofARM and9.6 GHzofDSP processing coupled with: security, packetprocessing, Ethernet Therawcomputationalperformance is38.4 GMACS/core and 19.2 Gflops/core(@ 1.2 GHz operating frequency)
TI 66AK2H12 Eight TMS320C66x DSP Core Subsystems Each With Upto 1.2 GHzC66x Fixed/Floating-PointDSP Cores 38.4 GMacs/Corefor FixedPoint @ 1.2 GHz 19.2 GFlops/Corefor FloatingPoint @ 1.2 GHz Memory 32K ByteL1P Per Core 32K ByteL1D Per Core 1024K ByteLocalL2 Per Core
TI 66AK2H12 ARM Cortex -A15 MPCore Processors Containing Four ARM Cortex-A15 Cores Up to 1.4-GHz Cortex-A15 Processor Core Speed 4MB L2 Cache Memory Shared by All ARMCores Full Implementation of ARMv7-A ArchitectureInstruction Set 32KB L1 Instruction Cache and Data Cache percortex-a15 Processor Core AMBA 4.0 AXI Coherency Extension (ACE) MasterPort, Connected to MSMC(MulticoreShared Memory Controller) for Low Latency Access to Shared MSMC SRAM
TI 66AK2H12 Network Coprocessor Packet Accelerator Enables Support for Transport Plane IPsec, GTP-U, SCTP, PDCP L2 User Plane PDCP (RoHC, Air Ciphering) 1 GbpsWire Speed Throughput at 1.5 MPackets Per Second Security Accelerator Engine Enables Support for IPSec, SRTP, 3GPP and WiMAX Air Interface, and SSL/TLS Security ECB, CBC, CTR, F8, A5/3, CCM, GCM, HMAC,CMAC, GMAC, AES, DES, 3DES, Kasumi, SNOW 3G, SHA-1, SHA-2 (256-bit Hash), MD5 Up To 6.4 GbpsIPSecand 3 GbpsAir Ciphering Ethernet Subsystem Five SGMII Port Switch
Peripherals Four Lanes of SRIO 2.1 TI 66AK2H12 5 GbpsOperation Per Lane Supports Direct I/O, Message Passing Two Lanes PCIeGen2 Supports Up To 5 GBaud TwoHyperLink Supports Connections to Other KeyStoneArchitecture Devices Supports Up To 50 GBaud Five Enhanced Direct Memory Access (EDMA) Modules
Peripherals TI 66AK2H12 Two 72-Bit DDR3 Interfaces with Speeds Up To1600 MHz USB 3.0 Two UART Interfaces Three I2C Interfaces 32 GPIO Pins Three SPI Interfaces Semaphore Module Twenty 64-Bit Timers Five On-Chip PLLs
Keystonearchitecture Highperformance structure for integrating RISC and DSP coreswith application-specific coprocessors and I/O Four main hardware elements: MulticoreNavigator, TeraNet, Multicore Shared Memory Controller HyperLink
Keystonearchitecture Multicore Navigator: A packet-based manager that controls 16k queues When tasks are allocated to the queues, Multicore Navigator provides hardware-accelerated dispatch that directs tasks to the appropriateavailable hardware TeraNet: central resource to move packetswith2 Tbps capacity!
Keystonearchitecture Multicore Shared Memory Controller: enables processing cores to accessshared memory directly without drawing from the TeraNet s capacity no blocking of packet moevement by memory access HyperLink: provides a 50-GBaud chip-level interconnect Working with Multicore Navigator, HyperLink dispatches tasks to tandem devices transparently and executes tasks as if they arerunning on local resources
Thank you for your attention
Interestingparameters
Thank you for your attention
References [1] dspic family documentation; www.microchip.com [2] www.ti.com [3] C2000 family documentation; www.ti.com [4] www.freescale.com [5] 56F8000 family documentation; www.freescale.com [6] http://www.coe.pku.edu.cn/tpic/2010913102418831.pdf [7] http://www.dspguide.com/ch28.pdf [8] http://www.cs.berkeley.edu/~pattrsn/252s98/lec08-dsp.pdf [9] http://www.ti.com/lit/ds/symlink/66ak2h12.pdf