System Considerations Interfacing Performance Power Size Ease-of Use Programming Interfacing Debugging Cost Device cost System cost Development cost Time to market Integration Peripherals
Different Needs? Multiple Families! C6000 (C62x/64x/67x) C2000 (C20x/24x/28x) C1x C2x Lowest Cost Control Systems Motor Control Storage Digital Ctrl Systems C5000 (C54x/55x) C5x Efficiency Best MIPS per Watt / Dollar / Size Wireless phones Internet audio players Digital still cameras Modems Telephony VoIP C3x C4x C8x Max Performance with Best Ease-of-Use Multi Channel and Multi Function App's Comm Infrastructure Wireless Base-stations DSL Imaging Multi-media Servers Video
What Problem Are We Trying To Solve? ADC x DSP Y DAC Digital sampling of an analog signal: A t Most DSP algorithms can be expressed with MAC: count Y = Σ a i i = 1 * x i for (i = 1; i < count; i++){ sum += m[i] * n[i]; } What does it take to do this fast and easy?
Fast MAC using only C Multiply-Accumulate (MAC) in Natural C Code for (i = 0; i < count; i++){ sum += m[i] * n[i]; } Fastest Execution of MACs The C6x roadmap... from 200 to 2400 MMACs Ease of C Programming Even using natural C, the C6000 Architecture can perform 2 to 4 MACs per cycle Compiler generates 80-100% efficient code How does the C6000 achieve such performance from C?
'C6000 Architecture: Built for Speed A0.D1.D2.M1.M2 B0 C6000 Compiler excels at Natural C While dual-mac speeds math intensive algorithms, flexibility of 8 independent functional units allows the compiler to quickly perform other types of processing. A15. A31.L1.S1.L2.S2. B15. B31 All C6000 instructions are conditional allowing efficient hardware pipelining Instruction set and hardware orthogonality allow the compiler to achieve 80-100% efficiency Controller/Decoder
Fastest MAC using Natural C float mac(float *m, float *n, int count) { int i, float sum = 0; A0.D1.D2 B0 for (i=0; i < count; i++) { sum += m[i] * n[i]; }. A15. A31.M1.M2.L1.L2.S1.S2 Controller/Decoder. B15. B31 ;** --------------------------------------------------* LOOP: ; PIPED LOOP KERNEL LDDW.D1 A4++,A7:A6 LDDW.D2 B4++,B7:B6 MPYSP.M1X A6,B6,A5 MPYSP.M2X A7,B7,B5 ADDSP.L1 A5,A8,A8 ADDSP.L2 B5,B8,B8 [A1] B.S2 LOOP [A1] SUB.S1 A1,1,A1 ;** --------------------------------------------------*
'C6000 System Block Diagram P E R I P H E R A L S.D1.M1.L1.S1.D2.M2.L2.S2 Looking at the internal buses...
C6000 Program Addr x32 Program Data x256 Data Addr - T1 x32 Data Data - T1 x32/64 Data Addr - T2 x32 Data Data - T2 x32/64 PC A regs B regs Peripherals DMA Addr - Read DMA Data - Read DMA Addr - Write DMA Data - Write DMA
'C6000 System Block Diagram.D1.D2.M1.M2.L1.L2.S1.S2 Next, the internal memory...
4K Program Cache C6711 0000_0000 64KB 64K Prog / Data (Level 2) 0180_0000 8000_0000 On-chip Peripherals 0 128MB 4K Data Cache cache logic cache details 9000_0000 A000_0000 B000_0000 FFFF_FFFF 1 2 3 128MB 128MB 128MB
'C6000 System Block Diagram P E R I P H E R A L S.D1.M1.L1.S1.D2.M2.L2.S2 Looking at each peripheral...
EMIF Async SDRAM EMIF SBSRAM Interface (EMIF).D1.M1.L1.S1.D2.M2.L2.S2 Glueless access to async/sync memory Works with PC100 SDRAM (cheap, fast, and easy!) Byte-wide data access 16, 32, or 64-bit bus widths
HPI / XBUS / PCI XBUS, PCI, Host Port EMIF Parallel Peripheral Interface HPI:.D1.M1.L1.D2.M2 Dedicated, slave-only, async 16/32-bit bus allows host-μp access to C6000 memory.l2 XBUS: Similar to HPI but provides.s1.s2 Master/slave and sync modes Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI: Standard 32-bit, 33MHz PCI interface These interfaces provide means to bootstrap the C6000
GPIO XB, PCI, Host Port GPIO EMIF.D1.D2.M1.M2.L1.L2.S1.S2 General Purpose Input/Output (GPIO) C64x provides 8 or 16 bits of general purpose bitwise I/O Use to observe or control the signal of a single-pin
McBSP and Utopia XB, PCI, Host Port GPIO EMIF McBSP s Utopia.D1.M1.L1.D2.M2 Multi-Channel Buffered Serial Port (McBSP).L2 2 (or 3) full-duplex, synchronous serial-ports Up to 100 Mb/sec performance Supports multi-channel operation (T1, E1,.S1 MVIP,.S2 ) Utopia (C64x) ATM connection 50 MHz wide area network connectivity
DMA / EDMA XB, PCI, Host Port GPIO EMIF McBSP s Utopia DMA, EDMA (Boot).D1.M1.L1.S1.D2.M2 Direct Access (DMA / EDMA).L2.S2 Transfers any set of memory locations to another 4 / 16 / 64 channels (transfer parameter sets) Transfers can be triggered by any interrupt (sync) Operates independent of On reset, provides bootstrap from memory
Timer / Counter XB, PCI, Host Port GPIO EMIF McBSP s Utopia DMA, EDMA (Boot) Timers Timer / Counter Two (or three) 32-bit timer/counters Can generate interrupts Both input and output pins.d1.d2.m1.m2.l1.l2.s1.s2
VCP / TCP -- 3G Wireless XB, PCI, Host Port GPIO Turbo Coprocessor (TCP) EMIF Supports 35 data channels at 384 kbps 3GPP / IS2000 Turbo coder Programmable McBSP s parameters include mode, rate and frame length Utopia.D1.D2 Viterbi Coprocessor (VCP) VCP TCP.M1.S1.M2 Supports >500 DMA, voice EDMA channels at 8 kbps Programmable decoder (Boot) parameters include constraint length, code rate, and frame length.l1.l2 Timers.S2
XB, PCI, Host Port GPIO PLL EMIF PLL McBSP s Utopia DMA, EDMA clock multiplier (Boot) Reduces EMI and Timers cost Pin selectable VCP TCP PLL Input.D1 CLKIN Output.M1.L1.D2.M2.L2 CLKOUT1 - Output.S1 rate.s2of PLL - Instruction (MIP) rate CLKOUT2-1/2 rate of CLKOUT1
'C6000 Peripherals XB, PCI, Host Port GPIO EMIF McBSP s Utopia DMA, EDMA (Boot) Timers VCP TCP.D1.D2.M1.M2.L1.L2.S1.S2 PLL
C6000 Roadmap Software Compatible Multi-core C64x DSP 1.1 GHz Floating Point Performance 1st Generation C6203 C6202 C6204 C6201 C6211 C6701 C6711 2nd Generation C64x DSP General Purpose C6414 C6415 C6416 C62x C6205 C67x C6712 Media Gateway Time 3G Wireless Infrastructure