How to realize high-performance compute with Multicore 1
C667x Target Applications (Non- Telecom) Mission Critical Test and Automation HPC, Imaging and Medical Video Infrastructure Infrastructure Audio Emerging Others Emerging Broadband Innovations
RF and Communication Applications Military & Defense Avionics Govt & Public Safety Application ISR (Intelligence/Surveillance/Reconnaissance) o SIGINT/COMINT/Signal Generators Military Communications. o SDR(JTRS)-Manpack/LMR/Fixed o Comm. Infra - VoIP/Video Gateways Satellite\Avionics Communications o Ground Receiver/Repeaters o Weather Radar FAA Civil Aviation/Govt Comm. Conventional PS TETRA/APCO/E911 o Wireless Infrastructure o Comm. Infra - VoIP/Video Gateways Emerging Broadband (OFDM/LTE/WiMAX) o Utilities/Transport/Smart Grid Key Customer Careabouts Long Term Partnership Financial Stability Strong Roadmap and R&D Floating Point Performnce Size, Weight, and Power (SWaP) I/O Bandwidth Longevity of supply (10+yrs) 3 3
RF and Comm. Product Requirements End Product Need Support Multiple Waveforms Common Platform for TDMA/CDMA/OFDMA Multi-channel VoIP/Video capability Support FEC and Modulation TCP/IP Networking support Requirement Needs Raw Performance in terms of MIPS/GHz/MMACS Floating Point Capable ISA to achieve precision and high GFLOPS. Large On Chip RAM Reduce accesses to slow external memory. High Speed External Memory Interface Large addressable memory Efficient DMA architecture Wireless specific accelerators and TCP/IP Offload 4
Imaging Product Requirements End Product Need High BW Interface RF Front End and Telecom ports Connect Multiple s on a board e.g. in ATCA Card High BW Backplane and Network Connectivity Reliability in Mission Critical Designs Low Power Design Ease of Use Requirement Needs multiple high speed interfaces PCIe,Serial RapidIO OBSAI/CPRI Interface Gigabit Ethernet etc Memory Error Correction & Checking (ECC) Efficient Low Power s Support Extended Temp ranges from -40 o C to 105 o C and others Temp Dev and Debug Tools Multicore S/W Frameworks Signal/Image Processing functions. VoIP Library Audio/Video Codecs 5
Introducing Keystone Architecture (C66x) The Best Combination of Performance (GHz) and Power Consumption in the Industry 16GFLOPs & 32GMACS per Core @ 1GHz C64x+ Core (Fixed pt) Next-Generation C66x Core C64x+ Lowest Power Highest Performance Core C67x Core (Floating pt) Fixed Point Floating Point NEW MultiCore C66x Fixed and Floating-point Core @ 1.25 GHz 4x C64x+ MAC (32) 4xC67x Fl pt MAC(8) 16FLOP/cy compared to 6FLOP/cy 8 Core C6678 based on C66x core delivers 320 GMACs/160GFLOPS @ 1.25GHz/Core (effectively a 10GHz ) 100% Code Compatible with all C64x (fixed) & C67x (floating) Devices C67xx Similar Power Profiles as C64x Core Industry s Lowest Power FP Core High precision and wide dynamic range KEYSTONE Architecture Supported by Code Composer Studio IDE 6
Unmatched Performance BDTImark2000 TM Score ADI 2116x (SHARC) ADI 2126x (SHARC) ADI 213xx (SHARC) ADI TS201S (TigerSHARC) ADI TS202S/203S (TigerSHARC) Intell Pentium III Renesas SH77xx (SH-4) TMS320C67x TMS320C66xx NEC upd77050 ADI BF5xx (Blackfin) ADI TS201S(TigerSHARC) ADI TS202S/203S (TigerSHARC) Freescale MSC81xx (SC140) Freescale MSC814x (SC3400) Freescale MSC815x (SC3850) TMS320C64x+ TMS320C66xx 0 2000 4000 6000 8000 10000 12000 14000 BDTI Score for Floating Point Processors 0 5000 10000 15000 20000 25000 BDTI Score for Fixed Point Processors Algorithm C67x @ 300MHz C64x+ @1.2GHz C66x @1.25GHz Gain Single Precision Floating Point FFT, 2048 pt, Radix 4 86.84 us 14.00 us* ~600% Fixed Point FFT, 2048 pt, Radix 4 8.23 us 4.46 us* ~200% FIR Filter, 40 samples, 40 taps 0.69 us 0.34 us* ~200% Matrix Multiply 32 x 32 17.92 us 6.16 us* ~300% Matrix Inverse 4 x 4 0.53 us 0.13 us* ~400%
TI Multicore KeyStone Architecture Multicore Navigator Multicore Shared Memory Controller Network on Chip C66x, ARM Processing Cores Highest Integration Cost & Power Common Architecture Portable Software Scalable Tailored Solutions TeraNet 2 Shared Memory System Management (Debug, Clocking, Power) Application Accelerator Application Accelerator Navigator Innovative Multi-core Floating Point Development Time Tools & Debugging R&D Efficiency Quality Software Solutions & Libraries High Speed I/O HyperLink 50 The first network on chip infrastructure to unleash full multicore entitlement 8 8
Product Highlights: C6670 and C6678 C6670 Performance Optimized Core Next Generation C66x Core - 4 C66x Cores @ 1GHz - 1.2GHz Memory Architecture - 4MB Local L2/Core (1MB per Core) - 2MB Multicore Shared Memory Communication Accelerators - TCP3e (Turbo Encode) Up to 550Mbps - TCP3d (Turbo Decode) Up to 600Mbps - FFTC 2048 FFT every 4.6µs - VCP2 for voice channel decoding C6678 Power Optimized Core Next Generation C66x Core - Up to 8 C66x Cores @ 1GHz -1.25GHz - Available Options: 1, 2, 4, and 8 Core Devices Memory Architecture - 4MB Local L2/Core (512KB per Core) - 4MB Multicore Shared Memory Power Optimized Core - <10W at 1Ghz nominal temp L1 L1 DDR3-64b L2 L2 Power Management Debug Multicore Navigator L1 L1 Memory Subsystem Multicore Shared Memory Controller (MSMC) Shared Memory 2MB System Elements SysMon EDMA L2 L2 TeraNet HyperLink Communications CoProcessors 4x VCP2 2x RAC 3x FFTC Peripherals & IO SRIO x4 SGMII x2 PCIe x2 I 2 C SPI 3x TCP3d 1x TAC BCP Network CoProcessors Crypto Packet Accelerator AIF2 x6 UART L1 L2 L1 L2 DDR3-64b Multicore Navigator 8 x CorePac L1 L2 L1 L2 Power Management Debug L1 L2 L1 L2 Memory Subsystem L1 L2 L1 L2 Multicore Shared Memory Controller (MSMC) Shared Memory 4MB System Elements SysMon EDMA TeraNet HyperLink Network CoProcessors IP Interfaces SGMII Peripherals & IO SRIO x4 TSIP x2 Crypto Packet Accelerator GbE Switch PCIe x2 I 2 C SPI SGMII EMIF 16 UART 9
Innovation & Integration via C6678 Highlights C66x Core Next generation Fixed / Floating-Point core with clock speeds ranging from 1GHz 1.25GHz and Up to 8 core options Multicore Navigator Data transfer engine that is architected to move data between various system elements without using any CPU overhead so maximum system efficiency is achieved Memory Architecture 0.5 MB of local Memory per core; 4 MB of Shared Memory. Enhanced memory architecture through an enhanced Multicore Shared memory Controller Bottleneck free fast on- and offchip memory access including a DDR3-1333MHz (64-bit) interface L1/L2/L3 ECC Improved Debug S/W Dev and Debug Support Leveraged by CCS L1 L2 L1 L2 DDR3-64b Multicore Navigator 8 x CorePac L1 L2 L1 L2 Power Management Debug HyperLink Ultra high-speed ( up to 50 Gbaud), low latency serial interface that connects to other s and FPGAs in the systems L1 L2 L1 L2 Memory Subsystem Multicore Shared Memory Controller (MSMC) Shared Memory 4MB System Elements SysMon EDMA L1 L2 L1 L2 TeraNet HyperLink Network CoProcessors IP Interfaces SGMII Peripherals & IO SRIO x4 TSIP x2 Crypto Packet Accelerator GbE Switch PCIe x2 I 2 C SPI SGMII EMIF 16 UART Network Co- Processor and Accelerators A cost effective implementation to off-load the TCP/IP and secure networking functions from the TeraNet Switch fabric that has 2 Terabits of bandwidth which allows maximum data transfer between system components to realize full system entitlement Peripherals and I/O Interfaces High bandwidth peripherals that operate independently (NOT Shared) allowing simultaneous data transfer to prevent bottle necks - featuring: RapidIO v2.1 4lanes @ 5Gbps with 1x, 2x and 4x support PCIe x2 2lanes, running independently of RapidIO 10
Competitive Analysis Value Prop against FPGA C66x Performance 320GMACS/160GFLOP Baseband on a chip. Handles multiple waveforms supporting OFDM,CDMA,TDM L1/L2/L3 Processing capability Wireless Accelerators (VCP/TCP/FFT) Software Programmability Time To Market Smaller Package (more /Board) Lower Power smaller battery, simpler cooling Low Cost - MIPs/$ Value Prop against other s C66x Fixed & Floating Point capability@1.25ghz Industry s Fastest at 10GHz On-Chip RAM up to 8MB DDR3 1600MHz, 64Bit, 8GB Address space Multiple Independent High Speed IO 4xsRIOv2.1,2xPCIe Gen II, 2xSGMII, 2xTSIP High BW FPGA connectivity Hyperlink @ 50Gbps 1/2/4/8 Core Option (Pin Compatible) L1/L2/L3 Memory ECC System Reliability Low Power per GFLOPs and GMACS Extended Temp support -40 o C to 105 o C CCS Tools + S/W Collateral 3 rd Party Network 11
TMDXEVM6678L EVM Singe wide AMC form factor C6678 Code Composer Studio IDE *Design *Code and Build *Debug *Analyze *Tune H/W Development Tools CCSv5 Allows designers of all experience levels to move quickly through application development (www.ti.com/ccstudio) Time Limited FREE Evaluation Versions available for download. Includes C667x Simulator EVM Kit includes BIOS 6.x, BIOS-MCSDK / LINUX-MCSDK 2.0 (NDK, PDK, LIB etc), Sample Program and Out of box demo (OOB) e.g. I/O Benchmark, Imaging Processing Pipeline and High Performance Utility Application (HUA) User Guide, Starter guide, Tech Ref Guide, App Notes etc TMDXEVM6678L EVM with XDS100 emulation - $399 TMDXEVM6678LE EVM with XDS560V2 emulation - $599 TMDXEVM6678LXE EVM with XDS560V2 emulation Encryption Enabled - $599 TMDSEMU560v2STM-UE - XDS560v2 System Trace Emulator with 128Mb System Trace buffer and Ethernet / USB support Optional PCIe adapter card to connect the C6678 EVM to a standard PCI header of a desktop.
TI s Multicore Hardware Ecosystem Others Standardized Boards Chassis / System PCIExpress (with Gen 2) Advanced Mezzanine (AMC) Custom ATCA Other
TI s Multicore Software Ecosystem Customer Application Layer 2+ Multicore Entitlement IP Network Stack Layer 1 UMTS Layer 1 LTE TI Runtime TI s Device Entitlement Libraries TI Layer 1 Libraries TI BIOS, Linux, OSE(ck)
Multicore Tools and Software (MC-SDK) Tools Codegen with OpenMP support Emulator/Debugger Simulator Profiler / DVT 3 rd party tools Software BIOS/Linux SDK Multicore Demonstration 6.x BIOS Platform Abstraction Basic Networking Inter core communication Application Specific Libraries Audio/Video CODECS VoIP Components WiMAX Toolkit, LTE Toolkit, Lib others.. Eclipse Code Composer Studio TM Editor/IDE Compiler Linker (Codegen) Profiler Debugger Remote Debug SoC Analyzer Third Party Plug-Ins Polycore ENEA Optima 3L Host Computer XDS 560 V2 XDS 560 Trace Customer Application Multicore Software Development Kit Demo App Multicore BIOS LIB IMGLIB Speech Codec Operating System w/ Boot Loader BIOS Demo App Multicore Linux NDK Audio Codec Multicore Entitlement Full Silicon Entitlement Platform Development Kit Target Board Demo App Multicore BIOS and Linux Linux Video Codec Inter Core Communication 15
KeyStone Multicore Software Libraries & Codecs Libraries Digital Signal Processing FFT Adaptive Filtering Filtering and convolution Others.. Available free from TI MATLAB Image processing Math operations Vision Analytics Image Processing Edge Detection Boundary Morphology Others.. Available free from TI Security/Cryptography AES, SHA1, 3DES Voice and Fax Line Echo Cancellation Voice Activity Detection Others Available free from TI Vision Lib (object only) 50+ royalty-free kernels: Background modeling & subtraction Object feature extraction Tracking, recognition Low-level pixel processing Codecs Voice G.711, G.722 G.723, G.729 CDMA, AMR(NB/WB), EVRC-B Others Fax T.38 Fax Modem Video H.263 H.264 MPEG2 MPEG4 VC1/WMV9 Decode Others Audio MPEG1 Layer2 AAC LC/HE AC3 2.0/5.1 Sample Rate Conversion
High-Performance and Multicore Processor High Value Low-Cost EVM High-Performance at the Right Power & Price Keystone Architecture Open & Affordable Tools Easy to Use Product Collateral Training Drivers & Example Code User Community Quick to Market Quick-Start Hardware Enabler Software Benchmarks & Functional Understanding Frameworks & Abstraction Generic Libraries Application Libraries
Getting Started More Information/Links Product Folders: Informational Wiki Page All C6000 Multicore s TMS320C6670 TMS320C6678 EVMs and Software Tools: TMS320C6678 EVM TMS320C6670 EVM AMC to PCIe Adapter Card Multicore Software Development Kit for BIOS & Linux MCSDK Wiki CCS v5 Wiki C66x Linux Wiki Signal Processing Library(LIB) Image and Video Processing Library (IMGLIB) LTE /WiMAX Toolkit Discuss with BDM Technical Support TI E2E Community (Online Support) Product Training
Online Video Training http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=olt110027
Mission Critical Market What Customers Like about TI Undisputed #1 and SoC supplier Strong Growth for 8 years in a row, even in 2009 Higher R&D spending than revenue of most competitors KeyStone SoC Architecture secures future success Rich Product Portfolio & Strong Roadmap 2 Families with multiple devices and growing Nyquist(6670), Shannon(6678/4/2) 40nm -> 28nm Tools/Software & Compilers 3 rd Party Eco-System Multiple Design Wins Pre-Announcement Secure Supply No product discontinuation (end of life) History of delivery upon promises (Power, GHz,..) Field Experience - Completeness of system analysis, Architecture, Internal Switch,. Customer Support Business Model - Long Term relationships with key customers Actively seek and incorporate customer feedback in roadmap devices. Revenue 2002 2009 Layer 1 Radio TI SoC Architecture Macro Pico Femto Software PHY IP Network
Backup Slides Product Details 21
C6678 (Shannon) Lightning Half-Length PCIe Card Feature Set TI TMS320C6678 (8-core) x 4 C66x Core Frequency: 1.25GHz DDR3 Memory Data Frequency: 1600MHz Data Bus Width: 64-bit Serial RapidIO Gen-2 Interface PCIe Gen-2 Interface 10/100/1000Mbps Ethernet w/ SGMII Hyperlink50 Interface 1024 MB DDR3-1333 on board PLX PEX8624 PCIe Gen-2 Switch Serial RapidIO daisy-chain Ethernet daisy-chain Each device is linked to PCIe switch by x2 lanes Dual s linked by Hyperlink50 Power: Max 54Watts
What is Hyperlink? high-speed, low-latency, and low-pin-count communication interface Low pin count (24 pins) Point to Point Connection Interconnect -to- -to-fpga. SerDes for data transfer x1 x4 modes for Tx and Rx 12.5GBaud/lane Effectively 8b9b encoding LVCMOS sideband signals for flow control & power mgmt - errors/events/timeouts Up to 64 Memory mapped Regions each region up to 256MB * Simple packet-based transfer protocol for memory-mapped access * Read/Write to /FPGA local memory - discrete memory access of any byte aligned width up to 64bits. - burst transfer modes Write (Maximum Burst Size 256Bytes) Write Request ---> Data Packet ---> Read (Maximum Burst Size 256Bytes) Read Request ---> Read Response - Interrupt Request <--> 23
Universal Parallel Port (upp) What is it? Parallel bus, two independent channels (separate data buses) I/O speeds up to 75 MHz with 8-16 bit data width per channel 1 or 2 channel parallel interface operating in RX, TX or FD mode Supports Double data rate mode of operation (Bandwidth does not change/increase) Application Each channel can interface cleanly with high-speed ADCs and/or DACs with up to 16-bit data width (per channel). Useful as low cost interface with FPGAs. Can run up to 120MByte/s per channel in single channel or bi-directional mode ( 240MByte for both channels in unidirectional mode) Can also be used to interface two C6655/57 devices or to connect C6655/57 with C674x or OMAP-L13x family of devices. Other benefits Internal DMA leaves CPU EDMA free Simple protocol with few control pins (configurable: 2-4 per channel) Multiple data packing formats for 9-15 bit data widths Interleave mode (single channel only) Simple interface: IO Queued by software Throughput Estimates: Note: Max. clock of 50 MHz in (*) configuration
Thank You 25