Digital Signal Processors for Mobile Phone Terminals Katsuhiko Ueda Matsushita Electric Ind. Co., Ltd. K. K. Ueda, '99 '99 VLSI Circuits Short Course
Abstract 1 - A DSP is one of the key components in a digital cellular phone terminal. - This talk will discuss the role of the DSP in the terminal and how to achieve high performance with low power consumption. - Mobile phone system is moving rapidly into mobile multimedia era. A DSP architecture suitable for this new era will be also discussed.
Outline 2 1. The role of DSP in cellular phone terminal. 2. How to achieve high performance with low power consumption. 3. Issues for DSP in next generation mobile phone systems. 4. Mobile multimedia DSP. 5. Summary.
Architecture of Portable Phone Terminal 3 Receiver Demodulator Synthesizer Equalizer Channel CODEC Speech CODEC Speaker Microphone Transmitter Modulator ----- Microcomputer Keypad/Display Role of DSP - Speech CODEC - Channel CODEC - Equalizer - Mod/Demodulator
Relationship between Speech sig. and Tx sig. (PDC) 4 Receiver Synthesizer Demodulator Channel CODEC Speech CODEC Point A Speaker Microphone Point B Point A Transmitter 8bits (u-law) Modulator 1 TDMA Frame (40ms) 125us (8KHz) -> 2,560bits/40ms -> 64kbps(8bits*8KHz) Speech: 138bits/40ms -> 3.45kbps FEC: 86bits/40ms -> 2.15kbps Point B Total: 5.6kbps 42kbps User 6 User 1 User 2 User 3 User 4 User 5 User 6 User 1 1 slot (~6.7ms, 7kbps)
Speech CODEC 5 Digitized Input Speech Signal Code Book Gain Spectrum Analysis Synthesis Filter Minimizing the difference between input speech signal and synthesized speech signal Parameter [MOPS] 20 15 10 PSI-CELP 13kbps VSELP 5 11.2kbps VSELP RPE-LTP 1 0 Bit rate 10 20 [kbps]
Dedicated DSP for Portable Phone 6 BASIC DSP POINTER UNIT CONTROL I/O DMA CTRL SERI AL ADR CTRL (ex. modulo, bit reverse) POINTER DATA MEMORY MEMORY X ADR CTRL (ex. repeat, loop) POINTER INST MEMORY Dedicated DSP for Portable Phone PARALLEL MEMORY Y DECODER DPU MUL ALU SAT REG (ACC) n - Increase Performance by adding accelerators - Reduce Power consumption Pc f*c*v^2 f: frequency, c: capacitance v: voltage
A DSP Architecture for Portable Phone Terminal 7 DATA MEMORY Data ROM da AMA PROGRAM CONTROL IR INST ROM Special memory scheme to realize double speed MAC M BUS 16 A BUS 16 B BUS 16 EXT CLK Data RAM Double Access PLL CLK GEN AMB DSP-CORE PU DATA REGS sp BSFT ALU ACS SAT DEC STACK IP cc DPU I/O DMA CONT SERIAL PARALLEL RB-MAC ACC MAC Viterbi accelerator Dedicated MAC unit Double speed MAC scheme Redundant binary number system
Double Speed MAC Scheme 8 1 MACHINE CYCLE EVEN SIDE POINTER X (PX) MEMORY X (MX) 16-bit 16-bit 16-bit ODD SIDE TEMP REG EVEN SIDE POINTER Y (PY) MEMORY Y (MY) 16-bit 16-bit 16-bit ODD SIDE TEMP REG A-BUS B-BUS Output of MX 1 cycle D(2x) D(2x+1) TEMP REG D(2x+2) D(2x+3) D(2x) D(2x+1) D(2x+3) D(2x+5) D(2x+2) D(2x+3) D(2x+3) D(2x+5) 16-bit 16-bit A-BUS D(2x) D(2x+1) D(2x+2) D(2x+3) D(2x+4) D(2x+5) 1/2 MACHINE CYCLE MULTIPLIER 32-bit PIPELINE REG B-BUS D(2y) D(2y+1) D(2y+2) D(2y+3) D(2y+4) D(2y+5) 1/2 MACHINE CYCLE 40-bit ADDER ACC MULTIPLIER D(2x) * D(2y) D(2x+1) * D(2y+1) D(2x+2) * D(2y+2) D(2x+3) * D(2y+3) D(2x+4) * D(2y+4) D(2x+5) * D(2y+5) MAC UNIT BARREL SHIFTER 0.5 cycle
Accelerator for Viterbi Decoding 9 PM0(t-1) BMa(t) PM1(t-1) BMb(t) PM0(t-1) BMa(t) BMb(t) PM0(t) Compare Add Upper 8-bits ALU Lower 8-bits PM1(t-1) PM0(t) = min[(pm0(t-1)+bma(t)), (PM1(t-1)+BMb(t))] COMPARATOR SHIFT REG Select REG Two Adds, one Compare and one Select -> ACS operation - Normal operation: The ALU is used as a 16-bit processing unit. - ACS operation: The ALU is used as two 8-bit adders.
Effect of Accelerators 10 Comparison of the number of clock cycles needed to realize an 11.2kbps VSELP CODEC. [%] 100 Total: - 33.1% 80 60 40 20-9.0% - 4.7% - 8% - 11.4% Misc Block Floating Error Correction MAC 0 DSP w/o MAC & Viterbi Accelerators ALU DSP w/ MAC & Viterbi Accelerators
Dual MAC Scheme 11 FIR Filtering: tow outputs in parallel with delay register y(0)=c(0)x(0)+c(1)x(-1)+c(2)x(-2)+c(3)x(-3)+ y(1)=c(0)x(1)+c(1)x(0)+c(2)x(-1)+c(3)x(-2)+ y(2)=c(0)x(2)+c(1)x(1)+c(2)x(0)+c(3)x(-1)+ y(3)=c(0)x(3)+c(1)x(2)+c(2)x(1)+c(3)x(0)+ c(i) x(n-i+1) REG x(n-i) c(i) # of MAC operations Single MAC Dual MAC Dual MAC with REG N N N MAC1 MAC0 # of Memory reads 2N 2N N Acc1 Acc0 Low power consumption
MAC Unit using Redundant Binary Number 12 A-BUS 16 b 16 b B-BUS MUL 0.5 cycle RBMU 24 b 24 b P-Reg 1 BW-MAC ACC 0.5 cycle RB->B CNV 0.5 cycle RBA 1 RBA 2 RBA 3 RTBC 40 b 40 b ACC RBAU P-Reg 2 RBMU : Redundant Binary Multiply Unit RBAU : Redundant Binary Accumulation Unit RTBC : Redundant Binary Digit to Binary Digit Conversion Unit RB-MAC 0 20 40 60 80 100 [%] Preg2 Mmux1 Conv FA Tree2(BW) Preg1 RBA Tree2(RB) FA Tree1(BW) RBA Tree1(RB) Encoder Partial Product Gen. Power Consumption Ratio normalized to a BW-MAC
A System LSI realizing Base Band Processing & Control 13 Receiver Demodulator Synthesizer Digital Signal Processor Audio I/F Speaker Microphone Transmitter Modulator Misc VCO TDMA Controller Microcomputer Process Tech. # of Transistors Die Size Package Integrated IP Keypad/Display Features of the LSI 0.35 um CMOS 2.5 Million 9.26x10.0mm 11x11mm CSP 16b MCU MN102L(3MIPS) 16b DSP MN1930(40MIPS) Demodulator, TDMA Controller, VCO, etc.
Goal of the next generation Mobile Phone System 14 Next Generation System (W-CDMA) Hello Video Phone High Speed Wireless Network 8 kbps ~ 2 Mbps System Requirements Current System (PDC) 5.6 / 11.2 kbps High Bit Rate Data Transfer -> MORE cycles for error correction -> MORE data input/output to/from DSP Video CODEC Capability Of course, LOW POWER
Increasing Capability of Access Systems 15 Increase - Channel Capacity - Data Speed -> System Complexity (DSP Performance) Time CDMA(Code Division Multiple Access) f1,c1 Time c2b c1 A Freq TDMA (Time Division Multiple Access) s 3 B A B A s s s 2 s 1 s 3 2 1 f1 f2 f3 f4 Freq f1,s1 f1,s2 f1,c2 A B A B Digital System ex. IS-95, W-CDMA Digital System ex. PDC, GSM, IS-54, PHS FDMA (Frequency Division Multiple Access) Time A B f1 f2 f3 f4 Freq f1 f2 A B Analog System
Next Generation Mobile Phone Terminal and Issues to LSIs 16 Low-power & High speed A/D Low-power & High speed correlator High speed Viterbi/ Turbo decoder Receiver A/D De-spread Rake Data DUP RF unit Transmitter D/A Base band Signal Processing unit Spread Channel CODEC Speech/ Video CODEC Voice Video High speed & Low-power LSI DSP Control unit Low-power Video/ Audio CODEC LSI
DSP Architecture for the Next Generation System 17 CORE Instruction Memory PU regs - Trace back - Modulo - Bit reverse AMA AMB Data Memory Adrs Data M BUS 16 DPP Wide bandwidth to/from DSP - RF - ADC,DAC - Spreading/ Despreading PCU IOU DPU ICU A B High Performance Processing Unit for Viterbi Decoding CKU CKU: DPP: DPU: ICU: IOU: PCU: PU: Clock control Unit Direct Parallel Port Data Processing Unit Interrupt Control Unit data I/O Unit Program flow Control Unit Pointer Unit
Dual ACS Operation 18 PM0 T-1 T PM0' 2 path metrics are updated in 1 cycle PM1 PM1' Data Memory [MIPS] 20 40 60 80 {PM1,PM0} PM1 32 32 PM0 register {BM1,BM0} Data Rate 8 Kbps (VOICE) Conventional This scheme COMP {BM1+PM0} < > {BM0+PM1} AU1 {BM1+PM0} {BM0+PM1} CMPR ALU {BM0+PM0} < > {BM1+PM1} AU0 {BM0+PM0} {BM1+PM1} 32 Kbps (DATA) 64 Kbps ASR1 ASR2 (DATA) to Data Memory to Data Memory
DSP for Wireless Video Phone 19 SDRAM DMA Controller Video Out Video In DA AD Video I/F Dedicated Engine Sub-Processor Double Buffer DSP Core Local Memory Shared Memory Shared Register Double Buffer DSP Core Local Memory Dedicated Engine Main-Processor Host I/F CPU Clock frequency Technology Number of devices Die size Supply voltage Performance Features 67.5MHz(14.8nsec) 0.25um-CMOS(4-Metal) 7,670KTr. 9.41 x 9.22(=86.76)mm^2 1.8V(Internal), 3.3V(IO) 4GOPS 15frames/sec(CIF CODEC)
Summary 20 1. DSP is one of key components in portable phone terminal and high performance with low power consumption is essential factor. 2. In the mobile multimedia era, new DSPs with higher performance and increased functionality will be necessary. 3. DSP for portable phone must keep on achieving MORE MIPS and LESS power consumption.
[References] [1] K. Ueda, T. Sugimura, et. al., "A 16-bit Digital Signal Processor with Specially Arranged Multiply- Accumulator for Low Power Consumption," IEICE Transaction, Vol. E78-C, No.12, pp.1709-1716, 1995. [2] K. Honma and O. Kato, "Trends of research and development in Europe and America," Journal of The IEICE, Vol.78, no.2, pp.173-178, 1995. [3] H. Kabuo, M. Okamoto, et. al., "An 80 MOPS-Peak High-Speed and Low-Power-Consumption 16-bit Digital Signal Processor," IEEE JSSC, Vol. 31, No. 4, pp.494-503, 1996. [4] A. P. Chandrakasan, S. Sheng, et. al., "Low-power CMOS digital design," IEEE JSSC, Vol. 27, No. 4, pp.473-484, 1992. [5] I. Verbauwhede and M. Touriguian, "Low Power DSP Engine for Wireless Communications," Journal of VLSI Signal Processing 18, pp.177-186, 1998. [6] N. Nakajima, H. Shibata, et al., "Baseband System LSI for Cellular Mobile Telephone," Matsushita Technical Journal, pp.46-52, 1999. [7] M. Okamoto K. Stone, et. al., "A High Performance DSP Architecture for Next Generation Mobile Phone Systems," IEEE DSP Workshop,1998. [8] T. Ishikawa, H. Suzuki, et al., "W-CDMA hardware-related issues," IEEE ICCT,1998. [9] S. Kurohmaru, M. Matsuo, et. al., "A MPEG4 Programmable Codec DSP with an Embedded Pre/Postprocessing Engine," IEEE CICC,1999.
Mobile Phone for Internet & Data Communication Ex. i Mode system provided by NTT DoCoMo Applications - E-mail - Web Browsing - Banking - Locating combining car navigation system etc. Panasonic P502i