V. Chandrasekhar. National Instruments, Austin, TX, USA E-mail: Vikram.Chandrasekhar@ni.com



Similar documents
INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

An Alternative Way to Measure Private Equity Performance

Calculating the high frequency transmission line parameters of power cables

An RFID Distance Bounding Protocol

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Traffic State Estimation in the Traffic Management Center of Berlin

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Project Networks With Mixed-Time Constraints

VOLUME 5 BLAGOEVGRAD, BULGARIA SCIENTIFIC. Research ELECTRONIC ISSUE ISSN

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

A Secure Password-Authenticated Key Agreement Using Smart Cards

Quantization Effects in Digital Filters

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

An Interest-Oriented Network Evolution Mechanism for Online Communities

DEFINING %COMPLETE IN MICROSOFT PROJECT

Calculation of Sampling Weights

Forecasting the Direction and Strength of Stock Market Movement

Multiplication Algorithms for Radix-2 RN-Codings and Two s Complement Numbers

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

The OC Curve of Attribute Acceptance Plans

M3S MULTIMEDIA MOBILITY MANAGEMENT AND LOAD BALANCING IN WIRELESS BROADCAST NETWORKS

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Damage detection in composite laminates using coin-tap method

Comparison of Control Strategies for Shunt Active Power Filter under Different Load Conditions

IMPACT ANALYSIS OF A CELLULAR PHONE

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Dimming Cellular Networks

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

An Introduction to 3G Monte-Carlo simulations within ProMan

RequIn, a tool for fast web traffic inference

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Can Auto Liability Insurance Purchases Signal Risk Attitude?

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Statistical Approach for Offline Handwritten Signature Verification

How To Calculate The Accountng Perod Of Nequalty

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Adaptive Fractal Image Coding in the Frequency Domain

Politecnico di Torino. Porto Institutional Repository

Laddered Multilevel DC/AC Inverters used in Solar Panel Energy Systems

Time Domain simulation of PD Propagation in XLPE Cables Considering Frequency Dependent Parameters

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Recurrence. 1 Definitions and main statements

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

The circuit shown on Figure 1 is called the common emitter amplifier circuit. The important subsystems of this circuit are:

Analysis of Premium Liabilities for Australian Lines of Business

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

NOTE: The Flatpak version has the same pinouts (Connection Diagram) as the Dual In-Line Package. *MR for LS160A and LS161A *SR for LS162A and LS163A

Applied Research Laboratory. Decision Theory and Receiver Design

Section 5.4 Annuities, Present Value, and Amortization

Efficient Project Portfolio as a tool for Enterprise Risk Management

A GENERIC HANDOVER DECISION MANAGEMENT FRAMEWORK FOR NEXT GENERATION NETWORKS

Traffic-light a stress test for life insurance provisions

A Crossplatform ECG Compression Library for Mobile HealthCare Services

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

Time Value of Money Module

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Performance Analysis and Comparison of QoS Provisioning Mechanisms for CBR Traffic in Noisy IEEE e WLANs Environments

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Cooperative Load Balancing in IEEE Networks with Cell Breathing

What is Candidate Sampling

CHAPTER 14 MORE ABOUT REGRESSION

Section C2: BJT Structure and Operational Modes

Single and multiple stage classifiers implementing logistic discrimination

BERNSTEIN POLYNOMIALS

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

ivoip: an Intelligent Bandwidth Management Scheme for VoIP in WLANs

The Current Employment Statistics (CES) survey,

A 2 -MAC: An Adaptive, Anycast MAC Protocol for Wireless Sensor Networks

Rapid Estimation Method for Data Capacity and Spectrum Efficiency in Cellular Networks

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Kiel Institute for World Economics Duesternbrooker Weg Kiel (Germany) Kiel Working Paper No. 1120

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Transcription:

28 Int. J. Embedded Systems, Vol. 3, No. 3, 2008 Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers V. Chandrasekhar Natonal Instruments, Austn, TX, USA E-mal: Vkram.Chandrasekhar@n.com F. Lvngston Texas Instruments, Burlngton, MA, USA E-mal: Frank-Lvngston@t.com J.R. Cavallaro* Department of Electrcal and Computer Engneerng, Rce Unversty, ouston, TX, USA E-mal: cavallar@rce.edu *Correspondng author Abstract: Reducton of the power consumpton n portable wreless recevers s mportant for cellular systems, ncludng UMTS and IMT2000. Ths paper explores the archtectural desgn-space and methodologes for reducng the dynamc power dsspaton n the Drect Sequence Code Dvson Multple Access (DS-CDMA) downlnk RAKE recever. At the algorthm level, we nvestgate the tradeoffs of reduced precson and arthmetc complexty on the recever performance. We then present and analyse two archtectures for mplementng the reference and reduced complexty recevers, wth respect to dynamc power dsspaton. The combned effect of reduced precson and complexty reducton leads to a 37.44% power savngs. Keywords: DS-CDMA RAKE recever; VLSI archtectures; moble recever; power reducton. Reference to ths paper should be made as follows: Chandrasekhar, V., Lvngston, F. and Cavallaro, J.R. (2008) Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers, Int. J. Embedded Systems, Vol. 3, No. 3, pp.28 40. Bographcal notes: Vkram Chandrasekhar receved the MS Degree from Rce Unversty n 2003 and the BTech Degree from the Indan Insttute of Technology, Kharagpur n 2000. e s currently wth Natonal Instruments. Frank Lvngston receved the MSEE Degree n 995 and the BS Degree n 992, both from the Unversty of New Mexco. e s currently wth Texas Instruments. Joseph R. Cavallaro receved the PhD Degree from Cornell Unversty n 988, the MS Degree from Prnceton Unversty n 982, and the BS Degree from the Unversty of Pennsylvana n 98. e joned Rce Unversty where he s currently a Professor n the Department of Electrcal and Computer Engneerng and Assocate Drector of the Centre for Multmeda Communcaton. e served as Program Drector n the Prototypng Tools and Methodology program at NSF durng 996 997, and has been a Vstng Professor at the Unversty of Oulu, Fnland durng 2005. s research nterests nclude computer arthmetc, VLSI and FPGA desgn, and VLSI/DSP archtectures and algorthms for wreless communcaton systems. Introducton Achevng power-effcent archtectures wll be a major goal n the desgn of next-generaton moble communcaton recevers such as laptops, cell phones, PDA etc. Future portable recevers wll need the ablty to handle varous multmeda data traffc rrespectve of moblty, provde guaranteed Qualty-of-Servce (QoS) requrements, and ntegrate multple functonalty (GPS, World Wde Web, e-commerce etc.) smultaneously. The hgh bandwdth requred by these applcatons mples that these functonalty would come at the expense of a heavy dran on the avalable battery power. Shown n Table are the Copyrght 2008 Inderscence Enterprses Ltd.

Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers 29 specfcatons of a next-generaton wreless standard (IMT-2000) n Ojanperä and Prasad (200). The hgh levels of expected performance, as well as the requred data-rates wll call for the mplementaton of advanced algorthms n the desgn of such recevers. Wth rapdly mprovng Integrated-Crcut (IC) technology as well as the decreasng cost of slcon area, there have been great advances n the ablty to ntegrate the entre recever chan on a sngle-chp (System-on-chp desgn). The pont that has not been addressed n these desgns s the system ntegraton, wth power mnmsaton as a key constrant. The desgn of such archtectures forms a focal pont n current research n communcaton systems. Table IMT-2000 Servce requrements Operatng envronment Termnal speed (mph) Peak bt-rate Target BER Rural outdoor < 50 > 44 kbps 0 3 0 7 Urban/suburban outdoor < 90 > 384 kbps 0 3 0 7 Indoor/low range outdoor < 6 2 Mbps 0 3 0 7. Motvaton The RAKE recever unt forms an mportant consttuent of a DS-CDMA moble recever for performng sngle-user detecton. The RAKE algorthm s a conceptually smple algorthm, however, ts computatonal complexty ncreases lnearly wth the number of mult-path components beng processed. Even though there has been consderable research nvestgatng technques for mprovng the performance of DS-CDMA RAKE recevers n fadng mult-path channels, there has been comparatvely lttle research on nvestgatng methodologes for mnmsng the power dsspaton of the recever archtectures. A strength reducton technque has been descrbed n Baghae and Laakso (99) for reducng the on lne power dsspaton n the complex RAKE multplers by up to 25%. Power reducton technques for a spread spectrum based correlator have been descrbed n Garrett and Stan (997) usng a modfed adder-tree structure and employng bus-nvert codng. Low-power correlator archtectures have been descrbed n Srram et al. (999) that employ a partal correlaton approach for reducng on lne power dsspaton durng code acquston n WCDMA based systems. To the best of our knowledge, there has been very lttle work on developng a framework whch analyses the performance vs. power dsspaton trade-offs n the context of moble DS-CDMA RAKE recevers..2 Contrbutons The work presented n ths paper has two prncpal ams. Frst, we analyse the mpact of reduced precson and arthmetc complexty on the algorthm performance and power dsspaton n the DS-CDMA moble RAKE recever. Next, we explore the archtectural desgn-space for reducng the on lne power dsspaton. Startng wth a conventonal mplementaton of the RAKE recever, we demonstrate desgn methodologes for achevng power reducton at the algorthm level and the archtectural level. Ths proof of concept archtecture has been targeted towards a Xlnx Vrtex-II FPGA and acheves the targeted data rate of 384 kbps. The resultng power-performance profles have been obtaned after passng synthessed complex recever data smulatng a urban three path fadng channel through the targeted archtectures. Algorthm level. We show that reducton of samplng rate of the nput complex mult-path recever data to the DS-CDMA RAKE correlator durng de-spreadng results n favourable trade-offs n power consumpton vs. the correspondng recever performance. Sgnfcant power savngs are acheved through reducton n arthmetc complexty by decreasng the number of arthmetc operatons durng the RAKE correlaton per symbol demodulaton. For a 6 bt data-path, we have observed a 24.65% reducton n dynamc power dsspaton n the reduced complexty RAKE recever compared to the reference RAKE recever mplementaton, wth an acceptable performance loss of less than two db. Archtectural level. Startng wth a 6 bt data path, and reducng precson tll ten bts, we study the varaton n the RAKE recever performance wth decreasng fxed-pont precson. Word-length reducton alone results n power reducton of up to 25.6% n the reference RAKE recever archtecture, and 6.96% n the reduced complexty RAKE recever archtecture. 2 System descrpton We consder a K user DS-CDMA downlnk system employng Bnary Phase Shft Keyng (BPSK) symbol modulaton durng transmsson. The kth user s nformaton sequence b k {, } s multpled by a N chp Pseudo-Nose (PN) sequence whose bt duraton equals T bt = NT chp. For purposes of estmatng the complex channel coeffcents (see Fantaccc and Gallgan (999), Vterb (995) and Rappaport (986)), a common code-multplexed plot sgnal s broadcast by the base staton to all moble users. The sampled complex recever data r(n) at the DS-CDMA moble recever can be wrtten n vector-matrx notaton as n Latva-aho and Juntt (2000); Chandrasekhar (2002) as r = S Ab + w where r s the receved sampled data (S samples/chp) correspondng to the th nformaton symbol represented by r = [ rnst ( s) r(( NS+ ) Ts) T r([( + 2) NS ] Ts )] 2NS.

30 V. Chandrasekhar et al. S descrbes the sgnature matrx for all K actve users and the plot channel gven by S =[ s,, s p,, s,, K,, s s,, s ] K,P plot, 2 NS ( K + ) P. plot, P Each of the columns s k,p, l k K + l, l p P represents the approprately delayed (by NSτ p / Tb samples) sgnature waveform of the kth user and pth multpath. Therefore, T T NSτ p / Tp k ( NS NSτ p / Tb ) s k,p = 0 s 0 and s 2NS = [ s ( T ) s (2 T ) ( NST )] k k s k s s. NS where s k (t) represents the kth user s contnuous-tme spreadng waveform gven by the convoluton of the user s spreadng sequence {ck(n)} and the transmtted chp-waveform g T (t). denotes the complex channel mpulse response coeffcent matrx for the th nformaton symbol gven by and h 0 0 0 0 = h 0 h ( K+ ) P ( K+ ) h T P = [ h, h,2 h, P]. A s the user/plot ampltude matrx gven by ( K+) ( K+) dag{ A, A,, A, A } 2 K plot b s the symbol vector for all K users and plot correspondng to the th transmsson gven by b = T ( K+ ) [ b b2 b K ]. 2. DS-CDMA RAKE recever The DS-CDMA RAKE recever attempts to collect the sgnal energy from all the receved sgnal paths that fall T wthn the delay lne and carry the same nformaton as descrbed n Proaks (995). Assumng that user one s the user of nterest, we defne the sgnature matrx, S s s s 2 [,,2, ] T NS = P P the RAKE recever computes the decson statstc gven by: bˆ = sgn( Shˆ A) r = sgn( Shˆ A ) ( SAb + w), where ˆ P h s the complex channel coeffcent estmate obtaned from the output of a channel estmator. To estmate the complex channel coeffcent for performng phase offset correcton, a channel estmator s requred. A L tap movng average flter performs channel estmaton whle demodulatng the th nformaton symbol. An all-ones plot symbol sequence (assumed to be known at the moble recever) s used for the purpose of channel estmaton. For the plot sequence, defne S = [ s s s ] plot plot, plot,2 plot,p 2NS P as the plot code sgnature matrx. Then, the channel estmate h ˆ s gven by the expresson n * = b, plot = n L+ hˆ S r (2) where L s the length of the averagng flter. 3 Recever archtecture Fgure shows the hgh-level descrpton of the front-end n a wreless communcaton recever. The archtectures mplemented n ths paper are represented by the sold lne blocks (correspondng to the RAKE recever), whle the dashed-lne blocks are assumed to feed n the sampled wde-band sgnal and the estmated delays nto the recever. The sampled complex wde-band recever data s nput to the RAKE recever for performng symbol-level demodulaton, and the delay-tracker block for ntal tmng acquston followed by fne synchronsaton wth a delay-locked loop. For the th symbol nterval, complex recever data r s nput to the RAKE recever n a chp-seral fashon. ()

Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers 3 Fgure Front-end descrpton of a wreless communcaton DS-CDMA recever 3. Front-end crcular buffer The front end crcular buffer stores complex sampled recever data r to be used for performng correlaton durng the channel estmaton and detecton operatons, as shown n Fgure 2. Denotng the maxmum delay spread of the channel s a symbol duraton D, N s the PN sequence length, S s the number of samples per chp, the mnmum requred buffer sze s gven by B = NS [D/N + ] words. Assumng a maxmum delay spread of a symbol D = T b, processng gan of N = 32 chps and S = 2 samples/chp, we obtan B = 28 words, requrng an address wdth of 7 bts. The buffer employs the followng modes of operaton: Intalsaton mode. For the frst NS = 64 cycles, the buffer s wrtten nto, tll there s a symbol-duraton worth of recever data stored n the buffer. Durng ths perod, all the read-addresses are set to a value of 28 to ensure that there s no memory access conflct generated by the read and wrte addresses. The recever data s wrtten nto the recv_buff_wr_addr specfed from 0 to NS = 63.When recv_buff_wr_addr = NS = 64, the mode changes to the steady-state mode descrbed below. Steady state mode. At the end of the Intalsaton mode, there are NS = 64 words of recever data (or symbol worth of nformaton) stored n the buffer. Condtoned on the current value of recv_buff_wr_addr, the read-port address read_address(), = 0 P s ether ntalsed wth the computed path delays Delay() (from the delay-tracker unt), or ncremented by one. The read- port addresses are specfed by, Delay() f recv_ buf f_ wr _ addr = 64 Delay() + 64 f read _ Address() =. recv_ buf f_ wr_ addr = 0 read_ Address( ) + otherwse As the read-addresses get ncremented, successve complex recever data values get read from the buffer and are nput to the RAKE/PILOT correlator unts where the correlaton of the recever data wth the user/plot codes s carred out.

32 V. Chandrasekhar et al. Fgure 2 Buffer for storng complex recever data For a P fnger RAKE recever, the above descrpton assumes a recever buffer wth P read-ports each for the real and magnary parts of the complex recever data. In a practcal mplementaton however, truly mult-ported buffers are nfeasble owng to the hgh output load capactance whch would dramatcally ncrease the memory access tme, and hence decrease the operatng frequency of the desgn. An alternatve s to use a seral shft regster delay lne mplementaton. For a n bt data-path, there n are 2 logc transtons (for real and magnary data storage regsters) potentally occurrng at every node per clock cycle, due to shftng of data n the shft regster unt as shown n Garrett and Stan (997) and Srram et al. (999). For the shft regster storage sze of 2NS = 28 words of n bt recever data, ths would amount to an average of NSn = 64n logc transtons per clock cycle whch s clearly power-neffcent. Consequently, a much smpler approach was adopted by nstantatng P separate SRAM based dual-ported recever data buffers, to store the real and magnary components of the nput sampled recever data. Ths ensures a smaller output load capactance at the data-bus compared to the regster-fle based approach. Moreover, the use of the ponter-based approach mples that the swtchng actvty n the data-bus s reduced from NSn = 64n logc transtons to just Pn/2 =.5n logc transtons per clock cycle on an average. 3.2 User/Plot PN code crcular buffer The User/Plot code crcular buffers (Fgure 3) store the 64 length NS = 64 PN sequences of the user ( s ) 64 and plot codes ( splot ). We assume that the code coeffcents are pre-determned at start-up and stored n the buffer. Snce the symbol despreadng operaton begns when recv_buff_wr_addr = or recv_buff_wr_addr = 65, the read-address ponter code_read_addr (6 bts wde) for the buffer s drectly determned by the current wrte address recv_buff_wr_addr of the front end crcular buffer, by the relaton code_read_addr = recv_buff_wr_addr mod 64. Whle recv_buff_wr_addr counts from 0 to 2NS = 27, code_rd_addr counts from 0 to NS l = 63. 3.3 PILOT/RAKE matched-flterng block The PILOT and RAKE correlator blocks take n the sampled wde-band sgnal, perform matched-flterng, and output a narrow-band sgnal at the symbol rate. The narrow band output and channel estmates are nput to the Maxmal Rato combner (MRC) whch performs coherent demodulaton. Fgure 4 llustrates the archtecture of the correlator network for computng the complex nner products, plot _ soft _ out ( ) = S r and plot rake _ soft _ out ( ) = S r where, 0 P corresponds to the th fnger and r refers to the delayed mult-path data comng from the recever crcular buffer.

Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers 33 Fgure 3 Crcular buffer for storng the user/plot code coeffcents Fgure 4 Structure of the PILOT/RAKE fnger network (see onlne verson for colours) 3.4 Channel estmaton The channel estmator block uses a smple movng averager flter to estmate the complex mult-path channel coeffcents. For the th symbol demodulaton, the estmated channel coeffcent ˆ p h s computed by, n ˆ * = bplot, k plot k. L k = n L + h S r (3) Snce the flterng operaton (see Fgure 5) computes the channel estmates based on the results of the prevous L plot correlatons, a L word crcular buffer based mplementaton s employed. In the practcal mplementaton, the plot sequence s assumed to be an all ones sequence, therefore, the computaton of h ˆ s smplfed as shown, hˆ = hˆ + { S ( r r L)} [log ]. (4) plot 2L

34 V. Chandrasekhar et al. Fgure 5 Movng average based channel estmaton Correspondng to symbol, the despread plot correlaton output Splotr s wrtten nto the crcular buffer address mod L where t replaces the oldest plot correlaton output Splotr L. The dfference between these two values s used to compute the th channel estmate as ndcated n equaton (4). A read before wrte type of crcular buffer s chosen for mplementaton, n order that the oldest plot correlaton value s read out before beng overwrtten by the new plot correlaton output. For performng the scalng, L s chosen to be a power of 2, n order to replace the dvson operaton by rght shftng by L bts. Fgure 6 Maxmal rato combner unt 3.5 Maxmal rato combnng The MRC weghts the narrow band despread outputs of the RAKE fnger network, by the correspondng complex conjugated channel coeffcent estmates. Fgure 6 shows the mplementaton of the MRC unt. The fve stage ppelned multplers mplement the phase rotaton operaton hˆ S r. The [log 2 P] stage deep adder tree network combnes the phase rotated outputs and produces a sngle soft symbol estmate. Fnally, the hard symbol estmate b, correspondng to the th transmtted symbol of user one s computed by takng the sgn of the MRC output as shown n equaton (5). b = sgn(re( hˆ S r )). (5), 4 Power-effcent archtectures Dynamc power dsspaton s usually the domnant source of power dsspaton n CMOS VLSI crcuts. The dynamc power consumpton P dyn at any node n a CMOS-based desgn s a functon of the node capactance C, the swtchng actvty α of the node (defned as the average number of node transtons per clock cycle), the clockng frequency f clock, and the supply-voltage V cc employed n the desgn, gven by equaton (6)

Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers 35 P dyn 2 = αcvcc fclock. (6) 2 Snce P dyn s quadratcally related to V cc, voltage reducton yelds the bggest savngs n power consumpton. As voltage reducton results n ncreased combnatonal logc delays as shown n Chandrakasan et al. (995), technques such as ppelnng and parallelsm are employed for mantanng a constant throughput of the desgn. In addton, optmsatons such as reduced algorthmc complexty, re-orderng of arthmetc expressons, word-length reducton can markedly reduce the overall capactance and node swtchng actvty n the desgn, thereby reducng the power-dsspaton (detaled descrpton s provded n Chandrakasan et al. (995) and Rabaey and Pedram (996)). At the crcut level, clock-dsablng technques that turn off dle functonal unts can be exploted to extract further power savngs. 4. Reducton n arthmetc complexty The computatonally most ntensve operaton nvolved n the RAKE recever s the correlaton operaton where the sampled complex mult-path recever data s correlated wth the spreadng waveform vector for the user and plot p channels. For the pth fnger, the correlaton output X cor () correspondng to the th sgnalng nterval can be represented by, (+) T+ τ p p cor T+ τ p X ()= r() t s ( t T τ p )dt (7) where N l n=0 s () t = ck( n) gt( t ntc ). = c ( n) ( t nt ) g ( t). N ( δ n=0 ) k c T When mplementng the correlaton operaton as a dgtal matched flter, the complexty of the correlaton operaton s governed by the length of the sgnature waveform vector N corr and the number of actve fngers P. The sgnature waveform vector s,p s represented by the dscrete-tme convoluton of the length N spreadng sequence {c (n)} and the M tap rased cosne flter wth mpulse response {g T (n)}. The square-root rased cosne flter s gven by M = 2DS + taps (beng lnear phase) where D s the group delay of the flter and S s the upsamplng rate at the flter nput. The length of the convoluton output s gven by N conv = M + NS samples. Assumng values of D = 0 samples, S = 2 samples/chp, we obtan M = 4, N conv = 2N + 40 samples, hence the overall correlator length s specfed by N corr = N conv. For typcal values such as a spreadng code of length N = 32, P = 3 path channel, L = 6 tap channel estmator, the arthmetc complexty of the RAKE recever wth deal correlaton equals 6NP + 38P + 2LP = 2585 flops/symbol. We explore the reducton n the correlator length as a means for achevng reducton n arthmetc complexty n Table 2 and Fgure 7. We consder the followng two schemes: Samplng at 2 samples/chp. The startng and endng DS = 20 samples of the spreadng waveform at the convoluton output occur due to the group delay of the flter g T (n). By dscardng these 2DS = 40 samples and retanng the steady state response, the correlator length reduces to N corr = N conv 40 = 2N samples/symbol, whch translates nto savngs n arthmetc complexty. Thus the number of correlaton operatons nvolved n the plot correlators (for channel estmaton) and rake correlators (for despreadng and detecton) are reduced by 320P = 960 flops/symbol to 625 flops/symbol. In the results, the performance of the resultng recever (wth truncated correlaton waveform) s shown to be almost dentcal wth that obtaned wth perfect correlaton. We call ths recever as the reference RAKE recever. Samplng at sample/chp: To acheve a reducton n the arthmetc complexty, we reduce the samplng rate for the despreadng operaton n the RAKE correlators to one sample/chp, and nvestgate the resultng complexty vs. performance trade-offs. Ths halves the length of the correlator for the RAKE de-spreadng operaton to N corr = N samples/symbol and a correspondng reducton n the overall flop count by 4NP = 384 flops/symbol to 24 flops/symbol. As the performance of detecton s heavly nfluenced by the accuracy of channel estmates, the plot channel correlaton s stll performed at two samples/chp. The complexty reducton comes at the tradeoff of reduced correlator output energy owng to the halved correlaton length. The results demonstrate a sgnfcant power reducton wth acceptable detecton performance due to ths optmsaton. We call ths recever as the reduced complexty RAKE recever. Table 2 Arthmetc complexty per symbol detecton n reference and reduced complexty recevers Operaton Multplcatons Addtons P S r 4NP/2NP 2P(2N )/2P(N ) P Splotr 4NP 2P(2N ) h ˆ = S r 2P(L ) P b L k= L+ k plot k Re( ˆ ) h S r 2P 2P RAKE recever (2 samples/chp) RAKE recever ( sample/chp) (6NP + 2LP 2P ) flops (2NP + 2LP 2P ) flops

36 V. Chandrasekhar et al. Fgure 7 Arthmetc complexty n flops per symbol (see onlne verson for colours) parametersed, so that the precson requrements of the entre desgn could be changed off-lne wth mnmal modfcatons. Table 3 Fxed-pont precson requrements for the RAKE recever Detector varable Descrpton Integer bts 2NS r Complex recever nput data 2NS P S User sgnature matrx 2NS P S plot Plot sgnature matrx P S r Soft rake correlator output 3 P Splotr Soft plot correlator output 3 b, S rk Movng average accumulator 5 P * k plot k= L+ h ˆ = Eb [ S r ] Channel coeffcent estmate 3 * P, plot ( ˆ ) Sh A r Maxmal rato combner output 6 In a practcal mplementaton of the reduced complexty DS-CDMA RAKE recever, the halvng n the nput samplng rate to the RAKE despreadng unt would mply that the RAKE correlaton would complete twce as fast as the PILOT correlaton. Ths means that the RAKE correlator would reman dle for half the symbol duraton, and stll be clocked by the sample-rate clock, resultng n wasteful dsspaton of dle clockng power. Therefore, the clock nput for the RAKE correlaton network s derved from the global clock at half the nput samplng rate. Note that the reduced clockng rate does not reduce the effectve symbol rate of the system. 4.2 Reducton n fxed-pont precson n the DS-CDMA RAKE recever All the DS-CDMA archtectures presented n ths paper are based on a fxed-pont mplementaton. A quantsaton analyss tool developed at the Unversty of Texas, Dallas n Lnebarger et al. (2000) was used for determnng the dynamc range and precson requrements of the RAKE recever. Ths paper assumes that all the fxed-pont varables are quantsed wth a unform wdth and only dffer n ther nteger bt requrements. Table 3 shows the fxed-pont nteger requrements of the ndvdual RAKE recever varables after quantsaton analyss. The correspondng fractonal bt-wdth requrements were determned from the dfference of the overall precson and the number of nteger bts. From the obtaned fxed-pont formats, extensve smulatons were carred out usng MATLAB/C wth C++ classes n SystemC provdng the fxed-pont arthmetc support. A mnmum word-length of 0 bts was requred for the RAKE recever to acheve acceptable performance (wthn db) of the equvalent floatng pont verson of the algorthm (ths wll be dscussed further n the next secton hghlghtng the results). The mplementatons of the reference and reduced complexty RAKE recevers (performed on the Vrtex-II FPGA) were made 4.3 Archtecture descrpton To quantfy the effect of the aforementoned methodologes, archtectures ncorporatng the power savng technques were mplemented on a Vrtex-II FPGA. Ths paper descrbes two dstnct archtectures based upon whch the results are reported. They are enumerated below: Reference archtecture. Fgure 8 shows the reference archtecture of the RAKE recever. Ths mplementaton employs a unform nput samplng rate of two samples/chp for both the PILOT and RAKE correlator matched flterng operatons. The external clock s passed through a delay-locked loop to derve the global clock buffer CLK runnng at the nput sample frequency of f samp = 24.576 Mz. Reduced Complexty archtecture. To explore the effects of reduced arthmetc complexty on the resultng power consumpton of the RAKE recever, the wde-band sgnal was nput at the rate of two samples/chp to the PILOT correlator and sample/chp to the RAKE correlator. Fgure 9 shows the archtecture of the resultng reduced complexty RAKE recever wth two separate clockng domans namely CLK (shown by the sold box) and CLKDV (shown by the dashed box) runnng at f samp = 24.576 Mz and fsamp = 2.288 Mz respectvely. Whle the global clock 2 buffer dstrbuton CLK was used to clock the PILOT matched flterng operaton, the second clock buffer CLKDV was used to clock the RAKE matched flterng, channel estmaton and Maxmal Rato Combnng blocks. The presence of two ndependent clockng domans requred the use of addtonal synchronsng logc to transfer sgnals (such as the plot soft matched flter output) from the CLK doman to CLKDV doman. Further, separate state machnes were encoded n order to descrbe the control logc for operaton of each of these domans.

Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers 37 Fgure 8 Archtecture of the reference DS-CDMA downlnk RAKE recever (see onlne verson for colours) Fgure 9 Archtecture of the DS-CDMA downlnk RAKE recever wth reduced complexty (see onlne verson for colours)

38 V. Chandrasekhar et al. 5 Results For studyng the mpact of precson reducton on the resultng algorthm performance, the moble recevers were smulated based on 0, 2, 4, 6 bt fxed-pont word-length and compared wth a floatng pont mplementaton. For each word-length format, the average receved SNR = 0log 0 (E b /N o ) was vared to study the effect on the bt-error rate performance of the algorthm. In the computer smulatons, fve equal power users employng length 32 extended Gold sequences were consdered. The scenaro n consderaton was a 5 user, 3 path correlated Raylegh fadng channel based on the Jakes moblty model. For each data-pont, 40 random test cases of 5000 transmtted bts were tested. The mult-path delays were fxed for each smulaton and vared from one smulaton to the next. All the users were assgned unt transmt ampltudes. An addtonal code-multplexed plot channel wth a three db hgher power was employed for channel estmaton at the moble recever. The over-samplng rate at the transmtter and recever front end was chosen to be two samples/chp n order to account for fractonal mult-path delays. The A/D converter at the recever front end was chosen to have an 8 bt wdth (S8Q7 format). We consder the performance of the followng DS-CDMA RAKE recevers: Reference RAKE recever performng truncated correlaton sampled at 2 samples/ chp (Complexty 6NP 2LP 2P flops/symbol). Reduced arthmetc complexty RAKE recever performng truncated correlaton sampled at sample/chp for detecton and 2 samples/chp for channel estmaton (Complexty = 2NP 2LP 2P flops/symbol). The performance of these recevers were compared aganst a DS-CDMA RAKE recever employng perfect correlaton (hghest complexty of 6NP + 38P + 2LP flops/symbol. 5. Mult-user, mult-path fadng channel We descrbe the performance of the reference and reduced complexty RAKE recevers for a mult-path channel n the presence of nterferers. Fgure 0 shows the performance of the reference DS-CDMA RAKE recever for the above scenaro. We notce that the recever performance n fxed-pont s close to the deal floatng pont performance, wth neglgble performance degradaton for the 0 bt precson (less than db loss) upto an SNR of ten db. Fgure 0 Error probablty reference RAKE (see onlne verson for colours) Fgure shows the performance of the reduced complexty DS-CDMA RAKE recever. The reducton n complexty for reducng the dynamc power consumpton, causes a performance degradaton of two db compared to the deal DS-CDMA RAKE recever employng deal correlaton (shown by the dashed lne n black), owng to the reduced energy at the output of the RAKE correlator. We note that the recever performance n fxed-pont s almost dentcal wth the floatng-pont performance up to a 0 bt precson. Fgure Error probablty: reduced complexty RAKE (see onlne verson for colours) 5.2 Results of FPGA mplementaton Two dfferent archtectures for the RAKE recever were targeted for a 2 mllon gate Vrtex-II (XC2V2000 seres)

Reducng dynamc power consumpton n next generaton DS-CDMA moble communcaton recevers 39 FPGA, whch employs a supply voltage of V cc =.5 V. Synthessed complex recever data for an urban three path Raylegh mult-path channel was passed through each recever mplementaton, and symbol detecton was carred out. The results of each smulaton were corroborated wth the correspondng SystemC/MATLAB smulaton to verfy correctness of performance. 5.2. Tmng smulaton For fndng the dynamc power consumpton n the desgn, the synthessed recever data was run through the recever. An external clock runnng at 50 Mz was produced to clock the recever. The analyss was carred out followng the synthess, translaton, mappng, netlst extracton, and the post-placement and routng phase. Extensve tmng smulatons were carred out n the Modelsm smulator to model true-devce behavour. All nternal node transtons occurrng durng the course of the smulatons were dumped nto a.vcd (Value-Change-Dump) fle format. The.vcd fles were then analysed by the power analyss tool XPower n Xlnx, Inc. (2005a) provded for Xlnx FPGAs descrbed n Xlnx, Inc. (2005b). A power report was generated as a result of the analyss that contaned the overall power consumpton, as well as a summary of the domnant power consumpton among the ndvdual blocks of the desgn. Fnally, the dynamc power consumpton was obtaned after calculatng the dfference of the overall desgn power consumpton and the quescent power (225 mw) of the FPGA. In Table 4, the results of mplementaton of the reference and reduced complexty archtectures for the DS-CDMA downlnk RAKE recever have been reported. The area shown n the table s represented n FGPA slces as well as the percentage occupancy n the FPGA, wth the avalable area beng 0752 slces n a Vrtex-II FPGA. Consderng only the effect of reduced precson, the reference archtecture shows a power reducton of 25.6% for the 0 bt data-path compared to the 6 bt data-path. For the reduced complexty archtecture, we observe power savngs of 6.96% for the 0 bt data-path. These power savngs are qute sgnfcant consderng that the 0 bt data-path acheves almost close to the equvalent floatng pont performance for both the reference and reduced complexty recevers (performance loss beng less than db). Next, we consder the effect of complexty reducton on the resultng power savngs. The 6 bt reduced complexty RAKE recever acheves a power savng of 24.65% compared to the 6 bt reference RAKE recever mplementaton. The combned effect of reduced precson and arthmetc complexty results n 37.4% reducton n dynamc power consumpton for the 0 bt RAKE recever, wth a three db degradaton n performance (Fgure ). The tradeoff of dynamc baseband power consumpton wth recever performance s mportant for battery operated moble wreless termnals. In scenaros where there s a strong receved sgnal, then adaptve methods to reduce the dynamc dgtal baseband processng as proposed n ths paper wll greatly ncrease battery lfe. Table 4 FPGA mplementaton complexty Type Bts Area (slces) P dyn (mw) Savngs (%) Reference 6 3572 (33%) 09.5 archtecture 4 3000 (28%) 97.5 0.95 2 234 (22%) 93 5.06 0 844 (7%) 8.5 25.6 Reduced 6 3724 (35%) 82.5 24.65 complexty 4 334 (29%) 73 33.33 archtecture 2 2457 (23%) 68.5 37.44 0 942 (8%) 68.5 37.44 6 Concluson We have examned desgn methodologes and performance trade-offs for reducng the onlne power dsspaton n a DS-CDMA moble RAKE recever. At the algorthm level, reducton n arthmetc complexty has been nvestgated for obtanng savngs n the dynamc power dsspaton. At the archtectural level, precson reducton and actvty rate reducton have been exploted for addtonal savngs. Reducton n precson shows that a 0 bt data-path acheves near floatng pont performance wth mnmal performance degradaton for the reference RAKE recever. Power-effcent archtectures based on a Xlnx Vrtex-II FPGA have been proposed for mplementng both the conventonal and reduced complexty DS-CDMA RAKE recever. For a 6 bt data-path, we have observed a 24.65% reducton n dynamc power dsspaton n the reduced complexty RAKE recever compared to the reference RAKE recever mplementaton, wth an performance loss of less than 2 db. The combned effect of reduced precson and complexty reducton leads to a 37.44% savngs n dgtal baseband power consumpton whch wll extend the operaton of moble wreless termnals. Acknowledgement Ths work was supported n part by Noka Corporaton, Texas Instruments Inc., and by NSF under grants ANI-9979465, EIA-0224458, and EIA-032266 and was done at Rce Unversty. An earler verson of ths paper appeared n the Proceedng of the 4th IEEE Internatonal Conference on Applcaton-Specfc Systems, Archtectures, and Processors, June 2003.

40 V. Chandrasekhar et al. References Baghae, R. and Laakso, T. (99) Implementaton of low power CDMA RAKE recevers usng strength reducton transformaton, Proceedngs of IEEE Vehcular Technology Conference, Sant Lous, MO, pp.543 548. Chandrakasan, A., Potkonjak, M., Mehra, R., Rabaey, J. and Brodersen, R. (995) Optmzng power usng transformatons, IEEE Transactons on Computer-Aded Desgn of Integrated Crcuts and Systems, Vol. 4, No., January, pp.2 3. Chandrasekhar, V. (2002) Reducng Dynamc Power Consumpton n Next Generaton DS-CDMA Moble Communcaton Recevers, Master s Thess, Rce Unversty, Avalable from www.ece.rce.edu/~ cvkram Fantaccc, R. and Gallgan, A. (999) An effcent RAKE recever archtecture wth plot sgnal cancellaton for downlnk communcatons n DS-CDMA ndoor wreless networks, IEEE Transactons on Communcatons, Vol. 47, No. 6, pp.823 827. Garrett, D. and Stan, M. (997) Power reducton technques for a spread spectrum based correlator, Proceedngs of IEEE Internatonal Symposum on Low Power Electroncs and Desgn, Monterey, CA, 8 20 August, pp.225 230. Latva-aho, M. and Juntt, M. (2000) LMMSE detecton for DS-CDMA systems n fadng channels, IEEE Transactons on Communcatons, Vol. 48, No. 2, pp.94 99. Lnebarger, D., Zed, F.A. and Shrvastava, A. (2000) Dynamc Range Tool, Sgnal Processng Lab, Engneerng and Computer Scence Department, Unversty of Texas, Dallas. Ojanperä, T. and Prasad, R. (200) WCDMA: Towards IP Moblty and Moble Internet, Artech ouse Publcatons, Boston, MA. Proaks, J. (995) Dgtal Communcatons, McGraw-ll, New York. Rabaey, J. and Pedram, M. (996) Low Power Desgn Methodology, Kluwer Academc Publshers, Boston, MA. Rappaport, T. (986) Wreless Communcatons, McGraw-ll, New York. Srram, S., Brown, K. and Dabak, A. (999) Low-power correlator archtectures for wdeband CDMA code acquston, Proceedngs of IEEE 33rd Aslomar Conference Sgnals, Systems and Computers, Pacfc Grove, CA, Vol., 24 27 October, pp.25 29. Vterb, A. (995) CDMA Prncples of Spread Spectrum Communcaton, Addson Wesley, Readng, MA. Xlnx, Inc. (2005a) FPGA Xpower Tutoral, Avalable from http://support.xlnx.com Xlnx, Inc. (2005b) Xlnx FPGA Products, Avalable from http://www.xlnx.com Note The quescent power (Q-Power) of a FPGA s fxed by the FPGA area, nternal operatng voltage and ndependent of the sze of the desgn. The Vrtex-II FPGA has a Q-Power specfcaton of 225 mw at an operatng voltage of V ccnt =.5 V.