EURASIP Journal on Appled Sgnal Processng 2005:16, 2694 2700 c 2005 Hndaw Publshng Corporaton Multmeda Termnal System-on-Chp Desgn and Smulaton Ivano Barber Department of Bophyscal and Electronc Engneerng, Unversty of Genova, Va Opera Pa 11A, 16146 Genova, Italy Emal: vano@dbe.unge.t Massmo Baran Department of Bophyscal and Electronc Engneerng, Unversty of Genova, Va Opera Pa11A, 16146 Genova, Italy Emal: baran@dbe.unge.t Alessandro Scotto Department of Bophyscal and Electronc Engneerng, Unversty of Genova, Va Opera Pa 11A, 16146 Genova, Italy Emal: scotto@dbe.unge.t Marco Raggo Department of Bophyscal and Electronc Engneerng, Unversty of Genova, Va Opera Pa 11A, 16146 Genova, Italy Emal: raggo@dbe.unge.t Receved 30 January 2004; Revsed 30 March 2005 Ths paper proposes a desgn approach based on ntegrated archtectural and system-on-chp (SoC) smulatons. The man dea s to have an effcent framework for the desgn and the evaluaton of multmeda termnals, allowng a fast system smulaton wth a defnable degree of accuracy. The desgn approach ncludes the smulaton of very long nstructon word (VLIW) dgtal sgnal processors (DSPs), the utlzaton of a devce multplexng the meda streams, and the emulaton of the real-tme meda acquston. Ths methodology allows the evaluaton of both the multmeda algorthm mplementatons and the hardware platform, gvng feedback on the complete SoC ncludng the nteracton between modules and conflcts n accessng ether the bus or shared resources. An nstructon set archtecture (ISA) smulator and an SoC smulaton envronment compose the ntegrated framework. In order to valdate ths approach, the evaluaton of an audo-vdeo multprocessor termnal s presented, and the complete smulaton test results are reported. Keywords and phrases: system-on-chp, multmeda, HW-SW codesgn, DSP, smulaton, VLIW. 1. INTRODUCTION Archtecture achevements of the last years followed the HW-SW codesgn approach transferrng functonaltes from hardware to software mplementaton [5] and movng developers toward system-on-chp (SoC) programmable devces. Otherwse SoC applcaton-drven desgn seems to be the answer to fulfll multmeda applcatons requrements [5]. Moreover, thanks to the VLIW [6] archtecture approach, t s now possble to desgn DSP-orented chps that are standalone processors [7] reachng hgh degrees of parallelsm [8]. Even though a number of general-purpose processors are sutable for DSP tasks, natve DSP processors outperform general-purpose devces n terms of ther cost-performance rato and power consumpton [9, 10]. In the followng, a system-on-chp desgn based on a dual core DSP archtecture wll be descrbed. The approach s based on HW-SW codesgn, smulatng at the same tme the nstructon set archtecture (ISA) and the complete SoC, and takng nto account sngle devce performance together wth run-tme nteractons between cores and perpheral devces. 2. MAIN ISSUE AND RELATED WORK An HW-SW codesgn envronment s an applcaton-drven tool chan able to evaluate and to modfy both SW and HW at the same tme. The software developer challenge s often to optmze a standard-based multmeda applcaton [11, 12] to get real-tme performance on a gven archtecture. The best result, followng a gven desgn constrants
Multmeda Termnal System-on-Chp Desgn and Smulaton 2695 lst, s achevable by tunng both the algorthm mplementaton and archtectural features. Instructon smulators are nowadays wdely used n developng applcaton-drven archtecture desgn. Archtecture smulaton ncorporates several smulaton technques rangng from nterpretve smulaton to applcaton-archtecture-specfc compled smulaton technques. Even though the compled smulaton approach allows better peak performance n terms of smulated nstructons per second, nterpretve smulaton appears to be more flexble [5]. Most of the state-of-the-art multmeda and DSP devce solutons are based on SoC. The development envronment should allow smulatng run-tme nteractons between core and perpheral devces n an SoC. In ths scenaro, the nterpretve approach best matches heterogeneous devces smulaton, especally regardng nterdevce run-tme nteractons, whch are not easly predctable a pror. An example of an SoC smulaton envronment usng an ISS ntegraton approach s ntroduced n [13]. The proposed soluton uses cycle-accurate ISSs ntegrated wth the archtecture desgn envronment. The work n [13] also analyses tradeoffs between ISS and communcaton nterfaces n SoC smulaton envronments and hghlghts the mportance of archtectural vsblty and of debug nterface. The work n [14] presents an nnovatve approach for the desgn of applcaton specfc multprocessor SoC. Usng a generc archtecture model as ntal template, avalable resources are dvded n software, hardware, and communcaton components. In the SoC valdaton process, CPUs are replaced by cycle-accurate ISSs. However, cycle-accuracy often leads to low performance, whereas a vertcal optmsaton for mprovng smulaton speed wll have a negatve effect on flexblty. Usually smulaton envronments try to acheve the best compromse between speed and accuracy dependng on the scope of the development envronment. The methodology ntroduced n ths paper s based on both archtecture and SoC smulaton. The presented approach employs an nterpretatve ISS ntegrated n an SoC desgn tool for fast smulaton n order to obtan an effectve archtectural exploraton and to nvestgate new SoC solutons. Compared to other exstent hardware desgn tools, ths methodology allows to obtan effcent system smulatons n a short tme wth parametrcal accuracy and good observablty. An effcent HW-SW codesgn development envronment should also allow contnuous access to regsters and bus values durng the smulaton provdng statstcs about the use of the dfferent resources [11]. The capablty to collect statstcs on both system devces, for example, nternal resource utlzaton, and nterdevce communcatons allows a software developer to better focus hs work ether on a partcular applcaton or on a porton of the selected code. In ths paper we demonstrate the effectveness of usng ntegrated archtectural and SoC desgn envronments, reducng the tme to dentfy major desgn constrants and bottlenecks startng from the software applcaton development. The descrbed methodology does not lmt hardware accuracy, allowng to perform subsequent steps of the system desgn. The proposed SoC soluton s manly based on two VLIW cores workng n parallel, a multplexer, and a bus arbter. The multplexer component mplements the H.223 standard [15]. H.223 has a flexble mappng scheme that s sutable for a varety of meda and can handle varable frame lengths. Ths soluton s ndependent from streamng applcatons allowng any composton of dfferent meda channels (audo/vdeo/data communcaton control) and mplements dfferent protocols, namely H.223 annexes A and B [15], dependng on the type of multmeda termnal, such as wred PSTN-ISDN [16], and wreless 3GPP [17]. The H.223 flexblty and the utlzaton of programmable DSPs together wth the defnton of a generc communcaton nterface that controls the devces behavour and the nteractons among the SoC modules, allow the desgn and evaluaton of multmeda termnals usng dfferent streamng standards. The chosen SoC scheme, modules, and HW-SW partton, ncludng two general purpose cores, one mux-specfcdevce, and memores, have been selected to address the streamng process for a gven famly of multmeda termnals consstng of ITU-T H.32x and 3GPP standards. SoC scheme parameters are, for example, memores sze, bus wdth, and multplexer confguraton. From the sgnal processng pont of vew, the scheme supports several compresson algorthms such as H.263, MPEG4, G.723, and AMR usng, as descrbed above, dfferent parameters. The organzaton of the paper s as follows. In Secton 3, the SoC modellng the multmeda termnal s descrbed. Ths secton detals the most mportant devces composng the overall system; attenton s focused on man features and the communcaton nterface. The proposed approach has been valdated on an audo-vdeo multprocessor system reported n Secton 4. Smulaton test results for ths case study are also reported n Secton 4. Fnally, conclusons are drawn n Secton 5. 3. SYSTEM-ON-CHIP DESCRIPTION The SoC ncludes two processors mplementng audo and vdeo compressons. Each processor s connected to three external memores: a program memory, a data memory, and a memory contanng the audo or vdeo data to process. The thrd memory s loaded at the reset phase wth the proper data to process and t s utlzed, together wth a tmer, n order to smulate the real-tme acquston (audo and vdeo) behavour. Audo and vdeo memores are loaded at the smulaton reset-phase startng from a specfed address. The core processors load audo/vdeo data accordng to the realtme acquston tmers and process the nput data producng compressed streams to be sent to the multplexer (MUX). The audo coder algorthm synchronzes the producton of the multmeda termnal output stream. Therefore the audo processor generates an nterrupt at the end of each audo-frame processng, sgnalng to the multplexer that the data are avalable to compose a new MUX-frame.
2696 EURASIP Journal on Appled Sgnal Processng mem0 Reserved for slave0 Multplexer audo-vdeo 0xA200000 0xA2FFFFF Arbtrated port Slave0 Arbter memory Slave1 mem1 mem2 Reserved for slave1 Start-audo Audo nterrupt VLIW-SIM0 audo processor VLIW-SIM1 vdeo processor mem3 Start-vdeo Program mem0 Program mem1 Fgure 1: Audo-vdeo system. The multplexer put-together the avalable data to form an audo-vdeo packet accordng to the H.223 standard syntax and wrtes the produced stream to an output fle. The communcaton between processors and memores occurs through a bus arbter, that also regulates the accesses to the shared-resource multplexer. The SoC archtecture has been desgned and smulated usng the MaxSm envronment. MaxSm s a smulaton envronment for easy modellng and fast smulaton of ntegrated SoC wth multple cores, perpherals, and memores. The cycle-based scheduler and the transacton-based component nterfaces enable hgh smulaton speed whle retanng full accuracy. The MaxSm system can be used as a standalone smulaton module or ntegrated nto HW smulaton or HW/SW cosmulaton tools [3]. The whole system smulaton occurs wthout any I/O operatons except for the storng of the MUX stream to the output fle. Fgure 1 shows devces and connectons n the smulaton. The modules composng the proposed SoC are detaled n the rest of ths secton. 3.1. Bus arbter Ths module regulates all the communcatons between master (core processors) and slave devces (e.g., memores, MUX) n the SoC. The bus arbter receves the readfrom/wrte-to memory requests from the two masters and sends them to the approprate slave devce dependng on the address range. It also resolves smultaneous access requests to shared resources. The bus arbter nterface ncludes two ports connected to master devces and sx ports connected to slave devces. The two data memores datamem0 and datamem1 are reserved for the master devces 0 and 1, whle other memory devces (datamem2, datamem3) canbe used wthout constrants. The port named arbtratedport s reserved for connectng the shared devce. In case of smultaneous access to shared devces, the bus arbter grants the access to the hgher-prorty master devce. In ths project, the hgher-prorty master s the processor devce smulatng the audo compresson (devce 0). n memores are stored startng from the address 0x00000000, therefore the bus arbter uses an offset part of a data address to dentfy the memory-devce and the dsplacement to fnd the nternal memory address. Durng the smulaton phase, the bus arbter collects statstcs on both read and wrte accesses from master devces, moreover t measures the number of contemporary shared-resource request (conflcts). 3.2. Multplexer The multplexer receves audo and vdeo data and puts them together n a MUX-frame packet followng an H.223-lke syntax. Each MUX-frame conssts of a header followed by audo and vdeo data. The header s composed of the followng. () Start of frame: two bytes usable as CRCa, and so forth. () MUX table ndex: ndcates the MUX-frame composton (audo and/or vdeo). () MUX-frame sze: ndcates the total sze n bytes of the MUX-frame. The number of the audo bytes n a MUX-frame s a multplexer parameter, varyng dependng on the audo compresson algorthm; t s a fxed sze for each MUX-frame. The number of vdeo bytes n a MUX-frame s a varable value nfluencng the MUX-frame total sze; t can vary among MUX frames up to the maxmum MUX-frame sze and s therefore lmted by the physcal channel bandwdth. The audo processor synchronzes the multplexer generatng an nterrupt at the end of each audo processng, sgnalng that the data are ready for composng a new MUX-frame packet. In order to contnuously process data, the multplexer uses two png-pong buffers. When an audo nterrupt s receved, the png-pong buffers are swapped allowng the multplexer to
Multmeda Termnal System-on-Chp Desgn and Smulaton 2697 assemble the data from header, audo, and vdeo components whle audo and vdeo processors keep sendng data. Snce the png-pong swappng of the buffers s controlled by the audo nterrupt, a delay n audo processng would result n an overflow of the vdeo buffers. In order to avod the vdeo buffer overflow, the png-pong swappng s also actvated whenever the vdeo buffer sze lmt s reached. 3.3. The VlIW-SIM component The VLIW DSP cores are smulated usng the VLIW-SIM ISA smulator. VLIW-SIM s an nterpretatve reconfgurable ISA smulaton envronment supportng both applcaton desgn and archtectural exploraton. A dynamc lbrary verson of the smulator has been developed n order to utlze VLIW- SIM not only as stand-alone core smulator but also as a component n a more complex HW-SW codesgn envronment. The ISS has been developed n pure C-language n order to reach hgh smulaton speed [1]. The VLIW-SIM component offers all the functonaltes of the stand-alone verson, such as resource observablty and applcaton proflng, allowng to change the smulated target processor to mantan the same component nterface wthout affectng the other modules of whch the SoC comprses. The nterface between VLIW-SIM and other devces s mplemented through communcaton ports. The read/wrte operaton on a data memory port s dvded nto two phases: access request and check for grant. The VLIW-SIM component s able to ether stall, for example, when the bus controller does not grant the access for a read or wrte operaton, or generate an nterrupt to sgnal a partcular event to other SoC modules. The data to be processed by the codng applcatons are stored n data memory startng from a specfc address dependng on the meda type. The coders process one frame and store the processed data to the output buffer at the proper MUX-feld startng address. As explaned n Secton 3.2, the multplexer wats for the audo nterrupt sgnal to generate the MUX-frame packet and to swap pngpong buffers for recevng the successve compressed frames. To better smulate the real-tme acquston, a tmer s used n order to start the compresson of each frame. 4. A CASE STUDY The presented desgn approach s here utlzed to desgn and evaluate a multprocessor system for audo-vdeo compresson. The smulated multmeda termnal s based on two ST210 [18] processors workng n parallel, one dedcated to the compresson of a vdeo stream followng the ITU-T H.263 [19] standard protocol, and the other one executng the ITU-T G.723 [20] speech compresson. generated by the two processors are multplexed together n an H.223- lke [15] format. The selected applcatons belong to a set of multmeda algorthms preventvely defned as relevant applcaton to verfy SoC performance and code optmzaton. The reported tests have been performed on optmzed mplementatons of the ITU-T standard algorthms specfcally mplemented for the addressed target archtecture. The VLIW-SIM smulator has been confgured to smulate the followng ST210 archtecture. () Clock rate: 250 MHz. () I-cache: 32 kb, drectly mapped. () D-cache: 32 kb, 4-way assocatve (round-robn block replacement). The real-tme acquston constrants are taken nto account by tmers snce the G.723 standard expects to process an audo frame every 30 mllsecond. The produced output stream has been decoded usng a standard A/V termnal decoder n order to check stream compatblty. Fgure 2 shows the MaxSm project for the descrbed system. It s composed of two ST210 cores and ther correspondent program memores, four data memores (two for audo processng and two for vdeo processng), two devces for audo and vdeo memores ntalzaton (called memloaders), a bus arbter, a multplexer, and two real-tme acquston tmers. The bus arbter measures audo and vdeo processor accesses for read and wrte operatons takng also nto account smultaneous multplexer accesses and stalls. Table 1 shows the collected statstcs about the processors and the bus arbter. Tests descrbed n ths document have been performed usng the followng platform. () OS: Wn 2000, servce pack 3. () CPU: Pentum IV 2.0 GHz. () RAM: 256 MB. The cycles entry n Table 1 represents the number of clock cycles wthout stalls due to cache mss, whch for VLIW archtectures s the number of executed long nstructons. Operatons s the number of elementary nstructons executed n the smulated applcaton except for NOP nstructons, a number of nstructons rangng from one to the long nstructon sze are executed n parallel for each cycle. D-cachemsses s the number of wrte and read msses n data-cache, whle I-cache-msses represents the number of read msses n nstructon-cache. -mem-accesses s the whole number of accesses n data memory ncludng both D-cache hts and msses, whle the whole number of accesses n program memory comprsng both I-cache hts and msses s denoted by nstr-mem-accesses. Bus accesses on wrte s the total number of accesses from master devces to bus for wrte operatons. Bus accesses dened s the total number of conflcts n accessng the shared devce (MUX). Fnally smulaton tme s the tme elapsed to complete the applcaton smulaton. The latter measures the ISA smulator performance, and allows, together wth operaton, to calculate the number of smulated cycles per second. In our test, processng 28 audo frames and 20 vdeo frames, multplexer access conflcts do not occur. Ths s manly due to the dfference n read/wrte operaton frequency of the two algorthms gvng a probablty of a contemporary access to the multplexer near to zero. The H.263 and G.723 encoders used n ths test are ST210 optmzed mplementatons, allowng a frame compresson
2698 EURASIP Journal on Appled Sgnal Processng Arbtrer... Mxmem1[1] (MxM...) mem0 Av mux[0](av MUX) mem1 Mxmem1[0] (MxM...) INT audo INT vdeo port ArbtratedPort mem2 mem3 Mxmem2[0] (MxM...) Port 1 Memloader[0] (Memloader) slave 0 slave 1 mem 4 Mxmem2[1] (MxM...) Port 1 Memloader[1] (Memloader) Vlw sm maxsm4[0] (vlw sm maxsm4) Vlw sm maxsm4[1] (vlw... Mxstmer[0] (MxS... Out Prgstatus Actvate rq mem Progmem Prgstatus mem Progmem rq Actvate Mxstmer(1) (MxS... Out Mxmem1(3) (MxMeml) Mxmem1(2) (MxMem1) Fgure 2: Audo-vdeo system. Table 1: Smulaton statstcs. VLIW-SIM 1 VLIW-SIM 0 Vdeo Audo Cycles 210 002 493 210 002 870 Stalls 156 695 902 194 281 104 (% total cycles) (74.61%) (92.51%) Operatons 132 141 925 43 936 322 D-cache msses 310 074 1771 I-cache msses 199 401 28 479 mem accesses 25 535 279 8 122 811 Instr mem accesses 144 536 244 46 094 608 Bus accesses on read 18 023 276 6 865 049 Bus accesses on wrte 7 512 444 1 258 055 Bus accesses dened 0 0 Smulaton tme (s) 296 296 Smulated cycle/s 709 310 tme consderably below the real-tme temporal wndow. Therefore, ST210s are stallng for most of the processng tme (vdeo: 74.61%, audo: 92.51%), watng for real-tme data acquston. Ths result ndcates, for example, the possblty to decrease ST210s clock rates n order to satsfy power consumpton constrants, or to select a dfferent archtecture for the meda processng. Otherwse, keepng the same HW confguraton, decoders, and other termnal communcatonmodulessuchash.245control[21], speech recognton module, or vdeo acquston processng modules as RGB to YCbCr conversons, can be mplemented on the ST210 cores. Alternatvely, a multchannel system can be developed executng multple coder nstances on each core. 5. CONCLUSIONS In ths paper we have presented a desgn methodology based on an ISA smulator ntegrated nto an SoC desgn envronment. Dfferently from other exstent HW-desgn tools, ths methodology allows to have an effectve system smulaton n a short tme wth parametrcal degrees of accuracy and observablty. Moreover, the descrbed envronment allows a system proflng, both detectng bottlenecks n communcaton between system components and evaluatng the degree of optmzaton n the smulated applcatons usng system crtera nstead of sngle-processor crtera. The smulaton speed and the short mplementaton tme for each block comprsng of the creaton of MaxSm components and the reconfguraton of VLIW-SIM allow us to develop the SW applcatons algorthms together wth the HW system archtectures. In order to valdate ths methodology, the case study of a multprocessor system for audo-vdeo compresson and multplexng has been presented. In ths SoC desgn evaluaton each block s reusable and allows nternal archtecture exploraton to model the dfferent HW behavors. Future work wll focus on power estmaton models connected wth core smulaton orented descrpton n order to take nto account power effcency n both hardware desgn and applcaton development.
Multmeda Termnal System-on-Chp Desgn and Smulaton 2699 REFERENCES [1] I. Barber, M. Baran, A. Cabtto, and M. Raggo, Multmeda-applcaton-drven nstructon set archtecture smulaton, n Proc. IEEE Internatonal Conference on Multmeda and Expo (ICME 02), vol. 2, pp. 169 172, Lusanne, Swtzerland, August 2002. [2] I. Barber, M. Baran, A. Cabtto, and M. Raggo, Effcent DSP smulaton for multmeda applcatons on VLIW archtectures, n Mathematcs and Smulaton wth Bologcal Economcal and Muscoacoustcal Applcatons, pp. 83 86, Malta, September 2001. [3] AXYSR Desgn Automaton Inc, http://www.axysdesgn. com/. [4] AXYS Desgn Automaton Inc, MaxSm Developer Sute User s Gude Verson 3.0, Document Verson 1.04 September 9th, 2002. [5] A. Hoffmann, T. Kogel, A. Nohl, et al., A novel methodology for the desgn of applcaton-specfc nstructon-set processors (ASIPs) usng a machne descrpton language, IEEE Trans. Computer-Aded Desgn, vol. 20, no. 11, pp. 1338 1354, 2001. [6] J. A. Fsher, Very long nstructon word archtectures and the ELI-512, n Proc. 10th Annual Internatonal Symposum on Computer Archtecture (ISCA 83), pp. 140 150, Stockholm, Sweden, June 1983. [7] P. Farabosch, G. Desol, and J. A. Fsher, The latest word n dgtal and meda processng, IEEE Sgnal Processng Mag., vol. 15, no. 2, pp. 59 85, 1998. [8] B. R. Rau and J. A. Fsher, Instructon-level parallel processng: hstory, overvew and perspectve, Journal of Supercomputng, vol. 7, no. 1-2, pp. 9 50, 1993, Specal ssue on nstructon-level parallelsm. [9]P.Lapsley,J.Ber,A.Shoham,andE.A.Lee,DSP Processor Fundamentals: Archtectures and Features, IEEE Press, Pscataway, NJ, USA, 1996. [10] J. Ber, VLIW archtectures for DSP: a two-part lecture outlne, n Proc. Internatonal Conference on Sgnal Processng Applcaton and Technques (ICSPAT 99), pp. 1290 1301, Orlando, Fla, USA, November 1999. [11] A. Jema, P. Ksson, and A. A. Jerraya, Archtectural smulaton n the context of behavoral synthess, n Proc. Conference on Desgn, Automaton and Test n Europe (DATE 98),pp. 590 595, Pars, France, February 1998. [12] A. Jema, P. Ksson, and A. A. Jerraya, Embedded archtectural smulaton wthn behavoral synthess envronment, n Proc. Asa and South Pacfc Desgn Automaton Conference (ASP-DAC 97), pp. 227 232, Makuhar Messe, Chba, Japan, January 1997. [13] L. Guerra, J. Ftzner, D. Talukdar, C. Schläger, B. Tabbara, and V. Zvojnovc, Cycle and phase accurate DSP modelng and ntegraton for HW/SW co-verfcaton, n Proc. 36th Desgn Automaton Conference (DAC 99), vol. 1, pp. 964 969, New Orleans, La, USA, June 1999. [14] A. Baghdad, D. Lyonnard, N. E. Zerganoh, and A. A. Jerraya, An effcent archtecture model for systematc desgn of applcaton-specfc multprocessor SoC, n Proc. Conference on Desgn, Automaton and Test n Europe (DATE 01), pp. 55 62, Munch, Germany, March 2001. [15] ITU-T Recommendaton H.223, Multplexng protocol for low btrate multmeda communcaton, May 1995. [16] ITU-T Recommendaton H.324, Termnal for low bt-rate multmeda communcaton, March 2002. [17] 3GPP specfcaton TS 26.111, Techncal Specfcaton Group Servces and System Aspects; Codec for crcut swtched multmeda telephony servce; Modfcatons to H.324, June 2003. [18] ST200 Core Archtecture Manual, STMcroelectroncs 2000. [19] ITU-T Recommendaton H.263, Vdeo codng for low btrate communcaton, February 1998. [20] ITU-T Recommendaton G.723.1, Dual rate speech coder for multmeda communcaton transmttng at 5.3 and 6.3 kbt/s, March 1998. [21] ITU-T Recommendaton H.245, Control protocol for multmeda communcaton, July 2003. Ivano Barber was born n Genova, Italy, n 1969. He obtaned hs M.S. degree n electronc engneerng from Genoa Unversty thess on Research on mage qualty evaluaton alternatve methods to MSE (mean square error) n mage codng systems for the subjectve redundancy reducton and Ph.D. degree thess on Effcent methodologes for multmeda communcaton termnal desgn and testng. Snce 1995, he has been employed n the Department of Bophyscal and Ellectronc Engneerng (DIBE), Genoa Unversty. Research areas are nnovatve approaches on mage qualty evaluaton, archtectural research on systems for realtme effcent mplementaton of vdeo codng algorthms explorng both embedded and sngle-chp solutons, real-tme multmeda systems (platforms, multplexng, and control ssues), DSP archtecture and development envronments, archtecture modellng for meda processng, and embedded systems for moble (lowpower) applcatons. Massmo Baran was born n Genova, Italy, n 1970. He obtaned hs M.S. degree n electronc engneerng from Genoa Unversty thess on Development and mplementaton of a multpont control unt for multmeda vdeoconferences and Ph.D. degree thess on Modellng and smulaton of VLIW archtectures for HW-SW codesgn of embedded systems. He s currently employed as a researcher n the Department of Bophyscal and Electronc Engneerng (DIBE), Genoa Unversty. In the electronc system feld, hs nterests nclude hardware desgn and smulaton, multmeda algorthm mplementaton for VLIW archtectures, archtectural exploraton, and real-tme multmeda communcaton based on standard protocols. Alessandro Scotto was born n Genova, Italy, n 1976. He obtaned hs M.S. degree n electronc engneerng from Genoa Unversty n 2001 thess on Multchannel system for real-tme voce codng on DSP archtecture. Snce 2002, he has been a Ph.D. student n the Department of Bophyscal and Electronc Engneerng (DIBE), Genoa Unversty. Research areas are archtectural research on systems for real-tme effcent mplementaton of vdeo-voce codng algorthms explorng both embedded and sngle-chp solutons, DSP archtecture and development envronments, archtecture modellng for meda processng, and embedded systems for moble (low-power) applcatons. Marco Raggo was born n Chavar, Genova, Italy, n 1964. He obtaned hs M.S. degree n electronc engneerng from Genoa Unversty thess on Development and real-tme test of vdeo compresson algorthms and Ph.D. degree thess on Implementaton and smulaton of real-tme multmeda embedded system for vdeotelephony applcaton and advanced DSP archtecture. Snce 1995, he has been employed as a Research Project Manager/Offcer n the Department of Bophyscal and Electronc Engneerng (DIBE), Genoa Unversty. In the electronc system feld,
2700 EURASIP Journal on Appled Sgnal Processng hs nterests nvolve hardware desgn and smulaton, nteractve real-tme multmeda archtecture desgn, that s, for moble termnal and survellance systems. Hs actvtes also nvolve feld trals setup, audt, and dssemnaton. In the networkng feld, he has expertse n LAN desgn, confguraton, mantenance, and securty. He partcpates n on unversty semnars wth dscussons on vdeo codng, standards for multmeda and streamng, DSP, ndustral feld bus, and embedded systems.