An Architecture for a Next Generation VoIP Transmission Systems

A Architecture for a Next Geeratio VoIP Trasmissio Systems Christia Hoee 1, Kai Clüver 2, Ja Weil 2 1 Wilhelm-Schickard-Istitute, Uiversity of Tübige, Germay 2 Fachgebiet Nachrichteübertragug, Techical Uiversity of Berli, Germay 1.1 Abstract Packetized speech trasmissio systems implemeted with Voice over IP are gaiig mometum agaist the traditioal circuit switched systems despite the fact that packet switched VoIP is two to three times less efficiet the its circuit switched couter part. At the time same time it is supportig just a rather bad toll quality. We believe that it is time to for a ew architecture developed from the scratch. A architecture that icludes a Iteret eabled speech codec ad its trasport system. This architecture maages the perceptual service quality while usig the available trasmissio resources to its best. The trasmissio of speech is maaged ad cotrolled i respect to its speech quality, moth-to-ear delay, bit-rate, frame-rate, ad loss robustess. Beside the architecture, we describe the requiremets for the Iteret speech codec ad its trasport protocol ad preset the descriptio of a iterface betwee speech codec ad trasport protocol. 2 Itroductio Iteret Telephoy is a mature techology that has gaied icreasig popularity agaist the traditioal PSTN systems. Voice over IP (VoIP) is replacig the PSTN service o broadbad access etworks such as cable modems ad DSL, as it is more cost efficiet to use IP broadbad access also for Iteret telephoy. I additio, future wireless broadbad access etworks such as the 3GPP s Log Term Evaluatio (LTE) radio techology will support telephoe services oly via VoIP [1]. Despite the success of Iteret Telephoy it has a fudametal drawback. It is much less badwidth efficiet tha its classic circuit switched couterpart. VoIP requires two to three times more physical gross badwidth tha a moder circuit switched speech trasmissio i DECT, GSM, or UMTS etworks. If more badwidth is required, other performace parameters are be sacrificed, too: the typical talk time or its trasmissio rage of a mobile, portable VoIP telephoe is shorter because more eergy is required to support the trasmissio of packetized voice. If we compare commercial, moder mobile ad cordless phoes, oe ca see that a DECT telephoe usig circuit switched techologies has a talk time at least three time loger tha a WLAN cordless phoe assumig similar battery capacities. Also, usig the circuit switched GSM techology the trasmissio rage is 10 to 100 times larger tha usig VoIP-WLAN techology, if the telephoes have the same battery capacities ad talk times 1. Takig these facts i cosideratio, oe ca say that a circuit switched based telephoe call is far more efficiet tha its VoIP-WLAN couter part. Because also other portable VoIP based phoes have similar operatioal specificatios, we believe the lackig of efficiet trasmissios of the curret VoIP architecture is fudametal ad valid regardless of ay implemetatio details ad product models. Traditioally, VoIP uses speech compressio schemes, which have bee desiged for circuit switched telephoe systems i mid, such as ISDN or GSM, ad have a static frame rate ad packet loss robustess. I the Iteret, may more trasmissio parameters eed to ad ca be cotrolled ad maaged. These iclude beside the bit rate of the speech coder the frame ad packet rate, the loss robustess, ad the algorithm delays. We believe that it is ecessary to develop both a speech codec ad a trasport protocol that are optimized for the path characteristics of the Iteret. They shall be aware about the curret trasmissio resources ad the perceptual quality of the ogoig telephoe call i order to adapt their trasmissio parameters autoomously. 1 These statemets are based o a compariso of the specificatios of commercial phoes. As a exemplary DECT based cordless phoe, we have chose the Siemes Gigaset S44, which comes with a battery of 750 mah, has a talk time of 10h, ad has a trasmissio rage up to 300m. As example for both GSM ad VoIP-WLAN we take the Nokia E70 model, which has a battery capacity of 970mAh. I the GSM mode, it has a talk time betwee 3.3 ad 6.4 hours ad a trasmissio rage up to 35 km. I the VoIP/WLAN mode usig IEEE 802.11g it has a talk time betwee 3 ad 3.2 hours ad a trasmissio rage similar to the DECT phoe.

Recet research results, to which we will refer i the followig sectios, have show that curret VoIP systems ca ideed be sigificatly ehaced, both i terms of efficiecy ad quality. To gai efficiecy we caot be backward compatible or support the classic speech coders or trasport protocols such as ITU G.729 or IETF RTP. Istead, we eed to break with the past ad make a ew start. If oe took the freedom to desig a ew VoIP system from the scratch, how would it look like? I the followig sectio we will propose a ew architecture o how to develop a efficiet speech trasmissio system icludig a speech codig framework ad a trasport protocol. Also, referrig to previous research results, we will describe the motivatio behid our desig decisios. I sectio 4 we will go ito details ad describe a iterface betwee speech codec ad trasport explaiig which parameters are exchaged. Fially, we will give a outlook to the upcomig the desig ad implemetatio of the ew for the Iteret optimized speech codec ad its correspodig trasport protocol. 3 Architecture The ext geeratio VoIP architecture shall cosist of a speech codec, optimized for the Iteret, ad a correspodig trasport protocol. The trasmissio shall be bidirectioal as telephoe calls are bidirectioal as well. Figure 1 gives a first overview o the compoet of the architecture. It displays just oe side of the trasmissio. However, the other side shall be build similarly. I the followig, we describe the compoets idividually. 3.1 Quality of the Telephoe Call I order to optimize the trasmissio of the telephoe call perceptual quality models, which simulate the huma ratig of the quality of telephoe calls, shall be applied. The foremost quality model to metio is the ITU s E-model that is iteded to as a plaig istrumet for telephoe systems [5]. It cosiders most of the parameters that have a effect of the trasmissio quality, such as the loudess of speech, the oise levels, the loudess of echoes, the speech quality, ad the acoustic mouth-to-ear (M2E) delay. It calculates a overall quality ratig called the R factor that rages from 0 (worse) to 100 (very good). Beside its primary purpose to pla trasmissio systems, it ca also be applied at real time to cotrol a trasmissio ad set the various trasmissio parameters [6]. I the ovel VoIP architecture a quality model similar to the E-Model is of the utmost importace as it gives a overview o which parameters eed to be optimized to achieve a high trasmissio quality. Also, a trade off betwee speech quality ad delay will be possible. We ca also derivate the first buildig blocks of the architecture, amely the cotrol of loudess with a adaptive gai cotrol (AGC), the cacellatio of echoes by a acoustic echo cacellatio (AEC), ad the determiatio of the itrisic delay of a telephoe, which are the sum of all delays that the telephoe adds to the overall moth-to-ear delay. I order to properly approximate the mouth-to-ear delay, the telephoe shall determie the itrisic latecy of the speech. For example, the AEC ca be used to determie this delay. 3.2 Speech Codec ad Cocealmet I the last years may speech codecs, comprisig of speech ecoder, speech decoder, ad loss cocealmet algorithms, have bee developed ad are applied i PSTN, cellular etworks, ad VoIP etworks. The speech codecs iclude ITU G:711, ITU G.729, ETSI GSM-EFR, 3GPP AMR, 3GPP AMR-WB, 3GPP2 VMR-WB, ad IETF ilbc. They have optimized to provide a superior speech quality, a low algorithmic delay, a low computatioal complexity, ad a high packet loss robustess. At the same time, they require a low trasmissio bit rate. If this was the case, why should we cosider the developmet of ew speech coders if the existig oes are perfect? Three argumets, based o recet research results, have give us the isight that the curret speech codecs might ot be perfectly matched for the requiremets of the Iteret. The first is based o the observatio that the losses of speech frame ca have a quite differet impact o the speech quality ad that may low rate speech codecs still allow a high loss rate 2 capacity report, loss report, ad remote delay aalog AD/DA, adaptive gai cotrol, echo cacelatio, ad oise reductio digital Ecoder, adaptive bit+frame rate TX evet aalog digital Decoder. loss ad time cocealmet Trasport protocol icludig rate cotrol, multipath, multihomig, NAT traversal packet RX evet packet IPv4 or IPv6 accoustic delay local delay MTU ad packet overhead IP Backboe Figure 1: Architecture for a Next Geeratio VoIP trasmissio system.

without a hearable degradatio of the speech quality. The secod is based o the observatio that low bit rate it ot the oly trasmissio parameter that is of importace i a packetized etwork. The third argumet simply accouts for the observatio that telephoes are ot oly used for huma to huma coversatio but icreasigly frequet for music listeig ad music exchage. 3.2.1 The uequal impact of losig speech frames For a log time it has bee kow that the impact of speech frame losses ca differ widely. Some losses, eve durig voice activity, are hardly hearable. Others have a otable egative impact o the speech quality. Just recetly, oe of the authors has ivestigated systematically this effect [7]. A measuremet procedure has bee developed to quatify the impact of sigle packet or speech frame losses. This measuremet procedure has bee verified by formal listeig-oly tests to esure its precisio. A metric was also developed that describes the impact of losses o speech quality quatitatively. Usig the importace of speech frames, simulatio ad listeig tests show that may speech frames ca be dropped durig active voice because the receiver side loss cocealmet works so well that the losses are hardly otable [7]. These studies were coducted for G.711, G.729, ad AMR ecoded voice ad loss rates up to oe third (durig voice activity) still allow uderstadable speech trasmissios. Thus, kowig the importace of speech frames, sigificat performace gais ca be achieved if oly importat packets are trasmitted. As a result of these research studies, oe ca say that the speech coders uder study still cotai a high level of redudacy because may speech frames eed ot to be trasmitted (or ca be dropped itetioally). Also, the iformatio about the speech is uequally distributed amog the speech frames as some frames are importat ad others are ot. Would it ot be better if all speech frames had the same importace ad all speech frames cotaied the same amout of iformatio? The, each packet loss would have a similar impact o the degradatio of speech quality. It ca oly be achieved if the size of the speech frames are variable(such as i the 3GPP2 VMR-WB speech codec [3]) or if the rate of frames varies over time. The, if the curret speech cotais a lot of ew iformatio, the ecoder would produce larger or more speech frames, otherwise the ecoder would produce smaller or less speech frames 2. We assume that future speech codecs, optimized for the Iteret, will geerate speech frames of similar importace. The speech codecs will have variable frame size ad/or variable frame rates. 3.2.2 Bit rate ad frame rate May speech codecs of today support multiple bit rates. For example, the AMR codec supports eight compressio rates ragig from 4.75 to 12.2 kbps. Others, like the speex Codec, support a bit rage from 2.15 to 44.2 kbps. If i VoIP system a highly efficiet trasmissio shall be achieved because, for example, badwidth or eergy is scarce, the ofte the lowest bit rate is chose. A low bit rate has low badwidth requiremets ad fewer bits per secod eed less trasmissio eergy. Agai, recet research results have show that the bit rate is ot the oly factor that iflueces the trasmissio efficiecy: The packetisatio ca be of equal importace. Packetisatio describes how may speech frames, produced by the speech ecoder, are put ito a VoIP packet before the packet is trasmitted. May speech coders produce every 10, 20, or 30 ms a speech frame. May VoIP telephoe trasmit those frames i VoIP packets every 20, 40, or 60 ms. Thus, oe VoIP packet cotais oe or multiple speech frames. If more speech frames are put ito oe VoIP packet, a loger time has to be waited before the VoIP packet ca be trasmitted. Thus, the algorithmic delay of the packetisatio icreases. O the other side, if less VoIP packets are trasmitted per secod, the the gross badwidth is reduced because less protocol headers such as IP, UDP, ad RTP eed to be trasmitted. If ow the badwidth is limited, shall the codig rate be reduced or the packetisatio icreased i order to save badwidth? Simulatios have bee coducted i [7] to aswer this questio. The results showed that the aswer to this depeds o the uderlyig techology. O a traditioal circuit switched coectio, which does ot trasmit packet headers, the reductio of the bit rate achieves the best quality. O switched Etheret liks usig a AMR codec, both bit ad packet rate shall be adapted. Ad, fially, o a IEEE 802.11b wireless LAN usig a AMR codec it is sufficiet to decrease oly packet rate to save a sigificat share of the badwidth. 2 Ideed, the AMR s discotiuous trasmissio (DTX) algorithm produces smaller ad less frequet speech frames durig silece. However, durig voice activity the frames have all a costat size ad are produced every 20 ms. 3

The results show that Iteret optimized speech shall ot oly support a low ad variable bit rate. The frame rate is of similar importace. This meas, a speech coder shall ot produce frames at a costat rate but shall reduce the packet rate, wheever this is possible without sacrificig the perceptual service quality. We believe that for the Iteret optimized speech coders shall be able to produce speech frame at ay poit of time. For example, speech frames ca be geerated if the curret chage of speech characteristics requires to do so. A Iteret speech codec must ot follow the strict rule of a costat time iterval. 3.2.3 Limitatios of the frequecy bad Quite frequetly it ca be see that mobile phoes are ot oly used for huma to huma commuicatios but for may other purposes like listeig to music, exchagig rig toes, listeig to the radio, ad may more. We assume that i future also telephoes will be required to trasmit, beside speech, also musical cotet. Curret speech codecs are iteded for the trasmissio of huma speech (ad backgroud oise). Recetly, ehacemet such as 3GPP s AMR-WB+, the AAC-Low delay, ad Frauhofers Ultra Low Delay (ULD) codec support the trasmissio of music at real time. However, curret VoIP telephoe uses codecs that support a arrow frequecy badwidth up to 3700 Hz or a widebad frequecy badwidth up to 7000 Hz. But i cotrast to the traditioal PSTN or cellular systems, VoIP has o techical costrais that limit the frequecy spectrum. Istead of this, a Iteret speech codec shall ecode speech ad music at the highest quality that the curret trasmissio path ca support to trasmit. 3.2.4 Loss ad time cocealmet Packet loss cocealmet algorithms are placed at the receivig ed of a trasmissio of speech ad limit the effect of packet losses [9]. They extrapolate the last speech if the curret speech frame has ot bee received. So they limit the egative effect of packet losses o the speech quality. Nowadays, they are ofte part of a speech codec s stadardizatio documet ad part of the decoder. Time cocealmet tries to cope with the effect of trasmissio jitter i a way of slowig dow or icreasig the speed of the curret speech [10]. Time cocealmet algorithms have a positive effect o the service quality but they come at the cost of additioal algorithmic delay. Also, if a speech frame has ot bee received o time, the decoder caot decide whether to slow dow the speech output or whether to coduct loss cocealmet. At this momet of time, the decoder caot kow whether the packet will still arrive or whether is has bee lost. O the other side, if the decoder would closely follow the delay process of the trasmissio path, the the overall mouth to ear delay could be reduced sigificatly. The bufferig of speech frame i play out buffer, owadays icluded i early all VoIP phoes, could be omitted. Thus, we suggest to iclude the loss cocealmet, the time cocealmet, ad the playout buffer ito the decoder. The decoder shall the decide to playback the speech frames as they arrive ad coceal, slow dow, or faste the speech, if required. 3.3 Trasport protocol The Iteret optimized speech codec shall ot operate o the traditioal RTP/UDP protocol. Istead, it requires a trasport protocol that iforms him o the curret state ad quality of the trasmissio path. Oly if the speech codec kows the curret properties of the trasmissio path ca it adapt its codig bit rate ad packet rate to achieve a high perceptual trasmissio quality. Forward Error Correctio (FEC) shall ot be a fuctioality provided by the trasport protocol. It ca be more easily implemeted at the ecoder. But the the trasport protocol shall iform the ecoder about the loss process i the etwork ad the ecoder shall chage its loss robustess. The trasport protocol shall take advatage of the bidirectioal ature of a telephoe call ad shall trasmit speech frame bidirectioally. This has the advatage that cotrol iformatio, owadays trasmitted i ig packets like RTCP, ca be piggy back o the data stream. Thus, the packet rate is reduced further. Also, the trasport protocol ca implemet feedback loops to implemet rate ad cogestio cotrol more easily. Optioally, the trasport protocol ca support other mechaisms such as multi-homig, mobility, multipath, or NAT traversal i order to icrease the reliability ad quality of the trasmissio. 4 Iterface Descriptio After the descriptio of the architecture, this chapter describes how a iterface betwee the speech codec optimized for the Iteret ad its correspodig trasport protocol. This iterface descriptio is required, if both speech codec ad trasport protocol are to be developed separately or if codecs or trasport protocols shall be exchageable. I this publicatio we are cocetratig o the ogoig trasmissio of speech. State chages are otified by evets. Evets chage parameters ad data betwee the codec ad the trasport protocol. To describe the parameters that are exchaged betwee both etities, we use a Java like pseudo code otio. 4

4.1 Codig to Trasport: Trasmit Evet The speech coder otifies the trasport layer every time a ew frame has bee geerated. Beside the frame data, its legth, ad time stamp is required. The legth ad time stamp ca both be dyamical because the speech coder might have a variable speech ad codig rate (such as the proprietary codec isac from Global IP Soud ad 3GPP2 s VMR-WB). class TrasmitEvet { byte data[]; // speech frame ad its legth it ts; // time stamps defiig whe the speech as bee produced (local clock) Time stamp is a ovel feature but a importat oe because oe caot assume that speech frames are produced at regular itervals. Also, the time stamp shall be take at the poit of time the speech has bee spoke or produced. Give this iformatio, the trasport layer ca calculate the curret bit ad frame rates geerated by the ecodig. Give a set of trasmit evets called te[1] to te[] all time stamps shall be icreasig. That meas, for all 1 i<, te[i].ts < te[i+1].ts. The, bit rate ad the packet rate are calculated as 8 te[ i]. data. legth i= 1 bitrate = (1) te[ ]. ts te[1]. ts packetrate = (2) te[ ]. ts te[1]. ts The mai task of the trasmissio layer is to trasmit the frame data, its legth, the time stamp, ad its icreasig idex. These parameters shall be trasmitted to oe (or multiple) destiatios. How the trasport layer opes ad tears dow its coectio ad whether the trasport layer uses multiple destiatios to support multicast, multiple paths, or ay kid of error correctio is beyod the scope of this publicatio. A secod task is to estimate the variability of the flow of speech frames. Accordig to the curret situatio of the coversatio, the variability of speech oe the oe side ad the iteractivity of the other side ca vary sigificat. Thus, the rate ad size of speech frames ca differ substatially. The trasport protocol requires a estimate about the variability of trasmissio rates i order to calculate a safety margi regardig the trasmissio capacity. 4.2 Trasport to Decodig: Receive Evet Similarly, just opposite, the trasport protocol hads over speech frames to the decoder as soo as it receives them. It shall ot buffer the speech frames. The data parameter icludes: class ReceiveEvet { byte data[]; // speech frame ad its legth it ts; // time stamps defiig whe the speech as bee spoke (remote clock) it jitter; // time offset as compared to mea remote roud trip time describe i sectio 3.3. short idex; // icreasig idex umber of the speech frame The receiver calculates the loss rates usig a set of receive evets called re[1] to re[] with for all 1 i<, re[i].ts < re[i+1].ts ad 1 i<, re[i].idex < re[i+1].idex: packetloss rate = (3) re[ ]. idex re[1]. idex Also, usig the time stamps, the decoder ca calculate the trasmissio delay variatios. It eables him to get a statistics about the distributio of the trasmissio delays i order to adapt the play out of the speech frames accordigly. 4.3 Trasport ad Codec: Roud Trip Times Delays The classic RTP Cotrol Protocol (RTCP) is a ig protocol to provide feedback o the quality of the trasport of multimedia data. The feedback is performed usig the by the RTCP seder ad receiver reports, which i regular itervals report iformatio about time stamps, byte- ud packet couts, loss rates, smooth mea deviatio of iterarrival times (jitter), ad the roud trip times [2]. Recetly, a the Exteded Reports (XR) have bee added to RTCP to report more detailed statistics o the etwork characteristics or quality moitorig [8]. The data provided icludes which packets have bee lost ad received, which packets have bee received multiple times, whe the packets have bee received. Also, it 5

provides the meas to gather the etwork roud trip time ad the ed system delay i order to calculate the acoustic roud trip time. As metioed above, the mouth to ear delay is a importat quality metric that iflueces the service quality of a telephoe call, which eeds to be optimized. More precisely, the metric uder optimizatio is the acoustic roud trip time, which is the sum of the mouth to ear delays of both trasmissio directios. Humas caot distiguish which directio of the trasmissio cotributes to the delay, thus the oe-way delay eeds ot to be kow. The roud trip time ca be used for both the codec ad the cocealmet. For example, if the RTT is below 150 ms, the codec icreases its algorithmic delay to cope better with packet loss or with delay variatios. Both the codec ad the trasport protocol iform each other, if the mea acoustic delay of each side has chaged. The followig evet format is applied: class RTTchage { it delay; // acoustic roud trip time o the local or remote side The sum of both values, from the codec ad the trasport protocol, is the overall acoustic roud trip time ad twice the mea mouth-to-ear delay. The evets are just triggered if the delay has chaged sigificatly, e.g. about more tha 10 ms, to avoid uecessary high umber of updates. 4.4 Trasmissio Capacity The trasport protocol determies the rate at which rate the coder is allowed to produce data. It iforms the codec about that this rate. I compliace with TCP, the rate is give i bits per roud trip time, which meas that the coder is allowed to sed this maximal the give umber of bits withi the ext roud trip time. The coder is free to choose whe he seds the data, either at the begiig, cotiuously durig, or at the ed of the RTT period. The capacity of the path ca chage highly dyamic. Thus, at ay time a update regardig the trasmissio rate ca occur. Depedig o the volatility of the coder s rate ad the volatility of the etwork badwidth, the trasport protocol is free to reduce the trasmissio rate to add a safety margi or the icrease the trasmissio rate i order to achieve a statistical multiplexig gai at the cost of a higher packet loss rate. TCP seds packets at the maximal trasfer uit (MTU) i order to achieve the highest throughput. If a service requires a low trasmissio delay, the it would ot beefit from sedig large packets cotaiig a log speech segmet but from short packets cotaiig short speech segmets. Usually, the costs of sedig may small packets is much higher tha sedig oe larger packet, because each packet has additioal packet headers o multiple layers. I additio, the medium access cotrol requires additioal resources to trasmit a packet. A example give i [7] studied the trasmissio over IEEE 802.11b at 11 Mbps i the DCF mode. The cost of the cotetio period, collisios, ad the immediate ackowledgemets cotribute beside the headers of PLCP, MAC, lik, IP; UDP, ad RTP sigificat to the badwidth requiremets of a packet. I total, trasmittig oe packet, the physical medium of IEEE 802.11b is busy for about oe microsecod i additio to the actual data trasmissio. Thus, the costs of oe packet regardless of its size correspod to about 1000µs/11MBps 1500bytes/s i the IEEE 802.11b mode. Packet headers ca be easily compressed to a few bytes by usig IETF s IP header compressio algorithms. But header compressio caot reduce the overhead of the MAC protocol. I [4], the otio of packet overhead is itroduce to determie the amout of overhead required to trasmit a packet. If is defied as the gross badwidth that is required to trasmit a packet: pspdu t overall = toverhead + toverall rate = toverhead rate + ps pdu (4) rate Defiig p overhead =t overhead rate, the packet overhead is the umber of bytes that each packet costs. It measures the gross umber of bits o the physical medium. Of course, this value ca chage with the physical medium, the trasmissio rate, ad may other parameters. If the packet overhead is ot precisely kow, the trasport protocol ca guess it by average the packet overhead of various, typical, ad commoly used trasmissio techologies. For this iterface descriptio, we apply the otatio of packet overhead: The trasport protocol s the coder the curret trasmissio requiremets as class Capacity { it bps; // mea bit per secod the coder is allowed to produce at maximal durig the ext roud trip time. it mtu; // the maximal trasfer uit, the largest packet size a coder is allowed to produce it overhead; // costs of a sigle packet i bits 6

Thus, for the trasmit evets i { 1;} withi a period of t rtt, the followig coditios must be give: te [ i]. data. legth capacity. mtu (5) ( te[ i]. data. legth + capacity. overhead ) capacity. bps 8 trtt (6) i = 1 4.5 Trasport to coder: Packet losses I the Iteret packet losses occur durig time of cogestio. Also, o wireless liks trasmissio error might cause packet losses. Followig the solutio give i the RTCP XR receiver reports [8] we report packet losses ad packet receptios with a bit vector. class PacketLossReport { short begi_idex; // the first idex umber that this evet reports o short ed_idex; // the last sequece umber that this evet report o plus oe. it vector[]; // the array of itegers is read from left to right, i order of icreasig idex umber // (with the appropriate allowace for a wraparoud) The coder requires the report about packet losses because the it could adapt its loss robustess ad chage the amout of redudacy. If may losses occur, the amout of redudacy shall be icreased to help the packet loss cocealmet algorithm. But if the losses hold o for a log time ad are bursty, the redudacy could ot help ad losses would be ievitably audible. 5 Summary ad Outlook We followed the followig teets i our architectural redesig of a VoIP trasmissio system: 1. Develop a speech codec that has a variable bit ad a variable frame rate. 2. Closely couple the speech codec ad the trasport to achieve the beefits of a cross layer optimizatio strategy. They shall be aware about the curret quality of the call i order to maage ad cotrol their trasmissio parameters. 3. Iclude Forward Error Correctio ito the ecoder. 4. Combie decodig, loss ad time cocealmet, ad the playout buffer ito a sigle Iteret eabled speech decoder. 5. Do ot stick to a arrow or wide frequecy bad because beside speech also music trasmissio will be required. This publicatio shall help research to desig ad implemet ew architecture for the ext geeratio of VoIP trasmissio system. But oly if this system has bee desiged, implemeted, ad tested, ca we see to what extet the ew architecture ca ehace the trasmissio efficiecy ad perceptual quality as compared to the classic VoIP system. 6 Refereces [1] 3GPP TR, Feasibility study for evolved Uiversal Terrestrial Radio Access (UTRA) ad Uiversal Terrestrial Radio Access Network (UTRAN), versio 7.1.0, Oct. 2006. [2] H. Schulzrie, S. Caser, R. Frederick, V. Jacobso, RTP: A Trasport Protocol for Real-Time Applicatios, IETF RFC 3550, Jul. 2003. [3] 3GPP2 C.S0052-A v1.0 "Source-Cotrolled Variable-Rate Multimode Widebad Speech Codec (VMR-WB) Service Optios 62 ad 63 for Spread Spectrum Systems", 3GPP2 Techical Specificatio, Apr. 2005. [4] C. Mahlo, C. Hoee, A. Rostami, A. Wolisz: Adaptive Codig ad Packet Rates for TCP-Friedly VoIP Flows, Proc. 3rd It. Symp. o Telecommuicatios (IST2005), Shiraz, Ira, Sep. 2005 [5] ITU G.107, The E-model, a computatioal model for use i trasmissio plaig, Mar. 2005. [6] C. Hoee, H. Karl, ad A. Wolisz, A perceptual quality model iteded adaptive VoIP applicatios, Iteratioal Joural of Commuicatio Systems, Wiley, Aug. 2005. [7] C. Hoee Iteret Telephoy over Wireless Liks, PhD thesis, Techical Uiversity of Berli, TKN, Dec. 2005. [8] T. Friedma, R. Caceres, A. Clark, RTP Cotrol Protocol Exteded Reports (RTCP XR), IETF RFC 3611, Nov. 2003. [9] K. Clüver ad P. Noll, Recostructio of missig speech frames usig sub-bad excitatio, i IEEE-SP Iteratioal Symposium o Time-Frequecy ad Time-Scale Aalysis, 1996. [10] Y. J. Liag, N. Färber, ad B. Girod, Adaptive playout schedulig ad loss cocealmet for voice commuicatio over IP etworks, IEEE Trasactios o Multimedia, vol. 5, o. 4, pp. 532 543, Dec. 2003. 7