VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

VoIP Playout Buffer Adjustment usng Adaptve Estmaton of Network Delays Mroslaw Narbutt and Lam Murphy* Department of Computer Scence Unversty College Dubln, Belfeld, Dubln, IRELAND Abstract The poor qualty of Voce over IP can be mproved by adaptve playout bufferng at the recever. Ths technque dynamcally adapts the playout deadlne to network condtons, thus mnmzng both late packet loss and bufferng tme. A standard playout buffer strategy uses an estmate (Exponentally Weghted Movng Average) of the mean and varance of network delay to set the playout deadlne. Ths estmaton s characterzed by a fxed, constant weghtng factor. We show that tunng of ths parameter so that the strategy works very well for all network condtons s not feasble. Therefore we propose to extend ths standard buffer strategy by replacng the fxed, constant weghtng factor wth a dynamc one. In our soluton, the weghtng factor s dynamcally adjusted accordng to the observed delay varatons. When these varatons are hgh (whch mples that the network condtons are changng), the parameter s set low, and vce-versa. Ths allows rapd adaptaton to network varatons and reduces the frequency of late packets (or bufferng tme). Smulatons and expermental results show that wth our strategy, the trade-off between bufferng delay and late packet loss at the recever s mproved sgnfcantly. 1. INTRODUCTION A typcal VoIP applcaton buffers ncomng packets and delays ther playout n order to compensate for varable network delays (jtter). Ths allows the slowest packets to arrve n tme to be played out. The fluctuatng end-to-end network delays may cause playout tmes to ncrease to a level, whch s rrtatng to users (when the buffer s too bg) or may cause packet losses due to ther late arrvals (when the buffer s too small). The two conflctng goals of mnmzng bufferng tme and mnmzng late packet loss have engendered varous playout algorthms. The need for adaptve bufferng comes when the end-to-end delay s hgh (close or above the nteractvty constrant of 1-15ms) and when the delay s unknown and the recever does not know how to select approprate playout tmes [1]. Adaptve playout mechansm makes t possble to balance the length of the buffer a major addton to end-toend delay wth the possblty of packet loss. Generally, a good playout algorthm should be able to acheve the best possble trade-off between loss and delay. In ths paper we present a new playout buffer algorthm that sgnfcantly mproves ths trade-off. * mreknarbutt@yahoo.com, lam.murphy@ucd.e

In secton two the motvaton of our work s demonstrated and basc dea of the new proposed algorthm s outlned. In secton three the new algorthm s descrbed and potental mprovements are outlned. Later ts effectveness s evaluated through smulatons wth the use of network emulator (secton four) and through experments on a real network (secton fve). In secton sx effects of the new bufferng scheme on the subjectve qualty s addressed. Fnally, n secton seven conclusons are drawn.. MOTIVATION Most of the adaptve playout algorthms descrbed n the lterature perform contnuous estmaton of the network delay and ts varaton to dynamcally adjust the talkspurt playout tme. Standard adaptve playout algorthm [] s based on Jacobson s work on TCP roundtrp tme estmaton [3]. The algorthm estmates two statstcs: the delay tself, and ts varance and uses them to calculate the playout tme. Both estmated are n the form of: d = d 1 + (1 α v α ) n ; = α v + (1 α) d n ; 1 where d and v and are the -th estmates of delay and ts varance respectvely, whle n s the -th packet delay. Parameter α has a crtcal mpact on the rate of convergence of ths estmaton. Followng the clam made n [], and n accordance wth NeVot [], the weghtng factor α s fxed and chosen to be hgh (α =.99) to lmt senstvty of the estmaton to short-term packet jtter. By experments wth dfferent values of α we observed that such hgh value of α s good only n stuatons when network condtons are stable (delay and jtter are constant). When network condtons are changng rapdly (sudden ncreases/decreases n delay) smaller values of α (.7,.,.9) were more approprate. Fgure1 llustrates that as a decreases, calculated playout tmes (sold lnes) track varatons of network delays (dots) more effcently. As a result less packets arrve too late (from 3.5% down to 1%) and the average bufferng tme s smaller (from 7.ms to 7.ms). Fg. 1 Calculated playout tmes for two dfferent α

Unfortunately, a sngle tunng of the parameter α that works well for all network condtons s not easy (or not even a feasble) problem to solve. Fgures and 3 show that there s no optmal fxed value of α when network condton vary n tme. Fg., 3. Calculated playout tmes for varous values of α When jtter s small and fluctuatons n the end-to-end delays are large (Fg. ), the best results are acheved when α s small. In ths case both the packet loss rato and average bufferng tme are relatvely small (3.7% of lost packets and 3ms of bufferng tme). When α s set to.99, the packet loss rato s hgh (11.7%), and the bufferng tme s much larger than necessary (3.ms). On the other hand, when jtter s large but average network delay s constant (Fg. 3), the best results are acheved when α =.99. In ths case, the packet loss rato s below 1%. When α s small, the algorthm s too senstve to short-term delay jtter and ths causes larger late packet loss (.7%). Snce there s no optmal fxed value of α that works well for all network condtons we clam that the accuracy of the estmates can be greatly mproved by dynamcally choosng the values of α. 3. PLAYOUT BUFFER ALGORITHM WITH ADAPTIVE a The dea behnd our algorthm s to adaptvely adjust the value of α dependng on the varaton n the network delays (α s set hgh when end-to-end varatons are small and vceversa). Ths new, dynamc parameter α (recomputed wth each ncomng packet) can be used to perform contnuous estmaton of the network delay and ts varaton n the same way lke before. Let α be a dynamc parameter based on new estmates of the varance v ˆ of the end-to-end delays between source and destnaton: α = f vˆ ), (

where the functon f ( vˆ ) was chosen expermentally to maxmze the performance of our algorthm over a large set of network traces. The dynamc verson of parameter α s now used to mantan adaptve estmatons of average delay and ts varaton: d v = α = α d v 1 + (1 α + (1 α ) n ) d n 1 Fnally the playout tme p at whch the the -th packet, assumed to be the frst packet n a talkspurt played at the destnaton s calculated as follow: p = t + d + β v Parameter ß controls delay/packet loss rato. The larger the coeffcent, the more packets are played out at the expense of longer delays. Any subsequent packets of that talkspurt are played out wth rate equal to the generaton rate at the sender - that s, p j = p + t j t SENDER t t j sendng tme RECEIVER recepton tme SPEAKER n p p j playout tme network delay bufferng delay playout delay Fg.. Playout tme etmaton. Ths mechansm uses the same playout delay throughout a gven talkspurt but permts dfferent playout delays for dfferent talkspurts. The varaton of the playout delay ntroduces artfcally elongated or reduced slence perods between successve talkspurts.

. BUFFERING PERFORMANCE TESTS THROUGH NETWORK EMULATIONS We have tested the performance of the new algorthm through network emulatons. For the test we have chosen NISTNET.1. network emulaton software [5] and we modeled varous delay patterns (Fg. 5,, 7, ) usng ts default Pareto dstrbuton. Fg. 5. Frst delay pattern - delay and jtter are constant (delay = 1ms, jtter = 5 ms). Fg.. Second delay pattern - delay constant and jtter vares n tme (delay = 1ms, jtter jumps between, 1,, 3,, 5 ms every mnute). Fg. 7. Thrd delay pattern - delay vares n tme, jtter s constant (delay jumps between 1, 15 and ms every mnute, jtter = 3ms). Fg.. Fourth delay pattern - delay and jtter vary n tme (delay jumps between 5, 1 and 15ms, jtter jumps betwen, 1,, 3,, 5 ms every 1 seconds). Durng experments we used two voce sources (wth and wthout hangover tme). Regardng ITU-T recommendaton P.59 [], human speech was modeled as a process that alternates between talkbursts and slence perods that follow exponental dstrbutons (Fg. 9,1) wth a mean of 7 and 59ms, wthout hangover tme or 1 and 157ms wth hangover tme respectvely. In our model voce packets were generated every 3ms. No packets were generated durng slence perods. Total duraton of each smulaton was 1 hour.

# talkbursts # gaps 1 5 TALKBURSTS AND GAPS w/o HANGOVER TIME TALKBURSTS DISTRIBUTION : MEAN TALKBURST = 7 ms MIN TALKBURST = 33 ms MAX TALKBURST = 17 ms TOTAL TALK TIME = 11 s 5 1 15 duraton [ms] 1 5 GAPS DISTRIBUTION : MEAN GAP = 59 ms MIN GAP = 5 ms MAX GAP = 51 ms TOTAL GAPS TIME = 599 s 1 3 5 duraton [ms] TALKBURSTS AND GAPS w. HANGOVER TIME TALKBURSTS DISTRIBUTION : MEAN TALKBURST = 1 ms duraton [ms] Fg. 9. Talkbursts and gaps generated by the Fg. 1. Talkbursts and gaps generated by the voce source wthout hangover tme. voce source wth hangover tme. In order to compare the performance of the new playout algorthm wth the basc one, we recorded network delays at the recever and processed that data wth the program that smulated the behavour of the two algorthms. The delay/packet loss rato was controlled by dfferent values of the ß factor (<ß<). Fgures below show the delay/loss trade-off of both algorthms for dfferent network condtons and two voce sources. The sold lnes represent the performance of the standard algorthm (four dfferent fxed values of α) whle the lnes wth crcles represent the new algorthm wth dynamc α. 1 1 1 α=.7 α=. α=.9 α=.99 dynamc a 1 1 1 1 1 average bufferng delay [ms] Fg. 11,1. Algorthms performance comparson - delay and jtter constant (voce source w. and w/o hangover) # talkbursts # gaps 1 1 late packets loss loss rate [%] GAPS DISTRIBUTION : MEAN GAP = 157 ms MIN TALKBURST = 79 ms MAX TALKBURST = 733 ms TOTAL TALK TIME = 17 s MIN GAP = 79 ms MAX GAP = 11 ms TOTAL GAPS TIME = 15 s 1 1 duraton [ms] 1 1 1 α=.7 α=. α=.9 α=.99 dynamc α 1 1 1 1 1 average bufferng delay [ms] 1 1 1 α=.7 α=. α=.9 α=.99 dynamc α 1 1 1 α=.7 α=. α=.9 α=.99 dynamc α 3 5 7 average bufferng delay [ms] 3 5 7 average bufferng delay [ms] Fg. 13, 1. Algorthms performance comparson - average delay s constant but jtter vares n tme (voce source w. and w/o hangover tme).

15 α=.7 α=. 1 1 α=.7 α=. 1 5 α=.9 α=.99 dynamc α 1 1 1 average bufferng delay [ms] 1 α=.9 α=.99 dynamc α 1 1 1 average bufferng delay [ms] Fg. 15, 1. Algorthms performance comparson - delay vares n tme and jtter s constant (voce source w. and w/o hangover tme). 15 α=.7 15.7 α=.. 1 5 α=.9 α=.99 1 5.9 dynamc α.99 dynamc α 1 1 1 average bufferng delay [ms] 1 1 1 average bufferng delay [ms] Fg. 17, 1. Algorthms performance comparson - delay and jtter vary n tme (voce source w. and w/o hangover tme). From the fgures above t can be notced that when network condtons were stable (jtter and delay were constant) the algorthm proposed performed at least as well as the algorthm wth fxed α. When network condtons were changng (jtter and delay vared n tme), our algorthm performed better for all delay patterns. 5. Expermental measurements and algorthm comparson To examne the performance of the new playout algorthm two packet audo termnals were bult based on OpenH33 source code [7]. One termnal was set up at the Performance Engneerng Laboratory n Dubln (IRELAND), and another one at the Computer Center of the Lodz Unversty of Technology - LODMAN (POLAND). The dstance between sender and recever was 1 hops and the nterconnectng lnks had a bandwdth of between and 155 Mbts per second. The clocks of the termnals were synchronzed usng NTP software whch for our purposes s suffcently precse. For the experments the smplest G.711 A-law encodng scheme (PCM) was chosen. The termnal encoder was sendng one frame of audo ( bytes) every 3 ms. As an nput sgnal a sequence of alternatng audo sgnals and slence perods was used (followng ITU-T P.59 recommendaton wthout hangover tme) and no audo packets were generated durng

slence perods. Durng one hour of transmsson all expermental data (the arrvng tmes, tmestamps, sequence numbers, and marker bts) were collected at the recevng host. Fg. 19 shows delays and the hstogram of delays experenced by audo packets durng one hour experment. The delay/loss trade-off of the two algorthms s shown n Fg.. 1 a=.7 dynamc α fxed α a=. a=.9 1 1 average bufferng delay [ms] Fg. 19. Delays experenced by audo packets Fg.. Algorthm performance comparson. and a hstogram of these delays. The comparson of calculated playout tmes for the whole network trace s shown n Fg. 1 and for the 5 seconds of transmsson n Fg.. a=.99 Fg. 1,. Calculated playout tmes for fxed and dynamc α and dynamc α vs. tme.. Effects of the new bufferng scheme on subjectve qualty To estmate the subjectve qualty of packet voce for varous α, the E-Model (ITU-T Recommendaton G.17) [] was used. E-Model combnes ndvdual mparments (loss, delay, echo, codec type, nose, etc.) due to both the sgnal s propertes and the network characterstcs nto a sngle R-ratng that ranges from to 1. Everythng below 5 s clearly unacceptable and everythng above 9.15 s unobtanable n narrowband telephony. The R-ratng s a lnear combnaton of the ndvdual mparments and s gven by the followng formula: R = ( Ro I s ) I d I e + A

From our pont of vew delay mparment delay I d (captures the effect of delay) and equpment mparment I e (captures the effect of nformaton loss due to encodng scheme and packet loss) are the most nterestng. Other mparments: loud connecton and quantzaton mparmet I S, basc sgnal to nose rato R, and the advantage factor A (zero n the fxed Internet) do not depend on the transmsson parameters. Therefore, we can conclude that we can wrte the R ratng (for undstorted G.711 audo) as: R = 9. 15 I d I e Fgures below show for several encoders and dfferent levels of echo cancellaton how the call qualty decreases due to one-way delay (Fg. 3) and how the equpment mparment ncreases for ncreasng packet loss ratos (Fg. ). 1 9 Transmsson Ratng Factor R vs. Delay TELR=5dB TELR=55dB TELR=5dB 5 Equpment Imparment Ie vs. Packet Loss R 7 5 1 3 5 one-way delay [ms] Fg. 3. Transmsson ratng factor R as a functon of the one-way delay [9]. Ie 3 1 G.711 w/o PLC G.73.1 GSM G.79A G.711 Bursty Loss w. PLC G.711 w. PLC Random Loss 5 1 15 packet loss [%] Fg.. Equpment mparment I e as a functon of the packet loss [1]. Based on R ratng, we assessed transmsson qualty and subjectve user satsfacton over a one-hour perod. Frst we calculated average playout delays and average packets loss for 1 seconds perods. Assumng the G.711 encodng wth PLC and echo cancellaton mplemented (TELR = 55, 5) we calculated delay mparments I and equpment mparments I e and fnally found tme varyng qualty of the call. d USER SATISFACTION vs. a for TELR=5 db a =. a =.9 % % 1% 9% 7% 3% 37% % USER SATISFACTION vs. a for TELR=55 db a =. a =.9 % % 3% 1% 9% 5% 7% 57% 3% a =.99 3% dynamc a 3%% 3% a =.99 3% 1% dynamc a % 11% 9% not recommended almost all users dssatsfed many users dssatsfed some users dssatsfed satsfed very satsfed 71% 3% not recommended almost all users dssatsfed many users dssatsfed some users dssatsfed satsfed very satsfed 5% Fg. 5. User satsfacton for varous a when TELR=5 Fg.. User satsfacton for varous a when TELR=55

Fgures 5 and show user satsfacton levels (based on calculated R values) for two types of echo cancelng (TELR=55, 5) and for varous parameters α (.,.9,.99, dynamc α). As we can see, the best n maxmzng R values and thus user satsfacton when TELR=5 was adaptve bufferng scheme wth dynamc α (7% of tme wth very good results). Second n maxmzng user satsfacton was adaptve bufferng wth fxed α=. (7% of tme wth very good results). When echo cancellaton level was TELR=5dB, the best n maxmzng user satsfacton was agan adaptve bufferng wth dynamc α (79% of tme wth good results) whle wth fxed α=. good results were acheved only durng 5% of tme. 7. CONCLUSIONS The new playout buffer algorthm proposed predcts and follows network delays more effcently than the basc algorthm wth fxed α. We compared those algorthms through smulatons and experments on real networks usng realstc voce sources and varous delay patterns. Results show that wth dynamc α one can acheve better delay/loss trade-off and thus better call qualty and user satsfacton. ACKNOWLEDGMENT The support of the Research Innovaton Fund of Enterprse Ireland s gratefully acknowledged. REFERENCES 1. A. P. Markopoulou, F. A. Tobag, and M. J. Karam. Assessment of VoIP Qualty over Internet Backbones, n Proceedngs of the IEEE Infocom,. Ramachandran Ramjee, Jm Kurose, Don Towsley, and Hennng Schulzrnne, Adaptve playout mechansms for packetzed audo applcatons n wde-area networks, n Proceedngs of the Conference on Computer Communcatons (IEEE Infocom), Toronto, Canada, 9 3. V. Jacobson, Congeston avodance and control, n Proceedngs of ACM SIGCOMM Conference, Stanford,. H. Schulzrnne, Voce Communcaton Across the Internet: a Network Voce Termnal, Techncal Report, Dept. of Computer Scence, U. Massachusetts, Amherst MA, July 9 5. Source code avalable from: www.antd.nst.gov. ITU-T Recommendaton P.59 Telephone transmsson qualty objectve measurng apparatus: Artfcal conversatonal speech, Geneva, March 93. 7. Source code avalable from www.openh33.org. ITU-T Recommendaton G.17 The E-model, A Computatonal Model for Use n Transmsson Plannng, 9 9. Telecommuncatons Industry Assocaton Voce Qualty Recommendatons for IP Telephony TIA/EIA/TSB11, 1 1. ITU-T Recommendaton G.113, "General Characterstcs of General Telephone Connectons and Telephone Crcuts - Transmsson Imparments", February 9.