New Models for Perceived Voice Quality Prediction and their Applications in Playout Buffer Optimization for VoIP Networks Dr. Lingfen Sun Prof Emmanuel Ifeachor University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk
Outline Background Speech quality for VoIP networks Current status Aims of the project Main Contributions Novel non-intrusive voice quality prediction models Novel perceptual-based speech quality optimization (e.g. jitter buffer optimization) mechanism Conclusions and Future Work ICC 2004, Paris France, 20-24 June 2004 2
Background Speech Quality for VoIP Networks Reference speech SCN Gateway MOS Intrusive measurement IP Network Degraded speech Gateway SCN SCN: Switched Comm. Networks (PSTN, ISDN, GSM ) Non-intrusive measurement MOS End-to-end Perceived speech quality VoIP speech quality: end-user perceived quality (MOS), an important metric. Affected by IP network impairments and other impairments. Voice quality measurement: subjective (MOS ) or objective (intrusive or non-intrusive) ICC 2004, Paris France, 20-24 June 2004 3
Current Status and Problems Lack of an efficient non-intrusive speech quality measurement method E-model (a complicated computational model) Based on subjective tests to derive models/parameters, timeconsuming and expensive. Only limited models exist Lack of perceptual optimization control methods only based on individual network parameters for buffer optimization and QoS control purposes not perceptual-based optimization control ICC 2004, Paris France, 20-24 June 2004 4
Aims of the Project End-to-end perceived voice quality (MOS) Encoder Packetizer IP Network Depacketizer Jitter buffer Decoder Voice source Sender Non-intrusive measurement Receiver Voice receiver MOS To develop novel and efficient method/models for non-intrusive quality prediction, To apply the models for perceptual-based optimization control ( e.g. buffer optimization or adaptive sender-bit-rate QoS control). ICC 2004, Paris France, 20-24 June 2004 5
Novel Non-intrusive Voice Quality Prediction Intrusive method MOS(PESQ) delay E-model Measured MOSc Reference speech PESQ Degraded speech VoIP Network (packet loss, delay, codec ) Non-intrusive method New model (regression or ANN models) Predicted MOSc Based on intrusive quality measurement (e.g. PESQ) to predict voice quality non-intrusively which avoids subjective tests. A generic method which can be applied to audio, image and video. ICC 2004, Paris France, 20-24 June 2004 6
New Structure to Obtain MOS c Reference speech Degraded speech PESQ MOS (PESQ) MOS R I e I e E-model MOSc End-to-end delay Delay model I d PESQ can only predict one-way listening speech quality (expressed as MOS). By a new combined PESQ/E-model structure, a conversational speech quality (MOSc) can be obtained as Measured MOSc. ICC 2004, Paris France, 20-24 June 2004 7
Regression based Models (1) Codec Packet loss Delay (d) I e model I d model I e I d E-model MOSc (a) PESQ/ PESQ-LQ MOS (PESQ) MOS R I e Measured I e Speech database Encoder Loss model Decoder Degraded speech Reference speech (b) Nonlinear regression model (I e model) Predicted I e Nonlinear regression models are derived for I e based on PESQ/PESQ-LQ Further combine I e with I d to obtain MOS c. ICC 2004, Paris France, 20-24 June 2004 8
Regression based Models (2) I e can be modelled by a logarithm fitting function with the form of I e = aln( 1+ bρ) + c Parameters for different codecs (PESQ) Parameters AMR(H) AMR(L) G.729 G.723.1 ilbc a 16.68 30.86 21.14 20.06 12.59 b*100 30.11 4.26 12.73 10.24 9.45 c 14.96 31.66 22.45 25.63 20.42 ICC 2004, Paris France, 20-24 June 2004 9
Regression Models for AMR (12.2Kb/s) e.g. for AMR (12.2Kb/s), I e = 16.68ln(1 + 0.3011ρ ) + 14.96 The goodness of fit is: SSE = 2.83 and R 2 = 0.998 MOS vs. packet loss and delay ICC 2004, Paris France, 20-24 June 2004 10
Perceptual-based Buffer Optimization Motivation: only based on individual network parameters (e.g. delay or loss) targeting only minimum average delay or minimum late arrival loss, not maximum MOS. There is a need to design buffer algorithm to achieve optimum perceived speech quality. Contribution A perceptual-based optimization jitter buffer algorithm o Use regression based models for buffer optimization o Use a minimum impairment criterion instead of traditional maximum MOS score o A Weibull delay distribution based on trace analysis o A perceptual-based optimization of playout buffer algorithm ICC 2004, Paris France, 20-24 June 2004 11
Impairment Function I m Define: impairment function I m I = f ( d, ρ) = I + I m d eρ = 0.024d + 0.11( d 177.3) H ( d 177.3) + aln(1 + bρ) H ( x) = 0 if x < 0 where a and b are codec related parameters H ( x) = 1 if x 0 (( d µ ) / α ) ρ = ρ + ρ = ρ + (100 ρ ) P( X d) = ρ + (100 ρ ) e n b n n n n r buffer loss ρ b Weilbull distribution Playout delay d ICC 2004, Paris France, 20-24 June 2004 12
Minimum Impairment Criterion Define: minimum impairment criterion Given: network delay d n, network loss ρ n and codec type Estimate: an optimized playout delay d opt Such that: minimize I m can be reached. d 1 d 2 d 3 d 4 Minimum I m ICC 2004, Paris France, 20-24 June 2004 13
Perceptual-based Optimization Buffer Algorithm For every packet i received, calculate network delay n i If mode == SPIKE then if n i tail*old_d then mode = NORMAL elseif n i > head*d i then mode = SPIKE; old_d = d i else -update delay records for the past W packets endif At the beginning of a talkspurt If mode == SPIKE then d i = n i else -obtain (µ, α, γ) for Weilbull distribution for the past W packets -search playout d which meets minimum I m criterion endif ICC 2004, Paris France, 20-24 June 2004 14
Performance Analysis and Comparison (1) Trace Delay (ms) Jitter (ms) Loss (%) 1 153 16.2 1.1 2 46 0.8 0.3 3 186 19.5 14.3 4 16 0.7 4.4 5 150 0.2 0.2 Selected five traces from UoP to CU (USA), DUT (Germany), BUPT (China), and NC (China). Traces 1 and 3 with high delay variation and traces 2, 4, 5 with low delay variation ICC 2004, Paris France, 20-24 June 2004 15
Performance Analysis and Comparison (2) Performance comparison for buffer algorithms 4 MOS 3.5 3 2.5 2 1.5 1 0.5 1 2 3 4 5 Traces exp-avg fast-exp min-delay spk-delay adaptive p-optimum p-optimum algorithm achieves the optimum voice quality for all traces. adaptive algorithm achieves sub-optimum quality with low complexity. ICC 2004, Paris France, 20-24 June 2004 16
Conclusions and Future Work Conclusions The development of a new methodology and regression models to predict voice quality non-intrusively. Demonstrated the application of new non-intrusive voice quality prediction models to perceptual-based optimization of playout buffer algorithms. Future Work To consider buffer adaptation during a talkspurt in order to achieve the best trade-off between delay, loss and end-to-end jitter. To extend the work to improve the performance of multimedia services (e.g. audio/image/video) over IP networks ICC 2004, Paris France, 20-24 June 2004 17
Contact Details http://www.tech.plymouth.ac.uk/spmc Dr. Lingfen Sun L.Sun@plymouth.ac.uk Prof Emmanuel Ifeachor E.Ifeachor@plymouth.ac.uk Any questions? Thank you! ICC 2004, Paris France, 20-24 June 2004 18