Subjective SNR measure for quality assessment of. speech coders \A cross language study

Similar documents
Analog-to-Digital Voice Encoding

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Digital Speech Coding

THE MEASUREMENT OF SPEECH INTELLIGIBILITY

Simple Voice over IP (VoIP) Implementation

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

White Paper. Comparison between subjective listening quality and P.862 PESQ score. Prepared by: A.W. Rix Psytechnics Limited

Evolution from Voiceband to Broadband Internet Access

White Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

Establishing the Uniqueness of the Human Voice for Security Applications

Dynamic sound source for simulating the Lombard effect in room acoustic modeling software

Audio Coding Algorithm for One-Segment Broadcasting

QoS Mapping of VoIP Communication using Self-Organizing Neural Network

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Use of Human Big Data to Help Improve Productivity in Service Businesses

Tech Note. Introduction. Definition of Call Quality. Contents. Voice Quality Measurement Understanding VoIP Performance. Title Series.

Monitoring VoIP Call Quality Using Improved Simplified E-model

From Concept to Production in Secure Voice Communications

Effects of Pronunciation Practice System Based on Personalized CG Animations of Mouth Movement Model

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Synchronization of sampling in distributed signal processing systems

Telephone Speech Quality Standards. for. IP Phone Terminals (handsets) CES-Q September 30, 2004

Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation

A Robust Method for Solving Transcendental Equations

THE SIMULATION OF MOVING SOUND SOURCES. John M. Chowning

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Sigma- Delta Modulator Simulation and Analysis using MatLab

Descriptive Statistics

Introduction to Packet Voice Technologies and VoIP

The Phase Modulator In NBFM Voice Communication Systems

Voice over IP Protocols And Compression Algorithms

A novel sharp beam-forming flat panel loudspeaker using digitally driven speaker system

Jitter Measurements in Serial Data Signals

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Convention Paper Presented at the 118th Convention 2005 May Barcelona, Spain

Predict the Popularity of YouTube Videos Using Early View Data

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Digital Transmission of Analog Data: PCM and Delta Modulation

Testing Voice Service for Next Generation Packet Voice Networks

ETSI TS V1.1.1 ( )

MultiDSLA. Measuring Network Performance. Malden Electronics Ltd

RECOMMENDATION ITU-R BO.786 *

Agilent Creating Multi-tone Signals With the N7509A Waveform Generation Toolbox. Application Note

Active Monitoring of Voice over IP Services with Malden

Tutorial about the VQR (Voice Quality Restoration) technology

Perceived Speech Quality Prediction for Voice over IP-based Networks

Emotion Detection from Speech

Voice Encoding Methods for Digital Wireless Communications Systems

How To Encode Data From A Signal To A Signal (Wired) To A Bitcode (Wired Or Coaxial)

Timing Errors and Jitter

Proposal for a Slot Pair Array Having an Invariant Main Beam Direction with a Cosecant Radiation Pattern Using a Post-Wall Waveguide

Network Sensing Network Monitoring and Diagnosis Technologies

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

Current Status and Problems in Mastering of Sound Volume in TV News and Commercials

Performance Evaluation of Online Image Compression Tools

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

Performance Analysis of Interleaving Scheme in Wideband VoIP System under Different Strategic Conditions

Thirukkural - A Text-to-Speech Synthesis System

Sound absorption and acoustic surface impedance

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

Application Note. Introduction. Definition of Call Quality. Contents. Voice Quality Measurement. Series. Overview

Lecture 1-6: Noise and Filters

SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems

INTER CARRIER INTERFERENCE CANCELLATION IN HIGH SPEED OFDM SYSTEM Y. Naveena *1, K. Upendra Chowdary 2

Objective Speech Quality Measures for Internet Telephony

CZECH PILOT STUDY OF ROAD HORIZONTAL ALIGNMENT OPTIMIZATION

Ericsson T18s Voice Dialing Simulator

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

application note Directional Microphone Applications Introduction Directional Hearing Aids

Optical Fibres. Introduction. Safety precautions. For your safety. For the safety of the apparatus

Data Transmission. Data Communications Model. CSE 3461 / 5461: Computer Networking & Internet Technologies. Presentation B

PCM Encoding and Decoding:

Effects of microphone arrangement on the accuracy of a spherical microphone array (SENZI) in acquiring high-definition 3D sound space information

Inter-Cell Interference Coordination (ICIC) Technology

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

The Optimization of Parameters Configuration for AMR Codec in Mobile Networks

MATLAB-based Applications for Image Processing and Image Quality Assessment Part II: Experimental Results

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

LAB 7 MOSFET CHARACTERISTICS AND APPLICATIONS

T = 1 f. Phase. Measure of relative position in time within a single period of a signal For a periodic signal f(t), phase is fractional part t p

Mobile Phone Tracking & Positioning Techniques

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Transcription:

Subjective SNR measure for quality assessment of speech coders \A cross language study Mamoru Nakatsui and Hideki Noda Communications Research Laboratory, Ministry of Posts and Telecommunications, 4-2-1, Nukuikita, Koganei, 184 Japan (Received 7 January 1994) The subjective speech-to-noise-ratio (SNR), derived from the forced-choice pair-comparison test using the psychometric analysis procedure, has well represented overall speech quality of speech coders in a single dimension. No significant speaker and listener variation has been found for a wide range of waveform coders at the tests conducted in two separate sessions 14 months apart using different groups of English speakers and listeners. The purpose of this study is to investigate reproducibility of the measure conducting the same framework test using Japanese speakers and listeners. The test result shows the subjective SNR measure gives quite reliable scores evaluated in different laboratories with different language backgrounds. Keywords: Subjective quality, Speech coder, Pair-comparison test, Cross language study PACS number: 43. 71. Gv 1. INTRODUCTION The ultimate performance measure for evaluating voice communication systems is the subjective quality of the received speech. Modern digital speech-coding techniques achieve high intelligibility. The high level of speech intelligibility is a necessary but insufficient condition for user acceptance of the systems. Quality, as well, must meet acceptability criteria. Two general surveys of the methods of speech quality assessment have been given by Munson and Karlin1) and Hecker and Guttman,2) with different aspects of classifications. Munson and Karlin have classified the subjective assessment methods into two broad classes depending upon whether listeners are exposed only to the new system to be evaluated or to both the test and reference systems. They report that in the former case, called "indirect comparison," responses of listeners depend upon their prior experience with other systems to make the desired decisions or categorizations and in the latter case, called "direct comparison," responses of listeners depend upon their overall preference judgments alone, and there is no requirement for categorizing the reasons for their preference. Hecker and Guttman have divided the methods for measuring overall quality into two types of approaches depending upon whether or not the method investigates the underlying psychological factors that govern the overall judgments obtained. The former approach, called "analytic," attempts to explore the psychological components of speech quality and to discover the acoustical correlates of these components and the latter approach, called "utilitarian," attempts to obtain a unidimensional measure of speech quality. They conclude that, even though the unidimensional scaling of speech quality is limited in the accuracy of the measure obtained, this utilitarian approach is of primary interest from an engineering point of view. Among the measures obtained through the methods taking the utilitarian approach, the most widely used are categorical ratings depending upon

indirect comparison.3) Comparability of ratings obtained with these methods on different occasions is, in general, small since there is considerable variability due to speakers and listeners. Another widely used measure is the relative preference rating, derived from preference judgments on paired testreference systems.4) This relative rating may be less influenced by speaker and listener variability than the absolute rating scale obtained through the categorical judgments, but a direct comparability of ratings obtained in different studies is difficult to ensure. The subjective assessment method expected to provide a practical engineering criterion on overall speech quality should satisfy the following requirements of engineering practice: (1) test administration and data reduction are simple enough to be carried out in most speechcommunication laboratories, without assistance of specialist in psychological experiments or of trained or professional listeners, (2) the measure provides and adequate single absolute scale which is intuitively understandable by most communication engineers, (3) reproducibility of the results across studies is high enough to enable one to compare directly the measures obtained on different occasions or at different laboratories. The subjective speech-to-noise ratio (SNR) has been proposed as one of such measures and experimental results of subjective tests with the measure have been reported.5) The test results show that 1) the subjective SNR measure provides an adequate single absolute scale for a wide range of speech waveform coders, 2) reliable score is available using reasonably small number of listeners and speakers, and 3) no significant speaker and listener variation is found in the scores of two separate test sessions 14 months apart using different groups of English speakers and listeners. The purpose of this study is to investigate whether reproducibility of the results across tests is high enough to enable one to compare directly the measures obtained at different laboratories with different nationalities and language backgrounds. 2. SUBJECTIVE SNR MEASURE The concept of a subjective SNR has been found in the iso-preference method originally introduced by Munson and Karlin.1) The subjective SNR is derived from the forced-choice pair-comparison test using the psychometric analysis procedure commonly used in the method of constants. A speech signal degraded by varying amounts of multiplicative white noise6) is selected as the reference system in our tests. 2.1 Reference System The sampled reference signal r(i) corrupted multiplicative white noise n(i) is defined as by (1a) (1b) where s(i) is original speech signal which is also served as input signal to the speech coders evaluated, k is a coefficient for SNR control, and e(i) is a random variable taking on values of +1 and -1 with equal probability and independently of s(i) for each sample. Since this reference signal is identical to one of the speech signals processed by the modulated-noise reference unit (MNRU) of ITU-T (CCITT)7) and the SNR of the speech signal degraded by MNRU is called Q, our subjective SNR measure can be called as Equivalent Q measure. 2.2 Subjective Test Format Each test signal to be evaluated is paired with five or six reference signals selected so that preference ranging from 0% to 100% would result. During the test, the listeners are presented with repeated signal pairs in the order of ABAB as shown in Fig. 1. The listeners are asked to mark that member of each pair which they will prefer as a source of information. 2.3 Test Data Analysis The proportion of listeners preferring the test signal p(i) is converted to unit normal deviate z(i) using the equation Applying Muller-Urban weighting to the converted Fig. 1 Time pattern of the test pair sequence. (2)

M. NAKATSUI and H. NODA: SUBJECTIVE SNR MEASURE Table 1 Experimental frameworks. Fig. 2 An example of the least-square fit of the preference data: black circles denote the rates preferring the test signal and a solid curve denote the normally fitted ogive. data, a weighted least square algorithm8) is used to fit a straight line to the data points: (3) where S represents SNR of the reference signal. Thus the subjective SNR (SNRsubj) and its standard deviation s are given by (4a) (4b) Figure 2 shows an example of the least-square fit of the preference data. The 95% confidence interval of SNRsubj as an estimate of population mean Đ is defined by (5) where t(ė, x) is the t distribution with ė degrees of freedom and x is the significance level (5%). 3. EXPERIMENTAL PROCEDURE Table 1 summarizes the experimental setups of the current test (Test III) and the previous tests (Tests I and II).5) Five kinds of coder configuration were simulated and evaluated in the current test, including 40 and 56 kb/s Đ-255 companded pulse-code modulations (PCMs), 16 and 32 kb/s dual-adaptive delta modulators (DADMs),9) and a 16 kb/s adaptive delta modulator with one-bit memory (ADM).10) Two PCM and a ADM configurations were served as anchor points connecting the current tests in Japan with the previous tests in Canada.5) Four short sentences spoken by two male and two female speakers were bandlimited from 200 to 3,400 Hz and digitized to 12 bits at sampling frequencies of 8, 16 and 32 khz. Those digitized speech samples were served as inputs to the two PCM, 32 kb/s DADM, and 16 kb/s ADM configurations to produce the test signals to be evaluated. The speech samples having non-standard handlimitation from 200 to 2,400 Hz were served as inputs to the 16 kb/s DADM. The reference signals were also processed by Eq. (1) using these speech samples. Two separate sessions of pair-comparison tests, one session using speech samples of standard bandwidth and the other session using those of nonstandard (narrow) bandwidth, were conducted with eleven untrained listeners. Participants in both sessions were native Japanese speakers. 4. RESULT AND DISCUSSION A subjective SNR of each test signal and its 95% confidence interval were estimated from the test data pooled over four utterances and all the listeners and shown in Fig. 3 together with the results of the previous tests.5) Estimates obtained from the speech data having non-standard (narrow) bandwidth are indicated by an arrow in the figure (please refer to the previous report5) concerning coder configurations of APC, ADPCM-V and ADPCM-F in the figure). No statistically significant difference between the subjective SNR estimates at the 5% level is found for the following test signal pairs: (1) 40 kb/s PCMs in tests I, II, and III, (2) 56 kb/s PCMs in the tests I and III, and (3) 16 kb/s ADMs in tests I and III. The subjective SNR measure gives quite reliable scores evaluated in different laboratories with different language backgrounds. No significant variation due to the factors of speaker and listener is found for the tests signals of the current test as has also been shown in the previous tests for the wave-

Excited Linear Prediction (CELP),12) whose distortions differ significantly from those of the reference signal in our test. In order to overcome such limitation, reference signal having colored noise that reflects distortions of the coder to be evaluated13) should be introduced in the test. ACKNOWLEDGMENTS The authors are grateful to Dr. Paul Mermelstein, Bell-Northern Research, Canada for his cooperation and suggestion in the initial phase of this series of tests and express their sincere thanks to all the subjects involved in the tests. Fig. 3 Subjective SNR and its 95 % confidence interval of the test signals evaluated in three tests. form coders.5) A mean opinion score (MOS), which is the most widely used subjective measure on overall speech quality and is the absolute scale derived from absolute (categorical) judgements, has shown remarkable variations in test results obtained in different countries for the same speech transmission system.3) A MOS equivalent Q measure11) aiming at interpreting a large set of MOS data pooled into subjective SNR scale, has shown saturation (non-linearity) in high quality range (56 to 64 kb/s PCMs) reflecting inadequacy of the MOS measure in the range. 5. CONCLUSION Subjective SNR measure, which is the absolute scale derived from relative judgements well represents wide range of overall speech quality in a single dimension. A reproducibility of the results across tests is high enough to enable one to compare directly the measures obtained at different laboratories with different language backgrounds. A limitation of the measure may be in preventing its extension to low-bit-rate coders, such as Code REFERENCES 1) W.A. Munson and J.E. Karlin, "Isopreference method for evaluating speech-transmission circuits," J. Acoust. Soc. Am. 34, 762-774 (1962). 2) M.H.L. Hecker and N. Guttman, "Survey of methods for measuring speech quality," J. Aud. Eng. Soc. 15, 400-403 (1967). 3) D.J. Goodman and R.D. Nash, "Subjective quality of the same transmission conditions in seven different countries," IEEE Trans. Commun. COM- 30, 642-654 (1982). 4) P. Mermelstein, "Evaluation of a segmental SNR measure as an indicator of the quality of ADPCM coded speech," J. Acoust. Soc. Am. 66, 1664-1667 (1979). 5) M. Nakatsui and P. Mermelstein, "Subjective speech-to-noise ratio as a measure of speech quality for digital waveform coders," J. Acoust. Soc. Am. 72, 1136-1144 (1982). 6) M.R. Schroeder, "Reference signal for signal quality studies," J. Acoust. Soc. Am. 44, 1735-1736 (1968). 7) ITU-T(CCITT) Recommendation, p.81, V, Blue Book (1988). 8) J.P. Gilford, Psychometric Methods (McGraw- Hill, New York, 1954). 9) M. Natatsui and K. Nakata, "Dual adaptive delta modulation for mobile voice channel and its DSP implementation," Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 1684-1687 (1985). 10) N.S. Jayant, "Adaptive delta modulation with a one-bit memory," Bell Syt. Tech. J. 49, 321-342 (1970). 11) T. Watanabe, K. Itoh and N. Kitawaki, "Comparison of performance on voiceband codecs by several speech quality measures," Tech. Rep. Speech, Acoust. Soc. Jpn. S82-48 (1982) (in Japanese). 12) M.R. Schroeder and B.S. Atal, "Code-exited linear prediction (CELP): high-quality speech at very low bit rates," Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 937-941 (1985).

M. NAKATSUI and H. NODA: SUBJECTIVE SNR MEASURE 13) H. Nagabuchi, "A study on reference signals in subjective quality evaluation of low-bit-rate speech codings," Trans. Inst. Electr. Inf. Commun. Eng. J69-A, 896-903 (1986) (in Japanese). Mamoru Nakatsui received the B. Eng. degree in 1963 from the University of Electro-Communications and the Dr. Eng. degree in 1976 from the Tohoku University. In 1963 he joined the Radio Research Laboratory (now called the Communications Res. Lab.) of MPT, where he is now the Associate Director General. From 1978 to 1980 he was an invited professor of INRS-Telecommunications, University of Quebec, Canada. His main interests lie in speech communications. Hideki Noda received B.E. and M.E. degrees in electronics engineering from Kyushu University in 1973 and 1975 respectively. He received Dr. Eng. degree in electrical engineering from Kyushu Institute of Technology in 1993. He worked at the Daini-Seikosha Ltd. from 1975 to 1978 and at the National Research Institute of Police Science, Japan National Police Agency from 1978 to 1989. From 1984 to 1985, he was a visiting scientist of the National Research Council of Canada. In 1989, he joined the Communications Research Laboratory, Japan Ministry of Posts and Telecommunications, where he is now the chief of the Auditory and Visual Informatics Section. His research interests include speech and image processing.