Speaker Recognition from Coded Speech and the Effects of Score Normalization
|
|
- Marcia Lewis
- 7 years ago
- Views:
Transcription
1 Speaker Recognition from Coded Speech and the Effects of Score Normalization R.B. Dunn, T.F. Quatieri, D.A. Reynolds, J.P. Campbell MIT Lincoln Laboratory, Lexington, MA Abstract We investigate the effect of speech coding on automatic speaker recognition when training and testing conditions are matched and mismatched. Experiments used standard speech coding algorithms (, G.729, G.723, ) and a speaker recognition system based on gaussian mixture models adapted from a universal background model. There is little loss in recognition performance for toll quality speech coders and slightly more loss when lower quality speech coders are used. Speaker recognition from coded speech using handset dependent score normalization and test score normalization are examined. Both types of score normalization significantly improve performance, and can eliminate the performance loss that occurs when there is a mismatch between training and testing conditions. 1. Introduction With the increase in availability and use of digital cellular and VoIP telephony there has been increased interest in the effects of speech compression algorithms on speaker recognition systems. In this paper we investigate the effects of four commonly used speech coding algorithms on automatic speaker recognition for conversational telephone speech. We also examine the effects of a mismatch between the training and testing phases of the speaker recognition system, where, for example, the speaker model is trained from uncoded speech and in the recognition phase the speech is coded. The speaker recognition experiments in this paper are performed using a Gaussian mixture model universal background model (GMM-UBM) speaker recognition system [1]. This type of speaker recognition system has consistently had excellent performance in the annual NIST Speaker Recognition Evaluations [1, 2]. In the experiments, coded speech is generated by first encoding and then decoding speech segments from the NIST Speaker Recognition This work was supported by the Department of Defense under Air Force Contract F C-2. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government. Benchmarks [3] (SRB). This simulates the condition where speech originating from a land line is encoded digitally during transmission. This occurs, for example, when a user on a land line is connected to a digital cellular user. In past work [4] a relatively small data set containing only target speakers per gender was used to examine the effects of speech coding algorithms on the above speaker recognition system. Those experiments demonstrated that there was a slight loss in speaker recognition performance for speech compressed with toll quality coders and that this loss increased slightly for lower quality speech coders. In particular, this performance loss increased when the training and testing conditions were mismatched. These experiments used electret handsets and required that the training and testing speech use the same handset. We show that these results hold for a larger and more challenging data set where carbon button handsets are included and where the handset is allowed to vary between training and testing. These added conditions are known to pose a significantly increased challenge to automatic speaker recognition systems [2]. We also examine the effect of score normalization which is commonly used to boost system performance. We find that handset dependent score normalization () and handset dependent test score normalization () are about at effective at improving system performance for coded speech about as they are for speech that has not been coded. We also find that in most cases score normalization removes the performance loss associated with a mismatch between training and testing conditions s 2. Background We chose four standard speech coding algorithms to cover a range of speech quality and bit rates. The coders are: (12.2 kb/s), G.729 (8 kb/s), G.723 (.3 kb/s) and (2.4 kb/s). All of the coders use a source/filter representation of speech, with the primary differences being the fidelity of the transmitted parameters and the manner of coding and regenerating the excitation. The coder is the
2 Figure 1: GMM-UBM likelihood ratio detector. ETSI Pan-European standard fixed-point enhanced at 12.2 kb/s and is based on a regular multi-pulse residual coding scheme []. The G.729 coder is a fixed point coder at 8 kb/s standardized by ITU-T for personal communication and satellite systems, and is based on a conjugate-structure algebraic CELP residual coding scheme [6]. These two coders produce toll quality speech. The G.723 coder is the CELP-based ITU-T multi-media standard coder at.3 kb/s [7]. Finally, the Mixed Excitation Linear Prediction[8] () coder which is the new U.S. Federal Standard for speech coding at 2.4 kb/s uses a synthetic excitation (harmonics plus noise). This coder is used for narrowband radio and satellite communications Speaker Recognition System The basic speaker detector is a likelihood ratio detector with target and alternative probability distributions modeled by Gaussian mixture models (GMMs) as shown in Figure 1. A Universal Background Model (UBM) GMM is used as the alternative hypothesis model, and from this, target models are derived using Bayesian adaptation (also known as Maximum A-Posteriori (MAP) training) [1]. The scores are normalized such that a single speaker-independent threshold can be used for detection. The front end processing for the system is as follows. A 19-dimensional mel-cepstral vector is extracted from the speech signal every ms using a 2 ms window. The melcepstral vector is computed using a simulated triangular filterbank on the DFT spectrum. Bandlimiting is then performed by only retaining the filterbank outputs from the frequency range Hz. Cepstral vectors are processed first with cepstral mean subtraction and then with RASTA filtering to mitigate linear channel bias effects. Delta cepstra are then computed over a frame span and appended to the cepstra vector producing a 38 dimensional feature vector. Lastly, the feature vector stream is processed through an adaptive, energy-based speech detector to discard lowenergy vectors. The UBM is a 248 mixture gender-independent, handset-independent GMM trained using about 9 hours of data selected from the 1999 NIST SRB to be approximately evenly divided between sex and handset type. Target models are derived by Bayesian adaptation (a.k.a. MAP estimation) of the UBM parameters using the two minutes of training data. Only the mean vectors are adapted as this has been observed to provide better performance. The amount of adaptation of each mixture mean is data dependent. Details of the adapted GMM-UBM system can be found in [1] Score Normalization In past work [1, 9], the use of score normalization has significantly improved the performance of speaker recognition systems. In this work we undertake to determine if score normalization, in particular handset dependent score normalization () and handset dependent test-score normalization (), can be used to improve performance when the speech has been coded. In, scores from a handset-dependent collection of fixed non-target (imposter) speech samples are used to normalize a speaker model, while in, scores from a handset-dependent collection of fixed non-target (imposter) speaker models are used to normalize a speech test segment. In the application of, we first compute the loglikelihood ratio scores for a target speaker with a set of imposter test segments coming from both carbon-button (CARB) and electret (ELEC) handsets. We assume these scores have a Gaussian distribution and we estimate the handset-dependent means and standard deviations for these scores. To avoid bimodal distributions, the non-speaker data is of the same gender as the target speaker. The target speaker now has two sets of parameters describing his/her model' s response to CARB and ELEC type speech: CARB CARB ELEC ELEC In this paper we used 2 3-second speech segments per handset type, per gender derived from the 1999 NIST SRB test corpus. In general, the duration of the speech segments used to estimate parameters should match the expected duration of the test speech segments. During recognition, a handset detector is used to supply the handset type of the test segment. This detector is a simple maximum likelihood detector with handset types represented by 26 mixture GMMs []. Then for each test segment,, is applied to the log-likelihood ratio score as (1) where is the handset label for. The desired effect of is illustrated in Figure 2. This figure shows Log-Likelihood Ratio (LLR) score distributions for two speakers before (left column) and after (right column) has been applied. The effect
3 2 Figure 2: Pictorial example of compensation. This picture shows Log-Likelihood Ratio (LLR) score distributions for two speakers before (left column) and after (right column) has been applied. After, the nonspeaker score distribution for each handset type has been normalized to zero mean and unit standard deviation. of removing the handset dependent biases and scales is to normalize the non-target speaker score distributions such that they have zero mean and unit standard deviation for speech from both handset types. This results in better performance when using a single threshold for detection. In addition to removing handset bias and scales, also helps normalize log-likelihood scores across different speaker models, again resulting in better performance when using speaker-independent thresholds as in the NIST Speaker Recognition Evaluations [2, 11]. is in effect estimating speaker and handset specific thresholds and mapping them into the log-likelihood score domain rather than using them directly. is similar to but with the following difference: in we compute normalization parameters (means and variances) for each speaker model by scoring a fixed set of imposter test segments with that particular speaker model, while in the normalization parameters are computed for each test message by scoring that particular test message with a fixed set of imposter speaker models. 3. Experiments and Results The data used for this paper are derived from NIST Speaker Recognition Benchmarks [3] (SRB) which are made up of conversational telephone speech. Coded speech is generated by encoding and then decoding the speech segments with each of the four speech coders, simulating the condition where speech originating from a land line is encoded digitally during transmission. The data for training the Figure 3: Equal-Error-Rates for (matched case where background model, training and testing speech are all coded). UBM and for computing and parameters was selected from the 1999 NIST SRB. The test data used in this paper are that of the single speaker detection task in the 2 NIST SRB. This is a much larger and more demanding data set than was used in [4] where the handset was not allowed to vary between training and testing and only electret handsets were used. In contrast, the 2 NIST SRB includes tests where the training and testing handsets are different and it includes both electret and carbon-button handsets. In addition there are over 6 test utterances compared with the 62 test utterances used in [4]. The results reported below reflect a pooling of all single-speaker test segments from both male and female speakers. Although both genders are included in the results there are no crossgender tests as such tests significantly reduce the difficulty of the problem. The application of and to coded speech requires that the handset type (CARB or ELEC) be determined from the coded speech. In the experiments below, the handset type for coded speech was detected using GMMs trained from uncoded CARB and ELEC speech with a detection threshold adjusted for the coder as in [12] Test Conditions There are three places in the system where coded speech may be encountered: the test data, the target speaker training data, and the background model training data. In the
4 2 2 Figure 4: Equal-Error-Rates for (partially mismatched case where the background model and training speech are coded and the testing speech is not coded). Figure : Equal-Error-Rates for (partially mismatched case where only the test speech is coded and the background model and training speech are not coded). matched case all of the speech is coded while in the mismatched cases some of speech is coded while some of the speech is not coded. For each speech coder, there are four conditions that we tested and compared to a baseline condition in which no coding was performed. We describe the conditions in order of decreasing degree of matching between training and testing. : This is the fully matched case where background and target models are derived from coded speech and the test data is also coded. In this case training and testing speech are coded and a matching coded UBM is available. : This is a partially mismatched case where the UBM and target model are derived from coded speech and the test data is from uncoded speech. Since the uncoded test messages are scored against two coded models, we expect performance to decrease relative to condition A. In this case training and testing speech are mismatched but a UBM matching the training speech is available. : This is a partially mismatched case where the UBM and target model are derived from uncoded speech and the test data is from coded speech. Since the coded test messages are scored against two uncoded models, we expect performance similar to condition B. As in condition B the training and testing speech are mismatched but a UBM matching the training speech is available. : This is the fully mismatched case where the background model is derived from uncoded speech and the target models and test data are from coded speech. The test data is thus scored against one model derived from coded speech (the target speaker model) and one model derived from uncoded speech (the background model); as such, we expect the worst performance for this case. In this condition the training and testing speech are matched but a matching coded UBM is not available. In each condition, the imposter test segments used to compute parameters were matched to the test speech segments and the imposter models used for were trained in the same manner as the target speaker models. For example, in condition D, the test segments are coded and therefore the imposter test segments used to compute parameters are also coded. Likewise, in condition D the target models are trained using coded speech and a clean speech UBM so the imposter models for are also trained from coded speech and a clean speech UBM Matched Condition with Score Normalization Speaker detection performance for the various coders in the fully matched condition is shown in Figure 3. This is the
5 2 2 Normalization: None Figure 6: Equal-Error-Rates for (fully mismatched case where training and testing speech are coded but the background model speech was not coded). Figure 7: Equal-Error-Rates for all four conditions (A, B, C, and D) with no score normalization. condition where all training and testing speech are coded and a matching coded UBM is used. Performance is shown in terms of the Equal-Error-Rate (EER) which is the operating point where the probabilities of miss and false alarm are equal. Speaker detection performance for the coder is nearly identical to the performance on uncoded telephone speech and there is a slight increase in the EER as the coder bit rate (and speech quality) decreases. The figure shows that both and are as effective for coded speech as they are for uncoded speech, decreasing the EER about 2-3%. This is true even for G.723 and where the handset identification error is roughly double the error for uncoded speech [12] Mismatched Conditions Speaker detection performance in the various mismatched conditions is shown in Figures 4,, and 6. The trends seen here are similar to the matched condition where both and significantly improve system performance. In the partially mismatched conditions (B and C) appears to be as effective as is for the high-rate coders and G.729, but for the low-rate coder is not as effective as. In the fully mismatched condition (D) the use of score normalization (either or ) yields greater performance gains that it does in the matched (A) or partially mismatched (B and C) conditions. It is also notable that in the fully mismatched condition HT- norm has significantly better performance than. All of the conditions (A, B, C, and D) are directly compared without score normalization in Figure 7. The figure shows that in conditions A and B the performance is identical (except for the coder), in condition C there is a slight performance drop, and in condition D there is a large performance drop. When is applied, as shown in Figure 8, conditions A, B, and C have similar performance but condition D still shows a performance drop. When is used, as shown in Figure 9, there is little to no difference between the four conditions, except for the coder in conditions B and C. 4. Summary In this paper, we demonstrated that the adapted GMM-UBM speaker recognition system can be effectively used for textindependent speaker detection when telephone speech has been compressed using common speech coding algorithms. There is only a slight increase in the EER for toll quality speech coders ( and G.729) as compared with the baseline uncoded speech and the performance loss for lower rate speech coders (G.723 and ) is only slightly greater. It was shown that score normalization techniques such as and can be applied to coded speech and that both score normalization techniques are as effective at improving system performance for coded speech as they are for uncoded speech. Overall, the effect of speech coding on the adapted GMM-UBM speaker recognition system is
6 2 2 Normalization: Normalization: Figure 8: Equal-Error-Rates for all four conditions (A, B, C, and D) with. Figure 9: Equal-Error-Rates for all four conditions (A, B, C, and D) with. relatively benign under most conditions. Although there is a significant performance loss if the speech used to train the background model was not processed through the same speech coder as the the speech used to train speaker models (condition D), this performance loss can be eliminated if is used.. References [1] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing, Vol., No. 1-3, pp , January/April/July 2. [2] A Martin, M. Przybocki, The NIST 1999 Speaker Recognition Evaluation An Overview, Digital Signal Processing, Vol., No. 1-3, pp. 1-18, January/April/July 2. [3] Linguistic Data Consortium, NIST Speaker Recognition Benchmarks. [4] T.F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, and J.P. Campbell, Speaker and Language Recognition using Speech Codec Parameters, Proc. Eurospeech '99, Vol. 2, pp , September [7] ITU-T Recommendation G.723.1, Dual rate speech coder for multimedia communications transmitting at.3 and 6.3 kb/s, March [8] A.V. McCree, K.K. Truong, E.B. George, T. Barnwell, and V.R. Viswanathan, A 2.4 kbit/s Coder Candidate for the New U.S. Federal Standard, Proc. ICASSP '96, Vol. 1, pp. 2-24, May [9] D. A. Reynolds, Comparison of background normalization methods for text-independent speaker verification, Proc. Eurospeech '97, pp , September [] D. A. Reynolds, HTIMIT and LLHDB: Speech corpora for the study of handset transducer effects, Proc. ICASSP '97, pp. 3-38, April [11] National Institute of Standards and Technology, NIST Coordinated Speaker Recognition Evaluations [12] R.B. Dunn, T.F. Quatieri, D.A. Reynolds, and J.P. Campbell, Speaker Recognition from Coded Speech in Matched and Mismatched Conditions, Proc. Speaker Recognition Workshop '1, Crete, Greece, pp. 1-12, June [] European Telecommunication Standards Institute, European digital telecommunications system(phase2); Full rate speech processing functions ( 6.1), ETSI, [6] ITU-T Recommendation G.729, Coding of speech at 8 kb/s using conjugate-structure algebraic-code-excited linear prediction, June 199.
IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior
More informationThe effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationA Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman
A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints
More informationIntroduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles
Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles Sound is an energy wave with frequency and amplitude. Frequency maps the axis of time, and amplitude
More informationVoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)
VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.) 1 Remember first the big picture VoIP network architecture and some terminologies Voice coders 2 Audio and voice quality measuring
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationEFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE
Uludağ Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, Cilt 18, Sayı 1, 2013 ARAŞTIRMA EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE Cemal HANİLÇİ * Figen ERTAŞ * Abstract:
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationPerceived Speech Quality Prediction for Voice over IP-based Networks
Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,
More informationSimple Voice over IP (VoIP) Implementation
Simple Voice over IP (VoIP) Implementation ECE Department, University of Florida Abstract Voice over IP (VoIP) technology has many advantages over the traditional Public Switched Telephone Networks. In
More informationAnalog-to-Digital Voice Encoding
Analog-to-Digital Voice Encoding Basic Voice Encoding: Converting Analog to Digital This topic describes the process of converting analog signals to digital signals. Digitizing Analog Signals 1. Sample
More informationRadio over Internet Protocol (RoIP)
Radio over Internet Protocol (RoIP) Presenter : Farhad Fathi May 2012 What is VoIP? [1] Voice over Internet Protocol is a method for taking analog audio signals, like the kind you hear when you talk on
More informationADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
More informationDigital Speech Coding
Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2720 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationObjective Speech Quality Measures for Internet Telephony
Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationVoice over IP Protocols And Compression Algorithms
University of Tehran Electrical and Computer Engineering School SI Lab. Weekly Presentations Voice over IP Protocols And Compression Algorithms Presented by: Neda Kazemian Amiri Agenda Introduction to
More informationAutomatic Cross-Biometric Footstep Database Labelling using Speaker Recognition
Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition Ruben Vera-Rodriguez 1, John S.D. Mason 1 and Nicholas W.D. Evans 1,2 1 Speech and Image Research Group, Swansea University,
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationTutorial about the VQR (Voice Quality Restoration) technology
Tutorial about the VQR (Voice Quality Restoration) technology Ing Oscar Bonello, Solidyne Fellow Audio Engineering Society, USA INTRODUCTION Telephone communications are the most widespread form of transport
More informationIBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream
RC23499 (W0501-090) January 19, 2005 Computer Science IBM Research Report CSR: Speaker Recognition from Compressed Packet Stream Charu Aggarwal, David Olshefski, Debanjan Saha, Zon-Yin Shae, Philip Yu
More informationSubjective SNR measure for quality assessment of. speech coders \A cross language study
Subjective SNR measure for quality assessment of speech coders \A cross language study Mamoru Nakatsui and Hideki Noda Communications Research Laboratory, Ministry of Posts and Telecommunications, 4-2-1,
More informationSpeech Compression. 2.1 Introduction
Speech Compression 2 This chapter presents an introduction to speech compression techniques, together with a detailed description of speech/audio compression standards including narrowband, wideband and
More informationPERFORMANCE ANALYSIS OF VOIP TRAFFIC OVER INTEGRATING WIRELESS LAN AND WAN USING DIFFERENT CODECS
PERFORMANCE ANALYSIS OF VOIP TRAFFIC OVER INTEGRATING WIRELESS LAN AND WAN USING DIFFERENT CODECS Ali M. Alsahlany 1 1 Department of Communication Engineering, Al-Najaf Technical College, Foundation of
More informationPerformance Evaluation of VoIP Services using Different CODECs over a UMTS Network
Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network Jianguo Cao School of Electrical and Computer Engineering RMIT University Melbourne, VIC 3000 Australia Email: j.cao@student.rmit.edu.au
More informationMPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music
ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final
More informationARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS
ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,
More informationANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP
ENSC 427: Communication Networks ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP Spring 2010 Final Project Group #6: Gurpal Singh Sandhu Sasan Naderi Claret Ramos (gss7@sfu.ca) (sna14@sfu.ca)
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationSolutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
More informationLog-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network
Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering
More informationARIB STD-T64-C.S0042 v1.0 Circuit-Switched Video Conferencing Services
ARIB STD-T-C.S00 v.0 Circuit-Switched Video Conferencing Services Refer to "Industrial Property Rights (IPR)" in the preface of ARIB STD-T for Related Industrial Property Rights. Refer to "Notice" in the
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationETSI TS 101 329-2 V1.1.1 (2000-07)
TS 101 329-2 V1.1.1 (2000-07) Technical Specification Telecommunications and Internet Protocol Harmonization Over Networks (TIPHON); End to End Quality of Service in TIPHON Systems; Part 2: Definition
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationPerformance Analysis of Interleaving Scheme in Wideband VoIP System under Different Strategic Conditions
Performance Analysis of Scheme in Wideband VoIP System under Different Strategic Conditions Harjit Pal Singh 1, Sarabjeet Singh 1 and Jasvir Singh 2 1 Dept. of Physics, Dr. B.R. Ambedkar National Institute
More informationImplementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31
Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the
More informationAS indicated by the growing number of participants in
1960 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 7, SEPTEMBER 2007 State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software Benoît
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More informationDiscriminative Multimodal Biometric. Authentication Based on Quality Measures
Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,
More informationWhite Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN
PESQ: An Introduction White Paper Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com September
More informationSignal Detection. Outline. Detection Theory. Example Applications of Detection Theory
Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. Neyman-Pearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched
More informationAppendix C GSM System and Modulation Description
C1 Appendix C GSM System and Modulation Description C1. Parameters included in the modelling In the modelling the number of mobiles and their positioning with respect to the wired device needs to be taken
More informationDTS Enhance : Smart EQ and Bandwidth Extension Brings Audio to Life
DTS Enhance : Smart EQ and Bandwidth Extension Brings Audio to Life White Paper Document No. 9302K05100 Revision A Effective Date: May 2011 DTS, Inc. 5220 Las Virgenes Road Calabasas, CA 91302 USA www.dts.com
More informationBasic principles of Voice over IP
Basic principles of Voice over IP Dr. Peter Počta {pocta@fel.uniza.sk} Department of Telecommunications and Multimedia Faculty of Electrical Engineering University of Žilina, Slovakia Outline VoIP Transmission
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationGSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester 2004. Norsk Regnesentral
GSM speech coding Forelesning INF 5080 Vårsemester 2004 Sources This part contains material from: Web pages Universität Bremen, Arbeitsbereich Nachrichtentechnik (ANT): Prof.K.D. Kammeyer, Jörg Bitzer,
More informationVoice---is analog in character and moves in the form of waves. 3-important wave-characteristics:
Voice Transmission --Basic Concepts-- Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Amplitude Frequency Phase Voice Digitization in the POTS Traditional
More informationPART 5D TECHNICAL AND OPERATING CHARACTERISTICS OF MOBILE-SATELLITE SERVICES RECOMMENDATION ITU-R M.1188
Rec. ITU-R M.1188 1 PART 5D TECHNICAL AND OPERATING CHARACTERISTICS OF MOBILE-SATELLITE SERVICES Rec. ITU-R M.1188 RECOMMENDATION ITU-R M.1188 IMPACT OF PROPAGATION ON THE DESIGN OF NON-GSO MOBILE-SATELLITE
More informationPCM Encoding and Decoding:
PCM Encoding and Decoding: Aim: Introduction to PCM encoding and decoding. Introduction: PCM Encoding: The input to the PCM ENCODER module is an analog message. This must be constrained to a defined bandwidth
More informationStatistical Measurement Approach for On-line Audio Quality Assessment
Statistical Measurement Approach for On-line Audio Quality Assessment Lopamudra Roychoudhuri, Ehab Al-Shaer and Raffaella Settimi School of Computer Science, Telecommunications and Information Systems,
More informationHD VoIP Sounds Better. Brief Introduction. March 2009
HD VoIP Sounds Better Brief Introduction March 2009 Table of Contents 1. Introduction 3 2. Technology Overview 4 3. Business Environment 5 4. Wideband Applications for Diverse Industries 6 5. AudioCodes
More informationThe Optimization of Parameters Configuration for AMR Codec in Mobile Networks
01 8th International Conference on Communications and Networking in China (CHINACOM) The Optimization of Parameters Configuration for AMR Codec in Mobile Networks Nan Ha,JingWang, Zesong Fei, Wenzhi Li,
More informationSimulative Investigation of QoS parameters for VoIP over WiMAX networks
www.ijcsi.org 288 Simulative Investigation of QoS parameters for VoIP over WiMAX networks Priyanka 1, Jyoteesh Malhotra 2, Kuldeep Sharma 3 1,3 Department of Electronics, Ramgarhia Institue of Engineering
More informationVoice Activity Detection in the Tiger Platform. Hampus Thorell
Voice Activity Detection in the Tiger Platform Examensarbete utfört i Reglerteknik av Hampus Thorell LiTH-ISY-EX--06/3817--SE Linköping 2006 Voice Activity Detection in the Tiger Platform Examensarbete
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationTranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification
TranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification Mahesh Viswanathan, Homayoon S.M. Beigi, Alain Tritschler IBM Thomas J. Watson Research Labs Research
More informationIncorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
More informationKhalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska
PROBLEM STATEMENT A ROBUST COMPRESSION SYSTEM FOR LOW BIT RATE TELEMETRY - TEST RESULTS WITH LUNAR DATA Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska The
More informationICTTEN6043A Undertake network traffic management
ICTTEN6043A Undertake network traffic management Release: 1 ICTTEN6043A Undertake network traffic management Modification History Not Applicable Unit Descriptor Unit descriptor This unit describes the
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationMULTI-STREAM VOICE OVER IP USING PACKET PATH DIVERSITY
MULTI-STREAM VOICE OVER IP USING PACKET PATH DIVERSITY Yi J. Liang, Eckehard G. Steinbach, and Bernd Girod Information Systems Laboratory, Department of Electrical Engineering Stanford University, Stanford,
More informationSimulation Based Analysis of VOIP over MANET
Simulation Based Analysis of VOIP over MANET Neeru Mehta 1, leena 2 M-Tech Student 1, Assit. Prof. 2 &Department of CSE & NGF College of Engineering &Technology Palwal, Haryana, India Abstract In the last
More informationAudio processing and ALC in the FT-897D
Audio processing and ALC in the FT-897D I recently bought an FT-897D, and after a period of operation noticed problems with what I perceived to be a low average level of output power and reports of muffled
More informationCurso de Telefonía IP para el MTC. Sesión 2 Requerimientos principales. Mg. Antonio Ocampo Zúñiga
Curso de Telefonía IP para el MTC Sesión 2 Requerimientos principales Mg. Antonio Ocampo Zúñiga Factors Affecting Audio Clarity Fidelity: Audio accuracy or quality Echo: Usually due to impedance mismatch
More informationInformation Paper. FDMA and TDMA Narrowband Digital Systems
Information Paper FDMA and TDMA Narrowband Digital Systems Disclaimer Icom Inc. intends the information presented here to be for clarification and/or information purposes only, and care has been taken
More informationVoice Encoding Methods for Digital Wireless Communications Systems
SOUTHERN METHODIST UNIVERSITY Voice Encoding Methods for Digital Wireless Communications Systems BY Bryan Douglas Street address city state, zip e-mail address Student ID xxx-xx-xxxx EE6302 Section 324,
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationChannel-dependent GMM and Multi-class Logistic Regression models for language recognition
Channel-dependent GMM and Multi-class Logistic Regression models for language recognition David A. van Leeuwen TNO Human Factors Soesterberg, the Netherlands david.vanleeuwen@tno.nl Niko Brümmer Spescom
More informationReal Time Analysis of VoIP System under Pervasive Environment through Spectral Parameters
Real Time Analysis of VoIP System under Pervasive Environment through Spectral Parameters Harjit Pal Singh Department of Physics Dr.B.R.Ambedkar National Institute of Technology Jalandhar, India Sarabjeet
More informationHISO 10049.1 Videoconferencing Interoperability Standard
HISO 10049.1 Videoconferencing Interoperability Standard Document information HISO 10049.1 Videoconferencing Interoperability Standard is a standard for the New Zealand health and disability sector. Published
More informationTECHNICAL SPECIFICATION FOR CORDLESS TELEPHONE SYSTEMS
TECHNICAL SPECIFICATION FOR CORDLESS TELEPHONE SYSTEMS Suruhanjaya Komunikasi dan Multimedia Malaysia Off Pesiaran Multimedia, 63000 Cyberjaya, Selangor Darul Ehsan, Malaysia Copyright of SKMM, 2007 FOREWORD
More informationImage Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
More informationIntroduction to Packet Voice Technologies and VoIP
Introduction to Packet Voice Technologies and VoIP Cisco Networking Academy Program Halmstad University Olga Torstensson 035-167575 olga.torstensson@ide.hh.se IP Telephony 1 Traditional Telephony 2 Basic
More informationSpeech Performance Solutions
Malden Electronics Speech Performance Solutions The REFERENCE for Speech Performance ssessment Speech Performance Solutions Issue 1.0 Malden Electronics Ltd. 2005 1 Product Overview Overview Malden Electronics
More informationVoice Encryption over GSM:
End-to to-end Voice Encryption over GSM: A Different Approach Wesley Tanner Nick Lane-Smith www. Keith Lareau About Us: Wesley Tanner - Systems Engineer for a Software-Defined Radio (SDRF) company - B.S.
More informationApplication Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN
Using PESQ to Test a VoIP Network Application Note Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com
More informationChapter 6 Bandwidth Utilization: Multiplexing and Spreading 6.1
Chapter 6 Bandwidth Utilization: Multiplexing and Spreading 6.1 Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Note Bandwidth utilization is the wise use of
More informationThe Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT
The Effect of Network Cabling on Bit Error Rate Performance By Paul Kish NORDX/CDT Table of Contents Introduction... 2 Probability of Causing Errors... 3 Noise Sources Contributing to Errors... 4 Bit Error
More informationRevision of Lecture Eighteen
Revision of Lecture Eighteen Previous lecture has discussed equalisation using Viterbi algorithm: Note similarity with channel decoding using maximum likelihood sequence estimation principle It also discusses
More informationAN1200.04. Application Note: FCC Regulations for ISM Band Devices: 902-928 MHz. FCC Regulations for ISM Band Devices: 902-928 MHz
AN1200.04 Application Note: FCC Regulations for ISM Band Devices: Copyright Semtech 2006 1 of 15 www.semtech.com 1 Table of Contents 1 Table of Contents...2 1.1 Index of Figures...2 1.2 Index of Tables...2
More informationNetwork administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.
Delay Need for a Delay Budget The end-to-end delay in a VoIP network is known as the delay budget. Network administrators must design a network to operate within an acceptable delay budget. This topic
More informationVEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS
VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationAN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD INTRODUCTION CEA-2010 (ANSI) TEST PROCEDURE
AUDIOMATICA AN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD by Daniele Ponteggia - dp@audiomatica.com INTRODUCTION The Consumer Electronics Association (CEA),
More information8. Cellular Systems. 1. Bell System Technical Journal, Vol. 58, no. 1, Jan 1979. 2. R. Steele, Mobile Communications, Pentech House, 1992.
8. Cellular Systems References 1. Bell System Technical Journal, Vol. 58, no. 1, Jan 1979. 2. R. Steele, Mobile Communications, Pentech House, 1992. 3. G. Calhoun, Digital Cellular Radio, Artech House,
More informationActive Monitoring of Voice over IP Services with Malden
Active Monitoring of Voice over IP Services with Malden Introduction Active Monitoring describes the process of evaluating telecommunications system performance with intrusive tests. It differs from passive
More informationADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING
Development of a Software Tool for Performance Evaluation of MIMO OFDM Alamouti using a didactical Approach as a Educational and Research support in Wireless Communications JOSE CORDOVA, REBECA ESTRADA
More informationProbability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
More informationADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2
3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,
More informationWhite Paper. D-Link International Tel: (65) 6774 6233, Fax: (65) 6774 6322. E-mail: info@dlink.com.sg; Web: http://www.dlink-intl.
Introduction to Voice over Wireless LAN (VoWLAN) White Paper D-Link International Tel: (65) 6774 6233, Fax: (65) 6774 6322. Introduction Voice over Wireless LAN (VoWLAN) is a technology involving the use
More information