A TOOL FOR TEACHING LINEAR PREDICTIVE CODING



Similar documents
Simple Voice over IP (VoIP) Implementation

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Digital Speech Coding

Linear Predictive Coding

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Voice over IP Protocols And Compression Algorithms

Packet Loss Concealment Algorithm for VoIP Transmission in Unreliable Networks

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

Voice Encoding Methods for Digital Wireless Communications Systems

From Concept to Production in Secure Voice Communications

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.

AC : MULTIMEDIA SYSTEMS EDUCATION INNOVATIONS I: SPEECH

Speech Compression. 2.1 Introduction

Thirukkural - A Text-to-Speech Synthesis System

Basic principles of Voice over IP

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

DSAP - Digital Speech and Audio Processing

The Optimization of Parameters Configuration for AMR Codec in Mobile Networks

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Perceived Speech Quality Prediction for Voice over IP-based Networks

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

Performance Analysis of Interleaving Scheme in Wideband VoIP System under Different Strategic Conditions

How To Understand The Quality Of A Wireless Voice Communication

L9: Cepstral analysis

AUDIO CODING: BASICS AND STATE OF THE ART

Research Report. By An T. Le Oct 2005 Supervisor: Prof. Ravi Sankar, Ph.D. Oct 28, 2005 An T. Le -USF- ICONS group - SUS ans VoIP 1

Emotion Detection from Speech

Lecture 1-6: Noise and Filters

DeNoiser Plug-In. for USER S MANUAL

An Analysis of Error Handling Techniques in Voice over IP

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

Tutorial about the VQR (Voice Quality Restoration) technology

Appendix C GSM System and Modulation Description

FPGA implementation of Voice-over IP

Doppler Effect Plug-in in Music Production and Engineering

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream

School Class Monitoring System Based on Audio Signal Processing

Web-Conferencing System SAViiMeeting

Goal We want to know. Introduction. What is VoIP? Carrier Grade VoIP. What is Meant by Carrier-Grade? What is Meant by VoIP? Why VoIP?

An Arabic Text-To-Speech System Based on Artificial Neural Networks

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

Audio Coding Algorithm for One-Segment Broadcasting

Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

SUMMARY TABLE VOLUNTARY PRODUCT ACCESSIBILITY TEMPLATE

Simulative Investigation of QoS parameters for VoIP over WiMAX networks

Figure1. Acoustic feedback in packet based video conferencing system

ANALYSIS OF VOICE OVER IP DURING VERTICAL HANDOVERS IN HETEROGENEOUS WIRELESS AND MOBILE NETWORKS

ARIB STD-T V Codec for Enhanced Voice Services (EVS); Voice Activity Detection (VAD) (Release 12)

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

Ericsson T18s Voice Dialing Simulator

QoS issues in Voice over IP

Subjective SNR measure for quality assessment of. speech coders \A cross language study

RF Network Analyzer Basics

ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Synchronization of sampling in distributed signal processing systems

Lecture 1-10: Spectrograms

A WEB BASED TRAINING MODULE FOR TEACHING DIGITAL COMMUNICATIONS

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

ETSI TS V1.1.1 ( )

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Speech Signal Processing: An Overview

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

Case Study: Real-Time Video Quality Monitoring Explored

Monitoring VoIP Call Quality Using Improved Simplified E-model

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

GiLBCSteg: VoIP Steganography Utilizing the Internet Low Bit-Rate Codec

Understanding CIC Compensation Filters

Adaptive Equalization of binary encoded signals Using LMS Algorithm

Challenges and Solutions in VoIP

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Performance Evaluation of VoIP in Different Settings

Objective Speech Quality Measures for Internet Telephony

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID

Image Compression through DCT and Huffman Coding Technique

Analog-to-Digital Voice Encoding

Solutions to Exam in Speech Signal Processing EN2300

JOURNAL OF TELECOMMUNICATIONS, VOLUME 2, ISSUE 2, MAY Simulink based VoIP Analysis. Hardeep Singh Dalhio, Jasvir Singh, M.

Basics of Digital Recording

Dr. Abdel Aziz Hussein Lecturer of Physiology Mansoura Faculty of Medicine

Transcription:

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering and Information Technologies, University of Ss. Cyril and Methodius, Karpos II B.B., 1000 Skopje, Republic of Macedonia, phone: +389 2 3062 224, E-mail: {gerazov ; kafedzi; sugo}@feit.ukim.edu.mk We introduce an educational tool for teaching undergraduate students Linear Predictive Coding (LPC). The tool is a fully functional coder-decoder with an intuitive graphical user interface. Students can easily change the parameters involved in LPC and hear the results immediately. Plots of some of the parameters are also given to further illustrate the encoding/decoding process. The inclusion of the tool in laboratory classes resulted in an increased interest for the subject of LPC, providing students with hands-on experience by tuning the LPC codec. 1. INTRODUCTION Keywords: Linear Predictive Coding, Audio, Education, Laboratory Linear Predictive Coding (LPC) is one of the most important concepts in the coding, analysis and synthesis of digital speech. The technology offers high quality low bit-rate encoding and is in use in many current audio transmission / storage systems, most notably the GSM (Groupe Spécial Mobile) system. As such it has a special place in teaching digital speech and audio processing, [1-4]. It is unfortunate that the interest in electrical engineering education is slowly declining. It is necessary for the courses to adapt to this new situation and use new teaching methodologies, [5]. The students nowadays are largely computer literate, graphically oriented, possess refined hand-to-eye coordination, and, most importantly, are used to immediate feedback or results, [6]. This is why, in order to capture their imagination, modern courses must be accompanied with more attractive educational tools. This trend has led to creation of many web based DSP materials such as tutorials, assignments and web-based tools for courses in electrical engineering, [7-9]. In this paper we present an educational tool in the form of a codec, developed for teaching LPC to undergraduate and graduate students in courses dealing with the processing of digital audio and multimedia signals. Similar work has been carried out earlier as presented in [10], where most LPC-based codecs were implemented in Matlab for educational purposes. We focused our work on a single LPC-10 based codec tool with an intuitive graphical user interface (GUI) that was especially adapted for students at the undergraduate level. With the developed GUI, students can easily modify various parameters of the LPC process, and identify their influence on the speech quality. In this way students can get a hands-on experience and undertsand the significance of each parameter used in LPC. 175

The developed tool was used in laboratory exercises for the course Digital Audio Systems and the course Processing and Transmission of Multimedia Signals, and was well received by the students. It was of great value in illustrating the concepts and the particularities of LPC. In Section 2 we give a short description of linear predictive coding. In Section 3, we introduce our codec tool and in Section 4 we describe its use in a typical laboratory exercise. We present student feedback in Section 5 and conclude the paper in Section 6. 2. LPC THE BASICS LPC is based on the source-filter model of human speech production. In this model the generation of speech is represented with a system consisting of a source and a filter which correspond to the vocal cords and the vocal tract, respectively. The source operates in two modes. In the first mode an impulse train is generated, mimicking the pulse excitation from the vocal cords. In the second mode white Gaussian noise is generated, mimicking the noise-like excitation used by humans to generate unvoiced speech. The filter models the transfer function (impulse response) of the vocal tract. Its role is to shape the signal generated by the source and give it phonetic character. An autoregressive model is used for this modeling. The filter parameters are extracted from the speech input by the use of linear predictive analysis, [1]. This gives the coding scheme its name linear predictive coding. The speech production model is shown in Fig. 1. Fig. 1 The speech production model used in LPC Using the source-filter model, speech can be transmitted/stored without the use of audio samples, but instead by using the parameters of the model computed on a block (frame) basis. This way, a close approximation of the speech signal can be generated at the decoder using the transmitted/stored parameters. The input speech is analyzed in frames. For each frame the following set of parameters is extracted: Voicing: to discriminate whether the speech signal is voiced or unvoiced Gain: the energy level of the signal in the frame Filter coefficients: refer to the transfer function of the vocal tract Pitch period: base period of voiced frames 176

The main disadvantage of using this simple model of human speech production is the inappropriate representation of phones and inter-phone transitions that cannot be characterized as purely voiced or unvoiced, [2]. This disadvantage has been overcome by more sophisticated models, namely CELP (Code-Exited Linear Prediction) which is used in GSM telephony. The CELP model uses an excitation codebook in place of the simple voiced/unvoiced source model. 3. THE LPC-10 CODEC TOOL At the core of the tool is a modified version of the original Federal Standard (FS) 1015 coder [11]. It is a low bit-rate 2.4 kb/s LP-based codec developed originally by the US military and subsequently by NATO. The output of the coder is notably synthetic, however its intelligibility is excellent. Because of the use of a filter of order 10, the codec is also known as LPC-10. The codec has been adapted for use by students during laboratory classes. The parameters of the model can be easily changed and experimented with. This can be done on two levels. The first level is to change them in the developed GUI. The second level is to change them in the source code itself. For laboratory exercises only the first level is used, while the second level is more suitable to projects or graduate level courses. The GUI is presented in Fig. 2. There are three main fields in the GUI. The first contains text boxes in which the student can input the parameters of the codec. The second field contains switches which control the algorithms used in the coding process. The third field contains the load, execute, plot and play buttons. II I III Fig. 2 The Graphical User Interface of the LPC-10 177

4. TYPICAL LABORATORY CLASS The laboratory class was organized as follows. First, students start Matlab and the LPC-10 codec tool. Then, they load one of the audio files and perform the encoding / decoding. They listen to the decoded output and compare it with the original, trying to identify the artifacts. They also view the plots of the original and synthesized speech (Fig.3), comparing the amplitude envelopes of the two signals. Fig. 3 Plots of the original and the synthesized speech signal with parameters of the LP analysis 4.1 Changing the filter order In the next stage students change the filter order. From the default value of 4, they increase the filter order of the unvoiced parts to 10, and try to discern improvement in the output quality. After this they decrease the filter order to 1 to figure out if there is any impact. Increasing filter order bears little improvement in quality, while decreasing it to 1 results in noticeable degradation. For the voiced filter order they use the following values: 1, 2, 4, 6, the default value of 10, and 20. The first three values introduce intelligibility slowly in the resulting speech. 4.2 Changing frame and analysis window size In this part students increase the frame size from 22.5 ms to 100 ms and listen how the speech becomes smudged and phones merge with each other. The decrease of frame size to 5 ms, on the other hand, results in an increased quality, joined by an increase in bit-rate. Increasing the analysis window size by choosing the overlap to be 0.9, instead of the default 0.5, has similar effects on the speech signal as increasing the frame size (phone merging). 4.3 Adjusting voicing determination The most illustrative part for the students is to see how the voicing determinaton works. Two algorithms can be used for this. The first one uses the ratio of the pitch autocorrelation peak with the power of the speech signal over the current frame. The 178

second algorithm uses the value of the first PARCOR (Partial Correlation) coefficient k 1. Each algorithm compares uses a decision threshold. Figure 4 shows the effects of threshold variation for the first algorithm in the range from 0.3 to 0.5 for the utterance makedonski. a) b) c) Fig. 4 Influence of the threshold on the voicing decision when using the autocorrelation algorithm: a) 0.3; b) 0.4; c) 0.5. It can be seen that when the threshold is 0.3 there is a large error around 0.8 s into the signal, where the phone s is determined as voiced. The problem is eliminated when using a threshold of 0.4. Still, the second phone k ( makedonski ) is determined as voiced. Increase of the threshold to 0.5 will eliminate this problem as well. Extremely low and extremely high thresholds are also tried, as well as the use of different voice types: female, child, elderly, and a music passage. A key point is to hear that the whole LPC-10 concept breaks down when trying to encode music. 5. STUDENT FEEDBACK Students were enthusiastic during the laboratory classes. What was most entertaining to them was decreasing the voicing detection threshold to an extreme value, e.g. to 0.01. This forces the speech to be processed and synthesized as entirely voiceless, i.e. with a whisper. Another interesting effect was doing the opposite, forcing the entire speech to be synthesized as voiced. It was obvious that they wanted to continue to experiment on their own and they were encouraged to take the source code home and try it with their own recordings. 179

The increase in students interest for the field was evident as a result of using our tool. This is a definite pointer that such tools should be included in the exercises in the areas of digital audio and multimedia signal processing. 6. CONCLUSION LPC is a very important and powerful concept in the analysis and synthesis of digital speech. It is an important part of digital audio processing and multimedia signal processing courses for undergraduate students. The introduction of the LPC-10 codec in the laboratory classes has helped the students grasp the particularities of LPC in an intuitive way. It has also increased their interest in this area to the extent that they start experimenting on their own. The LPC-10 codec can be further improved. A graduate level task might be to implement the codec according to the exact specifications of the FS-1015 standard, and, also, implement a CELP coder at a following stage. Further development of similar tools for other areas in digital audio processing and multimedia signal processing is envisioned. 7. REFERENCES [1] Rabiner L.R., R.W. Schafer, Digital Processing of Speech Signals, Prentice Hall, 1978 [2] Chu W., Speech Coding Algorithms - foundation and evolution of of standardized coders, Wiley, 2003 [3] Spanias A., T. Painter, V. Atti, Audio Signal Processing and Coding, John Wiley & Sons, 2007 [4] Sayood К., Introduction to Data Compression, Second Edition, Morgan Kaufmann, 2000 [5] Taskovski D., V. Kitanovski, S. Bogdanov, Developing Collaborative Learning Model in DSP Courses, 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services, June 2007 [6] Morrow M., Welch T., Wright C., An Inexpensive Software Tool For Teaching Real-Time DSP, First Signal Processing Education Workshop, Hunt, Texas, 2000 [7] Taskovski D., V. Kitanovski, S. Bogdanova, Z. Cucej, New web-based tools for DSP education, Int. Conf. on Interactive Computer Aided Blended Learning, Brazil, May, 2007 [8] Marín S., F. García, R. Torres, S. Vázquez, A. Moreno, Implementation of a Web-Based Educational Tool for DSP Teaching Using thetechnological Acceptance Model, IEEE Transactions on Education, 2005 [9] Rahkila M., M. Karjalainen, Considerations of computer based education in acoustics and signalprocessing, Frontiers in Education Conference, 1998 [10] Painter E., A. Spanias, A Matlab Software Tool For The Introduction Of Speech Coding Fundamentals In A DSP Course, Frontiers in Education Conference '96, Nov. 1996 [11] Tremain T. E., The Government Standard Linear Predictive Coding Algorithm,LPC-10, Speech Technology, April. 1982, pp. 40 49 180