International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) - 216 A Novel Approach For Voice Quality Enhancement For VoIP Applications Using DSP Processor T. Madhavi Dr. B.Hari Krishna L.Phani Kanth; Dy.Manager ECE Dept. ECE Dept, Hindustan Aeronautics Ltd, Asst.Prof,CMREC Prof,CMREC HYDERABAD tmadhavi2@yahoo.com harikrishna7@gmail.com phanikanth2@yahoo.com Abstract Voice quality enhancement is the major concern in telecommunications. In this paper a novel approach is proposed for Background reduction (BGNR) and Acoustic Echo cancellation (AEC) for real time speech for VoIP applications. This paper aims to improve speech intelligibility and reduce listening effort. Voice quality enhancement is achieved by removing unwanted echo and noise of the signal. BGNR can be achieved by using noise cancellation method with the help of LMS algorithm. Acoustic echo is of poor coupling between the microphone and speaker in handset/hands free devices, its revocation is achieved by AAEC( AEC) with the help of Normalized LMS(NLMS) algorithm on a dedicated ADSP BF533- EZ KIT processor after satisfactory results in MATLAB simulation. We achieved 21.52db and 1.57db voice quality improvement within the permissible speech delay of 2ms defined by ITU-T standards. This approach is very inexpensive to small entrepreneurs. Key words: BGNR, AEC, adaptive filter,lms, NLMS I. INTRODUCTION Voice quality enhancement technique is used to improve speech intelligibility and reduce listening effort, to deliver much better speech quality-in the telecommunications systems design. The method to achieve VQE (voice quality enhancement) is to remove unwanted BGN and AEC of the signal. Furthermore, besides using VQE building blocks to enhance speech quality, the same technology can be applied to other speech-related applications example, in speech recording and recognition. VQE is becoming increasingly important because the telecommunication landscape continues to change and grow as the customer base yearns for new ways to communicate. Much to the annoyance of some, droves of people now carry on conversations in trains, planes and automobiles; while walking on city streets and eating at restaurants; stretched out on the beach and sprawled in public parks. Aside from social implications those environments are not ideal for quiet and clear communication. At the same time telecommunication infrastructure itself is evolving-more voice traffic and is moving from traditional circuit switched networks to packet switched networks. Although technology has made many new communication options, in many ways it must improve to meet the new demands. As a telephone user, we are accustomed to enjoy excellent speech quality when using the traditional telephone network. On the other hand, we ve have to accept a far lower speech quality when using some of the newer technology like cell phones. VQE is the umbrella that encompasses the technology intended to improve speech quality in both the newer and traditional telecommunications systems. The main impairments are noise and acoustic echo. Ideally, you want to listen to a speaker without any background noise condition that seems unlikely in an age of rising use of cell phones in noisy environments. In an office, noise comes from office equipment, forced air heating, air conditioning and other conversations. At home, the main cause are appliances, running water, audio/video gear, and A/C. Acoustic echo is caused by mismatched between speaker and microphone in a telephone handset, conference phone, or hands-free phone. Providing an embedded kit which supresses background noise and acoustic echo will be a great solution. This is to enhance voice quality by cancelling background noise and acoustic echo using DSP processors. The session is organized as follows. Session II describes the overall implementation of the method. Session III describes the implementation using DSP processor followed by results and discussions. II. IMPLEMENTATION APPROACH To enhance speech quality it is required to process the speech before sending for transmission as well as in reception. In this process the following major problems need to be controlled A. Problems 1) Back ground : The back ground noise is widely in sense, but in closed room environments it is very confined. In call centers and many office environments the soft phone speech suffers from AC and fan noise (major and common back ground noises). These noises may not be strong in their energy as compared to speech. is near to speaker so its energy is strong. In general back ground noise is very predominant when speech is not present. As quality of service concerned it is very much required to eliminate this back ground noise. For this hardware modification like DSP based head phones can be a good solution. It should be done on speech before sending to CODEC. 2) Acoustic Echo: The acoustic echo is created by the loudspeaker in a phone. The sound comes out of it, bounces the walls, ceiling and other objects in the room, reflects and comes back to the phone s microphone. The same thing is possible not only in the buildings, but also in cars, everywhere, where the sound from the loudspeaker can be reflected to the microphone, and this includes the phone s case as the sound usually go from the speaker to the microphone inside the hands-free phone. Similarly, if there s bad acoustic decoupling between the microphone and earpiece in the handset, the acoustic echo will exist in the handset, no matter whether it s a regular or cellular phone [2]. The implementation approach is explained in the following Fig. 1. with backgrou nd noise Near end speech D/A converter A/D converter Aucostic Echo cancellation Adapti ve BGNR block Tx A/D Converter D/A converter Fig. 1. Block diagram of implementation approach Far end speech 978-1-4673-9939-5/16/$31. 216 IEEE 4946
B. Solution Approach As DSP environment is more effective and hardware specific it is feasible to implement cancellation of BGN and AEC techniques using DSP BF-533 EZ kit. The BGN and AEC cancellation systems proposed in this paper [2] are designed and implemented on ADSP BF 533-EZ kit [5][7]. Firstly the speech is converted into digital format by using A/D converter. This digital information is passed to DSP processor for processing to attain good speech quality where our algorithms run in good manner. The BGN and AEC cancellation system described in this paper is designed for use within video and teleconferencing systems in moderate size corporate conference rooms. Accordingly, an 8kHz sampling rate was selected in order to comply with contemporary telephony standards. With a sampling rate of 8kHz, it was determined that the adaptive filter must have at least 4 filter coefficients to effectively cancel the direct field Echo. 1) ABGNR ( Back Ground Reduction) block: For background noise cancellation we consider noise cancellation Method [4] which is one of the best methods of background noise cancellation provided a dedicated processor is available to carry out multiple calculations as its algorithm part. In this method background noise during speech can be eliminated effectively. For this purpose a normal FIR filter is taken with M weights [2], [3], [4]. These filter weights are updated till the error becomes zero which can be treated as convergence. cancellation using adaptive filter is shown as below in Fig. 2. In this method one noise reference signal is also taken for learning to adaptive filter. First FIR filter is defined with M tapings which will take back ground noise reference signal as learning. The adaptive filter starts learning and generates noise estimated signals. Estimated filter output is subtracted from desired speech (speech with back ground noise) which results signal called error signal e[n]. This e[n] signal is given as input to adaptive algorithm as shown in Fig. 2. algorithm will generate fresh filter weights which are the function of e[n]. These adaptively updated weights will generate a good noise estimated signal after some iteration. In essence, an FIR filter is no more than a tapped delay line. As a result, it is possible for the target DSP to implement an NLMS adaptive filter [1],[6],[9] by generating a large array to store previous values of the input signal in the delay line (buffering). Instead of requiring the target DSP to shift, or copy, every value stored in the delay line each time a new value is inserted, two pointers are used to track the position of data in the delay line. For every processing iteration the oldest member of the array is replaced by the current value of the input signal and the pointers are updated to coincide with the latest configuration of the array [4], [8]. When the pointer reaches the end of the large array, all the numbers in the filter delay line are shifted to the start of the array. Estimated background noise which is close to real time noise, is subtracted from original signal. So that resulted signal is background noise free signal. 2) AAEC ( Acoustic Echo Cancellation) block: AAEC is realized with an adaptive or learning filter [4] which learns from given microphone and speaker (Far End ) signals. After learning, the filter can calculate an estimated microphone signal from the loudspeaker signal. This estimated microphone signal is subtracted from the real microphone signal. The difference signal no longer contains the loudspeaker signal thus the feedback loop is broken as shown in Fig. 3. filter will take far end speech signals as input signals for training. The filter weights are updated using adaptive algorithm which is a function of error signal and input far end speech values [9]. After estimating far end speech that will be subtracted from echo affected Near End (NES). Far End NLMS Algorithm e(n) Echo free Fig. 3. Block diagram of AAEC method - Filter y(n) III. IMPLEMENTATION OF ABGNR AND AAEC APPROACHES USING DSP PROCESSOR: The convergence of adaptive filter will depend on the efficiency of algorithm being used. For this purpose a suitable high speed ADSP BF533 (Black Fin) processor is opted shown in Fig. 4. + d(n) + Echo Near End with Back ground + Back ground Reference ce signal Filter - without Back ground Algorithm Fig. 2. Block diagram of ABGNR method Fig. 4. ADSP Black Fin 533 EZ kit 4947
3 32 While the implementation of the adaptive filter is quite simple, the realization of real-time input and output (I/O) on the target DSP is significantly more complex. The ADSP BF533 DSP supports one audio line-in channel and one audio line-out channel. In general, the target DSP accesses these channels through the on-board multichannel buffered serial port. The serial ports of the DSP are connected to the audio line-in and audio line-out channels through the target A/D and D/A converters, respectively. The I/O routine initializes two buffers for input data within the McBSP. However, the McBSP is instructed to fill only one buffer with input data at a time. When one input buffer is filled, the McBSP issues a software interrupt. This interrupt triggers an interrupt handler to read a word from the McBSP register. Coincident with the software handling of the interrupt, the second input buffer in the McBSP is instructed to begin collecting input data. The DSP BIOS and pipe library utilize a similar routine to write output data to the McBSP output registers. In effect, the high-level pipe library utilizes direct memory access (DMA) functionality and interrupt handlers to form an interface with the low level I/O hardware. One difficulty encountered with implementing the echo cancellation system on the target DSP is that the EZ KIT only supports one input channel and one output channel. However, because the I/O channels are 32-bit stereo channels, it is possible to split the input channel into two 16-bit mono channels and split the output channel into two 16-bit output channels. During I/O, the pipe library returns an array of 32-bit words. The low 16-bits of each word constitute one mono channel while the high 16-bits constitute the second channel. Therefore, each input buffer is processed to separate the two input channels and store them in floating point arrays to be processed. After processing is complete, the outputs are recombined into one 32-bit word, stored in an array, and eventually sent to the output pipe. 1) Implementation algorithm for ABGNR: The implementation algorithm flow diagram is shown in Fig.5. reference Analog to digital samples from the target device to a host computer and vice versa without interfering with the target application. Data transferred from the target device can then be analyzed on the host computer using any component object module (COM) client 2) Implementation algorithm for AAEC: Implementation algorithm flow diagram for AAEC is shown in Fig. 6. Far End Echo cancelled speech Fig. 6 Flow Diagram of Acoustic Echo Cancellation using ADSP BF533 ABGNR and AAEC approaches are carried out on ADSP BF533 processor after satisfactory results in simulation using MATLAB. A. FINDINGS Analog to digital Digital to analog + echo speech samples Updating of filter-weights of adaptive filter with NLMS algorithm Double talk detector Subtract estimated noise from speech plus echo speech samples Analog to digital IV. RESULTS AND DISCUSSIONS Noisy speech Analog to digital Buffering of speech samples Updating of filter-weights of adaptive filter with NLMS algorithm Subtract estimated noise from speech The algorithm for back ground noise reduction presented is evaluated under presence of various types of background noises. The results obtained are as follows: 1) Presence of Humming sounds from FAN: The humming sounds produced by FAN serve as the source of background noise during conversation over soft phones. Digital to analog free speech Fig. 5. Flow Diagram of Back Ground Cancellation using ADSP BF533 For testing and demonstration purposes, it is necessary to interface the ADSP BF533 with the host computer. Analog device supplies the Real-Time Data exchange (RTDX) utility as shown in fig 4 and 5 to facilitate communication between the host computer and the target board. RTDX provides developers with the ability to transfer data Fig. 7. with FAN in Background 4948
1 11 21 31 41 51 61 71 81 91 11 111 121 131 141 151 161 171 181 191 21 AMPLITUDE 29 29 29 292 29 3 32 the background noise reduction algorithm is presented in the fig 11, with the elaborated view of speech samples showing the effectiveness of the algorithm in reducing background noise. Also fig 12 shows the comparison between the amplitudes of speech samples with noise and after noise reduction. The overall reduction in noise is quantified in the Table 2. Fig. 8. with FAN in Background Fig. 7. shows the speech samples affected by noise generated by FAN in background. The figure also shows the extended view of samples affected with noise. The results obtained after processing the speech samples through the background noise reduction algorithm is presented in the Fig. 8 with the elaborated view of speech samples showing the effectiveness of the algorithm in reducing background noise. Also Fig 9 shows the comparison between the amplitudes of speech samples with noise and after noise reduction. The overall reduction in noise is quantified in the Table 1 Figure 1: with talking people in Background.15. SAMPLE AMPLITUDE.1.5 -.5 -.1 1 2 39 58 77 96 SAMPLES 115 134 153 172 191 Sample w ith Fan Sample w ithout Fan Fig. 9 Amplitude Comparison of samples for Fan Table 1 Reduction in Level from FAN in Background Average Average Reduction in Signal Level Signal Level Level in with in without in -59.78-66.66 6.88 reduction of level by 6.88. 2) Presence of Talking People in room : The interfering voices generated by several people talking among each other is one of the most common source of background noise and requires the algorithm to be evaluated in these conditions. The fig 1 shows the speech samples affected by noise generated by people talking in background. The figure also shows the extended view of samples affected with noise. The results obtained after processing the speech samples through Fig.11 with talking people in Background Eliminated.8.6.4.2 -.2 -.4 -.6 -.8 Fig.12 SAMPLES samples with Talking People without Talking People Amplitude Comparison of samples for Talking People Table 2 Reduction in Level from Talking People in Background Average Average Reduction in Signal Level Signal Level with Level without in in in -77.26-98.78 21.52 reduction of level by 21.52 4949
58 582 292 292 9 92 9 92 3) Presence of Air Conditioner : The sounds produced by Air conditioner are another source of background noise during conversation over soft phones. AMPLITUDE.8.6.4.2 -.2 -.4 -.6 -.8 1 1 19 28 37 46 55 64 73 82 91 1 19 118 SAMPLES 127 136 145 154 163 172 181 19 199 w ith Air Conditioner w ithout Air Conditioner Fig.15. Amplitude Comparison of samples for Air Conditioner 4) Acoustic echo: Fig.13. with Air Conditioner in Background Next the algorithm for echo cancellation is evaluated by processing the speech samples accompanied with echo. The result obtained for one speech sample is as follows in fig16. The figure also shows the extended view of samples affected with noise. The results obtained after processing the speech samples through the echo reduction algorithm is presented in the Fig.17 with the elaborated view of speech samples showing the effectiveness of the algorithm in reducing echo. fig 18 shows the comparison between the amplitudes of speech samples with associated echo and after echo reduction. The overall reduction in echo is quantified in the Table 4. Fig. 14. with Air Conditioner in Background Eliminated Fig. 13 shows the speech samples affected by noise generated by people talking in background. The figure also shows the extended view of samples affected with noise. The results obtained after processing the speech samples through the background noise reduction algorithm is presented in fig 14 with the elaborated view of speech samples showing the effectiveness of the algorithm in reducing background noise.fig 15 shows the comparison between the amplitudes of speech samples with noise and after noise reduction. The overall reduction in noise is quantified in the Table 3. Fig.16. with Echo Table 3 Reduction in Level from Air Conditioner in Background Average Signal Level with in Average Signal Level without in -47.82-47.93.11 Reduction in Level in reduction of level by.11 Fig.17. with Echo Eliminated. 495
AMPLITUDE 1.2.1 -.1 -.2 -.3 -.4 -.5 11 21 31 41 51 61 71 81 91 11 SAMPLES 111 121 131 141 151 161 171 181 191 21 background noise that present during speech eliminated effectively up to 21.52 due to people noise and 6.88db fan noise. Further development of the present approaches can give good results for changing back ground noises and noise dominated back ground areas like Avionics cockpit. In voice recognition methods, present approach for back ground noise reduction definitely a great solution. samples w ith Echo samples w ithout Echo Fig.18. Amplitude Comparison of samples for Echo Cancellation Average Signal Level with Echo in Table 4 Reduction in Echo Level Average Signal Level without Echo in Reduction in Echo Level in -49.18-59.76 1.57 reduction of level by 1.57. V. CONCLUSIONS Implementation of Back ground noise reduction and Acoustic noise cancellation for VoIP applications is explained in this paper. Out of several methods, noise cancellation using NLMS algorithm is found to be suitable as it involves efficiency in eliminating different kinds of background noises, provided a dedicated DSP processor is available to carry out multiple calculations as its algorithm part. We achieved the reduction in acoustic level by 1.57 and References [1] Adil Beyassine, Eyal Sholomot, Huan-Yu Su, Dominique Massaloux, Claude Lamblin, Jean-Pierre Petit ITU-T Recommendation G.729 Annex B: A Silence Compression Scheme for use with G.729 Optimized for V.7 Digital Simulationeous Voice and Data Applications, IEEE Communications Magazine, vol. 35, no.9, September 1997 pp.64-73 [2] L.R.Rabiner & R.W. Sachfer, Digital processing of Signals, Pearson Education publication, 1993 [3] http://www.readytechnology.co.uk/open/ipp-codecsg729-g723.1/ [4] Symon Haykins, filter theory, 4th Edition, Prentice Hall India Publications, 25. [5] www.spiritdsp.com/papers/ [6] Rabiner et al, Applications of Nonlinear Smoothing Algorithm to Processing", IEEE Transactions on Acoustics,, & Signal Processing, vol. ASSP-23, No. 6, Dec. 1975, pp. 552-557. [7] http://www.analog.com [8] http://www.mathworks.com [9] [OpenH323] Acoustic echo cancellation NLMS-pw algorithm 4951