LPC ANALYSIS AND SYNTHESIS

Size: px
Start display at page:

Download "LPC ANALYSIS AND SYNTHESIS"

Transcription

1 33 Chapter 3 LPC ANALYSIS AND SYNTHESIS 3.1 INTRODUCTION Analysis of speech signals is made to obtain the spectral information of the speech signal. Analysis of speech signal is employed in variety of systems like voice recognition system and digital speech coding system. Accepted methods of analyzing the speech signals make use of linear predictive coding (LPC). Linear prediction is a good tool for the analysis of speech signals. In linear prediction the human vocal tract is modeled as an infinite impulse response system for producing the speech signal. Voiced regions of speech have a resonant structure and high degree of similarity for time shifts that are multiples of the pitch periods, for this type of speech LPC modeling produces an efficient representation. In LPC the current sample of a speech signal is estimated by the linear combination of a series of weighted past samples of the speech signal. The series of weights or coefficients represent the LPC coefficients which are used as filter coefficients in encoding and decoding process during coding. Present days in many voice recognition systems and speech coding systems, LPC analysis techniques are used to generate the required spectral information of the speech signal. Voice recognition systems use LPC techniques to produce observation vectors (LPC coefficients). In a voice recognition system these observation vectors are used to recognize the uttered utterances. Voice recognition systems have applications in

2 34 various industries like telephone industry and consumer electronics. For example voice recognition is used in mobile telephony to have hands free dialing or voice dialing. LPC analysis is usually conducted at the transmitting end for each frame of the speech signal to find information like voiced and unvoiced decisions of a frame, pitch of a frame and the parameters needed to build up a filter for the current frame. This information regarding the frame has to be transmitted to the receiving end. Then the receiver performs LPC synthesis using the information received at the receiving end. In LPC analysis the input speech signal with 8000 samples per second is divided into frames containing 160 samples i.e., each frame represents 20msec of the input speech signal. The reason for framing is that speech is a non-stationary signal where its properties changes with time. This makes the use of Discrete Fourier Transform (DFT) or Autocorrelation techniques impossible. But for most phonemes the properties of the speech signal remains invariant for a short period of time (5-100msec) and hence traditional signal processing methods is applied successfully. Most of the speech processing is done in this manner. This short period of the signal is called as the frame and each frame length taken is 20msec. Due to framing, the dependency between the samples gets lost. To avoid this loss in dependency between the samples, the adjacent frames are overlapped and the overlap percent is taken as 50% on both sides. In turn overlapping results in signal discontinuities at the beginning and at the end of

3 35 each frame. To reduce these discontinuities each frame is multiplied using a window [25-30]. 3.2 WINDOWING Window is a region which has a zero value everywhere except for the region of interest. The function of windowing is to smooth the estimated power spectrum and to avoid abrupt transitions in the frequency response between adjacent frames. Windowing a speech signal involves multiplication of the speech signal on a frame by frame basis using a window of length equal to the frame length of the speech signal. The effect of multiplying a frame with a window of finitelength is equal to convolving the power spectrum of a frame with the frequency response of the window. This causes the side-lobes in the frequency response of the window to have an averaging effect on the power spectrum of the frame. The windows commonly used are the Rectangular window, Hamming window, Hanning window or Blackman window. The most widely used window in speech analysis is the Hamming or Hanning window. Window like rectangular window has high frequency resolution because of its narrowest main lobe, but has largest frequency leakage and is not widely used. This high frequency leakage is due to larger side lobes and this makes the speech signal noisier. This high frequency leakage tends to offset the benefits of high frequency resolution. Hence the rectangular window is not widely used in speech analysis. Windows like Hamming, Hanning and Blackman windows have smallest

4 36 frequency resolution and less frequency leakage, so they are widely used in speech analysis. These windows are smoother at the ends and are closer to one at the middle. The smoother ends and broader middle section produces less distortion in the signal. In this thesis Hamming window of 160 samples equal to the frame length is used. Window length is also another important parameter that affects smoothening. If the window length is too large, it may give better frequency resolution, but the spectral properties of the speech signal changes over large durations. So the frame size must be shorter over which the speech signal is considered as stationary and so the window duration needs to be shorter. Making the window short has some disadvantages, which are given by [22, 30-31]: The frame rate increases. This means more information is being processed than necessary there by increasing the computational complexity. The spectral estimates become less reliable because of the stochastic nature of the speech signal. Typically the pitch frequency relies between 80 and 500 Hz. This means that a typical pitch pulse occurs for every 2 to 12 msec. If the window size is small compared to the pitch period then the pitch pulse sometimes present and sometimes will not be present.

5 CHOOSING THE ORDER OF THE FILTER Linear predictive coding is a time domain technique that models the speech signal as a linear combination of the weighted delayed past speech sample values. LPC order is an important parameter used in linear prediction, which will affect the quality of synthesized speech signal as the order determines how many number of weighted past samples are to be used to determine the current speech sample. In this thesis an LPC order of 10 is chosen, which means the past 10 speech sample values are used to estimate the current speech sample. As the LPC of order 10 is used, the LPC model is called as the 10 th order LPC model. In linear prediction the number of prediction coefficients required to suitably model the speech signal depends on the spectral content of the source. For each formant or peak in the spectrum two poles are required to represent, where one pole requires one linear predictive coefficient to represent. Generally in human speech one peak or formant is observed for a fundamental frequency of 1000 Hz, so the best LPC order depends on the bandwidth of the sampled speech signal. In narrowband speech coding the speech signal information is band limited to around 4 KHz using a low pass filter and hence there are four formants in its spectrum, to model these four formants eight complex poles are required and so the filter order must be at least eight but in practice two poles are taken additionally to minimize the residual energy. So a total of ten poles are required to represent the four formants in a narrowband speech signal. So the LPC order is chosen as 10. For an LPC order 10, the

6 38 number of LPC coefficients is 11 and the first term is always assumed to be 1 in the 10 th order polynomial which is a very important assumption in LPC analysis [31-32]. 3.4 LINEAR PREDICTIVE MODELING OF SPEECH SIGNALS Linear prediction analysis is the most powerful speech analysis method. In it the short term correlation that exists between the samples of a speech signal (formants) is modeled and removed using a short order filter Source Filter Model of Speech Production The source filter model of speech production is used as a means for the analysis of speech signals. The block diagram of a source filter model [31] is shown in Fig 3.1. Pitch Period Impulse Train Generator Voiced/ Unvoiced Switch x(n) r(n) LPC Coefficients Time Varying Filter Output Speech G Random Noise Generator Fig 3.1 Source filter model of speech production The excitation signal used in this model is modeled as a train of impulses for voiced segments of speech and as random noise for unvoiced segments of speech.

7 39 The combined spectral contributions of the glottal flow, the vocal tract and the radiation at the lips is represented by a time varying filter with a steady state system function given by Hz Sz Xz M G 1 bj z j1 N 1 a z i1 i i j (3.1) Equation (3.1) represents the transfer function of the filter consisting of both poles and zeros, Sz is the Z-transform of the vocal tract output and Xz is the Z-transform of the vocal tract input. If the order of the denominator is high, H(Z) is approximated by an all pole model given by Hz G p 1 a z j1 where p is the order of the filter and j j G Az (3.2) p j j (3.3) j1 A z 1 a z when equation (3.2) is transformed into sampled time domain it is given by s n G x n a sn j p j (3.4) j1 Equation (3.4) represents the LPC difference equation. It states that the value of the present speech sample s n is obtained by summing the present input G xn and a weighted sum of the past speech

8 40 sample values. If represents the approximate of j a j then the error signal is the difference between the input and encoded speech signals and is given by equation (3.5) p j (3.5) j1 e n s n s n j The estimates are now determined by minimizing the mean squared error given by equation (3.6) p 2 j E e n E s n s n j The partial derivative of equation (3.6) with respect to zero for j=1,, p is given by That is, 2 (3.6) j1 j when set to p j for i =1,, p (3.7) j1 E s n s n j s n i 0 en is orthogonal to sn i for i = 1,.., p Equation (3.7) is arranged as p j n i, j n i,0 (3.8) j1 where n i, j E sn i sn j (3.9)

9 Solution to LPC Analysis The speech signal is a time varying signal and varies slowly with time. To model the time varying nature of the speech signal the analysis has been restricted to short segments of speech signal called frames. This is obtained by replacing the expectations of equation (3.8) by summation over finite limits given by equation (3.10) n i, j E sn i sn j s n m i s n m j for i = 1,., p, j = 0,.., p (3.10) m The solution to equation (3.10) is obtained using two methods namely the autocorrelation method and the covariance method Determination of LPC Coefficients In this thesis the LPC coefficients are determined using the autocorrelation method [31, 33] Autocorrelation Method In this method the speech signal is considered stationary over a short period of time and is assumed to be zero outside the interval 0 m N 1, where N is the length of the sample sequence. With this limit equation (3.10) is expressed as N p 1 i, j s m i s m j 1 i p, 0 j p (3.11) n n n m 0 Equation (3.11) can also be expressed as

10 42 N -1 - i j n i, j s n m s n m + i j 1 i p, 0 j p (3.12) m 0 From equation (3.12) it is observed that n i, j is similar to the short-time autocorrelation function and so equation (3.12) is reduced to the short-time autocorrelation function given by i, j R i j n n for i = 1,., p and j = 0,., p (3.13) where N -1 - j R j S m S m + j (3.14) n n n m 0 Using autocorrelation method equation (3.8) is expressed as p j R n i j R n i 1 i p (3.15) j 1 or in matrix form it is represented as R n 0 R n 1. R n p 1 1 R n 1 R n 1.. R n p 2 2 R n R n p 1.. R n 0 p R n p The above matrix is a symmetric matrix and all the elements along a diagonal are equal i.e., the matrix is a Toeplitz matrix. Equation (3.15) is solved by taking the inversion of p x p matrix but this result in computational errors. So the solution to equation (3.15) is to exploit the Toeplitz characteristics and to use efficient recursive procedures. The most widely used recursive procedure is the Durbin s recursive algorithm which is as follows

11 43 0 (3.16) E R 0 n n K i 1 i 1 R n i j R n i j j 1 E i i1 n 1 i p (3.17) i K (3.18) i i i i -1 i -1 K 1 j i -1 (3.19) j j i i j i i -1 E 1 K E (3.20) 2 n i n After solving equations (3.17) to (3.20) recursively for i = 1, 2,., p the prediction parameters where p is the prediction order. j is obtained and are given by j j p (3.21) 3.5 VOICED AND UNVOICED DETERMINATION According to the standards of LPC-10 before making voiced and unvoiced decisions of a frame it is necessary to pass each frame through a low pass filter of 1 KHz bandwidth to avoid the crisis of aliasing. Voiced and unvoiced decisions of a frame are important because of the difference that lies in the waveforms of voiced and unvoiced speech. The difference in the two waveforms creates a need for the use of two different excitation signals as inputs for the LPC filter during synthesis or decoding, one excitation signal for voiced speech and the other for unvoiced speech.

12 Amplitude(dB) 44 Voiced speech has distinct resonant or formant frequencies. The voiced, unvoiced and silence portions for an utterance (telephone banking) is shown in Fig 3.2. The voiced portion of the utterance has the characteristics of large amplitude and low frequencies, while unvoiced portions of the utterance have smaller amplitudes (less energy) and higher frequencies than voiced speech which is observed from Fig 3.2. In order to make a decision whether a frame is voiced or unvoiced one has to look at the energy of a frame and the number of zero-crossings encountered in that frame. Zero-crossings rate is an important consideration for deciding whether a frame is voiced or not. Voiced speech is produced due to the excitation of the vocal tract by a periodic flow of air through the vocal cords, hence voiced speech has less zero-crossings rate. Whereas unvoiced speech is produced by the turbulent flow of air through the vocal cords resulting in high zerocrossings rate UTTERENCE FOR THE WORD TELEPHONE BANKING VOICED SPEECH UNVOICED SPEECH SILENCE Time(msec) x 10 4 Fig 3.2 Voiced, Unvoiced and Silence representations for an utterance Telephone Banking

13 45 For voiced speech most of the energy is concentrated at low frequencies and for unvoiced speech most of the energy will be present at higher frequencies. High frequencies mean high zero-crossings rate and low frequencies mean low zero-crossings rate, so a strong relationship exists between zero-crossings rate and energy distribution with frequency. A reasonable generalization is that if zero-crossings rate is high and if energy is low the speech frame is considered as unvoiced, if zero- crossings rate is low and if energy is high the speech frame is considered as voiced. The assessment of a frame as voiced and unvoiced is shown in Fig 3.3 [34-36]. Hamming window Short-time Energy Calculation (E) Speech Signal Frame By Frame Signal Processing Short-time average zero-crossing rate Calculation (ZCR) If ZCR is small and if E is high No Voiced Frame Yes Unvoiced frame Fig 3.3 Voiced and Unvoiced decision of a frame The Ideal world categorization of speech signal into voiced, unvoiced and silence is shown in Table 3.1

14 46 Table 3.1 Ideal world categorization scheme Short-Time energy Zero-Crossings Label High Approx. 12 Voiced Low Approx. 5. Unvoiced 0 0 Silence In practice the categorization of sounds into voiced, unvoiced and silence is shown in Table 3.2. Table 3.2 Real world categorization scheme Short-Time energy Zero-Crossings Label Approx. 0 Approx. 0 Silence Low High Unvoiced High Low Voiced High Approx. 0 Voiced High High Voiced Low Low Voiced Low Approx. 0 Unvoiced Approx. 0 High Silence

15 47 In real time the speech signals are not free of noise they contain some amount of background noise. Apart from the background noise it is not easy to detect silent portions of the speech signal due to the fact that the short-time energy of a breath can easily be confused with the short-time energy of a fricative sound [37]. 3.6 PITCH DETECTION Introduction The process of estimating the pitch period or fundamental frequency of a periodic signal like the speech signal is referred to as pitch detection. During pitch period estimation, voiced speech is considered as being produced by passing quasi-periodic pulses of a signal through LPC filter. The interval between the pulses in the excitation signal is called as the pitch period represented by T 0. The estimation of the pitch period greatly influences the quality of the reconstructed speech signal as incorrect estimation of the pitch period greatly degrades the quality of the reconstructed speech signal. Pitch detection algorithms are classified into two types they are Frequency-domain based and Time-domain based algorithms. Frequency domain based algorithms estimate the pitch period directly using windowed segments of the speech signal by converting it from time domain to frequency domain. The conversion from time domain to frequency domain is done by applying the Fast Fourier Transform (FFT). Methods of this type are Cepstrum method, Maximum Likelihood method and Harmonic Product Spectrum method. In time-

16 48 domain methods the pitch period is estimated by finding the Glottal Closure Instant (GCI) and by measuring the time period between each event. Time domain methods include Average Magnitude Difference Function (AMDF) method, Average Squared Mean Difference Function (ASMDF) method and Autocorrelation method. Traditionally autocorrelation based methods are widely used in various speech coders. The vibration of vocal cords produces voiced speech, the rate of vibration of the vocal cords gives the pitch period of voiced speech. During the production of unvoiced speech the vocal cords do not vibrate, they remain open and do not contain any information regarding pitch. The estimation of the pitch period, voiced and unvoiced decisions of a frame greatly influence the quality of the reconstructed speech signal. If a voiced frame is classified as unvoiced the reconstructed speech signal is less intelligible and sounds roughly. On the other hand if an unvoiced frame is classified as voiced the reconstructed speech signal will sound annoyingly metallic or robotic [38-45] Pitch Detection Algorithm The excitation mechanism used in the source filter model depends greatly on the precise estimation of the pitch parameters, as incorrect pitch estimation reduces the quality of the reconstructed speech signal and its intelligibility by introducing artifacts into it. Intelligibility conveys whether the speech signal is clearly understood

17 49 or not. Therefore, the pitch estimation algorithm chosen greatly influences the quality of the reconstructed speech signal. Pitch period is the interval between two voiced excitations, varies from one cycle to the other cycle, evolves slowly and can be estimated. Estimating the pitch period is easy for highly periodic speech signals but some segments of speech do not exhibit this periodicity. Some speech segments contain information concerned to both voiced and unvoiced, the estimation of pitch period becomes inaccurate for such segments of speech. Presence of formants also creates problems in estimation of the pitch period as with formants the speech becomes highly resonant and this makes the pitch estimation inaccurate. Large amounts of background noise in the speech signal also make the pitch estimation inaccurate. In this thesis the estimation of the pitch period is done using the time-domain based autocorrelation method Autocorrelation Method of Pitch Detection The autocorrelation method is a method frequently used for the estimation of pitch period. The autocorrelation measures how well the input signal matches with a time-shifted version of itself. The maxima of the autocorrelation function occur at intervals of the pitch period. The autocorrelation method involves a large amount of computation i.e., multiplications and additions, but it is easy to implement in real time digital signal processing systems due to the regular form of computation. Another advantage of autocorrelation pitch determination algorithm is that it is in sensitive to phase. Hence it

18 50 performs well in estimating the pitch of a speech signal but suffers from some degree of phase distortion. One of the major limitations of the autocorrelation function is that it retains too much information in the speech signal [22, 31, 44-45]. Direct distance measurement is the most popular way to measure the similarity between two signals, which is expressed as N 1 1 E s n s n N n 0 2 (3.22) Where sn represent the speech samples from n 0 to N 1. Equation (3.22) assumes that the average signal level is fixed but this is not true at signal onsets and offsets. So the distance measure which takes nonstationary effects of the speech signal is expressed as N 1 1 E s n s n N (3.23) n 0 Where is the scaling factor or pitch gain which controls changes in the signal level. When the speech signal is assumed as stationary the error produced by equation (3.22) is written as where E R 0 R( ) (3.24) N 1 (3.25) R( ) s n s n n 0 2 The minimization of the error E in equation (3.22) is equivalent to maximizing the autocorrelation (or cross-correlation) R( ), where denotes the lag or delay and is equal to the value of pitch period. The

19 51 pitch gain is obtained by setting E, 0 in equation (3.23) and is N 1 n 0 N1 n 0 s n s n s 2 n (3.26) By substituting the pitch gain into the error function of equation (3.23), the pitch is estimated by minimizing it and is given by equation (3.27) 2 E, s n N1 N 1 n 0 N1 n 0 2 n 0 s n s n s n 2 (3.27) This is equivalent to minimizing the second term on the right hand side N1 2 n 0 n N1 R ( ) n 0 s n s n s 2 n 2 (3.28) Direct use of equation (3.28) may result in errors because the square of the autocorrelation function may result in maxima even though the correlation is negative which results in ineffective pitch estimation. In order to overcome this problem the square root of equation (3.28) is taken. As a result the square of the autocorrelation is removed and can remove the possibility of lags with negative correlation from being

20 52 selected as a pitch. The final normalized autocorrelation function is then given by equation (3.29) R ( ) n N 1 n 0 N 1 n 0 s n s n s 2 n (3.29) Autocorrelation of Center-Clipped Speech Speech is not a purely periodic signal and vocal tract resonances produce additional maxima in the autocorrelation. Pitch period estimation in the speech signal using autocorrelation method directly results in multiple maxima, hence it is difficult to determine the maxima corresponding to the right pitch period. To suppress this local maximum a method called center-clipping is used. The center-clipped speech is obtained by the linear transformation [46] given by Y n C s n (3.30) where C is the center-clipping function and is shown in Fig 3.4. Cx C L C L Threshold Level Fig 3.4 Center-Clipping function

21 frequency 53 For samples with amplitude above CL the output of the center-clipper is equal to the input minus the clipping level. For samples with amplitude below the clipping level the output of the center-clipper is zero. Fig 3.5 shows the peaks in a segment of speech when the autocorrelation method is used for extracting the pitch before and after applying the center-clipping function. The duration of the speech segment taken is 960msec. The blue waveform represents the peaks in the autocorrelation when center-clipping is not used and the green waveform represents the peaks in the autocorrelation when centerclipping is used. 300 Pitch of a Speech Signal Before and After Center Clipping Pitch Before CenterClip Pitch After CenterClip time(ms) Fig 3.5 Peaks in the autocorrelation of a speech signal before and after center-clipping

22 54 Fig 3.5 shows that when center-clipping is used the autocorrelation method produces peaks of the pitch period with more eminence by reducing the peaks due to local maxima. It is observed that the peaks in the autocorrelation of the center-clipped speech are much more distinguishable than in the autocorrelation of the original speech without center-clipping. So the use of center- clipping operation enhances the quality of the speech signal. But in some cases when the speech signal contains noise or if it is mildly periodic, the center-clipping operation removes beneficial information in the speech signal there by the quality of the speech signal gets degraded. For speech signals with rapidly changing energy setting an appropriate clipping level is difficult. Here the clipping level is taken as half the maximum amplitude level of the speech signal. 3.7 LPC SYNTHESIS Introduction Speech is primary a means of communication between humans as it conveys almost an infinite range of thoughts and concepts. Continuous speech is a set of complicated audio signals which makes them producing artificially difficult. Speech synthesis is a method used to produce the speech artificially. The quality of speech signal produced by a speech synthesis system is determined using two characteristics. They are intelligibility and naturalness. Intelligibility conveys whether the output of a speech synthesizer is easily understood or not. Naturalness conveys whether the output of a

23 55 speech synthesizer sounds like the speech of a real person or not. An ideal speech synthesizer is both intelligible and natural and every synthesis technique tries to maximize both of these characteristics. In modern systems much attention has been devoted on speech synthesis to produce high quality naturally sounding speech. In present days the production of naturally sounding speech is of earlier concern rather than producing an intelligible speech. Some of the speech synthesis systems are good at naturalness and some of them are good at intelligibility and the goal of synthesis will determine which system is to be used. There are three main methods used for generating the speech waveforms synthetically. They are concatenative synthesis, formant synthesis and articulatory synthesis [32, 47-49]. In this thesis, formant synthesis technique is used to generate the speech artificially. The three methods are briefly explained below: Concatenative Synthesis: This method is based on the concatenation of segments of the recorded speech and this method generates the most natural speech. But programmed methods for segmenting the speech waveforms and natural variations in speech occasionally produces audible glitches in the speech output and as a result the synthesized speech detracts from naturalness. Formant Synthesis: Formant synthesis produces synthesized speech using an acoustic model without using any human speech samples. In this method parameters like voiced and unvoiced information, pitch and noise levels are varied frame wise to produce the speech artificially. Many systems based on formant synthesis generate

24 56 artificial, robotic-sounding speech as utmost naturalness is not always the target of a speech synthesis system. Systems based on formant synthesis have some advantages over systems using concatenative synthesis. They are: Systems using formant synthesis produces speech that is very much intelligible without any audible glitches. But the problem of audible glitches is more common in concatenative systems. Formant synthesizers do not use any database of speech samples hence the programs used in formant synthesis are often smaller programs than the programs used in concatenative systems as they use a database of speech samples. Articulatory Synthesis: Articulatory synthesis is one of the widely used synthesis technique in recent days. It is based on the articulation process occurring in the vocal tract and the computational models of the human vocal tract. Few of these models are computationally efficient and advanced to use in commercial speech synthesis systems Linear Predictive Coding Synthesis Linear predictive coding (LPC) synthesis is an efficient synthesis technique used for the generation of speech artificially, where in the vocal tract parameters are represented by a set of LPC coefficients. The LPC techniques are based on the frequency domain representation of the speech signal. In LPC a time-domain speech

25 57 signal is transformed into a frequency-domain for extracting the parameters of the speech signal using a suitable model. LPC synthesis technique has several advantages over other speech synthesis techniques. They are: LPC techniques require lower data rates as a result the storage capacity increases and is well suited for transmitting signals in narrowband. So LPC methods are used in telecommunications, teaching aids and consumer products. LPC synthesizer produces high quality speech at bit- rates around 2.4 Kbps. In this thesis, using LPC synthesizer output speech with good quality is produced at bit-rates from 1.2 to 1 Kbps. The structure of linear predictive synthesizer is shown in Fig 3.6 [22]. Pitch Period Impulse Generator V Voiced/Unvoiced Control G LPC Filter White Noise Generator UV u (n) z -1 Speech Signal ŝn 1 z -1 2 z -1 p-1 z -1 p Fig 3.6 Linear Predictive Synthesizer

26 58 The time varying parameters needed by the synthesizer are the pitch period, voiced/unvoiced information, gain and linear predictive coefficients. Speech signal consists of both voiced and unvoiced information. In LPC synthesizer, an impulse generator produces a train of impulses of unit amplitude at the beginning of each pitch period or voiced segment and the impulse train is used as an excitation signal to produce voiced speech. While a random noise generator is used to produce random noise containing uncorrelated, uniformly distributed random samples with unity standard deviation and zero mean. The random noise is used as an excitation signal to produce unvoiced sounds. The selection between voiced/unvoiced sources is made using voiced/unvoiced switch. The gain control G determines the amplitude of the excitation signal. The synthetic speech samples are determined using the equation (3.31) p j (3.31) j 1 ˆ sˆ n s n - j G u n In Fig 3.6 the LPC synthesis filter is excited by an impulse train or random noise based on voiced/unvoiced decisions. The interval between each pulse of an impulse train is equal to the pitch period. The gain G represents the loudness and so it is multiplied with the excitation signal to obtain proper loudness intensity in the excitation signal. The filter network used in Fig 3.6 is a direct form filter which gives a simple and straight forward method for obtaining synthetic speech from the prediction parameters. A total of p multiplications

27 59 and p additions are required to generate a sample of the output, where p is the order of the filter. In this model the synthesis parameters are varied with time and are estimated at regular intervals during voiced portions of speech and these parameters are changed at the start of each period, where for unvoiced speech they are simply changed once per frame. The updating of the parameters at the beginning of each pitch period is called as Pitch Synchronous Synthesis and is found to be an effective synthesis process than Asynchronous Synthesis where the parameters are updated once per frame. The quality of the reconstructed speech signal depends on the accuracy of the extracted parameters. The main advantage of LPC is its simplicity and ease of implementation. Its main drawback is that it requires significant computational precision to synthesize the speech because the filter structure is a direct form recursive structure which tends to be quite sensitive to changes in the coefficients. So the extracted speech parameters must reflect high accuracy. The LPC coefficients shown in Fig 3.6 are the all pole filter coefficients and are used in the modeling of speech signals. In practice it is not possible to model the speech signal accurately using the delayed past sample values, there is a discrepancy or error in the reconstructed speech signal. So the goal of LPC analysis is to find a set of LPC coefficients that minimizes the mean square error 2 e n. When this happens the spectrum of the error signal becomes flat i.e., there is no change in the signal with frequency. There are only two

28 Amplitude (db) 60 types of time signals that can have a flat spectrum. They are impulse train and random noise (i.e., signal generated from random numbers). For this reason in LPC synthesis the filter is excited by an impulse train for voiced sounds and for unvoiced sounds the filter is excited by a random noise. The linear predictive coefficients used in the synthesis filter represent the spectral contribution from the vocal tract, glottal flow and the radiation at the lips. The waveform of a typical input speech signal is shown in Fig Input Speech Signal Time (msec) x 10 4 Fig 3.7 Input speech signal In the waveform of a speech signal the X-axis is calibrated to time measured in milliseconds and the Y-axis is calibrated to amplitude measured in decibels (db). The amplitude of a speech signal is measured on a decibel scale as it is best correlated with the perceived sound loudness.

29 Amplitude (db) 61 Fig 3.8 gives the speech signal reconstructed using speech parameters like pitch, gain and linear predictive coefficients. The total number of frames in the speech signal is 166 with 160 samples per frame. The number of linear predictive coefficients used in reconstruction is 11 per frame as the order taken is 10. Pitch and gain is calculated frame wise and are used accordingly in the reconstruction Reconstructed Speech Signal Using Speech Parameters Time (msec) x 10 4 Fig 3.8 Speech signal reconstructed using speech parameters From Figs 3.7 and 3.8 it is observed that the reconstructed speech signal is not having the same shape as the input speech signal but it sounds artificially same as the input speech signal. This is because parametric estimation of the speech signal does not lead to accurate reconstruction of the speech waveform. The Encoded speech signal called residue is shown in Fig 3.9 and the speech signal reconstructed using the residue is shown in Fig

30 Amplitude(dB) Amplitude(dB) From Fig 3.10 the speech signal reconstructed using the residue is having the same shape as the input speech signal and sounds same as the input speech signal. This is because waveform approximation methods always give perfect reconstruction of the speech signal without any loss in the quality of the speech signal. 0.3 Encoded Speech Signal Time (msec) x 10 4 Fig 3.9 Residue speech signal 0.5 Reconstructed Speech Signal Using Residue Time (msec) x 10 4 Fig 3.10 Speech signal reconstructed using residue

31 63 From Figs 3.8 and 3.10, it can be seen that Fig 3.10 looks same as the input speech signal and there is no loss in quality of the reconstructed speech signal. Where as in Fig 3.8 there is a loss in the quality of the speech signal and it looks different from the input speech signal. So it can be concluded that with parametric methods (speech signal reconstructed using speech parameters) the bit-rate of the speech signal can be reduced greatly but with a loss in the quality of the speech signal. Whereas with waveform approximation methods (speech signal reconstructed using residue) the bit-rate cannot be reduced greatly but the quality of the reconstructed speech signal can be maintained same as the input. So to achieve low bit-rates one has to go for parametric methods. If quality is to be retained one has to use waveform approximation methods.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Linear Predictive Coding

Linear Predictive Coding Linear Predictive Coding Jeremy Bradbury December 5, 2000 0 Outline I. Proposal II. Introduction A. Speech Coding B. Voice Coders C. LPC Overview III. Historical Perspective of Linear Predictive Coding

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Lecture 1-10: Spectrograms

Lecture 1-10: Spectrograms Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed

More information

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically. Sampling Theorem We will show that a band limited signal can be reconstructed exactly from its discrete time samples. Recall: That a time sampled signal is like taking a snap shot or picture of signal

More information

Analysis/resynthesis with the short time Fourier transform

Analysis/resynthesis with the short time Fourier transform Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis

More information

Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables. Generation of random variables (r.v.) Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN2300 Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.

More information

Auto-Tuning Using Fourier Coefficients

Auto-Tuning Using Fourier Coefficients Auto-Tuning Using Fourier Coefficients Math 56 Tom Whalen May 20, 2013 The Fourier transform is an integral part of signal processing of any kind. To be able to analyze an input signal as a superposition

More information

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints

More information

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1 WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's

More information

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic

More information

B3. Short Time Fourier Transform (STFT)

B3. Short Time Fourier Transform (STFT) B3. Short Time Fourier Transform (STFT) Objectives: Understand the concept of a time varying frequency spectrum and the spectrogram Understand the effect of different windows on the spectrogram; Understand

More information

Lecture 1-6: Noise and Filters

Lecture 1-6: Noise and Filters Lecture 1-6: Noise and Filters Overview 1. Periodic and Aperiodic Signals Review: by periodic signals, we mean signals that have a waveform shape that repeats. The time taken for the waveform to repeat

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2720 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

Design of FIR Filters

Design of FIR Filters Design of FIR Filters Elena Punskaya www-sigproc.eng.cam.ac.uk/~op205 Some material adapted from courses by Prof. Simon Godsill, Dr. Arnaud Doucet, Dr. Malcolm Macleod and Prof. Peter Rayner 68 FIR as

More information

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various

More information

Introduction to Digital Audio

Introduction to Digital Audio Introduction to Digital Audio Before the development of high-speed, low-cost digital computers and analog-to-digital conversion circuits, all recording and manipulation of sound was done using analog techniques.

More information

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Short-time FFT, Multi-taper analysis & Filtering in SPM12 Short-time FFT, Multi-taper analysis & Filtering in SPM12 Computational Psychiatry Seminar, FS 2015 Daniel Renz, Translational Neuromodeling Unit, ETHZ & UZH 20.03.2015 Overview Refresher Short-time Fourier

More information

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) Page 1 Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) ECC RECOMMENDATION (06)01 Bandwidth measurements using FFT techniques

More information

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA ABSTRACT Random vibration is becoming increasingly recognized as the most realistic method of simulating the dynamic environment of military

More information

PeakVue Analysis for Antifriction Bearing Fault Detection

PeakVue Analysis for Antifriction Bearing Fault Detection August 2011 PeakVue Analysis for Antifriction Bearing Fault Detection Peak values (PeakVue) are observed over sequential discrete time intervals, captured, and analyzed. The analyses are the (a) peak values

More information

The Calculation of G rms

The Calculation of G rms The Calculation of G rms QualMark Corp. Neill Doertenbach The metric of G rms is typically used to specify and compare the energy in repetitive shock vibration systems. However, the method of arriving

More information

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

Applications of the DFT

Applications of the DFT CHAPTER 9 Applications of the DFT The Discrete Fourier Transform (DFT) is one of the most important tools in Digital Signal Processing. This chapter discusses three common ways it is used. First, the DFT

More information

Agilent Creating Multi-tone Signals With the N7509A Waveform Generation Toolbox. Application Note

Agilent Creating Multi-tone Signals With the N7509A Waveform Generation Toolbox. Application Note Agilent Creating Multi-tone Signals With the N7509A Waveform Generation Toolbox Application Note Introduction Of all the signal engines in the N7509A, the most complex is the multi-tone engine. This application

More information

Figure1. Acoustic feedback in packet based video conferencing system

Figure1. Acoustic feedback in packet based video conferencing system Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents

More information

FFT Algorithms. Chapter 6. Contents 6.1

FFT Algorithms. Chapter 6. Contents 6.1 Chapter 6 FFT Algorithms Contents Efficient computation of the DFT............................................ 6.2 Applications of FFT................................................... 6.6 Computing DFT

More information

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Voice Transmission --Basic Concepts-- Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Amplitude Frequency Phase Voice Digitization in the POTS Traditional

More information

RF Measurements Using a Modular Digitizer

RF Measurements Using a Modular Digitizer RF Measurements Using a Modular Digitizer Modern modular digitizers, like the Spectrum M4i series PCIe digitizers, offer greater bandwidth and higher resolution at any given bandwidth than ever before.

More information

From Concept to Production in Secure Voice Communications

From Concept to Production in Secure Voice Communications From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure

More information

The Fourier Analysis Tool in Microsoft Excel

The Fourier Analysis Tool in Microsoft Excel The Fourier Analysis Tool in Microsoft Excel Douglas A. Kerr Issue March 4, 2009 ABSTRACT AD ITRODUCTIO The spreadsheet application Microsoft Excel includes a tool that will calculate the discrete Fourier

More information

Lab 1. The Fourier Transform

Lab 1. The Fourier Transform Lab 1. The Fourier Transform Introduction In the Communication Labs you will be given the opportunity to apply the theory learned in Communication Systems. Since this is your first time to work in the

More information

AN1200.04. Application Note: FCC Regulations for ISM Band Devices: 902-928 MHz. FCC Regulations for ISM Band Devices: 902-928 MHz

AN1200.04. Application Note: FCC Regulations for ISM Band Devices: 902-928 MHz. FCC Regulations for ISM Band Devices: 902-928 MHz AN1200.04 Application Note: FCC Regulations for ISM Band Devices: Copyright Semtech 2006 1 of 15 www.semtech.com 1 Table of Contents 1 Table of Contents...2 1.1 Index of Figures...2 1.2 Index of Tables...2

More information

Em bedded DSP : I ntroduction to Digital Filters

Em bedded DSP : I ntroduction to Digital Filters Embedded DSP : Introduction to Digital Filters 1 Em bedded DSP : I ntroduction to Digital Filters Digital filters are a important part of DSP. In fact their extraordinary performance is one of the keys

More information

Revision of Lecture Eighteen

Revision of Lecture Eighteen Revision of Lecture Eighteen Previous lecture has discussed equalisation using Viterbi algorithm: Note similarity with channel decoding using maximum likelihood sequence estimation principle It also discusses

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

Doppler. Doppler. Doppler shift. Doppler Frequency. Doppler shift. Doppler shift. Chapter 19

Doppler. Doppler. Doppler shift. Doppler Frequency. Doppler shift. Doppler shift. Chapter 19 Doppler Doppler Chapter 19 A moving train with a trumpet player holding the same tone for a very long time travels from your left to your right. The tone changes relative the motion of you (receiver) and

More information

SGN-1158 Introduction to Signal Processing Test. Solutions

SGN-1158 Introduction to Signal Processing Test. Solutions SGN-1158 Introduction to Signal Processing Test. Solutions 1. Convolve the function ( ) with itself and show that the Fourier transform of the result is the square of the Fourier transform of ( ). (Hints:

More information

PCM Encoding and Decoding:

PCM Encoding and Decoding: PCM Encoding and Decoding: Aim: Introduction to PCM encoding and decoding. Introduction: PCM Encoding: The input to the PCM ENCODER module is an analog message. This must be constrained to a defined bandwidth

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

Time Series Analysis: Introduction to Signal Processing Concepts. Liam Kilmartin Discipline of Electrical & Electronic Engineering, NUI, Galway

Time Series Analysis: Introduction to Signal Processing Concepts. Liam Kilmartin Discipline of Electrical & Electronic Engineering, NUI, Galway Time Series Analysis: Introduction to Signal Processing Concepts Liam Kilmartin Discipline of Electrical & Electronic Engineering, NUI, Galway Aims of Course To introduce some of the basic concepts of

More information

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB FACULTY OF ENGINEERING AND SUSTAINABLE DEVELOPMENT. The Algorithms of Speech Recognition, Programming and Simulating in MATLAB Tingxiao Yang January 2012 Bachelor s Thesis in Electronics Bachelor s Program

More information

NRZ Bandwidth - HF Cutoff vs. SNR

NRZ Bandwidth - HF Cutoff vs. SNR Application Note: HFAN-09.0. Rev.2; 04/08 NRZ Bandwidth - HF Cutoff vs. SNR Functional Diagrams Pin Configurations appear at end of data sheet. Functional Diagrams continued at end of data sheet. UCSP

More information

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Non-Data Aided Carrier Offset Compensation for SDR Implementation Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center

More information

The front end of the receiver performs the frequency translation, channel selection and amplification of the signal.

The front end of the receiver performs the frequency translation, channel selection and amplification of the signal. Many receivers must be capable of handling a very wide range of signal powers at the input while still producing the correct output. This must be done in the presence of noise and interference which occasionally

More information

Analog and Digital Signals, Time and Frequency Representation of Signals

Analog and Digital Signals, Time and Frequency Representation of Signals 1 Analog and Digital Signals, Time and Frequency Representation of Signals Required reading: Garcia 3.1, 3.2 CSE 3213, Fall 2010 Instructor: N. Vlajic 2 Data vs. Signal Analog vs. Digital Analog Signals

More information

Basics of Digital Recording

Basics of Digital Recording Basics of Digital Recording CONVERTING SOUND INTO NUMBERS In a digital recording system, sound is stored and manipulated as a stream of discrete numbers, each number representing the air pressure at a

More information

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY 3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 296 SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY ASHOK KUMAR SUMMARY One of the important

More information

Web-Conferencing System SAViiMeeting

Web-Conferencing System SAViiMeeting Web-Conferencing System SAViiMeeting Alexei Machovikov Department of Informatics and Computer Technologies National University of Mineral Resources Mining St-Petersburg, Russia amachovikov@gmail.com Abstract

More information

Univariate and Multivariate Methods PEARSON. Addison Wesley

Univariate and Multivariate Methods PEARSON. Addison Wesley Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

More information

CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC

CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC 1. INTRODUCTION The CBS Records CD-1 Test Disc is a highly accurate signal source specifically designed for those interested in making

More information

Acoustic Terms, Definitions and General Information

Acoustic Terms, Definitions and General Information Acoustic Terms, Definitions and General Information Authored by: Daniel Ziobroski Acoustic Engineer Environmental and Acoustic Engineering GE Energy Charles Powers Program Manager Environmental and Acoustic

More information

Speech Analysis for Automatic Speech Recognition

Speech Analysis for Automatic Speech Recognition Speech Analysis for Automatic Speech Recognition Noelia Alcaraz Meseguer Master of Science in Electronics Submission date: July 2009 Supervisor: Torbjørn Svendsen, IET Norwegian University of Science and

More information

RECOMMENDATION ITU-R BO.786 *

RECOMMENDATION ITU-R BO.786 * Rec. ITU-R BO.786 RECOMMENDATION ITU-R BO.786 * MUSE ** system for HDTV broadcasting-satellite services (Question ITU-R /) (992) The ITU Radiocommunication Assembly, considering a) that the MUSE system

More information

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS 1. Bandwidth: The bandwidth of a communication link, or in general any system, was loosely defined as the width of

More information

Jitter Measurements in Serial Data Signals

Jitter Measurements in Serial Data Signals Jitter Measurements in Serial Data Signals Michael Schnecker, Product Manager LeCroy Corporation Introduction The increasing speed of serial data transmission systems places greater importance on measuring

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper

More information

PYKC Jan-7-10. Lecture 1 Slide 1

PYKC Jan-7-10. Lecture 1 Slide 1 Aims and Objectives E 2.5 Signals & Linear Systems Peter Cheung Department of Electrical & Electronic Engineering Imperial College London! By the end of the course, you would have understood: Basic signal

More information

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester 2004. Norsk Regnesentral

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester 2004. Norsk Regnesentral GSM speech coding Forelesning INF 5080 Vårsemester 2004 Sources This part contains material from: Web pages Universität Bremen, Arbeitsbereich Nachrichtentechnik (ANT): Prof.K.D. Kammeyer, Jörg Bitzer,

More information

Matlab GUI for WFB spectral analysis

Matlab GUI for WFB spectral analysis Matlab GUI for WFB spectral analysis Jan Nováček Department of Radio Engineering K13137, CTU FEE Prague Abstract In the case of the sound signals analysis we usually use logarithmic scale on the frequency

More information

AN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD INTRODUCTION CEA-2010 (ANSI) TEST PROCEDURE

AN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD INTRODUCTION CEA-2010 (ANSI) TEST PROCEDURE AUDIOMATICA AN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD by Daniele Ponteggia - dp@audiomatica.com INTRODUCTION The Consumer Electronics Association (CEA),

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Time and Frequency Domain Equalization

Time and Frequency Domain Equalization Time and Frequency Domain Equalization Presented By: Khaled Shawky Hassan Under Supervision of: Prof. Werner Henkel Introduction to Equalization Non-ideal analog-media such as telephone cables and radio

More information

Timing Errors and Jitter

Timing Errors and Jitter Timing Errors and Jitter Background Mike Story In a sampled (digital) system, samples have to be accurate in level and time. The digital system uses the two bits of information the signal was this big

More information

Lecture - 4 Diode Rectifier Circuits

Lecture - 4 Diode Rectifier Circuits Basic Electronics (Module 1 Semiconductor Diodes) Dr. Chitralekha Mahanta Department of Electronics and Communication Engineering Indian Institute of Technology, Guwahati Lecture - 4 Diode Rectifier Circuits

More information

Convolution, Correlation, & Fourier Transforms. James R. Graham 10/25/2005

Convolution, Correlation, & Fourier Transforms. James R. Graham 10/25/2005 Convolution, Correlation, & Fourier Transforms James R. Graham 10/25/2005 Introduction A large class of signal processing techniques fall under the category of Fourier transform methods These methods fall

More information

SWISS ARMY KNIFE INDICATOR John F. Ehlers

SWISS ARMY KNIFE INDICATOR John F. Ehlers SWISS ARMY KNIFE INDICATOR John F. Ehlers The indicator I describe in this article does all the common functions of the usual indicators, such as smoothing and momentum generation. It also does some unusual

More information

Basic Acoustics and Acoustic Filters

Basic Acoustics and Acoustic Filters Basic CHAPTER Acoustics and Acoustic Filters 1 3 Basic Acoustics and Acoustic Filters 1.1 The sensation of sound Several types of events in the world produce the sensation of sound. Examples include doors

More information

SIGNAL PROCESSING FOR EFFECTIVE VIBRATION ANALYSIS

SIGNAL PROCESSING FOR EFFECTIVE VIBRATION ANALYSIS SIGNAL PROCESSING FOR EFFECTIVE VIBRATION ANALYSIS Dennis H. Shreve IRD Mechanalysis, Inc Columbus, Ohio November 1995 ABSTRACT Effective vibration analysis first begins with acquiring an accurate time-varying

More information

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION Introduction The outputs from sensors and communications receivers are analogue signals that have continuously varying amplitudes. In many systems

More information

MICROPHONE SPECIFICATIONS EXPLAINED

MICROPHONE SPECIFICATIONS EXPLAINED Application Note AN-1112 MICROPHONE SPECIFICATIONS EXPLAINED INTRODUCTION A MEMS microphone IC is unique among InvenSense, Inc., products in that its input is an acoustic pressure wave. For this reason,

More information

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions Rec. ITU-R SM.853-1 1 RECOMMENDATION ITU-R SM.853-1 NECESSARY BANDWIDTH (Question ITU-R 77/1) Rec. ITU-R SM.853-1 (1992-1997) The ITU Radiocommunication Assembly, considering a) that the concept of necessary

More information

The Fundamentals of FFT-Based Audio Measurements in SmaartLive

The Fundamentals of FFT-Based Audio Measurements in SmaartLive The Fundamentals of FFT-Based Audio Measurements in SmaartLive Paul D. Henderson This article serves as summary of the Fast-Fourier Transform (FFT) analysis techniques implemented in t h e SIA -SmaartLive

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Precision Diode Rectifiers

Precision Diode Rectifiers by Kenneth A. Kuhn March 21, 2013 Precision half-wave rectifiers An operational amplifier can be used to linearize a non-linear function such as the transfer function of a semiconductor diode. The classic

More information

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31 Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the

More information

CDMA TECHNOLOGY. Brief Working of CDMA

CDMA TECHNOLOGY. Brief Working of CDMA CDMA TECHNOLOGY History of CDMA The Cellular Challenge The world's first cellular networks were introduced in the early 1980s, using analog radio transmission technologies such as AMPS (Advanced Mobile

More information

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

HD Radio FM Transmission System Specifications Rev. F August 24, 2011 HD Radio FM Transmission System Specifications Rev. F August 24, 2011 SY_SSS_1026s TRADEMARKS HD Radio and the HD, HD Radio, and Arc logos are proprietary trademarks of ibiquity Digital Corporation. ibiquity,

More information

Dream DRM Receiver Documentation

Dream DRM Receiver Documentation Dream DRM Receiver Documentation Dream is a software implementation of a Digital Radio Mondiale (DRM) receiver. All what is needed to receive DRM transmissions is a PC with a sound card and a modified

More information

Understanding CIC Compensation Filters

Understanding CIC Compensation Filters Understanding CIC Compensation Filters April 2007, ver. 1.0 Application Note 455 Introduction f The cascaded integrator-comb (CIC) filter is a class of hardware-efficient linear phase finite impulse response

More information

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.) VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.) 1 Remember first the big picture VoIP network architecture and some terminologies Voice coders 2 Audio and voice quality measuring

More information

Speech Compression. 2.1 Introduction

Speech Compression. 2.1 Introduction Speech Compression 2 This chapter presents an introduction to speech compression techniques, together with a detailed description of speech/audio compression standards including narrowband, wideband and

More information

RADIO FREQUENCY INTERFERENCE AND CAPACITY REDUCTION IN DSL

RADIO FREQUENCY INTERFERENCE AND CAPACITY REDUCTION IN DSL RADIO FREQUENCY INTERFERENCE AND CAPACITY REDUCTION IN DSL Padmabala Venugopal, Michael J. Carter*, Scott A. Valcourt, InterOperability Laboratory, Technology Drive Suite, University of New Hampshire,

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper

More information

Implementation of Digital Signal Processing: Some Background on GFSK Modulation

Implementation of Digital Signal Processing: Some Background on GFSK Modulation Implementation of Digital Signal Processing: Some Background on GFSK Modulation Sabih H. Gerez University of Twente, Department of Electrical Engineering s.h.gerez@utwente.nl Version 4 (February 7, 2013)

More information

application note Directional Microphone Applications Introduction Directional Hearing Aids

application note Directional Microphone Applications Introduction Directional Hearing Aids APPLICATION NOTE AN-4 Directional Microphone Applications Introduction The inability to understand speech in noisy environments is a significant problem for hearing impaired individuals. An omnidirectional

More information

DOLBY SR-D DIGITAL. by JOHN F ALLEN

DOLBY SR-D DIGITAL. by JOHN F ALLEN DOLBY SR-D DIGITAL by JOHN F ALLEN Though primarily known for their analog audio products, Dolby Laboratories has been working with digital sound for over ten years. Even while talk about digital movie

More information