A Computationally Efficient Multipitch Analysis Model
|
|
|
- Clarissa Carmel Stokes
- 9 years ago
- Views:
Transcription
1 708 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 6, NOVEMBER 2000 A Computationally Efficient Multipitch Analysis Model Tero Tolonen, Student Member, IEEE, and Matti Karjalainen, Member, IEEE Abstract A computationally efficient model for multipitch and periodicity analysis of complex audio signals is presented. The model essentially divides the signal into two channels, below and above 1000 Hz, computes a generalized autocorrelation of the low-channel signal and of the envelope of the high-channel signal, and sums the autocorrelation functions. The summary autocorrelation function (SACF) is further processed to obtain an enhanced SACF (ESACF). The SACF and ESACF representations are used in observing the periodicities of the signal. The model performance is demonstrated to be comparable to those of recent time-domain models that apply a multichannel analysis. In contrast to the multichannel models, the proposed pitch analysis model can be run in real time using typical personal computers. The parameters of the model are experimentally tuned for best multipitch discrimination with typical mixtures of complex tones. The proposed pitch analysis model may be used in complex audio signal processing applications, such as sound source separation, computational auditory scene analysis, and structural representation of audio signals. The performance of the model is demonstrated by pitch analysis examples using sound mixtures which are available for download at Index Terms Auditory modeling, multipitch analysis, periodicity analysis, pitch perception. I. INTRODUCTION MANY principles have been proposed for the modeling of human pitch perception and for practical pitch determination of simple audio or speech signals [1] [3]. For regular signals with harmonic structure, such as clean speech of a single speaker, the problem is solved quite reliably. When the complexity increases further, e.g., when harmonic complexes of sounds or voices are mixed in a single signal channel, the determination of pitches is generally a difficult problem that has not been solved satisfactorily. Computational algorithms for multipitch identification, for instance, in automatic transcription of polyphonic music, have been around for over 20 years. The first systems had typically substantial limitations on the content, and they were only able to detect up to two simultaneous harmonic tones [4] [9]. The more recent systems have advanced in performance [10] [14] Manuscript received January 18, 1999; revised December 27, This work was supported by the GETA Graduate School, Helsinki University of Technology, the Foundation of Jenny and Antti Wihuri (Jenny ja Antti Wihurin rahasto), Tekniikan edistämissäätiö, and Nokia Research Center. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Dennis R. Morgan. The authors are with Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, FIN Espoo, Finland. Publisher Item Identifier S (00) allowing more simultaneous tones to be detected with greater accuracy. The concept of pitch [15] refers to auditory perception and has a complex relationship to physical properties of a signal. Thus, it is natural to distinguish it from the estimation of fundamental frequency and to apply methods that simulate human perception. Many such approaches have been proposed and they generally follow one of two paradigms: place (or frequency) theory and timing (or periodicity) theory. Neither of these in pure form has been proven to show full compatibility with human pitch perception and it is probable that a combination of the two approaches is needed. Recently it has been demonstrated that a peripheral auditory model that uses time-domain processing of periodicity properties shows ability to simulate many known features of pitch perception which are often considered to be more central [16], [17]. Such models are attractive since auditory processes may be simulated with relatively straightforward digital signal processing (DSP) algorithms. Additional features may be readily included using, e.g., frequency domain algorithms if desired. The unitary pitch analysis model of Meddis and O Mard [16] and its predecessors by Meddis and Hewitt [17] are among the best known recent models of time-domain pitch analysis. The unitary model is shown to exhibit qualitatively good correspondence to human perception in many listening tasks such as missing fundamental, musical chords, etc. A practical problem with the model is that, despite of its quite straightforward principle, the overall algorithm is computationally expensive since the analysis is carried out using a multichannel auditory filterbank. In this paper we present a multipitch analysis model that is computationally efficient. While it does not attempt to simulate the human auditory system in detail, it is still intuitive from the auditory modeling viewpoint. Our pitch 1 analysis model finds applications in complex audio signal processing tasks, such as sound source separation, computational auditory scene analysis [18] [21], structural representation of audio signals, and content analysis techniques. The proposed model is, to a certain extent, a computationally superior simplification of the Meddis and O Mard model which has very similar behavior, as will be demonstrated below. Additional features will be proposed in order to allow for further analysis of multipitch signals, such as musical chords and speech mixtures. The performance of the model is demonstrated by periodicity analysis examples using sound mixtures available at 1 Following a common practice, we use term pitch even when fundamental period or pitch period could be more precise concepts /00$ IEEE
2 TOLONEN AND KARJALAINEN: COMPUTATIONALLY EFFICIENT MULTIPITCH ANALYSIS MODEL 709 Fig. 1. Block diagram of the Meddis O Mard model [17]. The paper is organized as follows. In Section II, the proposed pitch analysis model is introduced and compared to pitch perception models reported in the literature. Section III describes how the periodicity representation can be enhanced so that periodicities may be more easily investigated. Section IV discusses the model parameters and shows with examples how they affect the behavior of the model, and Section V demonstrates the model performance in multipitch determination. Finally, Section VI concludes the paper with a summary and discussion. II. PITCH ANALYSIS MODEL A. Multichannel Pitch Analysis In many recent models of human perception, the key component is a filterbank that simulates the behavior of the cochlea. The filterbank separates a sound signal into subband channels that have bandwidths corresponding to the frequency resolution of the cochlea. A common choice is to use a gammatone filterbank [22] with channels corresponding to the equivalent rectangular bandwidth (ERB) channels of human audition [23]. Fig. 1 depicts the pitch perception model of Meddis and O Mard [17] that uses the filterbank approach. The input signal is first divided into channels depending on the implementation [16], [17], [21]. The signal in each channel is half-wave rectified and lowpass filtered. Essentially, this step corresponds to the detection of the envelope of the signal in each channel. From the envelope signals, a periodicity measure, such as the autocorrelation function (ACF), is computed within each channel. Finally, the ACFs are summed across the channels to yield a summary autocorrelation function (SAFC) that is used in pitch analysis. In studies that have applied the pitch analysis paradigm of Fig. 1, several implementations are reported. In some systems, pre-processing of the signal is performed before the signal enters the filterbank. For instance, in [17] a bandpass filter is used for simulating the middle ear transfer function. In [17] the half-wave rectification and lowpass filtering block is replaced with a block that estimates the probability of neural activation in each channel. In [24], an automatic gain control block is added after the half-wave rectification and the lowpass filtering is removed. There are several approaches for computation of the autocorrelation or a similar periodicity measure within each of the channels. The time domain approach is a common choice [16], [17], [21]. In these systems, an exponential window is applied with a window time constant that varies from 2.5 ms [17] to 25 ms [21]. Our experiments have determined that the effective length of the window should be approximately ms so that the window spans more than one period of the pitched tone with all fundamental periods that are in the range of interest. Ellis [21] applies a logarithmic scale of the autocorrelation lag with approximately 48 samples for each octave. He motivates the use of such a scale by better resemblance with the pitch detection resolution of the human auditory system. This requires interpolation of the signals in the channels. Ellis notes that half-wave rectification is preferred over full-wave rectification in order to suppress octave errors. Some of the pitch analysis systems prefer to use a discrete Fourier transform (DFT) based autocorrelation computation for computational efficiency [24]. This approach also allows for processing of the signal in the frequency-domain. As discussed below, nonlinear compression of the DFT magnitude may be used to enhance the performance of the pitch analysis. Such a compression is not readily implementable in a time-domain system. Although the unitary pitch perception model of Meddis and O Mard has been widely adopted, some studies question the general validity of the unitary pitch perception paradigm. Particularly, it has been suggested that two mechanisms for pitch perception are required: one for resolved harmonics and one for unresolved harmonics [25], [26]. The term resolved harmonics refers to the case when only one or no components fall within the 10-dB-down bandwidth of an auditory filter [27]. In the other case, the components are said to be unresolved. The present study does not attempt to answer the question on the pitch detection mechanism of the human auditory system. In fact, the proposed model has only two channels and does not attempt directly to follow human resolvability. Interestingly enough, as shown in the following subsection, the model still qualitatively produces similar and comparable results to those of the more elaborate multichannel pitch analysis systems. The computational demands of multichannel pitch analysis systems have prohibited their use in practical applications where typically real-time performance is required. The computational requirements are mostly determined by the number of channels used in the filterbank. This motivates the development of a simplified model of pitch perception presented below that is more suitable in practical applications and still qualitatively retains the performance of multichannel systems. B. Two-Channel Pitch Analysis A block diagram of the proposed two-channel pitch analysis model is illustrated in Fig. 2. The first block is a pre-whitening filter that is used to remove short-time correlation of the signal. The whitening filter is implemented using warped linear prediction (WLP) as described in [28]. The WLP technique works as ordinary linear prediction except that it implements critical-band auditory resolution of spectral modeling instead of uniform frequency resolution, and can be used to reduce the
3 710 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 6, NOVEMBER 2000 Fig. 2. Block diagram of the proposed pitch analysis model. Fig. 3. Comparison of the SACF functions of the two models using the musical chord test signal. The two-channel SACF is plotted on the top and the Meddis Hewitt SACF on the bottom. filter order considerably. A WLP filter of 12th order is used here with sampling rate of 22 khz, Hamming windowing, frame size of 23.2 ms, and hop size of 10.0 ms. Inverse filtering with the WLP model yields the pre-whitened signal. To a certain extent, the whitening filter may be interpreted as functionally similar to the normalization of the hair cell activity level toward spectral flattening due to the adaptation and saturation effects [29], [30]. The functionality of the middle part of Fig. 2 corresponds to that of the unitary multichannel pitch analysis model of Fig. 1. The signal is separated into two channels, below and above 1 khz. The channel separation is done with filters that have 12 db/octave attenuation in the stop-band. The lowpass block also includes a highpass rolloff with 12 db/octave below 70 Hz. The high-channel signal is half-wave rectified and lowpass filtered with a similar filter (including the highpass characteristic at 70 Hz) to that used for separating the low channel. The periodicity detection is based on generalized autocorrelation, i.e., the computation consists of a discrete Fourier transform (DFT), magnitude compression of the spectral representation, and an inverse transform (IDFT). The signal in Fig. 2 corresponds to the SAFC of Fig. 1 and is obtained as where and are the low and high channel signals before the periodicity detection blocks in Fig. 2. The parameter determines the frequency domain compression [31]. For normal autocorrelation but, as detailed in Section IV, it is advantageous to use a value smaller than 2. Note that periodicity computation using the DFT allows the control of the parameter or the use of some other nonlinear processing of the frequency transform, e.g, application of natural logarithm resulting in the cepstrum. This is not directly possible with time-domain periodicity detection algorithms. The fast Fourier transform (FFT) and its inverse (IFFT) are typically used to speed the computation of the transforms. The last block of Fig. 2 presents the processing of the SACF (denoted ). This part of the algorithm is detailed in Section III. Before comparing the performance of the models of Figs. 1 and 2, it is instructive to study the sensitivity to the phase properties of the signal in pitch analysis when using a multichannel model or a two-channel model. In the two-channel case, the low channel is phase-insensitive (except for the windowing effects) (1) due to the autocorrelation [notice the modulus in (1)]. However, the high channel is phase-sensitive since it follows the amplitude envelope of a signal in the frequency band above 1000 Hz. Thus, all phase-sensitivity in our model is inherently caused by the high channel. This is different from the Meddis Hewitt model where all channels are phase-sensitive since they follow the envelope of the signal in the corresponding frequency band. However, when lower channels resolve the harmonics, the difference is relatively small since in that case the autocorrelation computation removes the phase-sensitivity. C. Comparison of Multichannel and Two-Channel Models The performance and validity of the proposed two-channel SACF model (without pre-filtering and pre-whitening, using running autocorrelation similar to [17]) in pitch periodicity analysis is evaluated here by a comparison with the multichannel SACF model of Meddis and Hewitt. The AIM software [32] was used to compute the Meddis Hewitt SACFs. The test signals were chosen according to [17]. The results of the musical chord experiment [17] with the two-channel and the multichannel models are illustrated on the top and the bottom plots of Fig. 3, respectively. In this case, the test signal consisted of three harmonic signals with fundamental frequencies 392.0, 523.2, and Hz corresponding to tones G,C, and E, respectively. The G tone consisted of first four harmonics, and the C and E tones contained the first three harmonics each. All the harmonic components were of equal amplitude. Both models exhibit an SACF peak at a lag of 7.7 ms. This corresponds to a frequency of 130 Hz (tone C ), which is the root tone of the chord. The waveforms of the two summary autocorrelation functions are similar although the scales differ. While it is only possible to report this experiment here, the models behave similarly with a broader range of test signals. More examples of SACF analysis are available at III. ENHANCING THE SUMMARY AUTOCORRELATION FUNCTION The peaks in the SACF curve produced as output of the model in Fig. 2 are relatively good indicators of potential pitch periods in the signal being analyzed as shown in Fig. 3. However, such a summary periodicity function contains much redundant and spurious information that makes it difficult to estimate
4 TOLONEN AND KARJALAINEN: COMPUTATIONALLY EFFICIENT MULTIPITCH ANALYSIS MODEL 711 TABLE I SUGGESTED PARAMETER VALUES FOR THE PROPOSED TWO-CHANNEL MODEL Fig. 4. An example of multipitch analysis. A test signal with three clarinet tones with fundamental frequencies 147, 185, and 220 Hz, and relative rms values of , , and 1, respectively, was analyzed. Top: two-channel SACF, bottom: two-channel ESACF. which peaks are true pitch peaks. For instance, the autocorrelation function generates peaks at all integer multiples of the fundamental period. Furthermore, in case of musical chords the root tone often appears very strong though in most cases it should not be considered as the fundamental period of any source sound. To be more selective, a peak pruning technique similar to [21], but computationally more straightforward, is used in our model. The technique is the following. The original SACF curve, as demonstrated above, is first clipped to positive values and then time-scaled (expanded in time) by a factor of two and subtracted from the original clipped SACF function, and again the result is clippedtohavepositivevaluesonly. Thisremovesrepetitive peaks with double the time lag where the basic peak is higher than the duplicate. It also removes the near-zero time lag part of the SACF curve. This operation can be repeated for time lag scaling with factorsofthree,four,five, etc.,asfarasdesired,inordertoremove higher multiples ofeachpeak. Theresultingfunctioniscalledhere the enhanced summary autocorrelation (ESACF). An illustrative example of the enhanced SACF analysis is shown in Fig. 4 for a signal consisting of three clarinet tones. The fundamental frequencies of the tones are 147, 185, and 220 Hz. The SACF is depicted on the top and the enhanced SACF curve on the bottom, showing clear indication of the three fundamental periodicities and no other peaks. We have experimented with different musical chords and source instrument sounds. In most cases, sound combinations of two to three sources are resolved quite easily if the amplitude levels of the sources are not too different. For chords with four or more sources, the subsignals easily mask each other so that some of the sources are not resolved reliably. One further idea to improve the pitch resolution with complex mixtures, especially with relatively different amplitudes, is to use an iterative algorithm, whereby the most prominent sounds are first detected and filtered out (see Section VI) or attenuated properly, and then the pitch analysis is repeated for the residual. IV. MODEL PARAMETERS The model of Fig. 2 has several parameters that affect the behavior of pitch analysis. In the following, we show with examples the effect of each parameter on the SACF and ESACF representations. In most cases, it is difficult to obtain the correct value for a parameter from the theory of human perception. Rather, we attempt to obtain model performance that is similar to that of the human perception or approximately optimal based on visual inspection of analysis results. The suggested parameter values are collected in Table I. In the following section, there are two types of illustrations. In Figs. 5 and 7, the SACFs of one frame of the signal are plotted on the top, and one or two ESACF s that correspond to the parameter values that we found best suited are depicted on the bottom. In Figs. 6 and 8, consecutive ESACFs are illustrated as a function of time. This representation is somewhat similar to the spectrogram: time is shown on the horizontal axis, ESACF lag on the vertical axis, and the gray level of a point corresponds to the value of the ESACF. Test signals are mixtures of synthetic harmonic tones, noise, and speech signals. Each synthetic tone has the amplitude of the first harmonic equal to 1.0, and the amplitude of the th harmonic equal to. The initial phases of the harmonics are 0. The noise that is added in some examples is white Gaussian noise. In this work, we have used speech samples that have been recorded in an anechoic chamber and in normal office conditions. The sampling rate of all the examples is Hz. A. Compression of Magnitude in Transform Domain In Section II we motivated the use of transform-domain computation of the generalized autocorrelation by two considerations: 1) it allows nonlinear compression of the spectral representation and 2) it is computationally more efficient. The following examples concentrate on magnitude compression and suggest that the normal autocorrelation function ( ) is suboptimal for our periodicity detection model. Exponential compression is easily available by adjusting parameter in (1). While we only consider exponential compression in this context, other nonlinear functions may be applied as well. A common choice in the speech processing community is to use the natural logarithm, which results in the cepstrum. Nonlinear compression in the transform domain has been studied in the context of pitch analysis of speech signals [31]. In that study, the periodicity measure was computed using the wideband signal directly without dividing into channels. It was reported that the compression parameter gave the best results. It was also shown that pre-whitening of the signal spectrum showed a tendency of improving the pitch analysis accuracy. Fig. 5 illustrates the effect of magnitude compression to the SACF. The upper four plots depict the SACF s that are obtained using,,, and, whereas the bottom plot shows the ESACF that is obtained from the SACF with. A Hamming window of 1024 samples (46.4 ms with a sampling frequency of Hz) is used. The test signal consists of two synthetic harmonic tones with fundamental frequencies and Hz and white Gaussian noise with signal-to-noise ratio (SNR) of 2.2 db. The fundamental frequencies of the harmonic tones are separated
5 712 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 6, NOVEMBER 2000 Fig. 5. An example of magnitude compression. Test signal consists of two harmonic tones with added Gaussian white noise (SNR is 2.2 db). The first four plots from the top illustrate the SACF computed with k = 0:2, k =0:67, k =1:0, and k =2:0, respectively. The bottom plot depicts the ESACF corresponding to the second plot with k =0:67. by one semitone. The two tones are identifiable by listening to the signal, although the task is much more involved than with the signal without the additive noise (see the examples at the WWW). The example shows that the SACF peaks get broader as the value of increases. However, the performance with low values of is compromised by sensitivity to noise, as seen by the number and level of the spurious peaks in the top plot. According to this example, is a good compromise between lag-domain resolution and sensitivity to noise. While we prefer the use of, in some computationally critical applications it may be useful to use if optimized routines are available for the square-root operation but not for the cubic-root operation. It is interesting to compare exponential compression to the cepstrum where logarithmic compression is used. Cepstral analysis typically results in higher lag-domain resolution than exponential compression with,but it may be problematic with signals with low amplitude levels, since the natural logarithm approaches as the signal amplitude approaches 0 [31]. B. Time-Domain Windowing The choice of the time-domain window in correlation computation affects the temporal resolution of the method and also sets a lower bound to the lag range. The shape of the window function determines the leakage of energy in the spectral domain. It also determines the effective length of the window that corresponds to the width of the main lobe of the window in the spectral domain. We have applied a Hamming window in our pitch analysis model, although other tapered windows may Fig. 6. An example of the effect of the window length. Test signal consists of the Finnish vowel /a/ spoken by a male. The pitch of the vowel is falling. The signal is added onto itself after a delay of 91 ms. Plots from top to bottom illustrate the ESACF computed with window length 23.2, 46.4, 92.9, and ms, respectively. be used as well. For stationary signals, increasing the window length reduces the variance of the spectral representation. However, the sound signals typically encountered in pitch analysis are far from stationary and a long window is bound to degrade the performance when the pitch is changing. Fig. 6 illustrates this trade-off. Consecutive frames of ESACF s computed on the test signal are plotted in a spectrogram-like fashion in Fig. 6. The darker a point is in the figure, the higher is the value of the ESACF. The test signal consists of Finnish vowel /a/ spoken by a male. The pitch of the vowel is falling. The vowel signal is added onto itself after a delay of 91 ms. Listening to the signal reveals that the human auditory system is easily able to distinguish two vowels with falling pitches. The value of the hop size parameter is 10 ms, i.e., consecutive frames are separated in time by 10 ms. The Hamming window lengths of the analysis are, from the top to the bottom, 23.2, 46.4, 92.9, and ms. The two plots on the top of Fig. 6 exhibit two distinct pitch trajectories, as desired. However, as the window length is increased, the two trajectories are merged into one broader trajectory, as shown in the two bottom plots. Clearly, the two cases on the top are preferred over the two on the bottom. When the two top plots of Fig. 6 are compared, it is noticed that the one using a shorter window exhibits more undesired peaks in the ESACF. As expected, the use of a longer window reduces the artifacts that correspond to noise and other
6 TOLONEN AND KARJALAINEN: COMPUTATIONALLY EFFICIENT MULTIPITCH ANALYSIS MODEL 713 un-pitched components. From Fig. 6 it is suggested that a Hamming window with a length of 46.4 ms is a good compromise. When a tapered time-domain window, such as the Hamming window, is used, the hop size is typically chosen to be less than half of the window length. This ensures that the signal is evenly weighted in the computation. The hop size may be further reduced to obtain finer sampling of the ESACF frames in time, if desired. We have chosen a hop size equal to 10 ms which, from our experiments, seems a good compromise between the displacement of consecutive ESACF frames and computational demands. A similar hop size value is very often used in speech and audio processing applications. C. Pre-Whitening The pre-whitening filter that is used before the filterbank removes short-time correlation from the signal, e.g., due to formant resonance ringing in speech signals. In the spectral domain, this corresponds to flattening the spectral envelope. We thus expect the whitening to give better resolution of the peaks in the autocorrelation function. Since whitening flattens the overall spectral envelope of a signal, it may degrade the signal-to-noise ratio of a narrowband signal since the noise outside the signal band is strengthened. However, as the following example illustrates, the whitening does not typically degrade the two-channel pitch estimator performance. Fig. 7 demonstrates the effect of pre-whitening with test signal of Fig. 5, i.e., two harmonic tones with fundamental frequencies separated by one semitone and additive white Gaussian noise. The first two plots illustrate the SACF without (top) and with (second) pre-whitening. The peaks of the whitened SACF are better separated than those without whitening. The spurious peaks that are caused by the noise still appear at a relatively low level. This is confirmed by investigation of the corresponding ESACFs in the third and the fourth plot of Fig. 7. D. Two-Channel Filterbank The choice of the filterbank parameters affects the performance of the periodicity detector quite significantly, as demonstrated by the following example. The most important parameters are the filter orders and cut-off frequencies. As discussed in Section II, the two filters are bandpass filters with passband from Hz for the lower channel and from Hz for the upper channel. The lowest cut-off frequency at 70 Hz is chosen so that DC and very-low-frequency disturbances are suppressed while the periodicity detection of low-pitched tones is not degraded. The crossover frequency at 1000 Hz is not a critical parameter; it may vary between Hz (see, e.g., [30], [33]). It is related to the upper limit of fundamental frequencies that may be estimated properly using the method and naturally also affects the lag domain temporal resolution of periodicity analysis. When a tone with a fundamental frequency higher than the crossover frequency is analyzed, the SACF is dominated by the contribution from the high channel. The high-channel compressed autocorrelation is computed after the low-pass filtering at the crossover frequency, thus, the high-channel contribution for fundamental Fig. 7. An example of the effect of pre-whitening. The test signal is the same as in Fig. 5. The first two plots illustrate the SACF without (top) and with (second) pre-whitening. The third and the fourth plot depict the ESACF corresponding to the first and the second plot, respectively. frequencies above the crossover frequency is weak. The method is really a periodicity estimator: it is not capable of simulating properly the spectral pitch, i.e., a pitch that is based on resolved harmonics with fundamental frequency higher than 1000 Hz. Note that the other aforementioned filterbank methods are also periodicity detectors and have their fundamental frequency detection upper limit related to the lowpass filter after the half-wave rectification. Both filters are of the Butterworth type for maximally flat passbands. The filter order that is used for every transition band is a parameter that has a significant effect on the temporal resolution of the periodicity detector. Fig. 8 shows an example of the effect of channel-separation filtering. The test signal is the same as that in the example of Fig. 6, i.e., a Finnish vowel /a/ with a falling pitch is added to itself after a delay of 91 ms. The plots from top to bottom illustrate the ESACF s when filter orders 1, 2, and 4 (corresponding to a rolloff of 6, 12, and 24 db/octave) are used for each transition band, respectively. It is observed that the spurious peaks in the ESACF representations are reduced as the filter order is increased. By examination of Fig. 8 we conclude that filter order 2 for each transition band is the best compromise between resolution of the ESACF and the number of spurious peaks in the ESACF. V. MODEL PERFORMANCE The performance of the pitch analysis model is demonstrated with three examples: 1) resolution of harmonic tones with different amplitudes; 2) musical chords that are played with real instruments; 3) ESACF representation of a mixture of two vowels with varying pitches. The test signals can be downloaded at
7 714 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 6, NOVEMBER 2000 Fig. 8. An example of the effect of channel-separation filtering. Test signal is the same as in Fig. 6. The plots from top to bottom illustrate the ESACF s when the filter orders 1, 2, and 4 (corresponding to a roll-off of 6, 12, and 24 db/octave) are used for each transition band, respectively. Fig. 9. Example of the resolution of harmonic tones with varying amplitudes. The amplitude ratios of the two tones are, from the top to the bottom, 0.0, 3.0, 6.0, and 10.0 db. Fig. 9 shows the first example where the test signal consists of two synthetic harmonic tones with different amplitude ratios. The fundamental frequencies of the tones are and Hz. The amplitude ratios in the plots of Fig. 9 are, from the top to the bottom, 0.0, 3.0, 6.0, and 10.0 db. The tones are clearly separated in the top plot and the separation degrades with increasing amplitude ratio, as expected. This is in agreement with perception of the test signals; the test signal with 0.0 db amplitude ratio is clearly perceived as two tones while at 10.0 db the weaker signal is only barely audible. Fig. 10 shows an ESACF representation of a mixture of two vowels with varying pitches. The two Finnish /ae/ vowels have been spoken by a male in an anechoic chamber and mixed. The figure shows two distinct trajectories with few spurious peaks. The example of Fig. 11 illustrates pitch analysis on musical chords. The test signals consist of 2 4 clarinet tones that are mixed to get typical musical chords. The tones that are used for mixing are D (146.8 Hz), F (185.0 Hz), A (220.0 Hz), and C (261.6 Hz). The plots from the top to the bottom show the ESACF of one analysis frame. The test signals are, respectively, tones D and A ;D and F ;D,F, and A ; and D F,A, and C. This example allows us to investigate the performance of the model when tones are added to the mixture. The little arrows above the plots indicate the tone peaks in the ESACF representations. The first two plots show the performance with only two tones present. In both cases, the two tones are shown with peaks of equal height. The maximum value of the peak corresponding to the tone D at lag of 7 ms is almost 30 in the top plot and only a little more than 20 in the second plot. In the third plot, three tones are present. The tones are the same that were used for the first two plots, but now the D peak is more pronounced Fig. 10. An example of the pitch analysis of two vowels with crossing pitch trajectories. The test signal consists of two Finnish vowels /ae/, one with a raising pitch and the other one with a falling pitch. than the peaks corresponding to tones F, and A. Finally, inclusion of tone C again alters the peak heights of the other peaks. This dependence of the peak height is caused partly by the computation of the ESACF representation from the SACF representation, and partly since the tones have harmonics with colliding frequencies. In all the cases, however, the tones are clearly distinguishable from the ESACF representation. VI. SUMMARY AND DISCUSSION The multipitch analysis model described above has been developed as a compromise between computational efficiency and auditory relevance. The first property is needed to facilitate advanced applications of audio and speech signal analysis. Computational auditory scene analysis (CASA) [18] [21], structured and object-based coding of audio signals, audio content analysis, sound source separation, and separation of speech from severe background noise are among such applications. Auditory relevance is advantageous in order to enable comparison of the system performance with human auditory processing. Computational efficiency was shown by testing the algorithm of Fig. 2 on a 300 MHz PowerPC processor (Apple Macintosh G3). Computation of the SACF using WLP pre-whitening, sample rate of 22 khz, and frame size of 1024 samples (46 ms) took less than 7.0 ms per frame. With a 10 ms hop size, only a
8 TOLONEN AND KARJALAINEN: COMPUTATIONALLY EFFICIENT MULTIPITCH ANALYSIS MODEL 715 Fig. 11. Example of pitch analysis of musical chords with clarinet tones. The plots show ESACF s of signals consisting of tones D and A (top plot); D and F (second plot); D,F, and A (third plot); and D F,A, and C (bottom plot). part of processor s capacity is used in real-time pitch analysis. A multichannel model with correlation in each channel could not be implemented as real-time analysis using current general purpose processors. Although the auditory analogy of the model is not very strong, it shows some features that make it easier to interpret analysis results from the point of view of human pitch perception. The pre-whitening of the input signal, often considered useful in pitch analysis, may be seen to have minor resemblance to spectral compression in the auditory nerve. The division of the audio frequency range into two subranges, below and above about 1 khz, is a kind of minimum division to realize direct time synchrony of neural firings at low frequencies and envelope-based synchrony at high frequencies. Note that the model is a periodicity analyzer that does not implement spectral pitch analysis, which is needed especially if the fundamental frequency of the input signal exceeds the synchrony limit frequency of about 1 khz. In this study we focused on the pitch analysis of low-to-mid fundamental frequencies and did not try to include spectral pitch analysis. The computation of time-lag correlation (generalized autocorrelation) is difficult to interpret from an auditory point of view since it is carried out in the frequency domain. The only interesting auditory analogy is that the exponent for spectral compression is close to the one used in computation of the loudness density function [34]. It would be interesting to compare the result with neural interspike interval histograms which, however, are less efficient to compute. The enhanced summary autocorrelation (ESACF) is an interesting and computationally simple means to prune the periodicity of autocorrelation function. In a typical case this representation helps in finding the fundamental periodicities of harmonic complex tones in a mixture of such tones. It removes the common periodicities such as the root tone of musical chords. This is useful in sound source separation of harmonic tones. In music signal analysis, the complement of such pruning, i.e., detection of chord periodicities and rejection of single tone pitches, might as well be a useful feature for chord analysis. An interesting question of pitch analysis is the temporal integration that the human auditory system shows. As with a large set of other psychoacoustic features, the formation of pitch percept takes ms to reach its full accuracy. Using a single frame length of 46 ms, corresponding to an effective Hamming window length of about 25 ms, is a compromise of pitch tracking and sensitivity to noise. Averaging of consecutive frames can be used to improve the stability of SACF with steady-pitch signals. Better temporal integration strategies may be needed when there is faster pitch variation. This paper has dealt with multipitch analysis of an audio signal using the SACF and ESACF representations. The next step in a typical application would be to detect and describe pitch objects and their trajectories in time. Related to such pitch object detection is the resolution of the analysis when harmonic tones of different amplitudes are found in a mixture signal. As shown in Fig. 9, separation of pitch objects is easy only when the levels of tones are relatively similar. An effective and natural choice to improve the pitch analysis with varying amplitudes is to use an iterative technique, whereby the most prominent pitches are first detected and the corresponding harmonic complexes are filtered out by FIR comb filtering tuned to reject a given pitch. Tones with low amplitude level can then be analyzed from the residual signal. The same approach is useful in more general sound (source) separation of harmonic tones [35]. The pitch lags can be used to generate sparse FIR s for rejecting (or enhancing) specific harmonic complexes in a given mixture signal. An example of vowel separation and spectral estimation of the constituent vowels is given in [35]. This can be considered as kind of multipitch prediction of harmonic complex mixtures. REFERENCES [1] W. M. Hartmann, Pitch, periodicity, and auditory organization, J. Acoust. Soc. Amer., vol. 100, pp , Dec [2] W. Hess, Pitch Determination of Speech Signals. Berlin, Germany: Springer-Verlag, [3] W. J. Hess, Pitch and voicing determination, in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker, 1992, ch. 6, pp [4] J. A. Moorer, On the segmentation and analysis of continuous musical sound by digital computer, Ph.D. dissertation, Dept. Music, Stanford Univ., Stanford, CA, May [5] R. C. Maher, Evaluation of a method for separating digitized duet signals, J. Audio Eng. Soc., vol. 38, pp , June [6] R. C. Maher and J. W. Beauchamp, Fundamental frequency estimation of musical signals using a two-way mismatch procedure, J. Acoust. Soc. Amer., vol. 95, pp , Apr [7] C. Chafe, D. Jaffe, K. Kashima, B. Mont-Reynard, and J. Smith, Techniques for note identification in polyphonic music, Dept. Music, Stanford Univ., Stanford, CA, Tech. Rep. STAN-M-29, CCRMA, Oct [8] C. Chafe and D. Jaffe, Source separation and note identification in polyphonic music, Dept. Music, Stanford University, Stanford, CA, Tech. Rep. STAN-M-36, CCRMA, Apr [9] M. Weintraub, A computational model for separating two simultanious talkers, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp , 1986.
9 716 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 6, NOVEMBER 2000 [10] K. Kashino and H. Tanaka, A sound source separation system with the ability of automatic tone modeling, Int. Computer Music Conf., pp , [11] A. Klapuri, Number theoretical means of resolving a mixture of several harmonic sounds, in Proc. Signal Processing IX: Theories Applications, vol. 4, 1998, p [12] K. D. Martin, Automatic transcription of simple polyphonic music: Robust front end processing, Mass. Inst. Technol., Media Lab Perceptual Computing, Cambridge, Tech. Rep. 399, [13], A blackboard system for automatic transcription of simple polyphonic music, Mass. Inst. Technol. Media Lab. Perceptual Computing, Cambridge, Tech. Rep. 385, [14] D. P. W. Ellis and B. L. Vercoe, A wavelet-based sinusoid model of sound for auditory signal separation, in Int. Computer Music Conf., 1991, pp [15] USA Standard Acoustical Terminology, Amer. Nat. Stand. Inst., S , [16] R. Meddis and L. O Mard, A unitary model for pitch perception, J. Acoust. Soc. Amer., vol. 102, pp , Sept [17] R. Meddis and M. Hewitt, Virtual pitch and phase sensitivity of a computer model of the auditory periphery I: Pitch identification, J. Acoust. Soc. Am., vol. 89, pp , June [18] A. S. Bregman, Auditory Scene Analysis. Cambridge, MA: MIT Press, [19] M. P. Cooke, Modeling auditory processing and organization, Ph.D. dissertation, Univ. Sheffield, Sheffield, U.K., [20] G. J. Brown, Computational auditory scene analysis: A representational approach, Ph.D. dissertation, Univ. Sheffield, Sheffield, U.K., [21] D. P. W. Ellis, Prediction-driven computational auditory scene analysis, Ph.D. dissertation, Mass. Inst. Technol., Cambridge, June [22] R. D. Patterson, The sound of the sinusoid: Spectral models, J. Acoust. Soc. Amer., vol. 96, pp , Sept [23] B. C. J. Moore, R. W. Peters, and B. R. Glasberg, Auditory filter shapes at low center frequencies, J. Acoust. Soc. Amer., vol. 88, pp , July [24] M. Slaney and R. F. Lyon, A perceptual pitch detector, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp , [25] R. Carlyon and T. M. Shackleton, Comparing the fundamental frequencies of resolved and unresolved harmonics: Evidence for two pitch mechanisms?, J. Acoust. Soc. Amer., vol. 95, pp , June [26] R. P. Carlyon, Comments on a unitary model of pitch perception, J. Acoust. Soc. Amer., vol. 102, pp , Aug [27] B. R. Glasberg and B. C. J. Moore, Derivation of auditory filter shapes from notched-noise data, Hear. Res., vol. 47, pp , [28] U. K. Laine, M. Karjalainen, and T. Altosaar, Warped linear prediction (WLP) in speech and audio processing, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. III.349 III.352, [29] E. D. Young and M. B. Sachs, Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditorynerve fibers, J. Acoust. Soc. Amer., vol. 66, pp , Nov [30] S. Seneff, A joint synchrony/mean-rate model of auditory speech processing, J. Phonetics, vol. 16, pp , [31] H. Indefrey, W. Hess, and G. Seeser, Design and evaluation of doubletransform pitch determination algorithms with nonlinear distortion in the frequency domain Preliminary results, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 1985, pp [32] R. D. Patterson, M. H. Allerhand, and C. Giguère, Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform, J. Acoust. Soc. Amer., vol. 98, pp , Oct [33] T. Dau, B. Kollmeier, and A. Kohlrausch, Modeling auditory processing of amplitude modulation I. Detection and masking with narrow-band carriers, J. Acoust. Soc. Amer., vol. 102, pp , Nov [34] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models. Berlin, Germany: Springer-Verlag, [35] M. Karjalainen and T. Tolonen, Multi-pitch and periodicity analysis model for sound separation and auditory scene analysis, in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, 1999, pp Tero Tolonen (S 98) was born in Oulu, Finland, in He majored in acoustics and audio signal processing and received the M.Sc.(Tech.) and Lic.Sc.(Tech.) degrees in electrical engineering from the Helsinki University of Technology (HUT), Espoo, Finland, in January 1998 and December 1999, respectively. He is currently pursuing a postgraduate degree. He has been with the HUT Laboratory of Acoustics and Audio Signal Processing since His research interests include model-based audio representation and coding, physical modeling of musical instruments, and digital audio signal processing. Mr. Tolonen is a student member of the IEEE Signal Processing Society and the Audio Engineering Society. Matti Karjalainen (M 84) was born in Hankasalmi, Finland, in He received the M.Sc. and Dr.Tech. degrees in electrical engineering from the Tampere University of Technology, Tampere, Finland, in 1970 and 1978, respectively. His doctoral dissertation dealt with speech synthesis by rule in Finnish. From 1980 to 1986, he was Associate Professor and, since 1986, Full Professor of acoustics with the Faculty of Electrical Engineering, Helsinki University of Technology, Espoo, Finland. His research activities cover speech synthesis, analysis, and recognition, auditory modeling and spatial hearing, DSP hardware, software, and programming environments, as well as various branches of acoustics, including musical acoustics and modeling of musical instruments. Dr. Karjalainen is a fellow of the AES and a member of ASA, EAA, ICMA, ESCA, and several Finnish scientific and engineering societies. He was the General Chair of the 1999 IEEE Workshop on Applications of Audio and Acoustics, New Paltz, NY.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song
, pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,
Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany
Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,
ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1
WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's
Lecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
Speech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
Onset Detection and Music Transcription for the Irish Tin Whistle
ISSC 24, Belfast, June 3 - July 2 Onset Detection and Music Transcription for the Irish Tin Whistle Mikel Gainza φ, Bob Lawlor* and Eugene Coyle φ φ Digital Media Centre Dublin Institute of Technology
Convention Paper Presented at the 118th Convention 2005 May 28 31 Barcelona, Spain
Audio Engineering Society Convention Paper Presented at the 118th Convention 25 May 28 31 Barcelona, Spain 6431 This convention paper has been reproduced from the author s advance manuscript, without editing,
MUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking
Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various
Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
ACOUSTICAL ANALYSIS OF TANBUR, A TURKISH LONG-NECKED LUTE
ACOUSTICAL ANALYSIS OF TANBUR, A TURKISH LONG-NECKED LUTE Cumhur Erkut 1,2, Tero Tolonen 2, Matti Karjalainen 2 and Vesa Välimäki 2 1 Yıldız Technical University Electronics and Telecommunication Eng.
SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY
3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 296 SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY ASHOK KUMAR SUMMARY One of the important
L9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
Estimation of Loudness by Zwicker's Method
Estimation of Loudness by Zwicker's Method Loudness is one category in the list of human perceptions of sound. There are many methods of estimating Loudness using objective measurements. No method is perfect.
Timing Errors and Jitter
Timing Errors and Jitter Background Mike Story In a sampled (digital) system, samples have to be accurate in level and time. The digital system uses the two bits of information the signal was this big
Tonal Detection in Noise: An Auditory Neuroscience Insight
Image: http://physics.ust.hk/dusw/ Tonal Detection in Noise: An Auditory Neuroscience Insight Buddhika Karunarathne 1 and Richard H.Y. So 1,2 1 Dept. of IELM, Hong Kong University of Science & Technology,
Email: [email protected]
USE OF VIRTUAL INSTRUMENTS IN RADIO AND ATMOSPHERIC EXPERIMENTS P.N. VIJAYAKUMAR, THOMAS JOHN AND S.C. GARG RADIO AND ATMOSPHERIC SCIENCE DIVISION, NATIONAL PHYSICAL LABORATORY, NEW DELHI 110012, INDIA
Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)
Page 1 Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) ECC RECOMMENDATION (06)01 Bandwidth measurements using FFT techniques
Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Measuring inharmonicity through pitch extraction Galembo, A. and Askenfelt, A. journal: STL-QPSR volume: 35 number: 1 year: 1994
The front end of the receiver performs the frequency translation, channel selection and amplification of the signal.
Many receivers must be capable of handling a very wide range of signal powers at the input while still producing the correct output. This must be done in the presence of noise and interference which occasionally
Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP
Department of Electrical and Computer Engineering Ben-Gurion University of the Negev LAB 1 - Introduction to USRP - 1-1 Introduction In this lab you will use software reconfigurable RF hardware from National
Filter Comparison. Match #1: Analog vs. Digital Filters
CHAPTER 21 Filter Comparison Decisions, decisions, decisions! With all these filters to choose from, how do you know which to use? This chapter is a head-to-head competition between filters; we'll select
AN-837 APPLICATION NOTE
APPLICATION NOTE One Technology Way P.O. Box 916 Norwood, MA 262-916, U.S.A. Tel: 781.329.47 Fax: 781.461.3113 www.analog.com DDS-Based Clock Jitter Performance vs. DAC Reconstruction Filter Performance
Analysis/resynthesis with the short time Fourier transform
Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis
Lecture 4: Jan 12, 2005
EE516 Computer Speech Processing Winter 2005 Lecture 4: Jan 12, 2005 Lecturer: Prof: J. Bilmes University of Washington Dept. of Electrical Engineering Scribe: Scott Philips
A Digital Audio Watermark Embedding Algorithm
Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, 3008, China [email protected],
Online Filtering for Radar Detection of Meteors
1, Gustavo O. Alves 1, José M. Seixas 1, Fernando Marroquim 2, Cristina S. Vianna 2, Helio Takai 3 1 Signal Processing Laboratory, COPPE/Poli, Federal University of Rio de Janeiro, Brazil. 2 Physics Institute,
A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques
A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques Vineela Behara,Y Ramesh Department of Computer Science and Engineering Aditya institute of Technology and
ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING
ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING Dennis P. Driscoll, P.E. and David C. Byrne, CCC-A Associates in Acoustics, Inc. Evergreen, Colorado Telephone (303)
Aircraft cabin noise synthesis for noise subjective analysis
Aircraft cabin noise synthesis for noise subjective analysis Bruno Arantes Caldeira da Silva Instituto Tecnológico de Aeronáutica São José dos Campos - SP [email protected] Cristiane Aparecida Martins
Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids
Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain
Frequency Response of Filters
School of Engineering Department of Electrical and Computer Engineering 332:224 Principles of Electrical Engineering II Laboratory Experiment 2 Frequency Response of Filters 1 Introduction Objectives To
The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT
The Effect of Network Cabling on Bit Error Rate Performance By Paul Kish NORDX/CDT Table of Contents Introduction... 2 Probability of Causing Errors... 3 Noise Sources Contributing to Errors... 4 Bit Error
Introduction to Receivers
Introduction to Receivers Purpose: translate RF signals to baseband Shift frequency Amplify Filter Demodulate Why is this a challenge? Interference (selectivity, images and distortion) Large dynamic range
B3. Short Time Fourier Transform (STFT)
B3. Short Time Fourier Transform (STFT) Objectives: Understand the concept of a time varying frequency spectrum and the spectrogram Understand the effect of different windows on the spectrogram; Understand
MATRIX TECHNICAL NOTES
200 WOOD AVENUE, MIDDLESEX, NJ 08846 PHONE (732) 469-9510 FAX (732) 469-0418 MATRIX TECHNICAL NOTES MTN-107 TEST SETUP FOR THE MEASUREMENT OF X-MOD, CTB, AND CSO USING A MEAN SQUARE CIRCUIT AS A DETECTOR
Figure1. Acoustic feedback in packet based video conferencing system
Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents
Practical Design of Filter Banks for Automatic Music Transcription
Practical Design of Filter Banks for Automatic Music Transcription Filipe C. da C. B. Diniz, Luiz W. P. Biscainho, and Sergio L. Netto Federal University of Rio de Janeiro PEE-COPPE & DEL-Poli, POBox 6854,
T = 1 f. Phase. Measure of relative position in time within a single period of a signal For a periodic signal f(t), phase is fractional part t p
Data Transmission Concepts and terminology Transmission terminology Transmission from transmitter to receiver goes over some transmission medium using electromagnetic waves Guided media. Waves are guided
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
Optimizing IP3 and ACPR Measurements
Optimizing IP3 and ACPR Measurements Table of Contents 1. Overview... 2 2. Theory of Intermodulation Distortion... 2 3. Optimizing IP3 Measurements... 4 4. Theory of Adjacent Channel Power Ratio... 9 5.
Short-time FFT, Multi-taper analysis & Filtering in SPM12
Short-time FFT, Multi-taper analysis & Filtering in SPM12 Computational Psychiatry Seminar, FS 2015 Daniel Renz, Translational Neuromodeling Unit, ETHZ & UZH 20.03.2015 Overview Refresher Short-time Fourier
Matlab GUI for WFB spectral analysis
Matlab GUI for WFB spectral analysis Jan Nováček Department of Radio Engineering K13137, CTU FEE Prague Abstract In the case of the sound signals analysis we usually use logarithmic scale on the frequency
Lecture 1-6: Noise and Filters
Lecture 1-6: Noise and Filters Overview 1. Periodic and Aperiodic Signals Review: by periodic signals, we mean signals that have a waveform shape that repeats. The time taken for the waveform to repeat
THE problem of tracking multiple moving targets arises
728 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 4, MAY 2008 Binaural Tracking of Multiple Moving Sources Nicoleta Roman and DeLiang Wang, Fellow, IEEE Abstract This paper
Loudspeaker Equalization with Post-Processing
EURASIP Journal on Applied Signal Processing 2002:11, 1296 1300 c 2002 Hindawi Publishing Corporation Loudspeaker Equalization with Post-Processing Wee Ser Email: ewser@ntuedusg Peng Wang Email: ewangp@ntuedusg
Solutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
Keysight Technologies 8 Hints for Better Spectrum Analysis. Application Note
Keysight Technologies 8 Hints for Better Spectrum Analysis Application Note The Spectrum Analyzer The spectrum analyzer, like an oscilloscope, is a basic tool used for observing signals. Where the oscilloscope
Impedance 50 (75 connectors via adapters)
VECTOR NETWORK ANALYZER PLANAR TR1300/1 DATA SHEET Frequency range: 300 khz to 1.3 GHz Measured parameters: S11, S21 Dynamic range of transmission measurement magnitude: 130 db Measurement time per point:
Maximizing Receiver Dynamic Range for Spectrum Monitoring
Home Maximizing Receiver Dynamic Range for Spectrum Monitoring Brian Avenell, National Instruments Corp., Austin, TX October 15, 2012 As consumers continue to demand more data wirelessly through mobile
RECOMMENDATION ITU-R BS.644-1 *,** Audio quality parameters for the performance of a high-quality sound-programme transmission chain
Rec. ITU-R BS.644-1 1 RECOMMENDATION ITU-R BS.644-1 *,** Audio quality parameters for the performance of a high-quality sound-programme transmission chain (1986-1990) The ITU Radiocommunication Assembly,
Harmonics and Noise in Photovoltaic (PV) Inverter and the Mitigation Strategies
Soonwook Hong, Ph. D. Michael Zuercher Martinson Harmonics and Noise in Photovoltaic (PV) Inverter and the Mitigation Strategies 1. Introduction PV inverters use semiconductor devices to transform the
Audio Coding Algorithm for One-Segment Broadcasting
Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient
Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles
Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles Sound is an energy wave with frequency and amplitude. Frequency maps the axis of time, and amplitude
Author: Dr. Society of Electrophysio. Reference: Electrodes. should include: electrode shape size use. direction.
Standards for Reportin ng EMG Data Author: Dr. Roberto Merletti, Politecnico di Torino, Italy The Standards for Reporting EMG Data, written by Dr. Robertoo Merletti, are endorsed by the International Society
FFT Spectrum Analyzers
FFT Spectrum Analyzers SR760 and SR770 100 khz single-channel FFT spectrum analyzers SR760 & SR770 FFT Spectrum Analyzers DC to 100 khz bandwidth 90 db dynamic range Low-distortion source (SR770) Harmonic,
White Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN
PESQ: An Introduction White Paper Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: [email protected] September
EXPERIMENTAL VALIDATION OF EQUALIZING FILTERS FOR CAR COCKPITS DESIGNED WITH WARPING TECHNIQUES
FOR CAR COCKPITS DESIGNED WITH WARPING TECHNIQUES Alberto Bellini 1, Gianfranco Cibelli 2, Emanuele Ugolotti 2, Filippo Bruschi 2, Angelo Farina 3 1 Dipartimento di Ingegneria dell Informazione, University
MICROPHONE SPECIFICATIONS EXPLAINED
Application Note AN-1112 MICROPHONE SPECIFICATIONS EXPLAINED INTRODUCTION A MEMS microphone IC is unique among InvenSense, Inc., products in that its input is an acoustic pressure wave. For this reason,
Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction
Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya ([email protected]) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper
RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA
RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA ABSTRACT Random vibration is becoming increasingly recognized as the most realistic method of simulating the dynamic environment of military
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium [email protected]
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
Sound Pressure Measurement
Objectives: Sound Pressure Measurement 1. Become familiar with hardware and techniques to measure sound pressure 2. Measure the sound level of various sizes of fan modules 3. Calculate the signal-to-noise
Ericsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
Non-Data Aided Carrier Offset Compensation for SDR Implementation
Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center
The Fourier Analysis Tool in Microsoft Excel
The Fourier Analysis Tool in Microsoft Excel Douglas A. Kerr Issue March 4, 2009 ABSTRACT AD ITRODUCTIO The spreadsheet application Microsoft Excel includes a tool that will calculate the discrete Fourier
FAST MIR IN A SPARSE TRANSFORM DOMAIN
ISMIR 28 Session 4c Automatic Music Analysis and Transcription FAST MIR IN A SPARSE TRANSFORM DOMAIN Emmanuel Ravelli Université Paris 6 [email protected] Gaël Richard TELECOM ParisTech [email protected]
Measurement of Adjacent Channel Leakage Power on 3GPP W-CDMA Signals with the FSP
Products: Spectrum Analyzer FSP Measurement of Adjacent Channel Leakage Power on 3GPP W-CDMA Signals with the FSP This application note explains the concept of Adjacent Channel Leakage Ratio (ACLR) measurement
Propagation Channel Emulator ECP_V3
Navigation simulators Propagation Channel Emulator ECP_V3 1 Product Description The ECP (Propagation Channel Emulator V3) synthesizes the principal phenomena of propagation occurring on RF signal links
Experimental validation of loudspeaker equalization inside car cockpits
Experimental validation of loudspeaker equalization inside car cockpits G. Cibelli, A. Bellini, E. Ugolotti, A. Farina, C. Morandi ASK Industries S.p.A. Via F.lli Cervi, 79, I-421 Reggio Emilia - ITALY
Ultra Wideband Signal Impact on IEEE802.11b Network Performance
Ultra Wideband Signal Impact on IEEE802.11b Network Performance Matti Hämäläinen 1, Jani Saloranta 1, Juha-Pekka Mäkelä 1, Tero Patana 2, Ian Oppermann 1 1 Centre for Wireless Communications (CWC), University
DeNoiser Plug-In. for USER S MANUAL
DeNoiser Plug-In for USER S MANUAL 2001 Algorithmix All rights reserved Algorithmix DeNoiser User s Manual MT Version 1.1 7/2001 De-NOISER MANUAL CONTENTS INTRODUCTION TO NOISE REMOVAL...2 Encode/Decode
CONDUCTED EMISSION MEASUREMENT OF A CELL PHONE PROCESSOR MODULE
Progress In Electromagnetics esearch C, Vol. 42, 191 203, 2013 CONDUCTED EMISSION MEASUEMENT OF A CELL PHONE POCESSO MODULE Fayu Wan *, Junxiang Ge, and Mengxiang Qu Nanjing University of Information Science
RF Measurements Using a Modular Digitizer
RF Measurements Using a Modular Digitizer Modern modular digitizers, like the Spectrum M4i series PCIe digitizers, offer greater bandwidth and higher resolution at any given bandwidth than ever before.
What Audio Engineers Should Know About Human Sound Perception. Part 2. Binaural Effects and Spatial Hearing
What Audio Engineers Should Know About Human Sound Perception Part 2. Binaural Effects and Spatial Hearing AES 112 th Convention, Munich AES 113 th Convention, Los Angeles Durand R. Begault Human Factors
Advanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
GSM/EDGE Output RF Spectrum on the V93000 Joe Kelly and Max Seminario, Verigy
GSM/EDGE Output RF Spectrum on the V93000 Joe Kelly and Max Seminario, Verigy Introduction A key transmitter measurement for GSM and EDGE is the Output RF Spectrum, or ORFS. The basis of this measurement
Thirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
Agilent AN 1316 Optimizing Spectrum Analyzer Amplitude Accuracy
Agilent AN 1316 Optimizing Spectrum Analyzer Amplitude Accuracy Application Note RF & Microwave Spectrum Analyzers Table of Contents 3 3 4 4 5 7 8 8 13 13 14 16 16 Introduction Absolute versus relative
A Microphone Array for Hearing Aids
A Microphone Array for Hearing Aids by Bernard Widrow 1531-636X/06/$10.00 2001IEEE 0.00 26 Abstract A directional acoustic receiving system is constructed in the form of a necklace including an array of
Agilent AN 1315 Optimizing RF and Microwave Spectrum Analyzer Dynamic Range. Application Note
Agilent AN 1315 Optimizing RF and Microwave Spectrum Analyzer Dynamic Range Application Note Table of Contents 3 3 3 4 4 4 5 6 7 7 7 7 9 10 10 11 11 12 12 13 13 14 15 1. Introduction What is dynamic range?
Jeff Thomas Tom Holmes Terri Hightower. Learn RF Spectrum Analysis Basics
Jeff Thomas Tom Holmes Terri Hightower Learn RF Spectrum Analysis Basics Learning Objectives Name the major measurement strengths of a swept-tuned spectrum analyzer Explain the importance of frequency
Voltage. Oscillator. Voltage. Oscillator
fpa 147 Week 6 Synthesis Basics In the early 1960s, inventors & entrepreneurs (Robert Moog, Don Buchla, Harold Bode, etc.) began assembling various modules into a single chassis, coupled with a user interface
FREQUENCY RESPONSE ANALYZERS
FREQUENCY RESPONSE ANALYZERS Dynamic Response Analyzers Servo analyzers When you need to stabilize feedback loops to measure hardware characteristics to measure system response BAFCO, INC. 717 Mearns Road
FUNDAMENTALS OF MODERN SPECTRAL ANALYSIS. Matthew T. Hunter, Ph.D.
FUNDAMENTALS OF MODERN SPECTRAL ANALYSIS Matthew T. Hunter, Ph.D. AGENDA Introduction Spectrum Analyzer Architecture Dynamic Range Instantaneous Bandwidth The Importance of Image Rejection and Anti-Aliasing
NRZ Bandwidth - HF Cutoff vs. SNR
Application Note: HFAN-09.0. Rev.2; 04/08 NRZ Bandwidth - HF Cutoff vs. SNR Functional Diagrams Pin Configurations appear at end of data sheet. Functional Diagrams continued at end of data sheet. UCSP
Spectrum Level and Band Level
Spectrum Level and Band Level ntensity, ntensity Level, and ntensity Spectrum Level As a review, earlier we talked about the intensity of a sound wave. We related the intensity of a sound wave to the acoustic
Adding Sinusoids of the Same Frequency. Additive Synthesis. Spectrum. Music 270a: Modulation
Adding Sinusoids of the Same Frequency Music 7a: Modulation Tamara Smyth, [email protected] Department of Music, University of California, San Diego (UCSD) February 9, 5 Recall, that adding sinusoids of
Separation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,
AN1200.04. Application Note: FCC Regulations for ISM Band Devices: 902-928 MHz. FCC Regulations for ISM Band Devices: 902-928 MHz
AN1200.04 Application Note: FCC Regulations for ISM Band Devices: Copyright Semtech 2006 1 of 15 www.semtech.com 1 Table of Contents 1 Table of Contents...2 1.1 Index of Figures...2 1.2 Index of Tables...2
Taking the Mystery out of the Infamous Formula, "SNR = 6.02N + 1.76dB," and Why You Should Care. by Walt Kester
ITRODUCTIO Taking the Mystery out of the Infamous Formula, "SR = 6.0 + 1.76dB," and Why You Should Care by Walt Kester MT-001 TUTORIAL You don't have to deal with ADCs or DACs for long before running across
Understanding Mixers Terms Defined, and Measuring Performance
Understanding Mixers Terms Defined, and Measuring Performance Mixer Terms Defined Statistical Processing Applied to Mixers Today's stringent demands for precise electronic systems place a heavy burden
Doppler Effect Plug-in in Music Production and Engineering
, pp.287-292 http://dx.doi.org/10.14257/ijmue.2014.9.8.26 Doppler Effect Plug-in in Music Production and Engineering Yoemun Yun Department of Applied Music, Chungwoon University San 29, Namjang-ri, Hongseong,
CLIO 8 CLIO 8 CLIO 8 CLIO 8
CLIO 8, by Audiomatica, is the new measurement software for the CLIO System. The CLIO System is the easiest and less expensive way to measure: - electrical networks - electronic equipment - loudspeaker
