MUSIC is to a great extent an event-based phenomenon for

Size: px
Start display at page:

Download "MUSIC is to a great extent an event-based phenomenon for"

Transcription

1 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER A Tutorial on Onset Detection in Music Signals Juan Pablo Bello, Laurent Daudet, Samer Abdallah, Chris Duxbury, Mike Davies, and Mark B. Sandler, Senior Member, IEEE Abstract Note onset detection and localization is useful in a number of analysis and indexing techniques for musical signals. The usual way to detect onsets is to look for transient regions in the signal, a notion that leads to many definitions: a sudden burst of energy, a change in the short-time spectrum of the signal or in the statistical properties, etc. The goal of this paper is to review, categorize, and compare some of the most commonly used techniques for onset detection, and to present possible enhancements. We discuss methods based on the use of explicitly predefined signal features: the signal s amplitude envelope, spectral magnitudes and phases, time-frequency representations; and methods based on probabilistic signal models: model-based change point detection, surprise signals, etc. Using a choice of test cases, we provide some guidelines for choosing the appropriate method for a given application. Index Terms Attack transcients, audio, note segmentation, novelty detection. I. INTRODUCTION A. Background and Motivation MUSIC is to a great extent an event-based phenomenon for both performer and listener. We nod our heads or tap our feet to the rhythm of a piece; the performer s attention is focused on each successive note. Even in non note-based music, there are transitions as musical timbre and tone color evolve. Without change, there can be no musical meaning. The automatic detection of events in audio signals gives new possibilities in a number of music applications including content delivery, compression, indexing and retrieval. Accurate retrieval depends on the use of appropriate features to compare and identify pieces of music. Given the importance of musical events, it is clear that identifying and characterizing these events is an important aspect of this process. Equally, as compression standards advance and the drive for improving quality at low bit-rates continues, so does accurate event detection become a basic requirement: disjoint audio segments with homogeneous statistical properties, delimited by transitions or events, can be compressed more successfully in isolation than they can Manuscript received August 6, 2003; revised July 21, The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Gerald Schuller. J. P. Bello, S. Abdallah, M. Davies, and M. B. Sandler are with the Centre for Digital Music, Department of Electronic Engineering, Queen Mary, University of London, London E1 4NS, U.K. ( juan.bello-correa@elec.qmul.ac.uk; samer.abdallah@elec.qmul.ac.uk; mike.davies@elec.qmul.ac.uk; mark.sandler@elec.qmul.ac.uk). L. Daudet is with the Laboratoire d Acoustique Musicale, Université Pierre et Marie Curie (Paris 6), Paris, France ( daudet@lam.jussieu.fr). C. Duxbury is with the Centre for Digital Music, Department of Electronic Engineering, Queen Mary, University of London, London E1 4NS, U.K., and also with WaveCrest Communications Ltd. ( christopher.duxbury@elec.qmul.ac.uk). Digital Object Identifier /TSA Fig. 1. note. Attack, transient, decay, and onset in the ideal case of a single in combination with dissimilar regions. Finally, accurate segmentation allows a large number of standard audio editing algorithms and effects (e.g., time-stretching, pitch-shifting) to be more signal-adaptive. There have been many different approaches for onset detection. The goal of this paper is to give an overview of the most commonly used techniques, with a special emphasis on the ones that have been employed in the authors different applications. For the sake of coherence, the discussion will be focused on the more specific problem of note onset detection in musical signals, although we believe that the discussed methods can be useful for various different tasks (e.g., transient modeling or localization) and different classes of signals (e.g., environmental sounds, speech). B. Definitions: Transients vs. Onsets vs. Attacks A central issue here is to make a clear distinction between the related concepts of transients, onsets and attacks. The reason for making these distinctions clear is that different applications have different needs. The similarities and differences between these key concepts are important to consider; it is similarly important to categorize all related approaches. Fig. 1 shows, in the simple case of an isolated note, how one could differentiate these notions. The attack of the note is the time interval during which the amplitude envelope increases /$ IEEE

2 1036 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 The concept of transient is more difficult to describe precisely. As a preliminary informal definition, transients are short intervals during which the signal evolves quickly in some nontrivial or relatively unpredictable way. In the case of acoustic instruments, the transient often corresponds to the period during which the excitation (e.g., a hammer strike) is applied and then damped, leaving only the slow decay at the resonance frequencies of the body. Central to this time duration problem is the issue of the useful time resolution: we will assume that the human ear cannot distinguish between two transients less than 10 ms apart [1]. Note that the release or offset of a sustained sound can also be considered a transient period. The onset of the note is a single instant chosen to mark the temporally extended transient. In most cases, it will coincide with the start of the transient, or the earliest time at which the transient can be reliably detected. C. General Scheme of Onset Detection Algorithms In the more realistic case of a possibly noisy polyphonic signal, where multiple sound objects may be present at a given time, the above distinctions become less precise. It is generally not possible to detect onsets directly without first quantifying the time-varying transientness of the signal. Audio signals are both additive (musical objects in polyphonic music superimpose and not conceal each other) and oscillatory. Therefore, it is not possible to look for changes simply by differentiating the original signal in the time domain; this has to be done on an intermediate signal that reflects, in a simplified form, the local structure of the original. In this paper, we refer to such a signal as a detection function; in the literature, the term novelty function is sometimes used instead [2]. Fig. 2 illustrates the procedure employed in the majority of onset detection algorithms: from the original audio signal, which can be pre-processed to improve the performance of subsequent stages, a detection function is derived at a lower sampling rate, to which a peak-picking algorithm is applied to locate the onsets. Whereas peak-picking algorithms are well documented in the literature, the diversity of existing approaches for the construction of the detection function makes the comparison between onset detection algorithms difficult for audio engineers and researchers. D. Outline of the Paper The outline of this paper follows the flowchart in Fig. 2. In Section II, we review a number of preprocessing techniques that can be employed to enhance the performance of some of the detection methods. Section III presents a representative cross-section of algorithms for the construction of the detection function. In Section IV, we describe some basic peak-picking algorithms; this allows the comparative study of the performance of a selection of note onset detection methods given in Section V. We finish our discussion in Section VI with a review of our findings and some thoughts on the future development of these algorithms and their applications. Fig. 2. Flowchart of a standard onset detection algorithm. II. PREPROCESSING The concept of preprocessing implies the transformation of the original signal in order to accentuate or attenuate various aspects of the signal according to their relevance to the task in hand. It is an optional step that derives its relevance from the process or processes to be subsequently performed. There are a number of different treatments that can be applied to a musical signal in order to facilitate the task of onset detection. However, we will focus only on two processes that are consistently mentioned in the literature, and that appear to be of particular relevance to onset detection schemes, especially when simple reduction methods are implemented: separating the signal into multiple frequency bands, and transient/steadystate separation. A. Multiple Bands Several onset detection studies have found it useful to independently analyze information across different frequency bands. In some cases this preprocessing is needed to satisfy the needs of specific applications that require detection in individual sub-bands to complement global estimates; in others, such an approach can be justified as a way of increasing the robustness of a given onset detection method. As examples of the first scenario, two beat tracking systems make use of filter banks to analyze transients across frequencies.

3 BELLO et al.: A TUTORIAL ON ONSET DETECTION IN MUSIC SIGNALS 1037 Goto [3] slices the spectrogram into spectrum strips and recognizes onsets by detecting sudden changes in energy. These are used in a multiple-agent architecture to detect rhythmic patterns. Scheirer [4] implements a six-band filter bank, using sixth-order elliptic filters, and psychoacoustically inspired processing to produce onset trains. These are fed into comb-filter resonators in order to estimate the tempo of the signal. The second case is illustrated by models such as the perceptual onset detector introduced by Klapuri [5]. In this implementation, a filter bank divides the signal into eight nonoverlapping bands. In each band, onset times and intensities are detected and finally combined. The filter-bank model is used as an approximation to the mechanics of the human cochlea. Another example is the method proposed by Duxbury et al. [6], that uses a constant-q conjugate quadrature filter bank to separate the signal into five subbands. It goes a step further by proposing a hybrid scheme that considers energy changes in high-frequency bands and spectral changes in lower bands. By implementing a multiple-band scheme, the approach effectively avoids the constraints imposed by the use of a single reduction method, while having different time resolutions for different frequency bands. B. Transient/Steady-State Separation The process of transient/steady-state separation is usually associated with the modeling of music signals, which is beyond the scope of this paper. However, there is a fine line between modeling and detection, and indeed, some modeling schemes directed at representing transients may hold promise for onset detection. Below, we briefly describe several methods that produce modified signals (residuals, transient signals) that can be, or have been, used for the purpose of onset detection. Sinusoidal models, such as additive synthesis [7], represent an audio signal as a sum of sinusoids with slowly varying parameters. Amongst these methods, spectral modeling synthesis (SMS) [8] explicitly considers the residual 1 of the synthesis method as a Gaussian white noise filtered with a slowly varying low-order filter. Levine [9] calculates the residual between the original signal and a multiresolution SMS model. Significant increases in the energy of the residual show a mismatch between the model and the original, thus effectively marking onsets. An extension of SMS, transient modeling synthesis, is presented in [10]. Transient signals are analyzed by a sinusoidal analysis/synthesis similar to SMS on the discrete cosine transform of the residual, hence in a pseudo-temporal domain. In [11], the whole scheme, including tonal and transients extraction is generalized into a single matching pursuit formulation. An alternative approach for the segregation of sinusoids from transient/noise components is proposed by Settel and Lippe [12] and later refined by Duxbury et al. [13]. It is based on the phasevocoder principle of instantaneous frequency (see Section III- A.3) that allows the classification of individual frequency bins of a spectrogram according to the predictability of their phase components. 1 The residual signal results from the subtraction of the modeled signal from the original waveform. When sinusoidal or harmonic modeling is used, then the residual is assumed to contain most of the impulse-like, noisy components of the original signal e.g., transients. Other schemes for the separation of tonal from nontonal components make use of lapped orthogonal transforms, such as the modified discrete cosine transform (MDCT), first introduced by Princen and Bradley [14]. These algorithms, originally designed for compression [15], [16], make use of the relative sparsity of MDCT representations of most musical signals: a few large coefficients account for most of the signal s energy. Actually, since the MDCT atoms are very tone-like (they are cosine functions slowly modulated in time by a smooth window), the part of the signal represented by the large MDCT atoms, according to a given threshold, can be interpreted as the tonal part of the signal [10], [17]. Transients and noise can be obtained by removing those large MDCT atoms. III. REDUCTION In the context of onset detection, the concept of reduction refers to the process of transforming the audio signal into a highly subsampled detection function which manifests the occurrence of transients in the original signal. This is the key process in a wide class of onset detection schemes and will therefore be the focus of most of our review. We will broadly divide reduction methods in two groups: methods based on the use of explicitly predefined signal features, and methods based on probabilistic signal models. A. Reduction Based on Signal Features 1) Temporal Features: When observing the temporal evolution of simple musical signals, it is noticeable that the occurrence of an onset is usually accompanied by an increase of the signal s amplitude. Early methods of onset detection capitalized on this by using a detection function which follows the amplitude envelope of the signal [18]. Such an envelope follower can be easily constructed by rectifying and smoothing (i.e., low-pass filtering) the signal where is an -point window or smoothing kernel, centered at. This yields satisfactory results for certain applications where strong percussive transients exist against a quiet background. A variation on this is to follow the local energy, rather than the amplitude, by squaring, instead of rectifying, each sample Despite the smoothing, this reduced signal in its raw form is not usually suitable for reliable onset detection by peak picking. A further refinement, included in a number of standard onset detection algorithms, is to work with the time derivative of the energy (or rather the first difference for discrete-time signals) so that sudden rises in energy are transformed into narrow peaks in the derivative. The energy and its derivative are commonly used in combination with preprocessing, both with filter-banks [3] and transient/steady-state separation [9], [19]. (1) (2)

4 1038 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 Another refinement takes its cue from psychoacoustics: empirical evidence [20] indicates that loudness is perceived logarithmically. This means that changes in loudness are judged relative to the overall loudness, since, for a continuous time signal,. Hence, computing the first-difference of roughly simulates the ear s perception of loudness. An application of this technique to multiple bands [5] showed a significant reduction in the tendency for amplitude modulation to cause the detection of spurious onsets. 2) Spectral Features: A number of techniques have been proposed that use the spectral structure of the signal to produce more reliable detection functions. While reducing the need for preprocessing (e.g., removal of the tonal part), these methods are also successful in a number of scenarios, including onset detection in polyphonic signals with multiple instruments. Let us consider the short-time Fourier transform (STFT) of the signal where is again an -point window, and is the hop size, or time shift, between adjacent windows. In the spectral domain, energy increases linked to transients tend to appear as a broadband event. Since the energy of the signal is usually concentrated at low frequencies, changes due to transients are more noticeable at high frequencies [21]. To emphasize this, the spectrum can be weighted preferentially toward high frequencies before summing to obtain a weighted energy measure where is the frequency dependent weighting. By Parseval s theorem, if, is simply equivalent to the local energy as previously defined. Note also that a choice of would give the local energy of the derivative of the signal. Masri [22] proposes a high frequency content (HFC) function with, linearly weighting each bin s contribution in proportion to its frequency. The HFC function produces sharp peaks during attack transients and is notably successful when faced with percussive onsets, where transients are well modeled as bursts of white noise. These spectrally weighted measures are based on the instantaneous short-term spectrum of the signal, thus omitting any explicit consideration of its temporal evolution. Alternatively, a number of other approaches do consider these changes, using variations in spectral content between analysis frames in order to generate a more informative detection function. Rodet and Jaillet [21] propose a method where the frequency bands of a sequence of STFTs are analyzed independently using a piece-wise linear approximation to the magnitude profile for, where is a short temporal window, and is a fixed value. The parameters of these approximations are used to generate a set of band-wise (3) (4) detection functions, later combined to produce final onset results. Detection results are robust for high-frequencies, showing consistency with Masri s HFC approach. A more general approach based on changes in the spectrum is to formulate the detection function as a distance between successive short-term Fourier spectra, treating them as points in an -dimensional space. Depending on the metric chosen to calculate this distance, different spectral difference, or spectral flux, detection functions can be constructed: Masri [22] uses the -norm of the difference between magnitude spectra, whereas Duxbury [6] uses the -norm on the rectified difference where, i.e., zero for negative arguments. The rectification has the effect of counting only those frequencies where there is an increase in energy, and is intended to emphasize onsets rather than offsets. A related form of spectral difference is introduced by Foote [2] to obtain a measure of audio novelty. 2 A similarity matrix is calculated using the correlation between STFT feature vectors (power spectra). The matrix is then correlated with a checkerboard kernel to detect the edges between areas of high and low similarity. The resulting function shows sharp peaks at the times of these changes, and is effectively an onset detection function when kernels of small width are used. 3) Spectral Features Using Phase: All the mentioned methods have in common their use of the magnitude of the spectrum as their only source of information. However, recent approaches make also use of the phase spectra to further their analyses of the behavior of onsets. This is relevant since much of the temporal structure of a signal is encoded in the phase spectrum. Let us define the -unwrapped phase of a given STFT coefficient. For a steady state sinusoid, the phase, as well as the phase in the previous window, are used to calculate a value for the instantaneous frequency, an estimate of the actual frequency of the STFT component within this window, as [23] where is the hop size between windows and is the sampling frequency. It is expected that, for a locally stationary sinusoid, the instantaneous frequency should be approximately constant over adjacent windows. Thus, according to (6), this is equivalent to the phase increment from window to window remaining approximately constant (cf. Fig. 3) 2 The term novelty function is common to the literature in machine learning and communication theory, and is widely used for video segmentation. In the context of onset detection, our notion of the detection function can be seen also as a novelty function, in that it tries to measure the extent to which an event is unusual given a series of observations in the past. (5) (6) (7)

5 BELLO et al.: A TUTORIAL ON ONSET DETECTION IN MUSIC SIGNALS 1039 Fig. 3. Phase diagram showing instantaneous frequencies as phase derivative over adjacent frames. For a stationary sinusoid this should stay constant (dotted line). Equivalently, the phase deviation can be defined as the second difference of the phase During a transient region, the instantaneous frequency is not usually well defined, and hence will tend to be large. This is illustrated in Fig. 3. In [24], Bello proposes a method that analyzes the instantaneous distribution (in the sense of a probability distribution or histogram) of phase deviations across the frequency domain. During the steady-state part of a sound, deviations tend to zero, thus the distribution is strongly peaked around this value. During attack transients, values increase, widening and flattening the distribution. In [24], this behavior is quantified by calculating the inter-quartile range and the kurtosis of the distribution. In [25], a simpler measure of the spread of the distribution is calculated as i.e., the mean absolute phase deviation. The method, although showing some improvement for complex signals, is susceptible to phase distortion and to noise introduced by the phases of components with no significant energy. As an alternative to the sole use of magnitude or phase information, [26] introduces an approach that works with Fourier coefficients in the complex domain. The stationarity of the spectral bin is quantified by calculating the Euclidean distance between the observed and that predicted by the previous frames, (8) (9) (10) These distances are summed across the frequency-domain to generate an onset detection function (11) See [27] for an application of this technique to multiple bands. Other preprocessing, such as the removal of the tonal part, may introduce distortions to the phase information and thus adversely affect the performance of subsequent phase-based onset detection methods. 4) Time-Frequency and Time-Scale Analysis: An alternative to the analysis of the temporal envelope of the signal and of Fourier spectral coefficients, is the use of time-scale or timefrequency representations (TFR). In [28] a novelty function is calculated by measuring the dissimilarity between feature vectors corresponding to a discretized Cohen s class TFR, in this case the result of convolving the Wigner-Ville TFR of the function with a Gaussian kernel. Note that the method could be also seen as a spectral difference approach, given that by choosing an appropriate kernel, the representation becomes equivalent to the spectrogram of the signal. In [29], an approach for transient detection is described based on a simple dyadic wavelet decomposition of the residual signal. This transform, using the Haar wavelet, was chosen for its simplicity and its good time localization at small scales. The scheme takes advantage of the correlations across scales of the coefficients: large wavelet coefficients, related to transients in the signal, are not evenly spread within the dyadic plane but rather form structures. Indeed, if a given coefficient has a large amplitude, there is a high probability that the coefficients with the same time localization at smaller scales also have large amplitudes, therefore forming dyadic trees of significant coefficients. The significance of full-size branches of coefficients, from the largest to the smallest scale, can be quantified by a regularity modulus, which is a local measure of the regularity of the signal (12) where the are the wavelet coefficients, is the full branch leading to a given small-scale coefficient (i.e., the set of coefficients at larger scale and same time localization), and a free parameter used to emphasize certain scales ( is often used in practice). Since increases of are related to the existence of large, transient-like coefficients in the branch, the regularity modulus can effectively act as an onset detection function. B. Reduction Based on Probability Models Statistical methods for onset detection are based on the assumption that the signal can be described by some probability model. A system can then be constructed that makes probabilistic inferences about the likely times of abrupt changes in the signal, given the available observations. The success of this approach depends on the closeness of fit between the assumed model, i.e., the probability distribution described by the model, and the true distribution of the data, and may be quantified using likelihood measures or Bayesian model selection criteria. 1) Model-Based Change Point Detection Methods: A wellknown approach is based on the sequential probability ratio test [30]. It presupposes that the signal samples are generated

6 1040 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 from one of two statistical models, or. The log-likelihood ratio is defined as (13) where and are the probability density functions associated with the two models. The expectation of the observed log-likelihood ratio depends on which model the signal is actually following. Under model, the expectation is (14) where denotes the Kullback Leibler divergence between the model and the observed distributions. Under model, the expectation is (15) If we assume that the signal initially follows model, and switches to model at some unknown time, then the short-time average of the log-likelihood ratio will change sign. The algorithms described in [30] are concerned with detecting this change of sign. In this context, the log-likelihood ratio can be considered as a detection function, though one that produces changes in polarity, rather than localized peaks, as its detectable feature. The method can be extended to deal with cases in which the models are unknown and must be estimated from the data. The divergence algorithm [31] manages this by fitting model to a growing window, beginning at the last detected change point and extending to the current time. Model is estimated from a sliding window of fixed size, extending back from the current time. Both Jehan [32], and Thornburg and Gouyon [33] apply variants of this method, using parametric Gaussian autoregressive models for and. Jehan [32] also applies Brandt s method [34], in which a fixed length window is divided at a hypothetical change point. The two resulting segments are modeled using two separate Gaussian AR models. The model parameters and the change point are then optimized to maximize the log-likelihood ratio between the probability of having a change at and the probability of not having an onset at all. Change points are detected when this likelihood ratio surpasses a fixed threshold. 2) Approaches Based on Surprise Signals : The methods described above look for an instantaneous switch between two distinct models. An alternative is to look for surprising moments relative to a single global model. To this end, a detection function is defined as the moment-by-moment trace of the negative log-probability of the signal given its recent history, according to a global model. The approach, introduced by Abdallah and Plumbley [35], is based on the notion of an observer which becomes familiar with (i.e., builds a model of) a certain class of signals, such that it is able to make predictions about the likely evolution of the signal as it unfolds in time. Such an observer will be relatively surprised at the onset of a note because of its uncertainty about when and what type of event will occur next. However, if the observer is in fact reasonably familiar with typical events (i.e., the model is accurate), that surprise will be localized to the transient region, during which the identity of the event is becoming established. Thus, a dynamically evolving measure of surprise, or novelty, can be used as a detection function. Let us consider the signal as a multivariate random process where each vector is a frame of audio samples. At time, an observer s expectations about will be summarized by the conditional probability according to that observer s model:. When is actually observed, the observer will be surprised to a certain degree, which we will define as (16) This is closely related to the entropy rate of the random process, which is simply the expected surprise according to the true model. An alternative conditional density model can be defined for an audio signal by partitioning the frame into two segments and then expressing in terms of. A detection function can then be generated from the surprise associated with (17) both terms of which may be approximated by any suitable joint density model; for example, [35] uses two separate independent component analysis (ICA) models. In ICA, we assume that a random vector is generated by linear transformation of a random vector of independent non-gaussian components; that is,, where is an basis matrix. This model gives (18) where is obtained from using, and is the assumed or estimated probability density function of the component of. Estimates of are relatively easy to obtain [36]. Results obtained with speech and music are given in [37]. It is worth noting that some of the detection functions described in previous sections can be derived within this probabilistic framework by making specific assumptions about the observer s probability model. For example, an observer that believes the audio samples in each frame to be independent and identically distributed according to a Laplacian (double-sided exponential) distribution, such that, where is the component of, would assign, which is essentially an envelope follower [cf. (1)]. Similarly, the assumption of a multivariate Gaussian model for the would lead to a quadratic form for, of which the short-term energy [(2)] and weighted energy [(4)] measures are special cases. Finally, measures of spectral difference [like (5)] can be associated with specific conditional probability models of one short-term spectrum given the previous one, while the complex domain method [(10) and (11)], depending as it does on a Euclidean distance measure between predicted and observed complex spectra, is related to a time-varying Gaussian process model.

7 BELLO et al.: A TUTORIAL ON ONSET DETECTION IN MUSIC SIGNALS 1041 Fig. 4. Comparison of different detection functions for 5 s of a solo violin recording. From top to bottom: time-domain signal, spectrogram, high-frequency content, spectral difference, spread of the distribution of phase deviations, wavelet regularity modulus, and negative log-likelihood using an ICA model. All detection functions have been normalized to their maximum value. C. Comparison of Detection Functions All the approaches described above provide a solution to the problem of onset detection in musical signals. However, every method presents shortcomings depending both on its definition and on the nature of the signals to be analyzed. What follows is a discussion of the merits of different reduction approaches, with an emphasis on the ones that have been employed in the various applications developed by the authors. Figs. 4 6 are included to support the discussion. They correspond, respectively, to a pitched nonpercussive sound (solo violin), a pitched percussive sound (solo piano), and a complex mixture (pop music). The figures show the waveforms, spectrograms, and a number of different detection functions for comparison. The hand-labeled onsets for each signal are marked with ticks in the time-axis of the detection functions. Temporal methods are simple and computationally efficient. Their functioning depends on the existence of clearly identifiable amplitude increases in the analysis signal, which is the case Fig. 5. Comparison of different detection functions for 5 s of a solo piano recording. From toptobottom: time-domain signal, spectrogram, high-frequency content, spectral difference, spread of the distribution of phase deviations, wavelet regularity modulus, and negative log-likelihood using an ICA model. All detection functions have been normalized to their maximum value. only for highly percussive events in simple sounds. The robustness of amplitude-based onset detection decreases when facing amplitude modulations (i.e., vibrato, tremolo) or the overlapping of energy produced by simultaneous sounds. This is true even after dividing the signal into multiple bands or after extracting the transient signal. For nontrivial sounds, onset detection schemes benefit from using richer representations of the signal (e.g., a time-frequency representation). The commonly used HFC [22, eq. (4)] is an example of a spectral weighting method. It is successful at emphasizing the percussiveness of the signal [cf. Figs. 5 and 6], but less robust at detecting the onsets of low-pitched and nonpercussive events [cf. Fig. 4], where energy changes are at low frequencies and hence de-emphasized by the weighting. In some signals, even broadband onsets are susceptible to masking by continuous high-frequency content such as that due to open cymbals in a pop recording. This problem can be overcome by using temporal difference methods such as the -norm of the rectified

8 1042 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 spectral difference [[6, eq. (5)], as these can respond to changes in the distribution of spectral energy, as well as the total, in any part of the spectrum. However, the difference calculation relies solely on magnitude information, thus neglecting the detection of events without a strong energy increase: e.g., low notes, transitions between harmonically related notes or onsets played by bowed instruments (cf. Fig. 4). Phase-based methods, such as the spread of the distribution of phase deviations in (9) (see [25]), are designed to compensate for such shortcomings. They are successful at detecting low and high-frequency tonal changes regardless of their intensity. The approach suffers from variations introduced by the phases of noisy low-energy components, and from phase distortions common to complex commercial music recordings (e.g., audio effects, post-production treatments cf. Fig. 6). The wavelet regularity modulus [29] in (12), is an example of an approach using an alternative time-scale representation that can be used to precisely localize events down to a theoretical resolution of as little as two samples of the original signal, which for typical audio sampling rates is considerably better than the ear s resolution in time. The price of this is a much less smooth detection function (cf. all figures), therefore emphasizing the need for post-processing to remove spurious peaks. The method provides an interesting alternative to other feature-based methods, but with an increase in algorithmic complexity. Approaches based on probabilistic models provide a more general theoretical view of the analysis of onsets. As shown in Section III-B.2, previous reduction methods can be explained within the context of measuring surprise relative to a probabilistic model, while new methods can be proposed and evaluated by studying refinements or alternatives to existing models. An example is the surprise-based method using ICA to model the conditional probability of a short segment of the signal, calculated in (17) as the difference between two negative log-likelihoods [35]. If the model is adequate (i.e., the assumptions behind the model are accurate and the parameters well-fitted), then robust detection functions for a wide range of signals can be produced. Examples are at the bottom of Figs However, for adaptive statistical models such as ICA, these advantages accrue only after a potentially expensive and time-consuming training process during which the parameters of the model are fitted to a given training set. Fig. 6. Comparison of different detection functions for 5 s of a pop song. From top to bottom: time-domain signal, spectrogram, high-frequency content, spectral difference, spread of the distribution of phase deviations, wavelet regularity modulus, and negative log-likelihood using an ICA model. All detection functions have been normalized to their maximum value. IV. PEAK-PICKING If the detection function has been suitably designed, then onsets or other abrupt events will give rise to well-localized identifiable features in the detection function. Commonly, these features are local maxima (i.e., peaks), generally subject to some level of variability in size and shape, and masked by noise, either due to actual noise in the signal, or other aspects of the signal not specifically to do with onsets, such as vibrato. Therefore a robust peak-picking algorithm is needed to estimate the onset times of events within the analysis signal. 3 We will divide the process of peak-picking a detection function in three steps: post-processing, thresholding, and a final decision process. A. Post-Processing Like preprocessing, post-processing is an optional step that depends on the reduction method used to generate the detection function. The purpose of post-processing is to facilitate the tasks of thresholding and peak-picking by increasing the uniformity and consistency of event-related features in the detection function, ideally transforming them into isolated, easily detectable local maxima. Into this category fall processes intended to reduce the effects of noise (e.g., smoothing) and processes needed for the successful selection of thresholding parameters for a wide range of signals (e.g., normalization and DC removal). B. Thresholding For each type of detection function, and even after post-processing, there will be a number of peaks which are not related to 3 It is worth noting that identifiable features are not necessarily peaks, they could be steep rising edges or some other characteristic shape. An algorithm able to identify characteristic shapes in detection functions is presented in [38].

9 BELLO et al.: A TUTORIAL ON ONSET DETECTION IN MUSIC SIGNALS 1043 onsets. Hence, it is necessary to define a threshold which effectively separates event-related and nonevent-related peaks. There are two main approaches to defining this threshold: fixed thresholding and adaptive thresholding. Fixed thresholding methods define onsets as peaks where the detection function exceeds the threshold:, where is a positive constant and is the detection function. Although this approach can be successful with signals with little dynamics, music generally exhibits significant loudness changes over the course of a piece. In such situations, a fixed threshold will tend to miss onsets in the most quiet passages, while over-detecting during the loud ones. For this reason, some adaptation of the threshold is usually required. Generally, an adaptive threshold is computed as a smoothed version of the detection function. This smoothing can be linear, for instance using a low-pass FIR-filter (19) with. Alternatively, this smoothing can be nonlinear, using for instance the square of the detection function (20) where is a positive constant and is a (smooth) window. However, a threshold computed in this way can exhibit very large fluctuations when there are large peaks in the detection function, tending to mask smaller adjacent peaks. Methods based on percentiles (such as the local median) are less affected by such outliers (21) C. Peak-Picking After post-processing and thresholding the detection function, peak-picking is reduced to identifying local maxima above the defined threshold. For a review of a number of peak-picking algorithms for audio signals, see [39]. For our experiments the detection functions were first normalized by subtracting the mean and dividing by the maximum absolute deviation, and then low-pass filtered. An adaptive threshold, calculated using a moving-median filter [(21)], was then subtracted from the normalized detection function. Finally, every local maximum above zero was counted as an onset. Both the filter and the thresholding parameters (cutoff frequency,, and ) were hand-tuned based on experimenting, thus resulting in a separate parameter set for each detection function. Values for the cutoff frequency are selected according to the inherent characteristics of each detection method, as discussed in Section III-C; is set to the longest time interval on which the global dynamics are not expected to evolve (around 100 ms); while is set to 1, as it is not critical for the detection. However, experiments show sensitivity to variations of, such that error rates can be minimized by changing it between different types of music signals (e.g., pitched percussive, nonpercussive, etc). The signal dependency of the onset detection process is further discussed in Section V-C. A. About the Experiments V. RESULTS This section presents experimental results comparing some of the onset detection approaches described in Section III-C: the high frequency content, the spectral difference, the spread of the distribution of phase deviations, the wavelet regularity modulus and the negative log-likelihood of the signal according to a conditional ICA model. Peak-picking was accomplished using the moving-median adaptive threshold method described in Section IV. The experiments were performed on a database of commercial and noncommercial recordings covering a variety of musical styles and instrumentations. All signals were processed as monaural signals sampled at 44.1 khz. The recordings are broadly divided into four groups according to the characteristics of their onsets: pitched nonpercussive (e.g., bowed strings), pitched percussive (e.g., piano), nonpitched percussive (e.g., drums) and complex mixtures (e.g., pop music). The number of onsets per category is given in Table I; there are 1065 onsets in total. Onset labeling was done mostly by hand, which is a lengthy and inaccurate process, especially for complex recordings such as pop music: typically including voice, multiple instruments and post-production effects. A small subsection of the database corresponds to acoustic recordings of MIDI-generated piano music which removes the error introduced by hand-labeling. Correct matches imply that target and detected onsets are within a 50-ms window. This relatively large window is to allow for the inaccuracy of the hand labeling process. B. Discussion: Comparison of Performance Fig. 7 depicts a graphical comparison of the performance of the different detection functions described in this paper. For each method, it displays the relationship between the percentage of true positives (i.e., correct onset detections relative to the total number of existing onsets) and percentage of false positives (i.e., erroneous detections relative to the number of detected onsets). All peak-picking parameters (e.g., filter s cutoff frequency, ) were held constant, except for the threshold which was varied to trace out the performance curve. Better performance is indicated by a shift of the curve upwards and to the left. The optimal point on a particular curve can be defined as the closest point to the top-left corner of the axes, where the error is at its minimum. By reading the different optimal points we can retrieve the best set of results for each onset detection method. For the complete database, the negative log-likelihood (90.6%, 4.7%) performs the best, followed by the HFC (90%, 7%), spectral difference (83.0%, 4.1%), phase deviation (81.8%, 5.6%), and the wavelet regularity modulus (79.9%, 8.3%). However, optimal points are just part of the story. The shape of each curve is also important to analyze, as it contains useful information about the properties of each method that may be

10 1044 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 Fig. 7. Comparison of onset detection algorithms: spectral difference, phase deviation, wavelet regularity modulus, negative log-likelihood, and high-frequency content. relevant to a number of applications. For example, certain applications (e.g., tempo estimation) may require high confidence in the events actually detected even at the expense of under-detecting, while other applications (e.g., time-stretching) require a maximum percentage of detected onsets, regardless of an increase in false detections. In this context, the negative log-likelihood shows appeal for a number of applications by remaining close to the top-left corner of the axes ([100% TP, 0% FP] point). The method successfully characterizes all types of onsets while producing little unrelated noise. The HFC is able to retrieve a large proportion of the existing onsets for relatively few false positives, reaching 95% true positives for 10% false positives. However, there is a drop in the number of correctly detected onsets as the rate of false positives is brought below 5%. This is similar to the performance of the wavelet regularity modulus, although the corresponding performance curve rises more slowly as the percentage of false positives increases. Both measures generate sharply defined peaks in their detection functions, and are therefore well-suited for precise time-localization of onsets. This also means that both methods are susceptible to producing identifiable peaks even when no onsets are present. On the other hand, methods that take information from a number of temporal frames into consideration (e.g., spectral difference, phase deviation) present a smoother detection function profile, minimizing the amount of spurious detections. The cost of this is a reduced ability to resolve all onsets as distinct detectable features. This is reflected in a performance curve that manages relatively high correct onset detection rates for low numbers of false positives, while obtaining comparatively fewer good detections for high rates of false positives (more than 25%). These methods are also less precise in their time localization. C. Discussion: Dependency on the Type of Onsets The above analysis emphasizes the dependency of the results on the characteristics of each tested method. In Table I, results are categorized according to the different types of onsets in the database. The idea is to illustrate the dependency of the results on the type of analysis signals. The results in the table correspond to the methods optimal points for each subset of the database. The selection of a particular method depends on the type and the quality of the input signal. For example, the phase deviation performs successfully for pitched sounds (both percussive and nonpercussive) where tonal information is key to the detection of onsets, while returning poor results for purely percussive sounds and complex mixtures (where it is affected by phase distortions and the artifacts introduced by speech utterances). On the other hand, the HFC performs better for highly percussive sounds and complex mixtures (with drums) than for music with softer onsets. The spectral difference sits in the middle, slightly below phase deviation for pitched sounds and just under-performing HFC for more percussive and complex sounds. The wavelet regularity modulus performance is at its best when dealing with simple percussive sounds, otherwise performing poorly with respect to the other methods. Notably, the negative log-likelihood performs relatively well for almost all types of music. This shows the method s effectiveness when fitted with an appropriate model. These results, while depicting a general trend in the behavior of these approaches, are not absolute. As confirmed by the results in Table I, detection results are strongly signal-dependent, and therefore the plots in Fig. 7 might have been significantly different had a different database been used. In addition, the hand-labeling of onsets is in some rare cases (e.g., in the pop signal) ambiguous and subjective. Finally, for the sake of a fair comparison between the detection functions, we opted to use

11 BELLO et al.: A TUTORIAL ON ONSET DETECTION IN MUSIC SIGNALS 1045 TABLE I ONSET DETECTION RESULTS. COLUMNS SHOW THE PERCENTAGE OF TRUE POSITIVES (TP%) AND PERCENTAGE OF FALSE POSITIVES (FP%) FOR EACH METHOD a common post-processing and peak-picking technique. However, performance can be improved for each detection function by fine tuning the peak-picking algorithm for specific tasks. VI. CONCLUSIONS In this paper, we have described and compared a variety of commonly used techniques and emerging methods for the detection of note onsets in audio signals. Given the scope of the paper, we have not mentioned methods that are not explicitly devised for this task but that may nevertheless hold some relevance (e.g., matching pursuits and time-frequency adaptive tiling). Direct comparisons of performance such as those in Section V have to be carefully considered with respect to the different requirements that a given application may have and the type of used audio signals. Generally speaking, a set of guidelines can be drawn to help find the appropriate method for a specific task. A. Guidelines for Choosing the Right Detection Function The general rule of thumb is that one should choose the method with minimal complexity that satisfies the requirements of the application. More precisely, good practice usually requires a balance of complexity between preprocessing, construction of the detection function, and peak-picking. If the signal is very percussive (e.g., drums), then timedomain methods are usually adequate. On the other hand, spectral methods such as those based on phase distributions and spectral difference perform relatively well on strongly pitched transients. The complex-domain spectral difference seems to be a good choice in general, at the cost of a slight increase in computational complexity. If very precise time localization is required, then wavelet methods can be useful, possibly in combination with another method. If a high computational load is acceptable, and a suitable training set is available, then statistical methods give the best overall results, and are less dependent on a particular choice of parameters. A more detailed description of relative merits can be found in Section III-C and Section V. B. Perspectives In this paper, we have only covered the basic principles of each large class of methods. Each one of these methods needs a precise fine-tuning, as described in the relevant papers (referenced in Section III). However, it is not expected that a single method will ever be able to perform perfectly well for all audio signals, due to the intrinsically variable nature of the beginning of sound events, especially between percussive (when transients are related to short bursts of energy) and sustained-note instruments (when transients are related to changes in the spectral content, possibly on a longer time-scale). In fact, we believe that the most promising developments for onset detection schemes lie in the combination of cues from different detection functions [6], [26], which is most likely the way human perception works [40]. More generally, there is a need for the development of analysis tools specifically designed for strongly nonstationary signals, which are now recognized to play an important part in the perceived timbre of most musical instruments [41]. ACKNOWLEDGMENT The authors wish to thank G. Monti and M. Plumbley for fruitful discussions and help; the S2M team at the Laboratoire de Mécanique et d Acoustique, Marseille, France, for kindly letting us use their Yamaha Disklavier; and the two anonymous reviewers for their great help in improving the structure of this article. REFERENCES [1] B. C. J. Moore, An Introduction to the Psychology of Hearing, 5th ed. New York: Academic, [2] J. Foote, Automatic audio segmentation using a measure of audio novelty, in Proc. IEEE Int. Conf. Multimedia and Expo (ICME2000), vol. I, New York, Jul. 2000, pp [3] M. Goto and Y. Muraoka, Beat tracking based on multiple-agent architecture a real-time beat tracking system for audio signals, in Proc. 2nd Int. Conf. Multiagent Systems, Dec. 1996, pp [4] E. D. Scheirer, Tempo and beat analysis of acoustic musical signals, J. Acoust. Soc. Amer., vol. 103, no. 1, pp , Jan [5] A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP-99), Phoenix, AZ, 1999, pp

12 1046 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 [6] C. Duxbury, M. Sandler, and M. Davies, A hybrid approach to musical note onset detection, in Proc. Digital Audio Effects Conf. (DAFX, 02), Hamburg, Germany, 2002, pp [7] R. J. McAulay and T. F. Quatieri, Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, pp , [8] X. Serra and J. O. Smith, Spectral modeling synthesis: a sound analysis/synthesis system based on a deterministic plus stochastic decomposition, Comput. Music J., vol. 14, no. 4, pp , winter [9] S. Levine, Audio Representations for Data Compression and Compressed Domain Processing, Ph.D. dissertation, Stanford Univ., Stanford, CA, [10] T. Verma, S. Levine, and T. Meng, Transient modeling synthesis: A flexible analysis/synthesis tool for transient signals, in Proc. Int. Computer Music Conf., Thessaloniki, Greece, 1997, pp [11] T. Verma and T. Meng, Sinusoidal modeling using frame-based perceptually weighted matching pursuits, in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, Phoenix, AZ, 1999, pp [12] Z. Settel and C. Lippe, Real-time musical applications using the FFT-based resynthesis, in Proc. Int. Computer Music Conf. (ICMC94), Aarhus, Denmark, 1994, pp [13] C. Duxbury, M. Davies, and M. Sandler, Extraction of transient content in musical audio using multiresolution analysis techniques, in Proc. Digital Audio Effects Conf. (DAFX 01), Limerick, Ireland, 2001, pp [14] J. Princen and A. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 5, pp , Oct [15] S. Shlien, The modulated lapped transform, its time-varying forms, and its applications to audio coding standards, IEEE Trans. Speech Audio Process., vol. 5, no. 4, pp , [16] M. Purat and P. Noll, Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms, in Proc. ICASSP, Atlanta, GA, 1996, pp [17] L. Daudet and B. Torrésani, Hybrid representations for audiophonic signal encoding, Signal Process., vol. 82, no. 11, pp , [18] A. W. Schloss, On the Automatic Transcription of Percussive Music From Acoustic Signal to High-Level Analysis, Ph.D. dissertation, Tech. Rep. STAN-M-27, Dept. Hearing and Speech, Stanford Univ., Stanford, CA, [19] C. Duxbury, M. Davies, and M. Sandler, Improved time-scaling of musical audio using phase locking at transients, in Proc. AES 112th Conv., Munich, Germany, 2002, p [20] B. Moore, B. Glasberg, and T. Bear, A model for the prediction of thresholds, loudness and partial loudness, J. Audio Eng. Soc., vol. 45, no. 4, pp , [21] X. Rodet and F. Jaillet, Detection and modeling of fast attack transients, in Proc. Int. Computer Music Conf., Havana, Cuba, 2001, pp [22] P. Masri, Computer Modeling of Sound for Transformation and Synthesis of Musical Signal, Ph.D. dissertation, Univ. of Bristol, Bristol, U.K., [23] M. Dolson, The phase vocoder: a tutorial, Comput. Music J., vol. 10, no. 4, [24] J. P. Bello and M. Sandler, Phase-based note onset detection for music signals, in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP-03), Hong Kong, 2003, pp [25] C. Duxbury, J. P. Bello, M. Davies, and M. Sandler, A combined phase and amplitude based approach to onset detection for audio segmentation, in Proc. 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS-03), London, U.K., Apr. 2003, pp [26] J. P. Bello, C. Duxbury, M. Davies, and M. Sandler, On the use of phase and energy for musical onset detection in the complex domain, IEEE Signal Proces. Lett., vol. 11, no. 6, pp , Jun [27] C. Duxbury, Signal Models for Polyphonic Music, Ph.D. dissertation, Dept. Electron. Eng., Queen Mary, Univ. of London, London, U.K., [28] M. Davy and S. Godsill, Detection of abrupt spectral changes using support vector machines. An application to audio signal segmentation, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP-02), Orlando, FL, 2002, pp [29] L. Daudet, Transients modeling by pruned wavelet trees, in Proc. Int. Computer Music Conf. (ICMC 01), Havana, Cuba, 2001, pp [30] M. Basseville and I. V. Nikiforov, Detection of Abrupt Changes Theory and Application. Englewood Cliffs, NJ: Prentice-Hall, [31] M. Basseville and A. Benveniste, Sequential detection of abrupt changes in spectral changes of digital signals, IEEE Trans. Inform. Theory, vol. 29, pp , [32] T. Jehan, Musical Signal Parameter Estimation, M.S. thesis, Univ. of California, Berkeley, CA, [33] H. Thornburg and F. Gouyon, A flexible analysis-synthesis method for transients, in Proc. Int. Computer Music Conf. (ICMC-2000), Berlin, 2000, pp [34] A. von Brandt, Detecting and estimating parameter jumps using ladder algorithms and likelihood ratio test, in Proc. ICASSP, Boston, MA, 1983, pp [35] S. A. Abdallah and M. D. Plumbley, Probability as metadata: event detection in music using ICA as a conditional density model, in Proc. 4th Int. Symp. Independent Component Analysis and Signal Separation (ICA2003), Nara, Japan, 2003, pp [36] J.-F. Cardoso, Infomax and maximum likelihood for blind source separation, IEEE Signal Process. Lett., vol. 4, no. 4, pp , Apr [37] S. A. Abdallah and M. D. Plumbley, If edges are the independent components of natural scenes, what are the independent components of natural sounds?, in Proc. 3rd Int. Conf. Independent Component Analysis and Signal Separation (ICA2001), San Diego, CA, 2001, pp [38], Unsupervised onset detection: a probabilistic approach using ICA and a hidden Markov classifier, in Cambridge Music Processing Colloq., Cambridge, U.K., 2003, [Online] Available: [39] I. Kauppinen, Methods for detecting impulsive noise in speech and audio signals, in Proc. 14th Int. Conf. Digit. Signal Process. (DSP2002), vol. 2, Santorini, Greece, Jul. 2002, pp [40] A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press, [41] M. Castellengo, Acoustical analysis of initial transients in flute-like instruments, Acta Acustica, vol. 85, no. 3, pp , Juan Pablo Bello received the engineering degree in electronics from the Universidad Simon Bolivar, Caracas, Venezuela, in 1998 and the Ph.D. degree from Queen Mary, University of London, London, U.K., in After a brief period working in industry, he received awards from institutions in Venezuela, the U.S., and the U.K. to pursue Ph.D. studies. He is currently a postdoctoral Researcher for the Centre for Digital Music, Queen Mary, University of London. His research is mainly focused on the semantic analysis of musical signals and its applications to music information retrieval and live electronics. Laurent Daudet received the degree in statistical and nonlinear physics from the Ecole Normale Supérieure, Paris, in 1997 and the Ph.D. degree in mathematical modeling from the Université de Provence, Marseilles, France, in 2000 on audio coding and physical modeling of piano strings. In 2001 and 2002, he was a Marie Curie post-doctoral fellow at the Department of Electronic Engineering, Queen Mary, University of London, London, U.K. Since 2002, he has been a Lecturer at the Université Pierre et Marie Curie (Paris 6), where he joined the Laboratoire d Acoustique Musicale. His research interests include audio coding, time-frequency and time-scale transforms, sparse representations of audio, and music signal analysis.

13 BELLO et al.: A TUTORIAL ON ONSET DETECTION IN MUSIC SIGNALS 1047 Samer Abdallah was born in Cairo, Egypt, in He received the B.A. degree in natural sciences from Cambridge University, Cambridge, U.K., in 1994, and the M.Sc. and Ph.D. degrees from King s College London, London, U.K., in 1998 and 2003, respectively. He spent three years working in industry. He is now a postdoctoral Researcher at the Centre for Digital Music, Queen Mary, University of London. Mike Davies has worked in signal processing and nonlinear modeling for 15 years and held a Royal Society Research Fellowship at the University College of London, London, U.K., and Cambridge University, Cambridge, U.K., from 1993 to In 2001, he co-founded the DSP Research Group at Queen Mary, University of London, where he is a Reader in digital signal processing. He specializes in nonlinear and non-gaussian signal processing with particular application to audio signals. His interests include non-gaussian statistics, independent component analysis, sparse signal representations and machine learning in DSP. Mr. Davies is currently an Associate Editor for the IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. Chris Duxbury received the B.Eng. degree in computer systems from King s College London, London, U.K., in In 2005, he received the Ph.D. degree from the Centre for Digital Music, Queen Mary, University of London. He is now a developer for WaveCrest Communications Ltd. Mark B. Sandler (M 87 SM 95) was born in London, U.K., in He received the B.Sc. and Ph.D. degrees from the University of Essex, U.K., in 1978 and 1984, respectively. He is Professor of signal processing at Queen Mary, University of London, where he moved in 2001, after 19 years at King s College London. He was founder and CEO of Insonify Ltd., an Internet Audio Streaming startup for 18 months. He has published over 250 papers in journals and conferences. Dr. Sandler is a Fellow of the IEE and of the Audio Engineering Society. He is a two-time recipient of the IEE A. H. Reeves Premium Prize.

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Onset Detection and Music Transcription for the Irish Tin Whistle

Onset Detection and Music Transcription for the Irish Tin Whistle ISSC 24, Belfast, June 3 - July 2 Onset Detection and Music Transcription for the Irish Tin Whistle Mikel Gainza φ, Bob Lawlor* and Eugene Coyle φ φ Digital Media Centre Dublin Institute of Technology

More information

Automatic Transcription: An Enabling Technology for Music Analysis

Automatic Transcription: An Enabling Technology for Music Analysis Automatic Transcription: An Enabling Technology for Music Analysis Simon Dixon simon.dixon@eecs.qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University

More information

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1 WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's

More information

FAST MIR IN A SPARSE TRANSFORM DOMAIN

FAST MIR IN A SPARSE TRANSFORM DOMAIN ISMIR 28 Session 4c Automatic Music Analysis and Transcription FAST MIR IN A SPARSE TRANSFORM DOMAIN Emmanuel Ravelli Université Paris 6 ravelli@lam.jussieu.fr Gaël Richard TELECOM ParisTech gael.richard@enst.fr

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Short-time FFT, Multi-taper analysis & Filtering in SPM12 Short-time FFT, Multi-taper analysis & Filtering in SPM12 Computational Psychiatry Seminar, FS 2015 Daniel Renz, Translational Neuromodeling Unit, ETHZ & UZH 20.03.2015 Overview Refresher Short-time Fourier

More information

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY 3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 296 SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY ASHOK KUMAR SUMMARY One of the important

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

Analysis/resynthesis with the short time Fourier transform

Analysis/resynthesis with the short time Fourier transform Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION Adiel Ben-Shalom, Michael Werman School of Computer Science Hebrew University Jerusalem, Israel. {chopin,werman}@cs.huji.ac.il

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques

A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques Vineela Behara,Y Ramesh Department of Computer Science and Engineering Aditya institute of Technology and

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

B3. Short Time Fourier Transform (STFT)

B3. Short Time Fourier Transform (STFT) B3. Short Time Fourier Transform (STFT) Objectives: Understand the concept of a time varying frequency spectrum and the spectrogram Understand the effect of different windows on the spectrogram; Understand

More information

Practical Design of Filter Banks for Automatic Music Transcription

Practical Design of Filter Banks for Automatic Music Transcription Practical Design of Filter Banks for Automatic Music Transcription Filipe C. da C. B. Diniz, Luiz W. P. Biscainho, and Sergio L. Netto Federal University of Rio de Janeiro PEE-COPPE & DEL-Poli, POBox 6854,

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

WAVEFORM DICTIONARIES AS APPLIED TO THE AUSTRALIAN EXCHANGE RATE

WAVEFORM DICTIONARIES AS APPLIED TO THE AUSTRALIAN EXCHANGE RATE Sunway Academic Journal 3, 87 98 (26) WAVEFORM DICTIONARIES AS APPLIED TO THE AUSTRALIAN EXCHANGE RATE SHIRLEY WONG a RAY ANDERSON Victoria University, Footscray Park Campus, Australia ABSTRACT This paper

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Audio Coding Algorithm for One-Segment Broadcasting

Audio Coding Algorithm for One-Segment Broadcasting Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

Recent advances in Digital Music Processing and Indexing

Recent advances in Digital Music Processing and Indexing Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components

More information

Lecture 1-10: Spectrograms

Lecture 1-10: Spectrograms Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Timbre. Chapter 9. 9.1 Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli

Timbre. Chapter 9. 9.1 Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli Chapter 9 Timbre Hanna Järveläinen Giovanni De Poli 9.1 Definition of timbre modelling Giving a definition of timbre modelling is a complicated task. The meaning of the term "timbre" in itself is somewhat

More information

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Annotated bibliographies for presentations in MUMT 611, Winter 2006 Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.

More information

The accurate calibration of all detectors is crucial for the subsequent data

The accurate calibration of all detectors is crucial for the subsequent data Chapter 4 Calibration The accurate calibration of all detectors is crucial for the subsequent data analysis. The stability of the gain and offset for energy and time calibration of all detectors involved

More information

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A International Journal of Science, Engineering and Technology Research (IJSETR), Volume, Issue, January SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A N.Rama Tej Nehru, B P.Sunitha

More information

Summary Nonstationary Time Series Multitude of Representations Possibilities from Applied Computational Harmonic Analysis Tests of Stationarity

Summary Nonstationary Time Series Multitude of Representations Possibilities from Applied Computational Harmonic Analysis Tests of Stationarity Nonstationary Time Series, Priestley s Evolutionary Spectra and Wavelets Guy Nason, School of Mathematics, University of Bristol Summary Nonstationary Time Series Multitude of Representations Possibilities

More information

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) Page 1 Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) ECC RECOMMENDATION (06)01 Bandwidth measurements using FFT techniques

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Lecture 1-6: Noise and Filters

Lecture 1-6: Noise and Filters Lecture 1-6: Noise and Filters Overview 1. Periodic and Aperiodic Signals Review: by periodic signals, we mean signals that have a waveform shape that repeats. The time taken for the waveform to repeat

More information

STRATEGIES FOR CONTINUOUS PITCH AND AMPLITUDE TRACKING IN REALTIME INTERACTIVE IMPROVISATION SOFTWARE

STRATEGIES FOR CONTINUOUS PITCH AND AMPLITUDE TRACKING IN REALTIME INTERACTIVE IMPROVISATION SOFTWARE STRATEGIES FOR CONTINUOUS PITCH AND AMPLITUDE TRACKING IN REALTIME INTERACTIVE IMPROVISATION SOFTWARE Christopher Dobrian Department of Music University of California, Irvine Irvine CA 92697-2775 USA dobrian@uci.edu

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

More information

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions Rec. ITU-R SM.853-1 1 RECOMMENDATION ITU-R SM.853-1 NECESSARY BANDWIDTH (Question ITU-R 77/1) Rec. ITU-R SM.853-1 (1992-1997) The ITU Radiocommunication Assembly, considering a) that the concept of necessary

More information

TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY 4 4th International Workshop on Acoustic Signal Enhancement (IWAENC) TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY Takuya Toyoda, Nobutaka Ono,3, Shigeki Miyabe, Takeshi Yamada, Shoji Makino University

More information

AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David

AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES Valentin Emiya, Roland Badeau, Bertrand David TELECOM ParisTech (ENST, CNRS LTCI 46, rue Barrault, 75634 Paris

More information

Nonlinear Signal Analysis: Time-Frequency Perspectives

Nonlinear Signal Analysis: Time-Frequency Perspectives TECHNICAL NOTES Nonlinear Signal Analysis: Time-Frequency Perspectives T. Kijewski-Correa 1 and A. Kareem 2 Abstract: Recently, there has been growing utilization of time-frequency transformations for

More information

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique A Reliability Point and Kalman Filter-based Vehicle Tracing Technique Soo Siang Teoh and Thomas Bräunl Abstract This paper introduces a technique for tracing the movement of vehicles in consecutive video

More information

Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction

Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Measuring inharmonicity through pitch extraction Galembo, A. and Askenfelt, A. journal: STL-QPSR volume: 35 number: 1 year: 1994

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

JPEG Image Compression by Using DCT

JPEG Image Compression by Using DCT International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-4 E-ISSN: 2347-2693 JPEG Image Compression by Using DCT Sarika P. Bagal 1* and Vishal B. Raskar 2 1*

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition Send Orders for Reprints to reprints@benthamscience.ae The Open Electrical & Electronic Engineering Journal, 2014, 8, 599-604 599 Open Access A Facial Expression Recognition Algorithm Based on Local Binary

More information

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

TEMPO AND BEAT ESTIMATION OF MUSICAL SIGNALS

TEMPO AND BEAT ESTIMATION OF MUSICAL SIGNALS TEMPO AND BEAT ESTIMATION OF MUSICAL SIGNALS Miguel Alonso, Bertrand David, Gaël Richard ENST-GET, Département TSI 46, rue Barrault, Paris 75634 cedex 3, France {malonso,bedavid,grichard}@tsi.enst.fr ABSTRACT

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let) Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2π-periodic square-integrable function is generated by a superposition

More information

Matlab GUI for WFB spectral analysis

Matlab GUI for WFB spectral analysis Matlab GUI for WFB spectral analysis Jan Nováček Department of Radio Engineering K13137, CTU FEE Prague Abstract In the case of the sound signals analysis we usually use logarithmic scale on the frequency

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

An accent-based approach to performance rendering: Music theory meets music psychology

An accent-based approach to performance rendering: Music theory meets music psychology International Symposium on Performance Science ISBN 978-94-90306-02-1 The Author 2011, Published by the AEC All rights reserved An accent-based approach to performance rendering: Music theory meets music

More information

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Measuring Line Edge Roughness: Fluctuations in Uncertainty Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as

More information

Adding Sinusoids of the Same Frequency. Additive Synthesis. Spectrum. Music 270a: Modulation

Adding Sinusoids of the Same Frequency. Additive Synthesis. Spectrum. Music 270a: Modulation Adding Sinusoids of the Same Frequency Music 7a: Modulation Tamara Smyth, trsmyth@ucsd.edu Department of Music, University of California, San Diego (UCSD) February 9, 5 Recall, that adding sinusoids of

More information

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 441 Audio Content Analysis for Online Audiovisual Data Segmentation and Classification Tong Zhang, Member, IEEE, and C.-C. Jay

More information

Optimizing IP3 and ACPR Measurements

Optimizing IP3 and ACPR Measurements Optimizing IP3 and ACPR Measurements Table of Contents 1. Overview... 2 2. Theory of Intermodulation Distortion... 2 3. Optimizing IP3 Measurements... 4 4. Theory of Adjacent Channel Power Ratio... 9 5.

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Non-Data Aided Carrier Offset Compensation for SDR Implementation Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center

More information

Midwest Symposium On Circuits And Systems, 2004, v. 2, p. II137-II140. Creative Commons: Attribution 3.0 Hong Kong License

Midwest Symposium On Circuits And Systems, 2004, v. 2, p. II137-II140. Creative Commons: Attribution 3.0 Hong Kong License Title Adaptive window selection and smoothing of Lomb periodogram for time-frequency analysis of time series Author(s) Chan, SC; Zhang, Z Citation Midwest Symposium On Circuits And Systems, 2004, v. 2,

More information

Mean-Shift Tracking with Random Sampling

Mean-Shift Tracking with Random Sampling 1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of

More information

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT The Effect of Network Cabling on Bit Error Rate Performance By Paul Kish NORDX/CDT Table of Contents Introduction... 2 Probability of Causing Errors... 3 Noise Sources Contributing to Errors... 4 Bit Error

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

AN1200.04. Application Note: FCC Regulations for ISM Band Devices: 902-928 MHz. FCC Regulations for ISM Band Devices: 902-928 MHz

AN1200.04. Application Note: FCC Regulations for ISM Band Devices: 902-928 MHz. FCC Regulations for ISM Band Devices: 902-928 MHz AN1200.04 Application Note: FCC Regulations for ISM Band Devices: Copyright Semtech 2006 1 of 15 www.semtech.com 1 Table of Contents 1 Table of Contents...2 1.1 Index of Figures...2 1.2 Index of Tables...2

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Admin stuff. 4 Image Pyramids. Spatial Domain. Projects. Fourier domain 2/26/2008. Fourier as a change of basis

Admin stuff. 4 Image Pyramids. Spatial Domain. Projects. Fourier domain 2/26/2008. Fourier as a change of basis Admin stuff 4 Image Pyramids Change of office hours on Wed 4 th April Mon 3 st March 9.3.3pm (right after class) Change of time/date t of last class Currently Mon 5 th May What about Thursday 8 th May?

More information

GETTING STARTED WITH LABVIEW POINT-BY-POINT VIS

GETTING STARTED WITH LABVIEW POINT-BY-POINT VIS USER GUIDE GETTING STARTED WITH LABVIEW POINT-BY-POINT VIS Contents Using the LabVIEW Point-By-Point VI Libraries... 2 Initializing Point-By-Point VIs... 3 Frequently Asked Questions... 5 What Are the

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of

More information

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE

More information

Estimation of Loudness by Zwicker's Method

Estimation of Loudness by Zwicker's Method Estimation of Loudness by Zwicker's Method Loudness is one category in the list of human perceptions of sound. There are many methods of estimating Loudness using objective measurements. No method is perfect.

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

MiSeq: Imaging and Base Calling

MiSeq: Imaging and Base Calling MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

WHEN listening to music, humans experience the sound

WHEN listening to music, humans experience the sound 116 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 1, JANUARY 2008 Instrument-Specific Harmonic Atoms for Mid-Level Music Representation Pierre Leveau, Emmanuel Vincent, Member,

More information

Voltage. Oscillator. Voltage. Oscillator

Voltage. Oscillator. Voltage. Oscillator fpa 147 Week 6 Synthesis Basics In the early 1960s, inventors & entrepreneurs (Robert Moog, Don Buchla, Harold Bode, etc.) began assembling various modules into a single chassis, coupled with a user interface

More information

AUDIO CODING: BASICS AND STATE OF THE ART

AUDIO CODING: BASICS AND STATE OF THE ART AUDIO CODING: BASICS AND STATE OF THE ART PACS REFERENCE: 43.75.CD Brandenburg, Karlheinz Fraunhofer Institut Integrierte Schaltungen, Arbeitsgruppe Elektronische Medientechnolgie Am Helmholtzring 1 98603

More information

Functional Data Analysis of MALDI TOF Protein Spectra

Functional Data Analysis of MALDI TOF Protein Spectra Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF

More information

Comparative study of the commercial software for sound quality analysis

Comparative study of the commercial software for sound quality analysis TECHNICAL REPORT #2008 The Acoustical Society of Japan Comparative study of the commercial software for sound quality analysis Sung-Hwan Shin Department of Electrical and Mechanical Engineering, Seikei

More information

Time-frequency segmentation : statistical and local phase analysis

Time-frequency segmentation : statistical and local phase analysis Time-frequency segmentation : statistical and local phase analysis Florian DADOUCHI 1, Cornel IOANA 1, Julien HUILLERY 2, Cédric GERVAISE 1,3, Jérôme I. MARS 1 1 GIPSA-Lab, University of Grenoble 2 Ampère

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

Time series analysis as a framework for the characterization of waterborne disease outbreaks

Time series analysis as a framework for the characterization of waterborne disease outbreaks Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Harmonics and Noise in Photovoltaic (PV) Inverter and the Mitigation Strategies

Harmonics and Noise in Photovoltaic (PV) Inverter and the Mitigation Strategies Soonwook Hong, Ph. D. Michael Zuercher Martinson Harmonics and Noise in Photovoltaic (PV) Inverter and the Mitigation Strategies 1. Introduction PV inverters use semiconductor devices to transform the

More information