ISOLATED INSTRUMENT TRANSCRIPTION USING A DEEP BELIEF NETWORK

Size: px
Start display at page:

Download "ISOLATED INSTRUMENT TRANSCRIPTION USING A DEEP BELIEF NETWORK"

Transcription

1 ISOLATED INSTRUMENT TRANSCRIPTION USING A DEEP BELIEF NETWORK Gregory Burlet, Abram Hindle Department of Computing Science University of Alberta {gburlet,abram.hindle}@ualberta.ca PrePrints ABSTRACT Automatic music transcription is a difficult task that has provoked extensive research on transcription systems that are predominantly general purpose, processing any number or type of instruments sounding simultaneously. This paper presents a polyphonic transcription system that is constrained to processing the output of a single instrument with an upper bound on polyphony. For example, a guitar has six strings and is limited to producing six notes simultaneously. The transcription system consists of a novel pitch estimation algorithm that uses a deep belief network and multi-label learning techniques to generate multiple pitch estimates for each audio analysis frame, such that the polyphony does not exceed that of the instrument. The implemented transcription system is evaluated on a compiled dataset of synthesized guitar recordings. Comparing these results to a prior single-instrument polyphonic transcription system that received exceptional results, this paper demonstrates the effectiveness of deep, multi-label learning for the task of polyphonic transcription. 1. INTRODUCTION The process of automatic music transcription involves the transformation of an audio signal into a digitally encoded music score through an analysis of the frequency and rhythmic properties of the acoustical waveform. The input audio signal may consist of an aggregation of signals from several different instruments and may be monophonic or polyphonic. Though the transcription of monophonic musical passages is considered a solved problem [3], the transcription of polyphonic music falls clearly behind skilled human musicians in accuracy and flexibility [15]. In an effort to reduce the complexity, the transcription problem can be constrained by limiting the number of notes that sound simultaneously (polyphony), the genre of music being analyzed, or the number and type of instruments producing sound [2]. Imposing constraints on the domain of analyzed signals provides meaningful prerequisite knowledge to the transcription algorithm, allowing it to exploit certain properties of its input, consequently reducing the difficulty of transcription. With this in mind, the objective of this research is to improve the quality of transcriptions generated from isolated recordings of individual polyphonic instruments, such as the guitar, bass guitar, or piano. A solution to the problem of isolated instrument transcription has substantial commercial interest with applications in musical games, instrument learning software, and music cataloguing. However, these applications seem far out of grasp given that the music information retrieval (MIR) research community has collectively reached a plateau in the accuracy of automatic music transcription systems [3]. In a paper addressing this issue, Benetos et al. [3] stress the importance of extracting expressive audio features and moving towards context-specific transcription systems. Also addressing this issue, Humphrey et al. [13, 14] propose that effort should be focused on audio features generated by deep belief networks instead of hand-engineered audio features, due to the success of these methods in other fields such as computer vision [18] and speech recognition [10]. The aforementioned literature provides motivation for investigating the viability of applying deep belief networks to the problem of isolated instrument transcription. This paper presents a polyphonic transcription system containing a novel pitch estimation algorithm that addresses three arguable shortcomings in modern pattern recognition approaches to pitch estimation: first, the task of estimating multiple pitches sounding simultaneously is often approached using multiple one-versus-all binary classifiers [21, 22] in lieu of estimating the presence of multiple pitches using a single classifier; second, there exists no standard method to impose constraints on the polyphony of pitch estimates at any given time; and third, the discriminative power of latent audio feature representations, as produced by deep belief networks and autoencoders, are often overlooked in favour of more traditional features such as the short-time Fourier transform (STFT). In response to these points, the pitch estimation algorithm described in this work uses a deep belief network in conjunction with multi-label learning techniques to produce multiple pitch estimates for each audio analysis frame that conform to the polyphony constraints of the input instrument. The structure of this paper is as follows: The subsequent section reviews algorithms for multiple fundamental frequency estimation, pitch detection, and note tracking. Section 3 describes the developed pitch estimation algorithm in the context of a larger polyphonic transcription system, which uses existing algorithms for note tracking. Section 4 presents a compiled ground-truth dataset of acoustic guitar recordings synthesized from crowdsourced tablature transcriptions that are then used to evaluate the 1

2 developed polyphonic transcription system. Section 5 ends with a discussion of the strengths and weaknesses of the transcription algorithm. PrePrints 2. RELATED WORK The first polyphonic transcription system for duets imposed constraints on the frequency range and timbre of the two input instruments as well as the intervals between simultaneously performed notes [20]. This work instigated a significant amount of research on this topic, which still aims to further the accuracy of transcriptions while gradually eliminating domain constraints. In the infancy of the problem, polyphonic transcription algorithms relied heavily on digital signal processing techniques to uncover the fundamental frequencies present in an input audio waveform. To this end, several different algorithms have been proposed: perceptually motivated models that attempt to model human audition [16]; salience methods, which transform the audio signal to accentuate the underlying fundamental frequencies [17, 30]; iterative estimation methods, which iteratively select a predominant fundamental from the frequency spectrum and then subtract an estimate of its harmonics from the residual spectrum until no fundamental frequency candidates remain [17]; and joint estimation, which holistically selects fundamental frequency candidates that, together, best describe the observed frequency domain of the input audio signal [28]. The MIR research community is gradually adopting a machine-learning-centric paradigm for many MIR tasks, including polyphonic transcription. Several innovative applications of machine learning algorithms to the task of polyphonic transcription have been proposed, including hidden Markov models (HMMs) [23], non-negative matrix factorization [8,25], support vector machines [22], and artificial shallow neural networks [19]. Although each of these algorithms operate differently, the underlying principle involves the formation of a model that seeks to capture the harmonic, and perhaps temporal, structures of notes present in a set of training audio signals. The trained model then predicts the harmonic and/or temporal structures of notes present in a set of previously unseen audio signals. Training a machine learning classifier for note pitch estimation involves extracting meaningful features from the audio signal that reflect the harmonic structures of notes and allow discrimination between different pitch classes. The obvious set of features exhibiting this property is the STFT, which computes the discrete Fourier transform (DFT) on a sliding analysis window over the audio signal. However, somewhat recent advances in the field of deep learning have revealed that neural networks with many layers of neurons can be efficiently trained [12] and form a hierarchical, latent representation of the input features [18]. Using a deep belief network (DBN) to learn alternate feature representations of DFT audio features, Nam et al. [21] exported these audio features and injected them into 88 binary support vector machine classifiers: one for each possible piano pitch. Each classifier outputs a binary class label denoting whether the pitch is present in a given au- Figure 1. Workflow of the proposed polyphonic transcription algorithm, which converts the recording of a single instrument to a sequence of MIDI note events. dio analysis frame. Using the same experimental set up as Poliner and Ellis. [22], Nam et al. [21] noted that the learned features computed by the DBN yielded significant improvements in the precision and recall of pitch estimates relative to standard DFT audio features. After note pitch estimation it is necessary to perform note tracking, which involves the detection of note onsets and offsets [2]. Several techniques have been proposed in the literature including a multitude of onset estimation algorithms [1, 9], HMM note-duration modelling algorithms [4, 24], and an HMM frame-smoothing algorithm [22]. The output of these note tracking algorithms are a sequence of note event estimates, each having a pitch, onset time, and duration. These note events may then be digitally encoded in a symbolic music notation for cataloguing or publishing. 3. ISOLATED INSTRUMENT TRANSCRIPTION The workflow of the proposed polyphonic transcription algorithm is presented in Figure 1. The algorithm consists of an audio signal preprocessing step, followed by a novel DBN pitch estimation algorithm that conforms to the polyphony constraints of the input instrument. The note-tracking component of the polyphonic transcription algorithm uses a combination of the frame-smoothing algorithm developed by Poliner and Ellis [22] and the spectral flux onset estimation algorithm described by Dixon [9]. 3.1 Audio Signal Preprocessing The input audio signal is preprocessed before feature extraction. If the audio signal is stereo, the channels are averaged to produce a monaural audio signal. Then the audio signal is decimated to lower the sampling rate f s by an integer multiple, k N +. Decimation involves low-pass filtering with a cut-off frequency of f s /2k Hz to mitigate against aliasing, followed by selecting every k th sample from the original signal. 2

3 Figure 2. Structure of the deep belief network for note pitch estimation. Edge weights are omitted for clarity. 3.2 Note Pitch Estimation The structure of the DBN pitch estimation algorithm is presented in Figure 2. The algorithm extracts audio features that are subsequently fed forward through the deep network, resulting in an array of posterior probabilities used for pitch and polyphony estimation. First, features are extracted from the input audio signal. The power spectrum of each audio analysis frame is calculated using a Hamming window of size w samples and a hop size of h samples. Half of the spectrum is retained, resulting in m = w/2 +1 features. The result is a matrix of normalized audio features Φ [0, 1] n m, such that n is the number of analysis frames spanning the input signal. The DBN consumes these normalized audio features; hence, the input layer consists of m nodes. There can be any number of stochastic binary hidden layers, each consisting of any number of nodes. The output layer of the network consists of k + p nodes, where the first k nodes are allocated for pitch estimation and the final p nodes are allocated for polyphony estimation. The network uses a sigmoid activation as the non-linear transfer function. The feature vectors Φ are fed forward through the network with parameters Θ, resulting in a matrix of probabilities P(Ŷ Φ, Θ) [0, 1]k+p that is then split into a matrix of pitch probabilities P(Ŷ (pitch) Φ, Θ) and polyphony probabilities P(Ŷ (poly) Φ, Θ). The polyphony of the i th analysis frame is estimated by selecting the polyphony class with the highest probability using the equation ρ i = argmax j ( P(Ŷ (poly) ij Φ i, Θ) ). (1) Pitch estimation is performed using a multi-label learning technique similar to the MetaLabeler system [26], which trains a multi-class classifier for label cardinality estimation using the output values of the original label classifier as features. Instead of using the matrix of pitch probabilities as features for a separate polyphony classifier, increased recall was noted by training the polyphony classifier alongside the pitch classifier using the original audio features. Formally, the pitches sounding in the i th analysis frame are estimated by selecting the indices of the ρ i highest pitch probabilities produced by the DBN. With these estimates, the corresponding vector of pitch probabilities is converted to a binary vector Ŷ (pitch) i {0, 1} k by turning on bits that correspond to the ρ i highest pitch probabilities. For training and testing the algorithm, a set of pitch and polyphony labels are calculated for each audio analysis frame using an accompanying ground-truth MIDI file. A matrix of pitch annotations Y (pitch) {0, 1} n k, where k is the number of considered pitches, is computed such that an enabled bit indicates the presence of a pitch. A matrix of polyphony annotations Y (poly) {0, 1} n p, where p is the maximum frame-wise polyphony, is also computed such that a row is a one-hot binary vector in which the enabled bit indicates the polyphony of the frame. These matrices are horizontally concatenated to form the final matrix Y {0, 1} n (k+p) of training and testing labels. The deep belief network is trained using a modified version of the greedy layer-wise algorithm described by Hinton et al. [12]. Pretraining is performed by stacking a series of restricted Boltzmann machines and sequentially training each in an unsupervised manner using 1-step contrastive divergence [5]. Instead of using the up-down fine-tuning algorithm proposed by Hinton et al. [12], the layer of output nodes are treated as a set of logistic regressors and standard backpropagation is conducted on the network. Rather than creating features from scratch, this fine-tuning method is responsible for modifying the latent features in order to adjust the class boundaries [11]. The canonical error function to be minimized for a set of separate pitch and polyphony binary classifications is the cross-entropy error function, which forms the training signal used for backpropagation: E(Θ) = n k+p Y ij ln P(Ŷij Φ i, Θ) (2) i=1 j=1 + (1 Y ij ) ln(1 P(Ŷij Φ i, Θ)). The aim of this objective function is to adjust the network weights Θ to pull output node probabilities closer to one for ground-truth label bits that are on and to pull probabilities closer to zero for bits that are off. The described pitch estimation algorithm was implemented using the Theano numerical computation library for Python [6]. Computations for network training and testing are parallelized on the GPU. Feature extraction and audio signal preprocessing is performed using Marsyas [27]. 3.3 Note Tracking Although frame-level pitch estimates are essential for transcription, converting these estimates into note events with 3

4 Frequency (Hz) Spectrogram Time (s) MIDI Note Number DBN Pitch Probabilities Time (s) Pitch Probability MIDI Note Number Note Tracking Time (s) OFF ON Figure 3. An overview of the transcription workflow on a four-second segment of a synthesized guitar recording. PrePrints an onset and duration is not a trivial task. The purpose of note tracking is to process these pitch estimates and determine when a note onsets and offsets Frame-level Smoothing The frame-smoothing algorithm developed by Poliner and Ellis [22] is used to postprocess the DBN pitch estimates Ŷ (pitch) for an input audio signal. The algorithm allows a frame-level pitch estimate to be contextualized amongst its neighbours instead of solely trusting the independent estimates made by a classification algorithm. Formally, the frame-smoothing algorithm [22] operates by training an HMM for each pitch. Each HMM consists of two hidden states: ON and OFF. The transition probabilities are computed by observing the frequency with which a pitch transitions between and within the ON and OFF states across analysis frames. The emission distribution is a Bernoulli distribution that models the certainty of each frame-wise estimate and is represented using the pitch probabilities P(Ŷ (pitch) Φ, Θ). The output of the Viterbi algorithm, which searches for the optimal underlying state sequence, is a revised binary vector of activation estimates for a single pitch. Concatenating the results of each HMM results in a revised matrix of pitch estimates Ŷ (pitch) Onset Detection If the HMM frame-smoothing algorithm claims a pitch arises within an analysis frame, it could onset at any time within the window. Arbitrarily setting the note onset time to occur at the beginning of the window often results in choppy sounding transcriptions. In response, the onset detection algorithm that uses spectral flux measurements between analysis frames [9] is run at a finer time resolution to pinpoint the exact note onset time. The onset detection algorithm is run on the original, undecimated audio signal with a window size of 2048 samples and a hop size of 512 samples. When writing the note event estimates as a MIDI file, the onset times calculated by this algorithm are used. The offset time is calculated by following the pitch estimate across consecutive analysis frames until it transitions from ON to OFF, at which point the time stamp of the end of this analysis frame is used. Note events spanning less than two audio analysis frames are removed from the transcription to mitigate against spurious notes. Output of the transcription algorithm at each stage from feature extraction to DBN pitch estimation to frame smoothing and quantization (note tracking) is displayed in Figure 3 for a four-second segment of a synthesized guitar recording. The pitch probabilities output by the DBN show that the classifier is quite certain about its estimates; there are few grey areas indicating indecision. 4. TRANSCRIPTION EVALUATION The polyphonic transcription algorithm is evaluated on a dataset of synthesized guitar tracks. Knowing that the input instrument is a guitar with six strings, the pitch estimation algorithm considers the k = 51 pitches from C2 D6, which spans the lowest note capable of being produced by a guitar in Drop C tuning to the highest note capable of being produced by a 22-fret guitar in Standard tuning. Though a guitar with six strings is only capable of producing six notes simultaneously, a chord transition may occur within a frame and so the maximum polyphony increases above this bound. This is a side effect of a slidingwindow analysis of the audio signal. The maximum framewise polyphony is calculated using the equation (( ) ) p = max Y (pitch) 1 + 1, (3) i i where 1 is a vector of ones. The addition of one to the maximum polyphony is to accommodate silence where no pitches sound in an analysis frame. 4.1 Ground-truth Dataset Using the methodology proposed by Burlet and Fujinaga [7], a ground-truth dataset of 45 synthesized acoustic guitar recordings paired with MIDI note-event annotations was compiled. The dataset was created by harvesting the abundance of crowdsourced guitar transcriptions uploaded to as tablature files that are manipulated by the Guitar Pro desktop application. 1 The transcriptions in the ground-truth dataset were selected by searching for the keyword acoustic, filtering results to those that have been rated by the community as five out of five stars, and selecting those that received the most numbers of ratings and views. The dataset consists of songs by 1 4

5 Number of Instances MIDI Note Number Figure 4. Distribution of note pitches in the ground-truth dataset. artists ranging from The Beatles, Eric Clapton, and Neil Young to Led Zeppelin, Metallica, and Radiohead. Each Guitar Pro file was preprocessed to remove extraneous instrument tracks other than guitar, removing repeated bars, and removing note ornamentations such as dead notes, palm muting, harmonics, pitch bends, and vibrato. The guitar model for note synthesis was set to a Martin & Co. acoustic guitar with steel strings and no capo. Finally, each Guitar Pro file is synthesized as a WAV file and also exported as a MIDI file, which captures the note events occurring in the guitar track. Recordings of real guitars would be ideal but the resources required to record and double-key annotate such a dataset is immense. In total the dataset consists of approximately 104 minutes of audio, an average tempo of 101 beats per minute, notes, and an average polyphony of The average polyphony is calculated by dividing the number of note events by the number of chords plus the number of individual notes. The distribution of note pitches in the dataset is displayed in Figure Frame-wise Pitch Estimation Evaluation The songs in the compiled ground-truth dataset are partitioned into a training and testing set, such that roughly 80% of songs are allocated for training and 20% are allocated for testing. Several preliminary experiments with the proposed transcription system revealed that a sampling rate of Hz, a window size of 1024, a hop size of 768, a network structure of 350 nodes in the first three hidden layers followed by 1200 nodes in the penultimate layer yielded promising results. For network pretraining, 400 epochs were conducted with a learning rate of 0.05 using 1-step contrastive divergence with a batch size of 1000 training instances. For network fine-tuning, epochs were conducted with a learning rate of 0.05 and a batch size of 1000 training instances. The frame-level pitch estimates computed by the DBN pitch estimation algorithm followed by the HMM framesmoothing algorithm are evaluated using the following standard multi-label learning metrics [29]: precision (p), recall (r), f-measure (f), one error, and hamming loss. The one error provides insight into the number of audio analysis frames where the predominant pitch is estimated incorr poly p r f ONE ERROR HAMMING LOSS i ii Table 1. Frame-wise pitch estimation evaluation metrics: r poly denotes the polyphony recall, p denotes precision, r denotes recall, and f denotes f-measure. The first row includes octave errors while the second row excludes them. rectly. The hamming loss provides insight into the number of false positive and false negative pitch estimates across the audio analysis frames. In addition, the frame-level polyphony recall (r poly ) is calculated to evaluate the accuracy of polyphony estimates. Using the ground-truth dataset, pretraining the DBN took 13 hours and fine-tuning took 9 hours using an Nvidia GPU with 1664 CUDA cores. After training, the network weights are saved so that they can be reused for future transcriptions. The results of the DBN pitch estimation algorithm are presented in Table 1. After HMM frame smoothing the results substantially improve with a precision of 0.79, a recall of 0.64, and an f-measure of 0.71 when considering octave errors. The results reveal that the 52% polyphony estimation accuracy likely hinders the frame-wise f-measure of the pitch estimation algorithm. Investigating further, when using the ground-truth polyphony for each frame an f-measure of 0.68 is noted before HMM smoothing. The 5% increase in f-measure reveals that the polyphony estimates are close to their ground-truth value. With respect to the one error, the results reveal that the DBN s belief of the predominant pitch the label with the highest probability is incorrect in 22% of the analysis frames, which improves to 18% when not considering octave errors. With respect to the hamming loss, the results show that, on average, 4% of pitch estimates in an analysis frame are false positives or negatives. Additionally, the results reveal a 7% increase in f-measure when disregarding octave errors. Comparison of these results with a state-of-the-art transcription algorithm is performed in the following section. 4.3 Note Event Evaluation After HMM smoothing the frame-level pitch estimates computed by the DBN, onset quantization is performed and a MIDI file, which encodes the pitch, onset time, and duration of note events, is written. An evaluation procedure similar to the music information retrieval evaluation exchange (MIREX) note tracking task is conducted using the metrics of precision, recall, and f-measure. Relative to a ground-truth note event, an estimate is considered correct if its onset time is within 250ms and its pitch is equivalent. The accuracy of offset times are not considered. A groundtruth note event can only be associated with a single note event estimate. Table 2 presents the results of this evaluation on the guitar tracks in the testing dataset. Additionally, these guitar 5

6 DBN TRANSCRIPTION PRECISION RECALL f -MEASURE RUNTIME (S) i ii ZHOU AND REISS [30] PRECISION RECALL f -MEASURE RUNTIME (S) i ii Table 2. Precision, recall, and f-measure evaluation of note events transcribed using the DBN transcription algorithm compared to the Zhou and Reiss [30] algorithm. The first row includes octave errors while the second row excludes them. tracks are transcribed by the single-instrument polyphonic transcription algorithm proposed by Zhou and Reiss [30], which was evaluated in the 2008 MIREX and received an f-measure of 0.76 on a dataset of 30 synthesized and real piano recordings. The transcription algorithm described in this paper resulted in a 9% increase, or a 16% relative increase, in f- measure relative to the transcription algorithm developed by Zhou and Reiss [30], and further, performed these transcriptions in a quarter of the time. This result emphasizes a lucrative property of neural networks: after training, feeding the features forward through the network is accomplished in a small amount of time. An analysis of the number of octave errors made by both algorithms reveals that the DBN transcription algorithm made a similar number of note octave errors as the digital signal processing transcription algorithm proposed by Zhou and Reiss. A subjective, aural analysis of the guitar transcriptions reflects these results: the predominant pitches and temporal structures of notes occurring in the input guitar tracks are more or less maintained. Another remark on the transcriptions is that when several guitar strums occur quickly in succession, the DBN transcription algorithm often transcribes only the first chord and prescribes it a long duration. This is likely a result of the temporally coarse window size of 1024 samples or a product of the HMM framesmoothing algorithm. A remedy for this issue is to lower the window size, which has an undesirable side-effect of lowering the frequency resolution of the DFT. 5. CONCLUSION The developed polyphonic transcription algorithm is capable of forming discriminative, latent audio features that are suitable for quickly transcribing isolated instrument recordings. The algorithm workflow consists of audio signal preprocessing, feature extraction, a novel pitch estimation algorithm that uses multi-label learning techniques to enforce polyphony constraints, frame smoothing, and onset quantization. The generated note event transcriptions are digitally encoded as a MIDI file. An evaluation of the frame-level pitch estimates generated by the deep belief network on a dataset of synthesized guitar recordings resulted in an f-measure of 0.71 after frame smoothing. An evaluation of the note events output by the entire transcription algorithm resulted in an f-measure of 0.65, which is 9% higher than the f-measure reported by a state-of-the-art, single-instrument transcription algorithm [30] on the same dataset. A threat to validity is the use of synthesized guitar signals for training and testing, which could be addressed by hiring guitarists and expert annotators to compile a dataset of real recordings. There are several directions of future work to improve the accuracy of transcriptions. First, there are substantial variations in the distribution of pitches across songs, and so the compilation of more training data is expected to improve the accuracy of frame-level pitch estimates made by the DBN. Second, alternate methods could be explored to raise the accuracy of frame-level polyphony estimates, such as training a separate classifier for predicting polyphony on potentially different audio features. Third, an alternate frame-smoothing algorithm that jointly considers the probabilities of other pitch estimates across analysis frames could further increase pitch estimation f-measure relative to the HMM method [22], which smooths the estimates of one pitch across the audio analysis frames. Finally, it would be beneficial to investigate whether the latent audio features derived for transcribing one instrument are transferable to the transcription of other instruments. The results of this work encourage the use of deep architectures such as belief networks or autoencoders to form alternative representations of industry-standard audio features for the purposes of instrument transcription. Moreover, this work demonstrates the effectiveness of multilabel learning for pitch estimation, specifically when an upper bound on polyphony exists. 6. ACKNOWLEDGEMENTS Special thanks are owed to Ruohua Zhou and Joshua Reiss for the opensource implementation of their transcription algorithm evaluated in this work, as well as the individuals who uploaded manual tablature transcriptions to www. ultimate-guitar.com. This research was generously funded by an Alberta Innovates Technology Futures Graduate Student Scholarship. 7. REFERENCES [1] J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5): , [2] E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri. Automatic music transcription: Challenges and future directions. Journal of Intelligent Information Systems, 41(3): , [3] E. Benetos, S. Dixon, D. Giannoulis, H. Kirchoff, and A. Klapuri. Automatic music transcription: Breaking the glass ceiling. In Proceedings of the International Society for Music Information Retrieval Conference, pages , Porto, Portugal,

7 [4] E. Benetos and T. Weyde. Explicit duration hidden Markov models for multiple-instrument polyphonic music transcription. In Proceedings of the International Conference on Music Information Retrieval, pages , Curitiba, Brazil, [5] Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1 127, [6] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference, pages 3 10, Austin, TX, [7] G. Burlet and I. Fujinaga. Robotaba guitar tablature transcription framework. In Proceedings of the International Society for Music Information Retrieval, pages , Curitiba, Brazil, [8] A. Dessein, A. Cont, and G. Lemaitre. Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In Proceedings of the International Society for Music Information Retrieval Conference, Utrecht, Netherlands, [9] S. Dixon. Onset detection revisited. In Proceedings of the International Conference on Digital Audio Effects, pages , Montréal, QC, [10] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6):82 97, [11] G. E. Hinton. Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10): , [12] G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7): , [13] E. Humphrey, J. Bello, and Y. LeCun. Moving beyond feature design: Deep architectures and automatic feature learning in music informatics. In Proceedings of the International Society for Music Information Retrieval, pages , Porto, Portugal, [14] E. Humphrey, J. Bello, and Y. LeCun. Feature learning and deep architectures: New directions for music informatics. Journal of Intelligent Systems, 41(3): , [15] A. Klapuri. Automatic music transcription as we know it today. Journal of New Music Research, 33(3): , [16] A. Klapuri. A perceptually motivated multiple-f0 estimation method. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages , New Paltz, NY, [17] A. Klapuri. Multiple fundamental frequency estimation by summing harmonic amplitudes. In Proceedings of the International Society for Music Information Retrieval Conference, pages , Victoria, BC, [18] H. Lee, R. Grosse, R. Ranganath, and A. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the International Conference on Machine Learning, pages , Montréal, QC, [19] M. Marolt. A connectionist approach to automatic transcription of polyphonic piano music. IEEE Transactions on Multimedia, 6(3): , [20] J. A. Moorer. On the segmentation and analysis of continuous musical sound by digital computer. PhD thesis, Department of Music, Stanford University, Stanford, CA, [21] J. Nam, J. Ngiam, H. Lee, and M. Slaney. A classification-based polyphonic piano transcription approach using learned feature representations. In Proceedings of the International Society for Music Information Retrieval, pages , Miami, FL, [22] Graham E. Poliner and Daniel P.W. Ellis. A discriminative model for polyphonic piano transcription. EURASIP Journal on Advances in Signal Processing, pages 1 9, [23] C. Raphael. Automatic transcription of piano music. In Proceedings of the International Society for Music Information Retrieval Conference, pages 1 5, Paris, France, [24] M. Ryynänen and A. Klapuri. Polyphonic music transcription using note event modeling. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages , New Paltz, NY, [25] P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages , New Paltz, NY, [26] L. Tang, S. Rajan, and V. K. Narayanan. Large scale multi-label classification via metalabeler. In Proceedings of the International Conference on World Wide Web, pages , Madrid, Spain, [27] G. Tzanetakis and P. Cook. MARSYAS: A framework for audio analysis. Organised Sound, 4(3): , [28] C. Yeh, A. Roebel, and X. Rodet. Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals. IEEE Transactions on Audio, Speech, and Language Processing, 18(6): , [29] M. Zhang and Z. Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8): , [30] R. Zhou, J. D. Reiss, M. Mattavelli, and G. Zoia. A computationally efficient method for polyphonic pitch estimation. EURASIP Journal on Advances in Signal Processing, 2009(729494):1 11,

Automatic Transcription: An Enabling Technology for Music Analysis

Automatic Transcription: An Enabling Technology for Music Analysis Automatic Transcription: An Enabling Technology for Music Analysis Simon Dixon simon.dixon@eecs.qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

Piano Transcription and Radio Travel Model

Piano Transcription and Radio Travel Model Unsupervised Transcription of Piano Music Taylor Berg-Kirkpatrick Jacob Andreas Dan Klein Computer Science Division University of California, Berkeley {tberg,jda,klein}@cs.berkeley.edu Abstract We present

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)

More information

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Annotated bibliographies for presentations in MUMT 611, Winter 2006 Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.

More information

Recent advances in Digital Music Processing and Indexing

Recent advances in Digital Music Processing and Indexing Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

ESTIMATION OF THE DIRECTION OF STROKES AND ARPEGGIOS

ESTIMATION OF THE DIRECTION OF STROKES AND ARPEGGIOS ESTIMATION OF THE DIRECTION OF STROKES AND ARPEGGIOS Isabel Barbancho, George Tzanetakis 2, Lorenzo J. Tardón, Peter F. Driessen 2, Ana M. Barbancho Universidad de Málaga, ATIC Research Group, ETSI Telecomunicación,

More information

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information

More information

FAST MIR IN A SPARSE TRANSFORM DOMAIN

FAST MIR IN A SPARSE TRANSFORM DOMAIN ISMIR 28 Session 4c Automatic Music Analysis and Transcription FAST MIR IN A SPARSE TRANSFORM DOMAIN Emmanuel Ravelli Université Paris 6 ravelli@lam.jussieu.fr Gaël Richard TELECOM ParisTech gael.richard@enst.fr

More information

AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David

AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES Valentin Emiya, Roland Badeau, Bertrand David TELECOM ParisTech (ENST, CNRS LTCI 46, rue Barrault, 75634 Paris

More information

Non-Negative Matrix Factorization for Polyphonic Music Transcription

Non-Negative Matrix Factorization for Polyphonic Music Transcription MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-Negative Matrix Factorization for Polyphonic Music Transcription Paris Smaragdis and Judith C. Brown TR00-9 October 00 Abstract In this

More information

Efficient online learning of a non-negative sparse autoencoder

Efficient online learning of a non-negative sparse autoencoder and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil

More information

Pedestrian Detection with RCNN

Pedestrian Detection with RCNN Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University mcc17@stanford.edu Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Auto-Tuning Using Fourier Coefficients

Auto-Tuning Using Fourier Coefficients Auto-Tuning Using Fourier Coefficients Math 56 Tom Whalen May 20, 2013 The Fourier transform is an integral part of signal processing of any kind. To be able to analyze an input signal as a superposition

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey

More information

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance

More information

Audio-visual vibraphone transcription in real time

Audio-visual vibraphone transcription in real time Audio-visual vibraphone transcription in real time Tiago F. Tavares #1, Gabrielle Odowichuck 2, Sonmaz Zehtabi #3, George Tzanetakis #4 # Department of Computer Science, University of Victoria Victoria,

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time

More information

Toward Evolution Strategies Application in Automatic Polyphonic Music Transcription using Electronic Synthesis

Toward Evolution Strategies Application in Automatic Polyphonic Music Transcription using Electronic Synthesis Toward Evolution Strategies Application in Automatic Polyphonic Music Transcription using Electronic Synthesis Herve Kabamba Mbikayi Département d Informatique de Gestion Institut Supérieur de Commerce

More information

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks This version: December 12, 2013 Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks Lawrence Takeuchi * Yu-Ying (Albert) Lee ltakeuch@stanford.edu yy.albert.lee@gmail.com Abstract We

More information

Automatic Chord Recognition from Audio Using an HMM with Supervised. learning

Automatic Chord Recognition from Audio Using an HMM with Supervised. learning Automatic hord Recognition from Audio Using an HMM with Supervised Learning Kyogu Lee enter for omputer Research in Music and Acoustics Department of Music, Stanford University kglee@ccrma.stanford.edu

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Lecture 6: Classification & Localization. boris. ginzburg@intel.com Lecture 6: Classification & Localization boris. ginzburg@intel.com 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview

More information

Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine George E. Dahl, Marc Aurelio Ranzato, Abdel-rahman Mohamed, and Geoffrey Hinton Department of Computer Science University of Toronto

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

More information

Monophonic Music Recognition

Monophonic Music Recognition Monophonic Music Recognition Per Weijnitz Speech Technology 5p per.weijnitz@gslt.hum.gu.se 5th March 2003 Abstract This report describes an experimental monophonic music recognition system, carried out

More information

Onset Detection and Music Transcription for the Irish Tin Whistle

Onset Detection and Music Transcription for the Irish Tin Whistle ISSC 24, Belfast, June 3 - July 2 Onset Detection and Music Transcription for the Irish Tin Whistle Mikel Gainza φ, Bob Lawlor* and Eugene Coyle φ φ Digital Media Centre Dublin Institute of Technology

More information

Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference

Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference Chetak Kandaswamy, Luís Silva, Luís Alexandre, Jorge M. Santos, Joaquim Marques de Sá Instituto de Engenharia

More information

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

More information

Visualizing Higher-Layer Features of a Deep Network

Visualizing Higher-Layer Features of a Deep Network Visualizing Higher-Layer Features of a Deep Network Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7,

More information

Structured Prediction Models for Chord Transcription of Music Audio

Structured Prediction Models for Chord Transcription of Music Audio Structured Prediction Models for Chord Transcription of Music Audio Adrian Weller, Daniel Ellis, Tony Jebara Columbia University, New York, NY 10027 aw2506@columbia.edu, dpwe@ee.columbia.edu, jebara@cs.columbia.edu

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina Graduate Co-op Students Information Manual Department of Computer Science Faculty of Science University of Regina 2014 1 Table of Contents 1. Department Description..3 2. Program Requirements and Procedures

More information

Deep learning applications and challenges in big data analytics

Deep learning applications and challenges in big data analytics Najafabadi et al. Journal of Big Data (2015) 2:1 DOI 10.1186/s40537-014-0007-7 RESEARCH Open Access Deep learning applications and challenges in big data analytics Maryam M Najafabadi 1, Flavio Villanustre

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Neovision2 Performance Evaluation Protocol

Neovision2 Performance Evaluation Protocol Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

FOR A ray of white light, a prism can separate it into. Soundprism: An Online System for Score-informed Source Separation of Music Audio

FOR A ray of white light, a prism can separate it into. Soundprism: An Online System for Score-informed Source Separation of Music Audio 1 Soundprism: An Online System for Score-informed Source Separation of Music Audio Zhiyao Duan, Graduate Student Member, IEEE, Bryan Pardo, Member, IEEE Abstract Soundprism, as proposed in this paper,

More information

The Big Data methodology in computer vision systems

The Big Data methodology in computer vision systems The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails

More information

A causal algorithm for beat-tracking

A causal algorithm for beat-tracking A causal algorithm for beat-tracking Benoit Meudic Ircam - Centre Pompidou 1, place Igor Stravinsky, 75004 Paris, France meudic@ircam.fr ABSTRACT This paper presents a system which can perform automatic

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire

Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire Simplified Machine Learning for CUDA Umar Arshad @arshad_umar Arrayfire @arrayfire ArrayFire CUDA and OpenCL experts since 2007 Headquartered in Atlanta, GA In search for the best and the brightest Expert

More information

1 About This Proposal

1 About This Proposal 1 About This Proposal 1. This proposal describes a six-month pilot data-management project, entitled Sustainable Management of Digital Music Research Data, to run at the Centre for Digital Music (C4DM)

More information

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Taking Inverse Graphics Seriously

Taking Inverse Graphics Seriously CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Little LFO. Little LFO. User Manual. by Little IO Co.

Little LFO. Little LFO. User Manual. by Little IO Co. 1 Little LFO User Manual Little LFO by Little IO Co. 2 Contents Overview Oscillator Status Switch Status Light Oscillator Label Volume and Envelope Volume Envelope Attack (ATT) Decay (DEC) Sustain (SUS)

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

B3. Short Time Fourier Transform (STFT)

B3. Short Time Fourier Transform (STFT) B3. Short Time Fourier Transform (STFT) Objectives: Understand the concept of a time varying frequency spectrum and the spectrogram Understand the effect of different windows on the spectrogram; Understand

More information

FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM

FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS Fanbo Xiang UIUC Physics 193 POM Professor Steven M. Errede Fall 2014 1 Introduction Chords, an essential part of music, have long been analyzed. Different

More information

Invited Applications Paper

Invited Applications Paper Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

More information

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 3 Object detection

More information

Analysis/resynthesis with the short time Fourier transform

Analysis/resynthesis with the short time Fourier transform Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis

More information

LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK

LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK Proceedings of the IASTED International Conference Artificial Intelligence and Applications (AIA 2013) February 11-13, 2013 Innsbruck, Austria LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK Wannipa

More information

Teaching Fourier Analysis and Wave Physics with the Bass Guitar

Teaching Fourier Analysis and Wave Physics with the Bass Guitar Teaching Fourier Analysis and Wave Physics with the Bass Guitar Michael Courtney Department of Chemistry and Physics, Western Carolina University Norm Althausen Lorain County Community College This article

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels

3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels 3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels Luís A. Alexandre Department of Informatics and Instituto de Telecomunicações Univ. Beira Interior,

More information

Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network

Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network Distributed Sensor Networks Volume 2015, Article ID 157453, 7 pages http://dx.doi.org/10.1155/2015/157453 Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) Page 1 Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) ECC RECOMMENDATION (06)01 Bandwidth measurements using FFT techniques

More information

Università degli Studi di Bologna

Università degli Studi di Bologna Università degli Studi di Bologna DEIS Biometric System Laboratory Incremental Learning by Message Passing in Hierarchical Temporal Memory Davide Maltoni Biometric System Laboratory DEIS - University of

More information

Music Score Alignment and Computer Accompaniment

Music Score Alignment and Computer Accompaniment Music Score Alignment and Computer Accompaniment Roger B. Dannenberg and Christopher Raphael INTRODUCTION As with many other kinds of data, the past several decades have witnessed an explosion in the quantity

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

List of publications and communications

List of publications and communications List of publications and communications Arnaud Dessein March 2, 2013 Chapters in books [1] Arnaud Dessein, Arshia Cont, and Guillaume Lemaitre. Real-time detection of overlapping sound events with non-negative

More information

Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

More information

arxiv:1312.6062v2 [cs.lg] 9 Apr 2014

arxiv:1312.6062v2 [cs.lg] 9 Apr 2014 Stopping Criteria in Contrastive Divergence: Alternatives to the Reconstruction Error arxiv:1312.6062v2 [cs.lg] 9 Apr 2014 David Buchaca Prats Departament de Llenguatges i Sistemes Informàtics, Universitat

More information

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically. Sampling Theorem We will show that a band limited signal can be reconstructed exactly from its discrete time samples. Recall: That a time sampled signal is like taking a snap shot or picture of signal

More information

MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC

MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC 12th International Society for Music Information Retrieval Conference (ISMIR 2011) MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC Yonatan Vaizman Edmond & Lily Safra Center for Brain Sciences,

More information

The Computer Music Tutorial

The Computer Music Tutorial Curtis Roads with John Strawn, Curtis Abbott, John Gordon, and Philip Greenspun The Computer Music Tutorial Corrections and Revisions The MIT Press Cambridge, Massachusetts London, England 2 Corrections

More information

Monotonicity Hints. Abstract

Monotonicity Hints. Abstract Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Neural network software tool development: exploring programming language options

Neural network software tool development: exploring programming language options INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira aao@fe.up.pt Supervisor: Professor Joaquim Marques de Sá June 2006

More information

Rameau: A System for Automatic Harmonic Analysis

Rameau: A System for Automatic Harmonic Analysis Rameau: A System for Automatic Harmonic Analysis Genos Computer Music Research Group Federal University of Bahia, Brazil ICMC 2008 1 How the system works The input: LilyPond format \score {

More information

Florida International University - University of Miami TRECVID 2014

Florida International University - University of Miami TRECVID 2014 Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,

More information

Follow links Class Use and other Permissions. For more information, send email to: permissions@pupress.princeton.edu

Follow links Class Use and other Permissions. For more information, send email to: permissions@pupress.princeton.edu COPYRIGHT NOTICE: David A. Kendrick, P. Ruben Mercado, and Hans M. Amman: Computational Economics is published by Princeton University Press and copyrighted, 2006, by Princeton University Press. All rights

More information

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742

More information

Practical Design of Filter Banks for Automatic Music Transcription

Practical Design of Filter Banks for Automatic Music Transcription Practical Design of Filter Banks for Automatic Music Transcription Filipe C. da C. B. Diniz, Luiz W. P. Biscainho, and Sergio L. Netto Federal University of Rio de Janeiro PEE-COPPE & DEL-Poli, POBox 6854,

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

GCSE Music Unit 3 (42703) Guidance

GCSE Music Unit 3 (42703) Guidance GCSE Music Unit 3 (42703) Guidance Performance This unit accounts for 40% of the final assessment for GCSE. Each student is required to perform two separate pieces of music, one demonstrating solo/it skills,

More information

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information