Singing Voice Detection in Music Tracks using Direct Voice Vibrato Detection
|
|
|
- Antony Jennings
- 9 years ago
- Views:
Transcription
1 Singing Voice Detection in Music Tracks using Direct Voice Vibrato Detection Lise Regnier, Geoffroy Peeters To cite this version: Lise Regnier, Geoffroy Peeters. Singing Voice Detection in Music Tracks using Direct Voice Vibrato Detection. ICASSP, Apr 2009, taipei, Taiwan. pp.1, <hal > HAL Id: hal Submitted on 25 Jan 2012 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
2 SINGING VOICE DETECTION IN MUSIC TRACKS USING DIRECT VOICE VIBRATO DETECTION L. Regnier, G. Peeters IRCAM Sound Analysis/ Synthesis team, CNRS-STMS 1 place Stravinsky, Paris ABSTRACT In this paper we investigate the problem of locating singing voice in music tracks. As opposed to most existing methods for this task, we rely on the extraction of the characteristics specific to singing voice. In our approach we suppose that the singing voice is characterized by harmonicity, formants, vibrato and tremolo. In the present study we deal only with the vibrato and tremolo characteristics. For this, we first extract sinusoidal partials from the musical audio signal. The frequency modulation (vibrato) and amplitude modulation (tremolo) of each partial are then studied to determine if the partial corresponds to singing voice and hence the corresponding segment is supposed to contain singing voice. For this we estimate for each partial the rate (frequency of the modulations) and the extent (amplitude of modulation) of both vibrato and tremolo. A partial selection is then operated based on these values. A second criteria based on harmonicity is also introduced. Based on this, each segment can be labelled as singing or non-singing. Post-processing of the segmentation is then applied in order to remove short-duration segments. The proposed method is then evaluated on a large manually annotated test-set. The results of this evaluation are compared to the one obtained with a usual machine learning approach (MFCC and SFM modeling with GMM). The proposed method achieves very close results to the machine learning approach : 76.8% compared to 77.4% F-measure (frame classification). This result is very promising, since both approaches are orthogonal and can then be combined. Index Terms Singing voice detection, vibrato detection, voice segmentation, vibrato and tremolo parameters extraction, feature extraction. 1. INTRODUCTION Singing voice melody is the most representative and memorable element of a song. Locating vocal segments in a track is the front-end of many applications including singer identification, singing voice separation, query-by-lyrics, query-byhumming and extraction of musical structure. The problem of singing voice detection can be stated as follows : segment a song into vocal and non-vocal (i.e pure instrumental or silent) parts. The main difficulty comes from the presence of musical instrument during most vocal segments. We review here existing methods. Usual systems for partitioning a song into vocal and non-vocal segments start by extracting a set of audio features from the audio signal and then use them to classify frames using a threshold method or a statistical classifier. The result of this classification is then used to segment the track. Features : According to [1], among the various features MFCCs (Mel Frequency Cepstral Coefficients) and their derivates are the most appropriated features to detect singing voice ([2], [3]). Other features describing the spectral envelope are also used : LPC and their perceptual variants PLP [4] [5], Warped Linear Coefficients [6]. These features, traditionally used in speech processing are however not able to describe characteristics of the singing voice in presence of instrumental background. Energy [7], harmonic coefficient [8], perceptually motivated acoustic features [9] such as attack-decay, formants, singing-formant and vibrato have also been investigated. Classifier : Different statistical classifiers such as GMM, HMM, ANN, MLP or SVM have been proposed to discriminate the classes using the various sets of features listed above. Post-processing : Frame-based classifiers tend to provide noisy classification leading to an over-segmentation (short segments). In the opposite, human annotation tends to provide under-segmentation (long segments ignoring instrumental breaks or singer breathing). For these reasons, post processing is usually applied to the estimated segmentation. This post-processing can be achieved using median filtering or an HMM trained on segment duration as suggested in [10]. [3] introduces a method for deriving decision function and smoothing with an autoregressive moving average filter. The structure of the paper is organized as follows. First, the characteristics of the singing voice are introduced in section 2. Section 3 provides an overview of the system used. Results obtained with this method are presented and compared with the ones obtained with a machine learning method in section 5. Section 6 concludes the paper.
3 2. CHARACTERISTICS OF SINGING VOICE We introduce here the most representative characteristics of the singing voice Formants Formants are the meaningful frequency components of speech and singing. Formants frequencies are determined by the shape of vocal tract. These frequencies (usually 3) allow us to identify vowels. A fourth formants called singing formant exists for singer and it ranges from 2kHz to 3kHz. Singing formant helps lyrics singer to stand above instrumental accompaniment. However, singing formant does not exist in many other type of singing Harmonicity One of the most discriminative element to distingue singing voice from speech is harmonicity [8]. A sound is harmonic when its partials are located at multiples of the fundamental frequency. Singing voice is naturally harmonic due to resonances of the vocal tract. In addition, the number of harmonic partials can increase with singing technic Vibrato and tremolo One of the specificity of the singing voice is its natural vibrato. Vibrato is a musical effect used to add expression. Nowadays vibrato is used by most musical instruments to add vocal-like qualities to instrumental music. The term vibrato is sometimes used inappropriately to qualify frequency modulation (or pitch vibrato) and amplitude modulation (or intensity vibrato). Vibrato only refers to the periodic variation of pitch, whereas tremolo is the term to be used for the periodic variation of intensity. Only a few musical instruments can produce simultaneously both types of modulation. In wind and brass instruments the amplitude modulation is predominant. In string instruments the frequency modulation is predominant. In voice both modulations occur at the same time. This is due to the mechanical aspects of the voice production system [11]. Vibrato and tremolo can be described each by two elements : their frequencies (or vibrato/tremolo rate) and their amplitudes (vibrato/tremolo extent). For the voice, the average rate is around 6 Hz and increases exponentially over the duration of a note event [12]. The average extent ranges from 0.6 to 2 semitones for singers and from 0.2 to 0.35 semitones for string players [13]. We exploit this particularities (average rate, average extent and presence of both modulations) to discriminate voice between all musical instruments. We suppose that a partial corresponds to a singing sound if the extent values of its vibrato and tremolo are greater than thresholds. 3. PROPOSED METHOD We present here a method to detect voice using only vibrato and tremolo parameters. Frame-analysis is first applied using a 40 ms window length with 20 ms hop size. At each time t, sinusoidal components are extracted. Each component s is characterized by its frequency f s (t), amplitude a s (t) and phase φ s (t). These values are then interpolated over time to create sinusoidal tracks or partials p k (t) [14] Selection of partials with vibrato We consider a partial as vibrated if the extent around 6Hz, Deltaf pk is greater than a threshold (τ vib ). Thus, to determine the set of partial with vibrato P vib we extract Deltaf pk for each partial p k. For each partial p k (t) the frequency values f pk (t) are analyzed. The average frequency of p k (t) is denoted by µ fpk. Let us denote the Fourier transform of f pk (t) by F pk (ω). The extent of the frequency modulation of a partial being proportional to its center frequency, we use relative values. Thus, for a partial existing from time t i to t j the Fourier transform is given by : F pk (f) = t j t=t i (f pk (t) µ fpk )e 2iπf t L Where L = t j t i. The extent value in Hz is given by : f pk (f) = F p k (f) L Its relative value is given by : f relpk (f) = f p k (f) µ pk Finally, we determine the relative extent value around 6 Hz as follow : f pk = max f [4,8] f relp k (f). To convert this value in cent (1 ton = 100 cents) we use the following formula : Thus we have : cent f pk = 1200 log 2( f pk + 1). p k P vib f pk > τ vib 3.2. Selection of partial with tremolo To determine the set of partial with tremolo (P trem ) we extract the extent value of tremolo using the values of amplitude a pk instead of f pk. We note a pk the extent relative value and P trem the set of partials with tremolo. We have p k P trem a pk > τ trem.
4 3.3. Selection of singing partials Let P voice be the set of partials belonging to the voice. We introduce a second criteria based on the number of partials present at a time t. The selection is made as follow : Let p k be a partial with a vibrato and a tremolo existing between time t i and t j : p k P vib P trem p k (t) 0, t [t i, t j ]. Then p k belongs to the voice if it exists another partial between t i and t j which also has a vibrato and a tremolo. p k P voice p l P vib P trem, t [t i, t j ] p l (t) Post-processing To be closer to human segmentation we decide to eliminate non-vocal segments with a duration lower than 1 second. Thus a short non-vocal segment located between two vocals segments will be classify as vocal as show on figure 1. training (58 songs) and test (32 songs) sets. The whole set is well balanced since 50.3% of frames are of singing segments and 49.7% of non singing segments. We note that all files from the database contains singing and no one of them is a Cappella music. Note that we are assuming that the signal is known to consist only of music and that the problem is locating singing within it. We are not concerned with the problem of distinguishing between vocal and regular speech, nor music and speech Results For this task we only consider the results given for the class singing segment. The classification accuracy is calculated on all the frames of the test set with the F-measure. Figure 2 shows the precision and recall obtained when one threshold is fixed and the second one vary. For both thresholds, Recall Fixed trem threshold 0.009, vib threshold [ ] Fixed vib threshold 0.009, trem threshold [ ] Precision Fig. 1. Segmentation into singing/non singing part before and after post-processing on signal : aline.wav 4.1. Test set 4. EXPERIMENTAL RESULTS The song database used to cary out this study is the one used in [10]. This database is composed of 90 songs selected for their variety in artists, languages, tempi and music genre. They constitute a representative sampling of commercial music. The sampling frequency of the songs is 44.1 khz, stereo channel and 16 bits per sample. Each file was annotated manually into singing and nonsinging sections by the same person to provide the ground thrust-data. Establishing where a singing segment starts and ends with certainty is problematic. The segmentation was done as a human would expect i.e small instrumental breaks of short duration were not labeled. The database is split into Fig. 2. P.R curves of singing voice identification using threshold method increasing a given threshold increase precision while decreasing recall. Thus we can chose the parameters depending on the task. For singer identification it is necessary to get a high precision so high thresholds would fit. In general we would choose threshold to obtain a good precision and a good recall at the same time. The thresholds τ vib and τ trem are learnt on the training set. We obtain the best classification with both thresholds equal to which is equivalent to 15.5 cents vibrato and tremolo extent. These results are shown on table 1. Before filtering After Filtering Fmeasure 66.54% 76.83% Recall 61.69% 83.57% Precision 70.56% 71.09% Table 1. F-measure for τ vib = τ trem = We compare this results with the ones obtained with a
5 learning machine approach. The features chosen are : MFCC, MFCC, MFCC, SFM (Spectral Flatness Measure), SFM and SFM. These feature are normalized by their inter-quantile-range (IQR). Feature with IQR equal to zero are deleted. The best 40 features are selected using a feature selection algorithm, in our case : Inertia Ratio Maximization with Feature Space Projection (IRMFSP) [15]. Then, a linear discriminant analysis is applied on each class. Finally, each class is modeled with a height-component GMM. The results are given in table 2. Results of both methods are very close. Before filtering After Filtering Fmeasure 75.81% 77.4% Recall 75.8% 77.5% Precision 75.80% 77.4% Table 2. F-measure : Features MFCC, Classifier GMM We notice that the proposed method gives a better recall wheras the learning approach is more precise. 5. CONCLUSION Conventional singing detection methods ignore the characteristics of singing signal. We presented here a method to detect vocal segments within an audio track based only on vibrato and tremolo parameters of partials. Using this, a partial is said to belong to singing sound or not using a simple threshold method. For a large test set the best classification achieved was 76.8% (F-measure). The threshold method has been compared to a machine learning approach using MFCCs and their derivates as features and a GMM as statistical classifier leading to 77.4% F-measure. It is surprising that our simple approach leads to very close results to the more sophisticated machine-learning approach. Since both approaches are orthogonal and since they both lead to good results, it could be interesting to combine them. Future work will therefore concentrate on replacing the threshold method by a machine learning approach which could also model the spectral shape representation provided by the MFCCs. 6. ACKNOWLEDGEMENT This work was partly realized as part of the Quaero Programme, funded by OSEO, French State agency for innovation. We thank M. Ramona for sharing his annotated audio collection. We also thank our colleague D. Tardieu who contributed to the contents and approach of this paper. 7. REFERENCES [1] M. Rocamora and P. Herrera, Comparing audio descriptors for singing voice detection in music audio files, SBCM - Brasilian Sypmosium on Computer Music 07, [2] W.H. Tsai, H.M. Wang, and D. Rodgers, Automatic singer identification of popular music recordings via estimation and modeling of solo vocal signal, in Eighth European Conference on Speech Communication and Technology. ISCA, [3] H. Lukashevich, M. Gruhne, and C. Dittmar, Effective singing voice detection in popular music using arma filtering, Workshop on Digital Audio Effects (DAFx 07), [4] AL Berenzweig and DPW Ellis, Locating singing voice segments within music signals, Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the, pp , [5] A. Berenzweig, D.P.W. Ellis, and S. Lawrence, Using voice segments to improve artist classification of music, AES 22nd International Conference, [6] Youngmoo E. Kim and Brian Whitman, Singer identification in popular music recordings using voice coding features, in ISMIR, [7] T. Zhang, Automatic singer identification, Multimedia and Expo, ICME 03. Proceedings International Conference on, vol. 1, [8] W. Chou and L. Gu, Robust singing detection in speech/music discriminator design, Acoustics, Speech, and Signal Processing, Proceedings.(ICASSP 01) IEEE International Conference on, vol. 2, [9] Tin Lay Nwe and Haizhou Li, Singing voice detection using perceptually-motivated features., in ACM Multimedia, Rainer Lienhart, Anand R. Prasad, Alan Hanjalic, Sunghyun Choi, Brian P. Bailey, and Nicu Sebe, Eds. 2007, pp , ACM. [10] M. Ramona and B. Richard, G. David, Vocal detection in music with support vector machine, Acoustics, Speech, and Signal Processing, Proceedings.(ICASSP 08). IEEE International Conference on, pp , March [11] V. Verfaille, C. Guastavino, and P. Depalle, Perceptual evaluation of vibrato models, Proceedings of Conference on Interdisciplinary Musicology, [12] E. Prame, Measurements of the vibrato rate of ten singers, The Journal of the Acoustical Society of America, vol. 96, pp. 1979, [13] R. Timmers and P. Desain, Vibrato : Questions and answers from musicians and science, Proc. ICMPC, [14] Axel Roebel, Frequency-slope estimation and its application to parameter estimation for non-stationary sinusoids, Computer Music Journal, [15] G. Peeters, Automatic classification of large musical instrument databases using hierarchical classifiers with inertia ratio maximization, 115th AES convention, New York, USA, October, 2003.
Separation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song
, pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,
Novel Client Booking System in KLCC Twin Tower Bridge
Novel Client Booking System in KLCC Twin Tower Bridge Hossein Ameri Mahabadi, Reza Ameri To cite this version: Hossein Ameri Mahabadi, Reza Ameri. Novel Client Booking System in KLCC Twin Tower Bridge.
Mobility management and vertical handover decision making in heterogeneous wireless networks
Mobility management and vertical handover decision making in heterogeneous wireless networks Mariem Zekri To cite this version: Mariem Zekri. Mobility management and vertical handover decision making in
ibalance-abf: a Smartphone-Based Audio-Biofeedback Balance System
ibalance-abf: a Smartphone-Based Audio-Biofeedback Balance System Céline Franco, Anthony Fleury, Pierre-Yves Guméry, Bruno Diot, Jacques Demongeot, Nicolas Vuillerme To cite this version: Céline Franco,
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs
IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs Thomas Durieux, Martin Monperrus To cite this version: Thomas Durieux, Martin Monperrus. IntroClassJava: A Benchmark of 297 Small and Buggy
Cobi: Communitysourcing Large-Scale Conference Scheduling
Cobi: Communitysourcing Large-Scale Conference Scheduling Haoqi Zhang, Paul André, Lydia Chilton, Juho Kim, Steven Dow, Robert Miller, Wendy E. Mackay, Michel Beaudouin-Lafon To cite this version: Haoqi
Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs
Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim To cite this version: Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim.
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:
Recent advances in Digital Music Processing and Indexing
Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components
QASM: a Q&A Social Media System Based on Social Semantics
QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media
L9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
Annotated bibliographies for presentations in MUMT 611, Winter 2006
Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.
The truck scheduling problem at cross-docking terminals
The truck scheduling problem at cross-docking terminals Lotte Berghman,, Roel Leus, Pierre Lopez To cite this version: Lotte Berghman,, Roel Leus, Pierre Lopez. The truck scheduling problem at cross-docking
A graph based framework for the definition of tools dealing with sparse and irregular distributed data-structures
A graph based framework for the definition of tools dealing with sparse and irregular distributed data-structures Serge Chaumette, Jean-Michel Lepine, Franck Rubi To cite this version: Serge Chaumette,
Territorial Intelligence and Innovation for the Socio-Ecological Transition
Territorial Intelligence and Innovation for the Socio-Ecological Transition Jean-Jacques Girardot, Evelyne Brunau To cite this version: Jean-Jacques Girardot, Evelyne Brunau. Territorial Intelligence and
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
Automatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
A model driven approach for bridging ILOG Rule Language and RIF
A model driven approach for bridging ILOG Rule Language and RIF Valerio Cosentino, Marcos Didonet del Fabro, Adil El Ghali To cite this version: Valerio Cosentino, Marcos Didonet del Fabro, Adil El Ghali.
Speech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium [email protected]
Ericsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
Expanding Renewable Energy by Implementing Demand Response
Expanding Renewable Energy by Implementing Demand Response Stéphanie Bouckaert, Vincent Mazauric, Nadia Maïzi To cite this version: Stéphanie Bouckaert, Vincent Mazauric, Nadia Maïzi. Expanding Renewable
An Automatic Reversible Transformation from Composite to Visitor in Java
An Automatic Reversible Transformation from Composite to Visitor in Java Akram To cite this version: Akram. An Automatic Reversible Transformation from Composite to Visitor in Java. CIEL 2012, P. Collet,
A usage coverage based approach for assessing product family design
A usage coverage based approach for assessing product family design Jiliang Wang To cite this version: Jiliang Wang. A usage coverage based approach for assessing product family design. Other. Ecole Centrale
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
School Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
Generalization Capacity of Handwritten Outlier Symbols Rejection with Neural Network
Generalization Capacity of Handwritten Outlier Symbols Rejection with Neural Network Harold Mouchère, Eric Anquetil To cite this version: Harold Mouchère, Eric Anquetil. Generalization Capacity of Handwritten
Advantages and disadvantages of e-learning at the technical university
Advantages and disadvantages of e-learning at the technical university Olga Sheypak, Galina Artyushina, Anna Artyushina To cite this version: Olga Sheypak, Galina Artyushina, Anna Artyushina. Advantages
Audio Content Analysis for Online Audiovisual Data Segmentation and Classification
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 441 Audio Content Analysis for Online Audiovisual Data Segmentation and Classification Tong Zhang, Member, IEEE, and C.-C. Jay
Contribution of Multiresolution Description for Archive Document Structure Recognition
Contribution of Multiresolution Description for Archive Document Structure Recognition Aurélie Lemaitre, Jean Camillerapp, Bertrand Coüasnon To cite this version: Aurélie Lemaitre, Jean Camillerapp, Bertrand
Minkowski Sum of Polytopes Defined by Their Vertices
Minkowski Sum of Polytopes Defined by Their Vertices Vincent Delos, Denis Teissandier To cite this version: Vincent Delos, Denis Teissandier. Minkowski Sum of Polytopes Defined by Their Vertices. Journal
From Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
MUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
Evaluating grapheme-to-phoneme converters in automatic speech recognition context
Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Flauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically
Flauncher and Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically Daniel Balouek, Adrien Lèbre, Flavien Quesnel To cite this version: Daniel Balouek,
VR4D: An Immersive and Collaborative Experience to Improve the Interior Design Process
VR4D: An Immersive and Collaborative Experience to Improve the Interior Design Process Amine Chellali, Frederic Jourdan, Cédric Dumas To cite this version: Amine Chellali, Frederic Jourdan, Cédric Dumas.
Analysis/resynthesis with the short time Fourier transform
Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis
Developing an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
Performance Evaluation of Encryption Algorithms Key Length Size on Web Browsers
Performance Evaluation of Encryption Algorithms Key Length Size on Web Browsers Syed Zulkarnain Syed Idrus, Syed Alwee Aljunid, Salina Mohd Asi, Suhizaz Sudin To cite this version: Syed Zulkarnain Syed
Music Genre Classification
Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails
Voltage. Oscillator. Voltage. Oscillator
fpa 147 Week 6 Synthesis Basics In the early 1960s, inventors & entrepreneurs (Robert Moog, Don Buchla, Harold Bode, etc.) began assembling various modules into a single chassis, coupled with a user interface
Lecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
Thirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
The Phase Modulator In NBFM Voice Communication Systems
The Phase Modulator In NBFM Voice Communication Systems Virgil Leenerts 8 March 5 The phase modulator has been a point of discussion as to why it is used and not a frequency modulator in what are called
Formant frequencies of British English vowels produced by native speakers of Farsi
Formant frequencies of British English vowels produced by native speakers of Farsi Gordon HUNTER, Hanna Kebede To cite this version: Gordon HUNTER, Hanna Kebede. Formant frequencies of British English
Managing Risks at Runtime in VoIP Networks and Services
Managing Risks at Runtime in VoIP Networks and Services Oussema Dabbebi, Remi Badonnel, Olivier Festor To cite this version: Oussema Dabbebi, Remi Badonnel, Olivier Festor. Managing Risks at Runtime in
Artificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
Onset Detection and Music Transcription for the Irish Tin Whistle
ISSC 24, Belfast, June 3 - July 2 Onset Detection and Music Transcription for the Irish Tin Whistle Mikel Gainza φ, Bob Lawlor* and Eugene Coyle φ φ Digital Media Centre Dublin Institute of Technology
An update on acoustics designs for HVAC (Engineering)
An update on acoustics designs for HVAC (Engineering) Ken MARRIOTT To cite this version: Ken MARRIOTT. An update on acoustics designs for HVAC (Engineering). Société Française d Acoustique. Acoustics 2012,
A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer
A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742
Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
New implementions of predictive alternate analog/rf test with augmented model redundancy
New implementions of predictive alternate analog/rf test with augmented model redundancy Haithem Ayari, Florence Azais, Serge Bernard, Mariane Comte, Vincent Kerzerho, Michel Renovell To cite this version:
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC
12th International Society for Music Information Retrieval Conference (ISMIR 2011) MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC Yonatan Vaizman Edmond & Lily Safra Center for Brain Sciences,
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids
Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain
The Computer Music Tutorial
Curtis Roads with John Strawn, Curtis Abbott, John Gordon, and Philip Greenspun The Computer Music Tutorial Corrections and Revisions The MIT Press Cambridge, Massachusetts London, England 2 Corrections
On the beam deflection method applied to ultrasound absorption measurements
On the beam deflection method applied to ultrasound absorption measurements K. Giese To cite this version: K. Giese. On the beam deflection method applied to ultrasound absorption measurements. Journal
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Monophonic Music Recognition
Monophonic Music Recognition Per Weijnitz Speech Technology 5p [email protected] 5th March 2003 Abstract This report describes an experimental monophonic music recognition system, carried out
HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard
HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Victor Bisot, Slim Essid, Gaël Richard Institut Mines-Télécom, Télécom ParisTech, CNRS LTCI, 37-39 rue Dareau, 75014
Musical Literacy. Clarifying Objectives. Musical Response
North Carolina s Kindergarten Music Note on Numbering/Strands: ML Musical Literacy, MR Musical Response, CR Contextual Relevancy Musical Literacy K.ML.1 Apply the elements of music and musical techniques
Adding Sinusoids of the Same Frequency. Additive Synthesis. Spectrum. Music 270a: Modulation
Adding Sinusoids of the Same Frequency Music 7a: Modulation Tamara Smyth, [email protected] Department of Music, University of California, San Diego (UCSD) February 9, 5 Recall, that adding sinusoids of
Overview of model-building strategies in population PK/PD analyses: 2002-2004 literature survey.
Overview of model-building strategies in population PK/PD analyses: 2002-2004 literature survey. Céline Dartois, Karl Brendel, Emmanuelle Comets, Céline Laffont, Christian Laveille, Brigitte Tranchand,
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Use of tabletop exercise in industrial training disaster.
Use of tabletop exercise in industrial training disaster. Alexis Descatha, Thomas Loeb, François Dolveck, Nathalie-Sybille Goddet, Valerie Poirier, Michel Baer To cite this version: Alexis Descatha, Thomas
ANALYSIS OF SNOEK-KOSTER (H) RELAXATION IN IRON
ANALYSIS OF SNOEK-KOSTER (H) RELAXATION IN IRON J. San Juan, G. Fantozzi, M. No, C. Esnouf, F. Vanoni To cite this version: J. San Juan, G. Fantozzi, M. No, C. Esnouf, F. Vanoni. ANALYSIS OF SNOEK-KOSTER
Semantic Analysis of Song Lyrics
Semantic Analysis of Song Lyrics Beth Logan, Andrew Kositsky 1, Pedro Moreno Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-66 April 14, 2004* E-mail: [email protected], [email protected]
FAST MIR IN A SPARSE TRANSFORM DOMAIN
ISMIR 28 Session 4c Automatic Music Analysis and Transcription FAST MIR IN A SPARSE TRANSFORM DOMAIN Emmanuel Ravelli Université Paris 6 [email protected] Gaël Richard TELECOM ParisTech [email protected]
arxiv:1506.04135v1 [cs.ir] 12 Jun 2015
Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1 - Viadeo
Less naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:[email protected] also CoSiNe Connectivity Systems
