Data driven design of filter bank for speech recognition

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Data driven design of filter bank for speech recognition"

Transcription

1 Data driven design of filter bank for speech recognition Lukáš Burget 12 and Hynek Heřmanský 23 1 Oregon Graduate Institute, Anthropic Signal Processing Group, 2 NW Walker Rd., Beaverton, Oregon , USA, 2 International Computer Science Institute, 1947 Center Street Suite 6, Berkeley, CA , USA, 3 Brno Univ. of Technology, Inst. of Radioelectronics, Purkyňova 118, 612, Brno, Czech Republic Abstract. Filter bank approach is commonly used in feature extraction phase of speech recognition (e.g. Mel frequency cepstral coefficients). Filter bank is applied for modification of magnitude spectrum according to physiological and psychological findings. However, since mechanism of human auditory system is not fully understood, the optimal filter bank parameters are not known. This work presents a method where the filter bank, optimized for discriminability between phonemes, is derived directly from phonetically labeled speech data using Linear Discriminant Analysis. This work can be seen as another proof of the fact that incorporation of psychoacoustic findings into feature extraction can lead to better recognition performance. 1 Introduction Feature extraction is an important part of speech recognition process where input waveform is processed for the following pattern classification. While classification is usually based on stochastic approaches where models are trained on data, feature extraction is generally based on knowledge and beliefs. Current methods of feature extraction are mostly based on short term Fourier spectrum and its changes in the time. Auditory-like modifications inspired by physiological and psychological findings are performed on spectra of each speech frame in the sequence. Mel frequency cepstral coefficients [2] are commonly used as feature extraction method where energies in spectrum are integrated by a set of band limited triangular weighting functions (filter bank). These weighting functions are equidistantly distributed over mel scale according to psycho-acoustic findings where better resolution in spectrum is preserved for lower frequencies than for higher frequencies. The log of integrated spectral energies is taken (which corresponds to human perception of loudness) and finally a projection to cosine bases is performed. However, since mechanism of human auditory system is not fully

2 understood, the optimal system for feature extraction is not known. Moreover, psychoacoustic findings often describe limitations of human auditory system and we do not know if modeling of those limitations is useful for speech recognition. This work presents a method where the filter bank is derived directly from phonetically labeled speech data. We can obtain both, frequency warping and shape of individual weighting function of filter bank as result of this method. 2 Linear Discriminant Analysis The method is based on Linear Discriminant Analysis (LDA) proposed by Hunt [3]. LDA is a technique looking for such linear transform which allows dimension reduction of input data. However, it preserves information important for linear discrimination among input vectors which belong to different classes. The output of LDA is a set of linear independent vectors which are bases of a linear transform and which are sorted by their importance for discrimination among different classes. Since we have also information about importance of particular base vectors, we can pick up only several first basis which preserve almost all the variability in the data important for the discriminability. In other words, the resulting transformation matrix contains only several first columns of matrix obtained by LDA. y C 1 m 1 m 2 C 2 x z Fig. 1. Linear discriminant analysis for 2-dimensional data

3 The figure 1 demonstrates effect of LDA for 2-dimensional data vectors which belong to two classes. The grey and the empty ellipses represent distributions of data of two different classes C 1 and C 2 with mean vectors m 1 and m 2. The axes X and Y are coordinates of the original space. Large overlap of the class distributions can be seen in both directions of these original coordinates. The axis Z then shows the direction obtained by LDA. The classes are well separated after their projection into this direction. Since this example deals just with two classes and since LDA assumes that distributions of all classes are Gaussian with the same covariance matrix, no other direction can be obtained for better discrimination. Base vectors of LDA transforms are given by the eigen vectors of a matrix Σ 1 wc Σ ac. The within-class covariance matrix Σ wc represents unwanted variability in data and it is computed as the weighted mean of covariance matrices of classes: Σ wc = E[Σ p ] where Σ p is covariance matrix of particular class. The across-class covariance matrix Σ ac represents the wanted variability in data and it is computed as an estimation of covariance matrices for mean vectors of classes. Σ ac = E[(µ p µ)(µ p µ) T ] where µ p is mean vector for particular class and µ is global mean vector. An eigen value associated with one eigen vector represents the amount of variability (necessary for the discriminability) preserved by the projection of input vectors to this particular eigen vector. a dimension reduction. If LDA is to be used for dimension reduction, only several eigen vectors corresponding to the highest eigen values can be used. 3 Filter bank derived from data Filter bank is derived directly from phonetically labeled speech data using LDA described in previous section. In this case the magnitude Fourier spectra of all training data frames are directly used for computation of across-class and within-class covariance matrices. In our speech recognition task, we want to distinguish between different phonemes. Spectra representing speech frames labeled by the same phoneme belong to one class. Examples of across-class covariance and within-class covariance matrices derived this way from speech data from TIMIT database are shown in figure 2. Half of symmetric magnitude spectrum (129 points) was used as vectors for deriving these covariance matrices. The figure 3 shows first 5 LDA spectral bases given by the eigen vectors of the matrix Σ 1 wc Σ ac. The eigen values in figure 3a indicate that almost all variability in data important for class separability is preserved by the projection to only several first base vectors. The linear transform can be performed by the multiplication of an input vector and a matrix M, where columns are the base vectors. In our case, we choose only 13 first base vectors, so the transform matrix M has 129 rows and 13 columns.

4 Fig. 2. Across-class and within-class covariance matrix computed from magnitude spectrum a) Eigen values b) 1st Eigen vector c) 2nd Eigen vector.3 d) 3rd Eigen vector e) 4th Eigen vector.3 f) 5th Eigen vector Fig. 3. Basis derived using LDA from magnitude spectrum

5 3.1 Smoothing of speech spectra The projection of magnitude spectrum of one speech frame into these selected basis results in new vector (13 points) which should contain almost the same information for correct recognition as the original spectrum. Since the base vectors are linear independent, it is possible to obtain another transform which projects the reduced vector back into the original space - spectrum (129 points long). This transform is given by the pseudoinverse transform matrix M 1. We will obtain a final transform by joining (multiplying) both mentioned matrices M M 1. This transform projects the magnitude spectrum into its smoothed version where the information useless for discriminability among phonemes is removed. Each column of the final transformation matrix represents a weighting function for integrating band of frequencies around the point corresponding to the index of given column. Every 5-th of these weighting functions are shown in figure 4a. The resulting weighting functions for integration of lower frequencies are very narrow (integrating only several points of spectra and preserving more details), while functions integrating higher frequencies are much wider. This fact corresponds also with psychoacoustic findings about human frequency resolution. 3.2 Deriving of filter bank It is also possible to derive frequency warping by measuring and integrating bandwidths (widths) of consequent weighting functions (figures 4b and 4c). The smoothed spectrum can be represented by selecting only some of its samples without loosing any information. It means that we can pick up only several weighting functions and perform projection of original spectrum into them. Their selection must be done according to the warping derived. This way we end up with a set of weighting functions which are very similar to commonly used Mel filter bank (figure 4d). 4 Limitations of the method and conclusions Our experience shows that recognizers based on feature extraction inspired by psychoacoustic findings about nonuniform human resolution in frequencies can perform better than those based on pure short term Fourier spectrum. This work can be seen as another proof of the fact that incorporation of those psychoacoustic findings into feature extraction leads to better separability among phonemes in low dimensional feature space and also to better recognition performance. However, the LDA technique expects that data which belong to individual classes have the same Gaussian distribution and that also mean values of classes obey a Gaussian distribution. Of course this is not true for magnitude spectra of speech. The quest for optimal filter bank for speech recognition is therefore still open.

6 .3 a) Every 5 th weighting function.3 b) Bandwidth of weighting funtions warped spectrum (129 ~ 4kHz) c) Estimated warping of spectrum d) Derived filter bank Fig. 4. Filter bank and warping derived using LDA References 1. B. Gold and N. Morgan. Speech and Audio Signal Processing, New York, S. B. Davis and P. Mermelstein. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences IEEE Trans. on Acoustics, Speech & Signal Processing, vol. 28, no. 4, pp , M. J. Hunt. A statistical approach to metrics for word and syllable recognition J. Acoust Soc. Am., vol. 66(S1), S35(A), N. Malayath. Data-Driven Methods for Extracting Features from Speech Ph.D. thesis, Oregon Graduate Institute, Portland, USA, H. Hermansky and N. Malayath. Spectral Basis Functions from Discriminant Analysis in Proceedings ICSLP 98, Sydney, Australia, November L. Rabiner and B. H. Juang. Fundamentals of speech recognition Signal Processing. Prentice Hall, Engelwood cliffs, NJ, S. Young. The HTK Book Entropics Ltd. 1999

Front-End Signal Processing for Speech Recognition

Front-End Signal Processing for Speech Recognition Front-End Signal Processing for Speech Recognition MILAN RAMLJAK 1, MAJA STELLA 2, MATKO ŠARIĆ 2 1 Ericsson Nikola Tesla Poljička 39, HR-21 Split 2 FESB - University of Split R. Boškovića 32, HR-21 Split

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Speaker Change Detection using Support Vector Machines

Speaker Change Detection using Support Vector Machines Speaker Change Detection using Support Vector Machines V. Kartik and D. Srikrishna Satish and C. Chandra Sekhar Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Signal Processing for Speech Recognition

Signal Processing for Speech Recognition Signal Processing for Speech Recognition Once a signal has been sampled, we have huge amounts of data, often 20,000 16 bit numbers a second! We need to find ways to concisely capture the properties of

More information

CLASSIFICATION OF NORTH INDIAN MUSICAL INSTRUMENTS USING SPECTRAL FEATURES

CLASSIFICATION OF NORTH INDIAN MUSICAL INSTRUMENTS USING SPECTRAL FEATURES CLASSIFICATION OF NORTH INDIAN MUSICAL INSTRUMENTS USING SPECTRAL FEATURES M. Kumari*, P. Kumar** and S. S. Solanki* * Department of Electronics and Communication, Birla Institute of Technology, Mesra,

More information

TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS

TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS Tobias Bocklet and Andreas Maier and Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Martenstr.3, 91058 Erlangen,

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR

SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR Masatsune Tamura y, Takashi Masuko y, Keiichi Tokuda yy, and Takao Kobayashi y ytokyo Institute of Technology, Yokohama, 6-8 JAPAN yynagoya

More information

An Approach to Extract Feature using MFCC

An Approach to Extract Feature using MFCC IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 08 (August. 2014), V1 PP 21-25 www.iosrjen.org An Approach to Extract Feature using MFCC Parwinder Pal Singh,

More information

PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED

PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED A. Matthew Zimmer, Bingjun Dai, Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk,

More information

Signal Modeling for High-Performance Robust Isolated Word Recognition

Signal Modeling for High-Performance Robust Isolated Word Recognition 1 Signal Modeling for High-Performance Robust Isolated Word Recognition Montri Karnjanadecha and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

On the use of asymmetric windows for robust speech recognition

On the use of asymmetric windows for robust speech recognition Circuits, Systems and Signal Processing manuscript No. (will be inserted by the editor) On the use of asymmetric windows for robust speech recognition Juan A. Morales-Cordovilla Victoria Sánchez Antonio

More information

Generating Gaussian Mixture Models by Model Selection For Speech Recognition

Generating Gaussian Mixture Models by Model Selection For Speech Recognition Generating Gaussian Mixture Models by Model Selection For Speech Recognition Kai Yu F06 10-701 Final Project Report kaiy@andrew.cmu.edu Abstract While all modern speech recognition systems use Gaussian

More information

Chapter 14. MPEG Audio Compression

Chapter 14. MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Overview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR

Overview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR Overview Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 7/2 January 23 Speech Signal Analysis for ASR Reading: Features for ASR Spectral analysis

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Fast Fourier Transforms and Power Spectra in LabVIEW

Fast Fourier Transforms and Power Spectra in LabVIEW Application Note 4 Introduction Fast Fourier Transforms and Power Spectra in LabVIEW K. Fahy, E. Pérez Ph.D. The Fourier transform is one of the most powerful signal analysis tools, applicable to a wide

More information

Modeling zonal electricity prices by anisotropic diffusion embeddings

Modeling zonal electricity prices by anisotropic diffusion embeddings In many fields including economics, collection of time series such as stocks or energy prices are governed by a similar non-linear dynamical process. These time series are often measured hourly, thus,

More information

L6: Short-time Fourier analysis and synthesis

L6: Short-time Fourier analysis and synthesis L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

ISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116

ISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY HINDI SPEECH RECOGNITION SYSTEM USING MFCC AND HTK TOOLKIT Nikita Dhanvijay *, Prof. P. R. Badadapure Department of Electronics

More information

CHAPTER 5 SPEAKER IDENTIFICATION USING SPEAKER- SPECIFIC-TEXT

CHAPTER 5 SPEAKER IDENTIFICATION USING SPEAKER- SPECIFIC-TEXT 52 CHAPTER 5 SPEAKER IDENTIFICATION USING SPEAKER- SPECIFIC-TEXT 5.1 MOTIVATION FOR USING SPEAKER-SPECIFIC-TEXT Better classification accuracy can be achieved if the training technique is able to capture

More information

Speech Signal Processing introduction

Speech Signal Processing introduction Speech Signal Processing introduction Jan Černocký, Valentina Hubeika {cernocky,ihubeika}@fit.vutbr.cz DCGM FIT BUT Brno FIT BUT Brno Speech Signal Processing introduction. Valentina Hubeika, DCGM FIT

More information

WAVELET-FOURIER ANALYSIS FOR SPEAKER RECOGNITION

WAVELET-FOURIER ANALYSIS FOR SPEAKER RECOGNITION Zakopane Kościelisko, 1 st 6 th September 2011 WAVELET-FOURIER ANALYSIS FOR SPEAKER RECOGNITION Mariusz Ziółko, Rafał Samborski, Jakub Gałka, Bartosz Ziółko Department of Electronics, AGH University of

More information

Removal of Noise from MRI using Spectral Subtraction

Removal of Noise from MRI using Spectral Subtraction International Journal of Electronic and Electrical Engineering. ISSN 0974-2174, Volume 7, Number 3 (2014), pp. 293-298 International Research Publication House http://www.irphouse.com Removal of Noise

More information

AUTOMATIC IDENTIFICATION OF BIRD CALLS USING SPECTRAL ENSEMBLE AVERAGE VOICE PRINTS. Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar

AUTOMATIC IDENTIFICATION OF BIRD CALLS USING SPECTRAL ENSEMBLE AVERAGE VOICE PRINTS. Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar AUTOMATIC IDENTIFICATION OF BIRD CALLS USING SPECTRAL ENSEMBLE AVERAGE VOICE PRINTS Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar Indian Institute of Technology Madras Chennai, 60006,

More information

Music instrument categorization using multilayer perceptron network Ivana Andjelkovic PHY 171, Winter 2011

Music instrument categorization using multilayer perceptron network Ivana Andjelkovic PHY 171, Winter 2011 Music instrument categorization using multilayer perceptron network Ivana Andjelkovic PHY 171, Winter 2011 Abstract Audio content description is one of the key components to multimedia search, classification

More information

EVALUATION OF SPEAKER RECOGNITION FEATURE-SETS USING THE SVM CLASSIFIER. Daniel J. Mashao and John Greene

EVALUATION OF SPEAKER RECOGNITION FEATURE-SETS USING THE SVM CLASSIFIER. Daniel J. Mashao and John Greene EVALUATION OF SPEAKER RECOGNITION FEATURE-SETS USING THE SVM CLASSIFIER Daniel J. Mashao and John Greene STAR Research Group, Department of Electrical Engineering, UCT, Rondebosch, 77, South Africa, www.star.za.net

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

T-61.184. Automatic Speech Recognition: From Theory to Practice

T-61.184. Automatic Speech Recognition: From Theory to Practice Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 27, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University

More information

EVALUATION OF KANNADA TEXT-TO-SPEECH [KTTS] SYSTEM

EVALUATION OF KANNADA TEXT-TO-SPEECH [KTTS] SYSTEM Volume 2, Issue 1, January 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: EVALUATION OF KANNADA TEXT-TO-SPEECH

More information

230622 - DSAP - Digital Speech and Audio Processing

230622 - DSAP - Digital Speech and Audio Processing Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 230 - ETSETB - Barcelona School of Telecommunications Engineering 739 - TSC - Department of Signal Theory and Communications

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

By choosing to view this document, you agree to all provisions of the copyright laws protecting it. This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

More information

Using the Singular Value Decomposition

Using the Singular Value Decomposition Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

More information

(LDA). I. INTRODUCTION

(LDA). I. INTRODUCTION A Study on Speech Recognition Mridula Shanbhogue 1, Shreya Kulkarni 2, R Suprith 3, Tejas K I 4, Nagarathna N 5 Dept. Of Computer Science & Engineering, B.M.S. College of Engineering, Bangalore, India

More information

SPEAKER verification, also known as voiceprint verification,

SPEAKER verification, also known as voiceprint verification, JOURNAL OF L A TEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 1 Deep Speaker Vectors for Semi Text-independent Speaker Verification Lantian Li, Dong Wang Member, IEEE, Zhiyong Zhang, Thomas Fang Zheng

More information

Speech Analysis using PRAAT (A brief guide prepared by Pranav Jawale)

Speech Analysis using PRAAT (A brief guide prepared by Pranav Jawale) Speech Analysis using PRAAT (A brief guide prepared by Pranav Jawale) 1. Praat installation Windows users can install the latest version of Praat from http://www.fon.hum.uva.nl/praat/download_win.html

More information

SVM Speaker Verification using Session Variability Modelling and GMM Supervectors

SVM Speaker Verification using Session Variability Modelling and GMM Supervectors SVM Speaker Verification using Session Variability Modelling and GMM Supervectors M. McLaren, R. Vogt, S. Sridharan Speech and Audio Research Laboratory Queensland University of Technology, Brisbane, Australia

More information

Robust Speaker Identification System Based on Two-Stage Vector Quantization

Robust Speaker Identification System Based on Two-Stage Vector Quantization Tamkang Journal of Science and Engineering, Vol. 11, No. 4, pp. 357 366 (2008) 357 Robust Speaker Identification System Based on Two-Stage Vector Quantization Wan-Chen Chen 1,2, Ching-Tang Hsieh 2 * and

More information

Synthesis Techniques. Juan P Bello

Synthesis Techniques. Juan P Bello Synthesis Techniques Juan P Bello Synthesis It implies the artificial construction of a complex body by combining its elements. Complex body: acoustic signal (sound) Elements: parameters and/or basic signals

More information

Speech Authentication based on Audio Watermarking

Speech Authentication based on Audio Watermarking S.Saraswathi S.Saraswathi Assistant Professor, Department of Information Technology, Pondicherry Engineering College, Pondicherry, India swathimuk@yahoo.com Abstract With the rapid advancement of digital

More information

Improvement of an Automatic Speech Recognition Toolkit

Improvement of an Automatic Speech Recognition Toolkit Improvement of an Automatic Speech Recognition Toolkit Christopher Edmonds, Shi Hu, David Mandle December 14, 2012 Abstract The Kaldi toolkit provides a library of modules designed to expedite the creation

More information

Image Compression Effects on Face Recognition for Images with Reduction in Size

Image Compression Effects on Face Recognition for Images with Reduction in Size Image Compression Effects on Face Recognition for Images with Reduction in Size Padmaja.V.K Jawaharlal Nehru Technological University, Anantapur Giri Prasad, PhD. B. Chandrasekhar, PhD. ABSTRACT In this

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Stress management with Music Therapy

Stress management with Music Therapy Stress management with Music Therapy Sougata Das 1 and Ayan Mukherjee 2 1 Senior Systems Engineer, IBM India Private Limited, Kolkata, India 2 Assistant Professor, Dept. of MCA, Brainware Group of Institutions,

More information

COMPARISON BETWEEN CLASSICAL AND MODERN METHODS OF DIRECTION OF ARRIVAL (DOA) ESTIMATION

COMPARISON BETWEEN CLASSICAL AND MODERN METHODS OF DIRECTION OF ARRIVAL (DOA) ESTIMATION International Journal of Advances in Engineering & Technology, July, 14. COMPARISON BETWEEN CLASSICAL AND MODERN METHODS OF DIRECTION OF ARRIVAL (DOA) ESTIMATION Mujahid F. Al-Azzo 1, Khalaf I. Al-Sabaawi

More information

LPC ANALYSIS AND SYNTHESIS

LPC ANALYSIS AND SYNTHESIS 33 Chapter 3 LPC ANALYSIS AND SYNTHESIS 3.1 INTRODUCTION Analysis of speech signals is made to obtain the spectral information of the speech signal. Analysis of speech signal is employed in variety of

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models

Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Denis Tananaev, Galina Shagrova, Victor Kozhevnikov North-Caucasus Federal University,

More information

Face Recognition using Principle Component Analysis

Face Recognition using Principle Component Analysis Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA

More information

Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline

Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline Thomas Schatz 1,2, Vijayaditya Peddinti 3, Francis Bach 2, Aren Jansen 3, Hynek Hermansky 3, Emmanuel

More information

Automatic Speech Recognition Using Template Model for Man-Machine Interface

Automatic Speech Recognition Using Template Model for Man-Machine Interface Automatic Speech Recognition Using Template Model for Man-Machine Interface Neema Mishra M.Tech. (CSE) Project Student G H Raisoni College of Engg. Nagpur University, Nagpur, neema.mishra@gmail.com Urmila

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Speech Emotion Recognition: Comparison of Speech Segmentation Approaches

Speech Emotion Recognition: Comparison of Speech Segmentation Approaches Speech Emotion Recognition: Comparison of Speech Segmentation Approaches Muharram Mansoorizadeh Electrical and Computer Engineering Department Tarbiat Modarres University Tehran, Iran mansoorm@modares.ac.ir

More information

Speech Analysis for Automatic Speech Recognition

Speech Analysis for Automatic Speech Recognition Speech Analysis for Automatic Speech Recognition Noelia Alcaraz Meseguer Master of Science in Electronics Submission date: July 2009 Supervisor: Torbjørn Svendsen, IET Norwegian University of Science and

More information

THE GOAL of this work is to learn discriminative components

THE GOAL of this work is to learn discriminative components 68 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 1, JANUARY 2005 Discriminative Components of Data Jaakko Peltonen and Samuel Kaski, Senior Member, IEEE Abstract A simple probabilistic model is introduced

More information

FIR Low Pass Filter for Improving Performance Characteristics of Various Windows

FIR Low Pass Filter for Improving Performance Characteristics of Various Windows FIR Low Pass Filter for Improving Performance Characteristics of Various Windows Mantar Singh Mandloi Assistant Prof. Department of Electronics & Communication Engineering, Rewa Engineering College, Rewa

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

DIGITAL AUDIO WATERMARKING USING PSYCHOACOUSTIC MODEL

DIGITAL AUDIO WATERMARKING USING PSYCHOACOUSTIC MODEL DIGITAL AUDIO WATERMARKING USING PSYCHOACOUSTIC MODEL AND SPREAD SPECTRUM THEORY Manish Neoliya School of Electrical and Electronics Engineering Nanyang Technological University Singapore-639789 Email:

More information

Bayesian Classification

Bayesian Classification CS 650: Computer Vision Bryan S. Morse BYU Computer Science Statistical Basis Training: Class-Conditional Probabilities Suppose that we measure features for a large training set taken from class ω i. Each

More information

A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION

A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION ZAKI B. NOSSAIR AND STEPHEN A. ZAHORIAN Department of Electrical and Computer Engineering Old Dominion University, Norfolk,

More information

MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming

MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming International Journal of Science and Research (IJSR) MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming Sandeep Joshi1, Sneha Nagar2 1 PG Student, Embedded Systems, Oriental

More information

Matlab GUI for WFB spectral analysis

Matlab GUI for WFB spectral analysis Matlab GUI for WFB spectral analysis Jan Nováček Department of Radio Engineering K13137, CTU FEE Prague Abstract In the case of the sound signals analysis we usually use logarithmic scale on the frequency

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

FEATURE EXTRACTION MEL FREQUENCY CEPSTRAL COEFFICIENTS (MFCC) MUSTAFA YANKAYIŞ

FEATURE EXTRACTION MEL FREQUENCY CEPSTRAL COEFFICIENTS (MFCC) MUSTAFA YANKAYIŞ FEATURE EXTRACTION MEL FREQUENCY CEPSTRAL COEFFICIENTS (MFCC) MUSTAFA YANKAYIŞ CONTENTS SPEAKER RECOGNITION SPEECH PRODUCTION FEATURE EXTRACTION FEATURES MFCC PRE-EMPHASIS FRAMING WINDOWING DFT (FFT) MEL-FILTER

More information

Timbre. Chapter 9. 9.1 Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli

Timbre. Chapter 9. 9.1 Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli Chapter 9 Timbre Hanna Järveläinen Giovanni De Poli 9.1 Definition of timbre modelling Giving a definition of timbre modelling is a complicated task. The meaning of the term "timbre" in itself is somewhat

More information

Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces

Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces More Info == Better Performance? Feature Extraction and Selection APR Course, Delft, The Netherlands Marco Loog Feature Space Curse of Dimensionality A p-dimensional space, in which each dimension is a

More information

Objective Speech Quality Measures for Internet Telephony

Objective Speech Quality Measures for Internet Telephony Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models

Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models Sri Harish Mallidi 1, Sriram Ganapathy 2, Hynek Hermansky 1 1 Center for Language and Speech Processing, Johns Hopkins University,

More information

Computer Vision: Filtering

Computer Vision: Filtering Computer Vision: Filtering Raquel Urtasun TTI Chicago Jan 10, 2013 Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 1 / 82 Today s lecture... Image formation Image Filtering Raquel Urtasun (TTI-C) Computer

More information

The Discrete Fourier Transform

The Discrete Fourier Transform The Discrete Fourier Transform Introduction The discrete Fourier transform (DFT) is a fundamental transform in digital signal processing, with applications in frequency analysis, fast convolution, image

More information

Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing)

Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing) Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing) Christos P. Yiakoumettis Department of Informatics University of Sussex, UK (Email: c.yiakoumettis@sussex.ac.uk)

More information

Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance

Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Iosif Mporas, Todor Ganchev, Elias Kotinas, and Nikos Fakotakis Department of Electrical

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails

More information

L1: Course introduction

L1: Course introduction Course introduction Course logistics Course contents L1: Course introduction Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 1 Course introduction What is speech processing? The study

More information

APPLICATION OF FILTER BANK THEORY TO SUBBAND CODING OF IMAGES

APPLICATION OF FILTER BANK THEORY TO SUBBAND CODING OF IMAGES EC 623 ADVANCED DIGITAL SIGNAL PROCESSING TERM-PROJECT APPLICATION OF FILTER BANK THEORY TO SUBBAND CODING OF IMAGES Y. PRAVEEN KUMAR 03010240 KANCHAN MISHRA 03010242 Supervisor: Dr. S.R.M. Prasanna Department

More information

Topic 4: Continuous-Time Fourier Transform (CTFT)

Topic 4: Continuous-Time Fourier Transform (CTFT) ELEC264: Signals And Systems Topic 4: Continuous-Time Fourier Transform (CTFT) Aishy Amer Concordia University Electrical and Computer Engineering o Introduction to Fourier Transform o Fourier transform

More information

( % . This matrix consists of $ 4 5 " 5' the coefficients of the variables as they appear in the original system. The augmented 3 " 2 2 # 2 " 3 4&

( % . This matrix consists of $ 4 5  5' the coefficients of the variables as they appear in the original system. The augmented 3  2 2 # 2  3 4& Matrices define matrix We will use matrices to help us solve systems of equations. A matrix is a rectangular array of numbers enclosed in parentheses or brackets. In linear algebra, matrices are important

More information

MASKING IN THE MODULATION RATE DOMAIN

MASKING IN THE MODULATION RATE DOMAIN ARCHIVES OF ACOUSTICS 28, 3, 181 189 (2003) MASKING IN THE MODULATION RATE DOMAIN J. LEMAŃSKA, A. P. SEK and W. RYBICKA Institute of Acoustics, Adam Mickiewicz University, 60-615 Poznań, Umultowska 85

More information

Research on Different Feature Parameters in Speaker Recognition

Research on Different Feature Parameters in Speaker Recognition Journal of Signal and Information Processing, 2013, 4, 106-110 http://dx.doi.org/10.4236/jsip.2013.42014 Published Online May 2013 (http://www.scirp.org/journal/jsip) Research on Different Feature Parameters

More information

MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC

MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC 12th International Society for Music Information Retrieval Conference (ISMIR 2011) MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC Yonatan Vaizman Edmond & Lily Safra Center for Brain Sciences,

More information

Forensic Discrimination of Voices using Multi-Speech Techniques

Forensic Discrimination of Voices using Multi-Speech Techniques Forensic Discrimination of Voices using Multi-Speech Techniques Bhanudas K. Dethe *1, Ajit V. Waghmare 2, Vitthal G. Mulik 3, B. P More 4, B. B. Daundkar 5 Directorate Of Forensic Science Laboratories,kalian

More information

Analysis/resynthesis with the short time Fourier transform

Analysis/resynthesis with the short time Fourier transform Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

Closed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches

Closed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches Closed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches G.R.Dhinesh, G.R.Jagadeesh, T.Srikanthan Nanyang Technological University, Centre for High Performance

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information