Data driven design of filter bank for speech recognition
|
|
- Jasper McDaniel
- 7 years ago
- Views:
Transcription
1 Data driven design of filter bank for speech recognition Lukáš Burget 12 and Hynek Heřmanský 23 1 Oregon Graduate Institute, Anthropic Signal Processing Group, 2 NW Walker Rd., Beaverton, Oregon , USA, {hynek,lukas}@ece.ogi.edu 2 International Computer Science Institute, 1947 Center Street Suite 6, Berkeley, CA , USA, hynek@icsi.berkeley.edu 3 Brno Univ. of Technology, Inst. of Radioelectronics, Purkyňova 118, 612, Brno, Czech Republic burget@urel.fee.vutbr.cz Abstract. Filter bank approach is commonly used in feature extraction phase of speech recognition (e.g. Mel frequency cepstral coefficients). Filter bank is applied for modification of magnitude spectrum according to physiological and psychological findings. However, since mechanism of human auditory system is not fully understood, the optimal filter bank parameters are not known. This work presents a method where the filter bank, optimized for discriminability between phonemes, is derived directly from phonetically labeled speech data using Linear Discriminant Analysis. This work can be seen as another proof of the fact that incorporation of psychoacoustic findings into feature extraction can lead to better recognition performance. 1 Introduction Feature extraction is an important part of speech recognition process where input waveform is processed for the following pattern classification. While classification is usually based on stochastic approaches where models are trained on data, feature extraction is generally based on knowledge and beliefs. Current methods of feature extraction are mostly based on short term Fourier spectrum and its changes in the time. Auditory-like modifications inspired by physiological and psychological findings are performed on spectra of each speech frame in the sequence. Mel frequency cepstral coefficients [2] are commonly used as feature extraction method where energies in spectrum are integrated by a set of band limited triangular weighting functions (filter bank). These weighting functions are equidistantly distributed over mel scale according to psycho-acoustic findings where better resolution in spectrum is preserved for lower frequencies than for higher frequencies. The log of integrated spectral energies is taken (which corresponds to human perception of loudness) and finally a projection to cosine bases is performed. However, since mechanism of human auditory system is not fully
2 understood, the optimal system for feature extraction is not known. Moreover, psychoacoustic findings often describe limitations of human auditory system and we do not know if modeling of those limitations is useful for speech recognition. This work presents a method where the filter bank is derived directly from phonetically labeled speech data. We can obtain both, frequency warping and shape of individual weighting function of filter bank as result of this method. 2 Linear Discriminant Analysis The method is based on Linear Discriminant Analysis (LDA) proposed by Hunt [3]. LDA is a technique looking for such linear transform which allows dimension reduction of input data. However, it preserves information important for linear discrimination among input vectors which belong to different classes. The output of LDA is a set of linear independent vectors which are bases of a linear transform and which are sorted by their importance for discrimination among different classes. Since we have also information about importance of particular base vectors, we can pick up only several first basis which preserve almost all the variability in the data important for the discriminability. In other words, the resulting transformation matrix contains only several first columns of matrix obtained by LDA. y C 1 m 1 m 2 C 2 x z Fig. 1. Linear discriminant analysis for 2-dimensional data
3 The figure 1 demonstrates effect of LDA for 2-dimensional data vectors which belong to two classes. The grey and the empty ellipses represent distributions of data of two different classes C 1 and C 2 with mean vectors m 1 and m 2. The axes X and Y are coordinates of the original space. Large overlap of the class distributions can be seen in both directions of these original coordinates. The axis Z then shows the direction obtained by LDA. The classes are well separated after their projection into this direction. Since this example deals just with two classes and since LDA assumes that distributions of all classes are Gaussian with the same covariance matrix, no other direction can be obtained for better discrimination. Base vectors of LDA transforms are given by the eigen vectors of a matrix Σ 1 wc Σ ac. The within-class covariance matrix Σ wc represents unwanted variability in data and it is computed as the weighted mean of covariance matrices of classes: Σ wc = E[Σ p ] where Σ p is covariance matrix of particular class. The across-class covariance matrix Σ ac represents the wanted variability in data and it is computed as an estimation of covariance matrices for mean vectors of classes. Σ ac = E[(µ p µ)(µ p µ) T ] where µ p is mean vector for particular class and µ is global mean vector. An eigen value associated with one eigen vector represents the amount of variability (necessary for the discriminability) preserved by the projection of input vectors to this particular eigen vector. a dimension reduction. If LDA is to be used for dimension reduction, only several eigen vectors corresponding to the highest eigen values can be used. 3 Filter bank derived from data Filter bank is derived directly from phonetically labeled speech data using LDA described in previous section. In this case the magnitude Fourier spectra of all training data frames are directly used for computation of across-class and within-class covariance matrices. In our speech recognition task, we want to distinguish between different phonemes. Spectra representing speech frames labeled by the same phoneme belong to one class. Examples of across-class covariance and within-class covariance matrices derived this way from speech data from TIMIT database are shown in figure 2. Half of symmetric magnitude spectrum (129 points) was used as vectors for deriving these covariance matrices. The figure 3 shows first 5 LDA spectral bases given by the eigen vectors of the matrix Σ 1 wc Σ ac. The eigen values in figure 3a indicate that almost all variability in data important for class separability is preserved by the projection to only several first base vectors. The linear transform can be performed by the multiplication of an input vector and a matrix M, where columns are the base vectors. In our case, we choose only 13 first base vectors, so the transform matrix M has 129 rows and 13 columns.
4 Fig. 2. Across-class and within-class covariance matrix computed from magnitude spectrum a) Eigen values b) 1st Eigen vector c) 2nd Eigen vector.3 d) 3rd Eigen vector e) 4th Eigen vector.3 f) 5th Eigen vector Fig. 3. Basis derived using LDA from magnitude spectrum
5 3.1 Smoothing of speech spectra The projection of magnitude spectrum of one speech frame into these selected basis results in new vector (13 points) which should contain almost the same information for correct recognition as the original spectrum. Since the base vectors are linear independent, it is possible to obtain another transform which projects the reduced vector back into the original space - spectrum (129 points long). This transform is given by the pseudoinverse transform matrix M 1. We will obtain a final transform by joining (multiplying) both mentioned matrices M M 1. This transform projects the magnitude spectrum into its smoothed version where the information useless for discriminability among phonemes is removed. Each column of the final transformation matrix represents a weighting function for integrating band of frequencies around the point corresponding to the index of given column. Every 5-th of these weighting functions are shown in figure 4a. The resulting weighting functions for integration of lower frequencies are very narrow (integrating only several points of spectra and preserving more details), while functions integrating higher frequencies are much wider. This fact corresponds also with psychoacoustic findings about human frequency resolution. 3.2 Deriving of filter bank It is also possible to derive frequency warping by measuring and integrating bandwidths (widths) of consequent weighting functions (figures 4b and 4c). The smoothed spectrum can be represented by selecting only some of its samples without loosing any information. It means that we can pick up only several weighting functions and perform projection of original spectrum into them. Their selection must be done according to the warping derived. This way we end up with a set of weighting functions which are very similar to commonly used Mel filter bank (figure 4d). 4 Limitations of the method and conclusions Our experience shows that recognizers based on feature extraction inspired by psychoacoustic findings about nonuniform human resolution in frequencies can perform better than those based on pure short term Fourier spectrum. This work can be seen as another proof of the fact that incorporation of those psychoacoustic findings into feature extraction leads to better separability among phonemes in low dimensional feature space and also to better recognition performance. However, the LDA technique expects that data which belong to individual classes have the same Gaussian distribution and that also mean values of classes obey a Gaussian distribution. Of course this is not true for magnitude spectra of speech. The quest for optimal filter bank for speech recognition is therefore still open.
6 .3 a) Every 5 th weighting function.3 b) Bandwidth of weighting funtions warped spectrum (129 ~ 4kHz) c) Estimated warping of spectrum d) Derived filter bank Fig. 4. Filter bank and warping derived using LDA References 1. B. Gold and N. Morgan. Speech and Audio Signal Processing, New York, S. B. Davis and P. Mermelstein. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences IEEE Trans. on Acoustics, Speech & Signal Processing, vol. 28, no. 4, pp , M. J. Hunt. A statistical approach to metrics for word and syllable recognition J. Acoust Soc. Am., vol. 66(S1), S35(A), N. Malayath. Data-Driven Methods for Extracting Features from Speech Ph.D. thesis, Oregon Graduate Institute, Portland, USA, H. Hermansky and N. Malayath. Spectral Basis Functions from Discriminant Analysis in Proceedings ICSLP 98, Sydney, Australia, November L. Rabiner and B. H. Juang. Fundamentals of speech recognition Signal Processing. Prentice Hall, Engelwood cliffs, NJ, S. Young. The HTK Book Entropics Ltd. 1999
MUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationMatlab GUI for WFB spectral analysis
Matlab GUI for WFB spectral analysis Jan Nováček Department of Radio Engineering K13137, CTU FEE Prague Abstract In the case of the sound signals analysis we usually use logarithmic scale on the frequency
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationT-61.184. Automatic Speech Recognition: From Theory to Practice
Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 27, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University
More informationBy choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
More information230622 - DSAP - Digital Speech and Audio Processing
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 230 - ETSETB - Barcelona School of Telecommunications Engineering 739 - TSC - Department of Signal Theory and Communications
More informationA Sound Analysis and Synthesis System for Generating an Instrumental Piri Song
, pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationMusic Genre Classification
Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More informationANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1
WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's
More informationObjective Speech Quality Measures for Internet Telephony
Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice
More informationMFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming
International Journal of Science and Research (IJSR) MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming Sandeep Joshi1, Sneha Nagar2 1 PG Student, Embedded Systems, Oriental
More informationAn Experimental Study of the Performance of Histogram Equalization for Image Enhancement
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 216 E-ISSN: 2347-2693 An Experimental Study of the Performance of Histogram Equalization
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
More informationThe Calculation of G rms
The Calculation of G rms QualMark Corp. Neill Doertenbach The metric of G rms is typically used to specify and compare the energy in repetitive shock vibration systems. However, the method of arriving
More informationShort-time FFT, Multi-taper analysis & Filtering in SPM12
Short-time FFT, Multi-taper analysis & Filtering in SPM12 Computational Psychiatry Seminar, FS 2015 Daniel Renz, Translational Neuromodeling Unit, ETHZ & UZH 20.03.2015 Overview Refresher Short-time Fourier
More informationTHE GOAL of this work is to learn discriminative components
68 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 1, JANUARY 2005 Discriminative Components of Data Jaakko Peltonen and Samuel Kaski, Senior Member, IEEE Abstract A simple probabilistic model is introduced
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationMODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC
12th International Society for Music Information Retrieval Conference (ISMIR 2011) MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC Yonatan Vaizman Edmond & Lily Safra Center for Brain Sciences,
More informationAnalysis/resynthesis with the short time Fourier transform
Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis
More informationWorkshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking
Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various
More informationUnderstanding CIC Compensation Filters
Understanding CIC Compensation Filters April 2007, ver. 1.0 Application Note 455 Introduction f The cascaded integrator-comb (CIC) filter is a class of hardware-efficient linear phase finite impulse response
More informationSolving Systems of Linear Equations Using Matrices
Solving Systems of Linear Equations Using Matrices What is a Matrix? A matrix is a compact grid or array of numbers. It can be created from a system of equations and used to solve the system of equations.
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationConvention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany
Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,
More informationTimbre. Chapter 9. 9.1 Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli
Chapter 9 Timbre Hanna Järveläinen Giovanni De Poli 9.1 Definition of timbre modelling Giving a definition of timbre modelling is a complicated task. The meaning of the term "timbre" in itself is somewhat
More informationSpeech Analysis for Automatic Speech Recognition
Speech Analysis for Automatic Speech Recognition Noelia Alcaraz Meseguer Master of Science in Electronics Submission date: July 2009 Supervisor: Torbjørn Svendsen, IET Norwegian University of Science and
More information1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationSTUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION
STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION Adiel Ben-Shalom, Michael Werman School of Computer Science Hebrew University Jerusalem, Israel. {chopin,werman}@cs.huji.ac.il
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationLecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
More informationMarathi Interactive Voice Response System (IVRS) using MFCC and DTW
Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,
More informationSPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A
International Journal of Science, Engineering and Technology Research (IJSETR), Volume, Issue, January SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A N.Rama Tej Nehru, B P.Sunitha
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationEnhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm
1 Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm Hani Mehrpouyan, Student Member, IEEE, Department of Electrical and Computer Engineering Queen s University, Kingston, Ontario,
More informationA Direct Numerical Method for Observability Analysis
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method
More informationHOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard
HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Victor Bisot, Slim Essid, Gaël Richard Institut Mines-Télécom, Télécom ParisTech, CNRS LTCI, 37-39 rue Dareau, 75014
More informationUniversity of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationSpectrum Level and Band Level
Spectrum Level and Band Level ntensity, ntensity Level, and ntensity Spectrum Level As a review, earlier we talked about the intensity of a sound wave. We related the intensity of a sound wave to the acoustic
More informationSpeech Recognition System for Cerebral Palsy
Speech Recognition System for Cerebral Palsy M. Hafidz M. J., S.A.R. Al-Haddad, Chee Kyun Ng Department of Computer & Communication Systems Engineering, Faculty of Engineering, Universiti Putra Malaysia,
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationB. Raghavendhar Reddy #1, E. Mahender *2
Speech to Text Conversion using Android Platform B. Raghavendhar Reddy #1, E. Mahender *2 #1 Department of Electronics Communication and Engineering Aurora s Technological and Research Institute Parvathapur,
More informationLab 1. The Fourier Transform
Lab 1. The Fourier Transform Introduction In the Communication Labs you will be given the opportunity to apply the theory learned in Communication Systems. Since this is your first time to work in the
More informationHow to Improve the Sound Quality of Your Microphone
An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,
More informationConvention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA
Audio Engineering Society Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA This Convention paper was selected based on a submitted abstract and 750-word precis that have
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More information10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
More informationAuto-Tuning Using Fourier Coefficients
Auto-Tuning Using Fourier Coefficients Math 56 Tom Whalen May 20, 2013 The Fourier transform is an integral part of signal processing of any kind. To be able to analyze an input signal as a superposition
More informationCROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES
Proceedings of the 2 nd Workshop of the EARSeL SIG on Land Use and Land Cover CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Sebastian Mader
More informationQuestion 2: How do you solve a matrix equation using the matrix inverse?
Question : How do you solve a matrix equation using the matrix inverse? In the previous question, we wrote systems of equations as a matrix equation AX B. In this format, the matrix A contains the coefficients
More informationAutomated Stellar Classification for Large Surveys with EKF and RBF Neural Networks
Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationPERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
More informationSpeech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,
More informationProbability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
More informationFigure1. Acoustic feedback in packet based video conferencing system
Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents
More informationThe Algorithms of Speech Recognition, Programming and Simulating in MATLAB
FACULTY OF ENGINEERING AND SUSTAINABLE DEVELOPMENT. The Algorithms of Speech Recognition, Programming and Simulating in MATLAB Tingxiao Yang January 2012 Bachelor s Thesis in Electronics Bachelor s Program
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationWhat Audio Engineers Should Know About Human Sound Perception. Part 2. Binaural Effects and Spatial Hearing
What Audio Engineers Should Know About Human Sound Perception Part 2. Binaural Effects and Spatial Hearing AES 112 th Convention, Munich AES 113 th Convention, Los Angeles Durand R. Begault Human Factors
More informationPractical Design of Filter Banks for Automatic Music Transcription
Practical Design of Filter Banks for Automatic Music Transcription Filipe C. da C. B. Diniz, Luiz W. P. Biscainho, and Sergio L. Netto Federal University of Rio de Janeiro PEE-COPPE & DEL-Poli, POBox 6854,
More information2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
More informationLecture 4: Jan 12, 2005
EE516 Computer Speech Processing Winter 2005 Lecture 4: Jan 12, 2005 Lecturer: Prof: J. Bilmes University of Washington Dept. of Electrical Engineering Scribe: Scott Philips
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationFrom Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
More informationUses of Derivative Spectroscopy
Uses of Derivative Spectroscopy Application Note UV-Visible Spectroscopy Anthony J. Owen Derivative spectroscopy uses first or higher derivatives of absorbance with respect to wavelength for qualitative
More information(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts
Communication in the Presence of Noise CLAUDE E. SHANNON, MEMBER, IRE Classic Paper A method is developed for representing any communication system geometrically. Messages and the corresponding signals
More informationSeparation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
More informationSupporting Information
S1 Supporting Information GFT NMR, a New Approach to Rapidly Obtain Precise High Dimensional NMR Spectral Information Seho Kim and Thomas Szyperski * Department of Chemistry, University at Buffalo, The
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More information