Data driven design of filter bank for speech recognition

Size: px
Start display at page:

Download "Data driven design of filter bank for speech recognition"

Transcription

1 Data driven design of filter bank for speech recognition Lukáš Burget 12 and Hynek Heřmanský 23 1 Oregon Graduate Institute, Anthropic Signal Processing Group, 2 NW Walker Rd., Beaverton, Oregon , USA, {hynek,lukas}@ece.ogi.edu 2 International Computer Science Institute, 1947 Center Street Suite 6, Berkeley, CA , USA, hynek@icsi.berkeley.edu 3 Brno Univ. of Technology, Inst. of Radioelectronics, Purkyňova 118, 612, Brno, Czech Republic burget@urel.fee.vutbr.cz Abstract. Filter bank approach is commonly used in feature extraction phase of speech recognition (e.g. Mel frequency cepstral coefficients). Filter bank is applied for modification of magnitude spectrum according to physiological and psychological findings. However, since mechanism of human auditory system is not fully understood, the optimal filter bank parameters are not known. This work presents a method where the filter bank, optimized for discriminability between phonemes, is derived directly from phonetically labeled speech data using Linear Discriminant Analysis. This work can be seen as another proof of the fact that incorporation of psychoacoustic findings into feature extraction can lead to better recognition performance. 1 Introduction Feature extraction is an important part of speech recognition process where input waveform is processed for the following pattern classification. While classification is usually based on stochastic approaches where models are trained on data, feature extraction is generally based on knowledge and beliefs. Current methods of feature extraction are mostly based on short term Fourier spectrum and its changes in the time. Auditory-like modifications inspired by physiological and psychological findings are performed on spectra of each speech frame in the sequence. Mel frequency cepstral coefficients [2] are commonly used as feature extraction method where energies in spectrum are integrated by a set of band limited triangular weighting functions (filter bank). These weighting functions are equidistantly distributed over mel scale according to psycho-acoustic findings where better resolution in spectrum is preserved for lower frequencies than for higher frequencies. The log of integrated spectral energies is taken (which corresponds to human perception of loudness) and finally a projection to cosine bases is performed. However, since mechanism of human auditory system is not fully

2 understood, the optimal system for feature extraction is not known. Moreover, psychoacoustic findings often describe limitations of human auditory system and we do not know if modeling of those limitations is useful for speech recognition. This work presents a method where the filter bank is derived directly from phonetically labeled speech data. We can obtain both, frequency warping and shape of individual weighting function of filter bank as result of this method. 2 Linear Discriminant Analysis The method is based on Linear Discriminant Analysis (LDA) proposed by Hunt [3]. LDA is a technique looking for such linear transform which allows dimension reduction of input data. However, it preserves information important for linear discrimination among input vectors which belong to different classes. The output of LDA is a set of linear independent vectors which are bases of a linear transform and which are sorted by their importance for discrimination among different classes. Since we have also information about importance of particular base vectors, we can pick up only several first basis which preserve almost all the variability in the data important for the discriminability. In other words, the resulting transformation matrix contains only several first columns of matrix obtained by LDA. y C 1 m 1 m 2 C 2 x z Fig. 1. Linear discriminant analysis for 2-dimensional data

3 The figure 1 demonstrates effect of LDA for 2-dimensional data vectors which belong to two classes. The grey and the empty ellipses represent distributions of data of two different classes C 1 and C 2 with mean vectors m 1 and m 2. The axes X and Y are coordinates of the original space. Large overlap of the class distributions can be seen in both directions of these original coordinates. The axis Z then shows the direction obtained by LDA. The classes are well separated after their projection into this direction. Since this example deals just with two classes and since LDA assumes that distributions of all classes are Gaussian with the same covariance matrix, no other direction can be obtained for better discrimination. Base vectors of LDA transforms are given by the eigen vectors of a matrix Σ 1 wc Σ ac. The within-class covariance matrix Σ wc represents unwanted variability in data and it is computed as the weighted mean of covariance matrices of classes: Σ wc = E[Σ p ] where Σ p is covariance matrix of particular class. The across-class covariance matrix Σ ac represents the wanted variability in data and it is computed as an estimation of covariance matrices for mean vectors of classes. Σ ac = E[(µ p µ)(µ p µ) T ] where µ p is mean vector for particular class and µ is global mean vector. An eigen value associated with one eigen vector represents the amount of variability (necessary for the discriminability) preserved by the projection of input vectors to this particular eigen vector. a dimension reduction. If LDA is to be used for dimension reduction, only several eigen vectors corresponding to the highest eigen values can be used. 3 Filter bank derived from data Filter bank is derived directly from phonetically labeled speech data using LDA described in previous section. In this case the magnitude Fourier spectra of all training data frames are directly used for computation of across-class and within-class covariance matrices. In our speech recognition task, we want to distinguish between different phonemes. Spectra representing speech frames labeled by the same phoneme belong to one class. Examples of across-class covariance and within-class covariance matrices derived this way from speech data from TIMIT database are shown in figure 2. Half of symmetric magnitude spectrum (129 points) was used as vectors for deriving these covariance matrices. The figure 3 shows first 5 LDA spectral bases given by the eigen vectors of the matrix Σ 1 wc Σ ac. The eigen values in figure 3a indicate that almost all variability in data important for class separability is preserved by the projection to only several first base vectors. The linear transform can be performed by the multiplication of an input vector and a matrix M, where columns are the base vectors. In our case, we choose only 13 first base vectors, so the transform matrix M has 129 rows and 13 columns.

4 Fig. 2. Across-class and within-class covariance matrix computed from magnitude spectrum a) Eigen values b) 1st Eigen vector c) 2nd Eigen vector.3 d) 3rd Eigen vector e) 4th Eigen vector.3 f) 5th Eigen vector Fig. 3. Basis derived using LDA from magnitude spectrum

5 3.1 Smoothing of speech spectra The projection of magnitude spectrum of one speech frame into these selected basis results in new vector (13 points) which should contain almost the same information for correct recognition as the original spectrum. Since the base vectors are linear independent, it is possible to obtain another transform which projects the reduced vector back into the original space - spectrum (129 points long). This transform is given by the pseudoinverse transform matrix M 1. We will obtain a final transform by joining (multiplying) both mentioned matrices M M 1. This transform projects the magnitude spectrum into its smoothed version where the information useless for discriminability among phonemes is removed. Each column of the final transformation matrix represents a weighting function for integrating band of frequencies around the point corresponding to the index of given column. Every 5-th of these weighting functions are shown in figure 4a. The resulting weighting functions for integration of lower frequencies are very narrow (integrating only several points of spectra and preserving more details), while functions integrating higher frequencies are much wider. This fact corresponds also with psychoacoustic findings about human frequency resolution. 3.2 Deriving of filter bank It is also possible to derive frequency warping by measuring and integrating bandwidths (widths) of consequent weighting functions (figures 4b and 4c). The smoothed spectrum can be represented by selecting only some of its samples without loosing any information. It means that we can pick up only several weighting functions and perform projection of original spectrum into them. Their selection must be done according to the warping derived. This way we end up with a set of weighting functions which are very similar to commonly used Mel filter bank (figure 4d). 4 Limitations of the method and conclusions Our experience shows that recognizers based on feature extraction inspired by psychoacoustic findings about nonuniform human resolution in frequencies can perform better than those based on pure short term Fourier spectrum. This work can be seen as another proof of the fact that incorporation of those psychoacoustic findings into feature extraction leads to better separability among phonemes in low dimensional feature space and also to better recognition performance. However, the LDA technique expects that data which belong to individual classes have the same Gaussian distribution and that also mean values of classes obey a Gaussian distribution. Of course this is not true for magnitude spectra of speech. The quest for optimal filter bank for speech recognition is therefore still open.

6 .3 a) Every 5 th weighting function.3 b) Bandwidth of weighting funtions warped spectrum (129 ~ 4kHz) c) Estimated warping of spectrum d) Derived filter bank Fig. 4. Filter bank and warping derived using LDA References 1. B. Gold and N. Morgan. Speech and Audio Signal Processing, New York, S. B. Davis and P. Mermelstein. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences IEEE Trans. on Acoustics, Speech & Signal Processing, vol. 28, no. 4, pp , M. J. Hunt. A statistical approach to metrics for word and syllable recognition J. Acoust Soc. Am., vol. 66(S1), S35(A), N. Malayath. Data-Driven Methods for Extracting Features from Speech Ph.D. thesis, Oregon Graduate Institute, Portland, USA, H. Hermansky and N. Malayath. Spectral Basis Functions from Discriminant Analysis in Proceedings ICSLP 98, Sydney, Australia, November L. Rabiner and B. H. Juang. Fundamentals of speech recognition Signal Processing. Prentice Hall, Engelwood cliffs, NJ, S. Young. The HTK Book Entropics Ltd. 1999

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Matlab GUI for WFB spectral analysis

Matlab GUI for WFB spectral analysis Matlab GUI for WFB spectral analysis Jan Nováček Department of Radio Engineering K13137, CTU FEE Prague Abstract In the case of the sound signals analysis we usually use logarithmic scale on the frequency

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

T-61.184. Automatic Speech Recognition: From Theory to Practice

T-61.184. Automatic Speech Recognition: From Theory to Practice Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 27, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University

More information

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

By choosing to view this document, you agree to all provisions of the copyright laws protecting it. This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

More information

230622 - DSAP - Digital Speech and Audio Processing

230622 - DSAP - Digital Speech and Audio Processing Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 230 - ETSETB - Barcelona School of Telecommunications Engineering 739 - TSC - Department of Signal Theory and Communications

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1 WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's

More information

Objective Speech Quality Measures for Internet Telephony

Objective Speech Quality Measures for Internet Telephony Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice

More information

MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming

MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming International Journal of Science and Research (IJSR) MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming Sandeep Joshi1, Sneha Nagar2 1 PG Student, Embedded Systems, Oriental

More information

An Experimental Study of the Performance of Histogram Equalization for Image Enhancement

An Experimental Study of the Performance of Histogram Equalization for Image Enhancement International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 216 E-ISSN: 2347-2693 An Experimental Study of the Performance of Histogram Equalization

More information

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX

More information

The Calculation of G rms

The Calculation of G rms The Calculation of G rms QualMark Corp. Neill Doertenbach The metric of G rms is typically used to specify and compare the energy in repetitive shock vibration systems. However, the method of arriving

More information

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Short-time FFT, Multi-taper analysis & Filtering in SPM12 Short-time FFT, Multi-taper analysis & Filtering in SPM12 Computational Psychiatry Seminar, FS 2015 Daniel Renz, Translational Neuromodeling Unit, ETHZ & UZH 20.03.2015 Overview Refresher Short-time Fourier

More information

THE GOAL of this work is to learn discriminative components

THE GOAL of this work is to learn discriminative components 68 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 1, JANUARY 2005 Discriminative Components of Data Jaakko Peltonen and Samuel Kaski, Senior Member, IEEE Abstract A simple probabilistic model is introduced

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC

MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC 12th International Society for Music Information Retrieval Conference (ISMIR 2011) MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC Yonatan Vaizman Edmond & Lily Safra Center for Brain Sciences,

More information

Analysis/resynthesis with the short time Fourier transform

Analysis/resynthesis with the short time Fourier transform Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis

More information

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various

More information

Understanding CIC Compensation Filters

Understanding CIC Compensation Filters Understanding CIC Compensation Filters April 2007, ver. 1.0 Application Note 455 Introduction f The cascaded integrator-comb (CIC) filter is a class of hardware-efficient linear phase finite impulse response

More information

Solving Systems of Linear Equations Using Matrices

Solving Systems of Linear Equations Using Matrices Solving Systems of Linear Equations Using Matrices What is a Matrix? A matrix is a compact grid or array of numbers. It can be created from a system of equations and used to solve the system of equations.

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Timbre. Chapter 9. 9.1 Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli

Timbre. Chapter 9. 9.1 Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli Chapter 9 Timbre Hanna Järveläinen Giovanni De Poli 9.1 Definition of timbre modelling Giving a definition of timbre modelling is a complicated task. The meaning of the term "timbre" in itself is somewhat

More information

Speech Analysis for Automatic Speech Recognition

Speech Analysis for Automatic Speech Recognition Speech Analysis for Automatic Speech Recognition Noelia Alcaraz Meseguer Master of Science in Electronics Submission date: July 2009 Supervisor: Torbjørn Svendsen, IET Norwegian University of Science and

More information

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification 1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION Adiel Ben-Shalom, Michael Werman School of Computer Science Hebrew University Jerusalem, Israel. {chopin,werman}@cs.huji.ac.il

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Lecture 1-10: Spectrograms

Lecture 1-10: Spectrograms Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed

More information

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,

More information

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A International Journal of Science, Engineering and Technology Research (IJSETR), Volume, Issue, January SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A N.Rama Tej Nehru, B P.Sunitha

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm 1 Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm Hani Mehrpouyan, Student Member, IEEE, Department of Electrical and Computer Engineering Queen s University, Kingston, Ontario,

More information

A Direct Numerical Method for Observability Analysis

A Direct Numerical Method for Observability Analysis IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method

More information

HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard

HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Victor Bisot, Slim Essid, Gaël Richard Institut Mines-Télécom, Télécom ParisTech, CNRS LTCI, 37-39 rue Dareau, 75014

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

Spectrum Level and Band Level

Spectrum Level and Band Level Spectrum Level and Band Level ntensity, ntensity Level, and ntensity Spectrum Level As a review, earlier we talked about the intensity of a sound wave. We related the intensity of a sound wave to the acoustic

More information

Speech Recognition System for Cerebral Palsy

Speech Recognition System for Cerebral Palsy Speech Recognition System for Cerebral Palsy M. Hafidz M. J., S.A.R. Al-Haddad, Chee Kyun Ng Department of Computer & Communication Systems Engineering, Faculty of Engineering, Universiti Putra Malaysia,

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

B. Raghavendhar Reddy #1, E. Mahender *2

B. Raghavendhar Reddy #1, E. Mahender *2 Speech to Text Conversion using Android Platform B. Raghavendhar Reddy #1, E. Mahender *2 #1 Department of Electronics Communication and Engineering Aurora s Technological and Research Institute Parvathapur,

More information

Lab 1. The Fourier Transform

Lab 1. The Fourier Transform Lab 1. The Fourier Transform Introduction In the Communication Labs you will be given the opportunity to apply the theory learned in Communication Systems. Since this is your first time to work in the

More information

How to Improve the Sound Quality of Your Microphone

How to Improve the Sound Quality of Your Microphone An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,

More information

Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA

Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA Audio Engineering Society Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA This Convention paper was selected based on a submitted abstract and 750-word precis that have

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method 578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after

More information

Auto-Tuning Using Fourier Coefficients

Auto-Tuning Using Fourier Coefficients Auto-Tuning Using Fourier Coefficients Math 56 Tom Whalen May 20, 2013 The Fourier transform is an integral part of signal processing of any kind. To be able to analyze an input signal as a superposition

More information

CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES

CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Proceedings of the 2 nd Workshop of the EARSeL SIG on Land Use and Land Cover CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Sebastian Mader

More information

Question 2: How do you solve a matrix equation using the matrix inverse?

Question 2: How do you solve a matrix equation using the matrix inverse? Question : How do you solve a matrix equation using the matrix inverse? In the previous question, we wrote systems of equations as a matrix equation AX B. In this format, the matrix A contains the coefficients

More information

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables. Generation of random variables (r.v.) Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

More information

Figure1. Acoustic feedback in packet based video conferencing system

Figure1. Acoustic feedback in packet based video conferencing system Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents

More information

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB FACULTY OF ENGINEERING AND SUSTAINABLE DEVELOPMENT. The Algorithms of Speech Recognition, Programming and Simulating in MATLAB Tingxiao Yang January 2012 Bachelor s Thesis in Electronics Bachelor s Program

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

What Audio Engineers Should Know About Human Sound Perception. Part 2. Binaural Effects and Spatial Hearing

What Audio Engineers Should Know About Human Sound Perception. Part 2. Binaural Effects and Spatial Hearing What Audio Engineers Should Know About Human Sound Perception Part 2. Binaural Effects and Spatial Hearing AES 112 th Convention, Munich AES 113 th Convention, Los Angeles Durand R. Begault Human Factors

More information

Practical Design of Filter Banks for Automatic Music Transcription

Practical Design of Filter Banks for Automatic Music Transcription Practical Design of Filter Banks for Automatic Music Transcription Filipe C. da C. B. Diniz, Luiz W. P. Biscainho, and Sergio L. Netto Federal University of Rio de Janeiro PEE-COPPE & DEL-Poli, POBox 6854,

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Lecture 4: Jan 12, 2005

Lecture 4: Jan 12, 2005 EE516 Computer Speech Processing Winter 2005 Lecture 4: Jan 12, 2005 Lecturer: Prof: J. Bilmes University of Washington Dept. of Electrical Engineering Scribe: Scott Philips

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

From Concept to Production in Secure Voice Communications

From Concept to Production in Secure Voice Communications From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure

More information

Uses of Derivative Spectroscopy

Uses of Derivative Spectroscopy Uses of Derivative Spectroscopy Application Note UV-Visible Spectroscopy Anthony J. Owen Derivative spectroscopy uses first or higher derivatives of absorbance with respect to wavelength for qualitative

More information

(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts

(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts Communication in the Presence of Noise CLAUDE E. SHANNON, MEMBER, IRE Classic Paper A method is developed for representing any communication system geometrically. Messages and the corresponding signals

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

Supporting Information

Supporting Information S1 Supporting Information GFT NMR, a New Approach to Rapidly Obtain Precise High Dimensional NMR Spectral Information Seho Kim and Thomas Szyperski * Department of Chemistry, University at Buffalo, The

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information