Artificial Neural Network for Speech Recognition

Similar documents

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

L9: Cepstral analysis

Hardware Implementation of Probabilistic State Machine for Word Recognition

Establishing the Uniqueness of the Human Voice for Security Applications

School Class Monitoring System Based on Audio Signal Processing

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Developing an Isolated Word Recognition System in MATLAB

COMPARATIVE STUDY OF RECOGNITION TOOLS AS BACK-ENDS FOR BANGLA PHONEME RECOGNITION

Speech Signal Processing: An Overview

Ericsson T18s Voice Dialing Simulator

Speech recognition for human computer interaction

Myanmar Continuous Speech Recognition System Based on DTW and HMM

A simple application of Artificial Neural Network to cloud classification

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Chapter 4: Artificial Neural Networks

Gender Identification using MFCC for Telephone Applications A Comparative Study

An Introduction to Neural Networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Analecta Vol. 8, No. 2 ISSN

Novelty Detection in image recognition using IRF Neural Networks properties

The Scientific Data Mining Process

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Emotion Detection from Speech

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Using artificial intelligence for data reduction in mechanical engineering

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco

Neural Networks and Support Vector Machines

Neural Network based Vehicle Classification for Intelligent Traffic Control

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Lecture 6. Artificial Neural Networks

GLOVE-BASED GESTURE RECOGNITION SYSTEM

Tennis Winner Prediction based on Time-Series History with Neural Modeling

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

Comparison Between Multilayer Feedforward Neural Networks and a Radial Basis Function Network to Detect and Locate Leaks in Pipelines Transporting Gas

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

Accurate and robust image superresolution by neural processing of local image representations

Data Mining mit der JMSL Numerical Library for Java Applications

NRZ Bandwidth - HF Cutoff vs. SNR

Neural Network Add-in

Back Propagation Neural Networks User Manual

Programming Exercise 3: Multi-class Classification and Neural Networks

Lecture 1-10: Spectrograms

Knowledge Discovery from patents using KMX Text Analytics

Data Acquisition Using NI-DAQmx

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

Design call center management system of e-commerce based on BP neural network and multifractal

Neural network software tool development: exploring programming language options

Stock Market Prediction System with Modular Neural Networks

Self Organizing Maps: Fundamentals

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Signal to Noise Instrumental Excel Assignment

K-Means Clustering Tutorial

Monotonicity Hints. Abstract

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

DSAP - Digital Speech and Audio Processing

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW

Speech Analysis for Automatic Speech Recognition

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

RESEARCH ON SPOKEN LANGUAGE PROCESSING Progress Report No. 29 (2008) Indiana University

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Machine Learning and Pattern Recognition Logistic Regression

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Neural Networks in Data Mining

3 An Illustrative Example

Credit Card Fraud Detection Using Self Organised Map

Blood Vessel Classification into Arteries and Veins in Retinal Images

1. Classification problems

Simplified Machine Learning for CUDA. Umar

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction

High Quality Integrated Data Reconstruction for Medical Applications

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification

ELLIOTT WAVES RECOGNITION VIA NEURAL NETWORKS

Feature Subset Selection in Spam Detection

Data quality in Accounting Information Systems

B. Raghavendhar Reddy #1, E. Mahender *2

/SOLUTIONS/ where a, b, c and d are positive constants. Study the stability of the equilibria of this system based on linearization.

Predict Influencers in the Social Network

Recurrent Neural Networks

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Question 2: How do you solve a matrix equation using the matrix inverse?

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Transcription:

Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase

Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken digits one, two, three, etc Choosing a speech representation scheme Training Perceptron Results 2

Representing Speech Problem Recording samples never produce identical waveforms Length Amplitude Background noise Sample rate However, perceptual information relative to speech remains consistent Solution Extract speech-related information See: Spectrogram 3

Representing Speech one one Waveform Spectrogram 4

Spectrogram Shows change in amplitude spectra over time Three dimensions X Axis: Time Y Axis: Frequency Z axis: Color intensity represents magnitude one 5

Mel Frequency Cepstrum Coefficients Spectrogram provides a good visual representation of speech but still varies significantly between samples A cepstral analysis is a popular method for feature extraction in speech recognition applications, and can be accomplished using Mel Frequency Cepstrum Coefficient analysis (MFCC) 6

Mel Frequency Cepstrum Coefficients Inverse Fourier transform of the log of the Fourier transform of a signal using the Mel Scale filterbank mfcc function returns vectors of 13 dimensions 7

Network Architecture Input layer 26 Cepstral Coefficients Hidden Layer 100 fully-connected hidden-layer units Weight range between -1 +1 Output Initially random Remain constant 1 output unit for each target Limited to values between 0 and +1 8

Sample Training Stimuli (Spectrograms) one two three 9

Training the network Spoken digits were recorded Seven samples of each digit One through eight recorded Total of 56 different recordings with varying lengths and environmental conditions Background noise was removed from each sample 10

Training the network Calculate MFCC using Malcolm Slaney s Auditory Toolbox c=mfcc(s,fs,fix((3*fs)/(length(s)-256))) Limits frame rate such that mfcc always produces a matrix of two vectors corresponding to the coefficients of the two halves of the sample Convert 13x2 matrix to 26 dimensional column vector c=c(:) 11

Training the network Supervised learning Choose intended target and create a target vector 56 dimensional target vector If training the network to recognize spoken one, target has a value of +1 for each of the known one stimuli and 0 for everything else 12

Training the network Train a multilayer perceptron with feature vectors (simplified) Select stimuli at random Calculate response to stimuli Calculate error Update weights Repeat In a finite amount of time, the perceptron will successfully learn to distinguish between stimuli of an intended target and not. 13

Training the network Calculate response to stimuli Calculate hidden layer h=sigmoid(w*s+bias) Calculate response o=sigmoid(v*h+bias) Sigmoid transfer function Maps values between 0 and +1 sigmoid(x)=1/(1+e -x ) 14

Training the network Calculate error For a given stimuli, error is the difference between target and response t-o t will be either 0 or 1 o will be between 0 and +1 15

Training the network Update weights v=v previous +γ(t-o)h T v is weight vector between hidden-layer units and output γ (gamma) is learning rate 16

Results Target = one Learning rate: +1 Bias: -1 100 hidden-layer units 3000 iterations 316 seconds to learn target 17

Results Response to unseen stimuli Stimuli produced by same voice used to train network with noise removed Network was tested against eight unseen stimuli corresponding to eight spoken digits Returned 1 (full activation) for one and zero for all other stimuli. Results were consistent across targets i.e. when trained to recognize two, three, etc sigmoid(v*sigmoid(w*t1+bias)+bias) == 1 18

Results Response to noisy sample Network returned a low, but response > 0 to a sample without noise removed Response to foreign speaker Network responded with mixed results when presented samples from speakers different from training stimuli In all cases, error rate decreased and accuracy improved with more learning iterations 19

References Jurafsky, Daniel and Martin, James H. (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st ed.). Prentice Hall Golden, Richard M. (1996) Mathematical Methods for Neural Network Analysis and Design (1st ed.). MIT Press Anderson, James A. (1995) An Introduction to Neural Networks (1st ed.). MIT Press Hosom, John-Paul, Cole, Ron, Fanty, Mark, Schalkwyk, Joham, Yan, Yonghong, Wei, Wei (1999, February 2). Training Neural Networks for Speech Recognition Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, http://speech.bme.ogi.edu/tutordemos/nnet_training/tutorial.html Slaney, Malcolm Auditory Toolbox Interval Research Corporation 20