COMPARATIVE STUDY OF RECOGNITION TOOLS AS BACK-ENDS FOR BANGLA PHONEME RECOGNITION



Similar documents
Hardware Implementation of Probabilistic State Machine for Word Recognition

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Artificial Neural Network for Speech Recognition

School Class Monitoring System Based on Audio Signal Processing

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Ericsson T18s Voice Dialing Simulator

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Lecture 6. Artificial Neural Networks

L9: Cepstral analysis

Chapter 4: Artificial Neural Networks

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

Establishing the Uniqueness of the Human Voice for Security Applications

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

Emotion Detection from Speech

Iranian J Env Health Sci Eng, 2004, Vol.1, No.2, pp Application of Intelligent System for Water Treatment Plant Operation.

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

A Time Series ANN Approach for Weather Forecasting

Price Prediction of Share Market using Artificial Neural Network (ANN)

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Power Prediction Analysis using Artificial Neural Network in MS Excel

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

Recurrent Neural Networks

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

Speech Signal Processing: An Overview

Neural network software tool development: exploring programming language options

Gender Identification using MFCC for Telephone Applications A Comparative Study

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Neural Networks and Support Vector Machines

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Analecta Vol. 8, No. 2 ISSN

Use of Artificial Neural Network in Data Mining For Weather Forecasting

Stock Market Prediction System with Modular Neural Networks

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

An Introduction to Neural Networks

Role of Neural network in data mining

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Neural Networks and Back Propagation Algorithm

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

GLOVE-BASED GESTURE RECOGNITION SYSTEM

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Performance Comparison between Backpropagation Algorithms Applied to Intrusion Detection in Computer Network Systems

Comparison of K-means and Backpropagation Data Mining Algorithms

How To Use Neural Networks In Data Mining

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Design call center management system of e-commerce based on BP neural network and multifractal

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Self Organizing Maps: Fundamentals

A Review And Evaluations Of Shortest Path Algorithms

Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks

CONATION: English Command Input/Output System for Computers

Thirukkural - A Text-to-Speech Synthesis System

Data Mining using Artificial Neural Network Rules

EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION

Computational Neural Network for Global Stock Indexes Prediction

Credit Card Fraud Detection Using Self Organised Map

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Appendices. Appendix A: List of Publications

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Approach for Utility Pole Recognition in Real Conditions

The Backpropagation Algorithm

3 An Illustrative Example

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

ARTIFICIAL NEURAL NETWORKS FOR ADAPTIVE MANAGEMENT TRAFFIC LIGHT OBJECTS AT THE INTERSECTION

Biological Neurons and Neural Networks, Artificial Neurons

CLOUD FREE MOSAIC IMAGES

6.2.8 Neural networks for data mining

How To Get A Computer Engineering Degree

Bank Customers (Credit) Rating System Based On Expert System and ANN

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Technical Trading Rules as a Prior Knowledge to a Neural Networks Prediction System for the S&P 500 Index

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Application of Neural Network in User Authentication for Smart Home System

Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

Speech recognition for human computer interaction

Music Genre Classification

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Temporal Difference Learning in the Tetris Game

Adaptive Online Gradient Descent

Follow links Class Use and other Permissions. For more information, send to:

Intelligence Trading System for Thai Stock Index Based on Fuzzy Logic and Neurofuzzy System

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

New Pulse Width Modulation Technique for Three Phase Induction Motor Drive Umesha K L, Sri Harsha J, Capt. L. Sanjeev Kumar

Enterprise Organization and Communication Network

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Transcription:

ITERATIOAL JOURAL OF RESEARCH I COMPUTER APPLICATIOS AD ROBOTICS ISS 2320-7345 COMPARATIVE STUDY OF RECOGITIO TOOLS AS BACK-EDS FOR BAGLA PHOEME RECOGITIO Kazi Kamal Hossain 1, Md. Jahangir Hossain 2, Arifa ferdousi 3 and Md. Farukuzzaman Khan 4 1 Assistant professor, SIBIT, Jessore, Bangladesh, Email: kazikamal77@gmail.com 2 Assistant professor, Pakshi Railway Degree College, Pabna, Bangladesh, Email: jahangirbd18@gmail.com 3 Lecturer, CSE department, Barindra University, Rajshahi, Bangladesh, Email: arifaferdousi@yahoo.com 4 Professor, CSE department, Islamic University, Kushtia-7003, Bangladesh, Email: mfkhanbd2@gmail.com Abstract This paper deals with the development of a speech recognition system and comparative study of recognition results for Bangla phonemes. At first, Phonemes were recorded and converted into digital form. Then MFCC features from phonemes were extracted by Mel scale cepstral analysis. The recognition tools include Hamming and Euclidean distance measurement and learning through a neural network. Ten Bangla phonemes were used to test the system. The performance of the system shows that Euclidean distance measurement is the simplest and better method in recognizing Bangla phonemes. Keywords: recognition, Bangla phonemes, MFCC, eural network, Hamming and Euclidean distance 1. Introduction recognition is a formidable problem; many approaches have been tried, with only mild success. This is an active area of DSP research, and will undoubtedly remain so far many years to come. The human vocal cord vibrates with a frequency ranging from 50Hz to 1000Hz during speech production. This is the fundamental frequency and speaker s sex, age, stress, emotion etc. cause the variation. Furthermore, the vocal tract which functions as resonator during speech production is about 17 cm for adult and this length is also variable depending on the speaker [1]. This variation in vocal tract causes variation in resonance frequencies. These problems contribute a major part in complexities in speech recognition. So the first target is assigned for a speech researcher is to find features within a phoneme, which will be almost stable against all the variability. And off course, final requirement is to select a better method, which can recognize phonemes correctly using these features. During a few decades researchers are continuing search for better methods and techniques that might bring us closer to accurate recognition. In this search for better method, two distance measurement techniques (Hamming and Euclidean) and neural network are used in our study. 2. Distance Measurement Techniques Several distance measurement techniques are used in pattern recognition. Two of them are discussed below. The most basic measure and one that is widely used because of its simplicity, is the Hamming Distance measure. For two vectors X x, x,... x ) Y ( 1 2 ( y1, y2,... x ) M d. F a r u k u z z a m a n K h a n e t a l Page 36

the Hamming distance is found by summing the absolute difference between the corresponding components as H x i y i i0 Another most common methods used is the Euclidean Distance measure. For the two vectors X and Y as defined above, the shortest distance is the Euclidean Distance that is defined by: d( X, Y) euc X i Yi i1 where is the dimensionality of the vectors. 3. Artificial eural etwork 2 Artificial neural networks, specially the multi-layer feedforward neural networks have long been used in the field of pattern learning for their competence in learning. The structure of such network is shown in figure-1. It consists of one input layer, one output layer and one hidden layer between them. In our study, the input layer contents the source nodes that provide physical access points to the input speech. The hidden and output layers consist of computation nodes. The input neurons simply distribute the speech data along multiple paths to the hidden layer neurons. A weight is associated with each connection to hidden neurons. The part of speech presented to the hidden layer neuron due to one single connection is the product of output value of input node and the connected weight. Among the learning algorithms for this feedforward network, the best known is the Error Backpropagation (BP) algorithm [2]. The BP algorithm is an iterative gradient algorithm to minimize the Mean Squire Error (MSE) between the actual output and target output as given below: 1. Initialize weights and thresholds Set all weights and thresholds to small random values. 2. Present input and desired output Present input as x i and target output as t i. 3. Calculate actual output Using sigmoid function of the from f ( x) 1 1 exp( x i ), each node calculates 1 y j n 1 1 exp wij yi i1 Where y i is the output of the node in consideration [3, 4], y i 1 is the output of the nodes of the previous layer - x i in case of first layer, w ij is the weight for i-j connection j is the thresholds for the node in consideration determines the slop of the function at point and is called the activation gain. In our program, two values are used for it, one is spr. and other is spread. 4. Adapt weights and thresholds Start from the output layer and work backwards. The weight correction for the hidden-to-output weight matrix elements [5,6] w 2 ( t y )( 1 y ) y jk k k k k w ( t 1) w ( t) w jk jk jk And the weight correction for the input-to-hidden weight matrix elements [5,6] w 1 ( 1 y ) y ( t y ) w ij k k k k jk k w ( t 1) w ( t) w ij ij ij Similarly, correction for thresholds for the hidden-to-output th 1 ( t y )( 1 y ) y j k k k k th ( t 1) th ( t) th k k j And correction for thresholds for the input-to-hidden M d. F a r u k u z z a m a n K h a n e t a l Page 37

th 1 ( 1 y ) y ( t y ) w j j j k k jk k th ( t 1) th ( t) th j j j Where 1 and 2 are small proportionality constant known as learning rate for input to hidden layer and hidden to output layer respectively. The value of is spread as before. t k denotes target output Figure-1: The structure of multi-layer feedforward neural network. 4. Computation of Mel Frequency Cepstrum Coefficients (MFCC) The sequential steps to compute MFCC features from phonemes are shown in figure-2. After preprocessing (Windowing and Preemphasis), the system calculates the DFT using most efficient FFT algorithm. A set of critical band filters evenly spaced along the mel-scale smoothes and averages the FFTed signal into a smaller number of coefficients. Taking the log of each coefficient will force the signal to be minimum phase. The discrete cosine transform can then be used to derive the mel frequency cepstral coefficients (MFCC). Signal (Phonemes) Preemphasis and windowing DFT (FFT) Mel frequency warping Log 10. DFT -1 / DCT MFCC Fig.-2: The sequence of operations to convert a phoneme into a set of MFCC features. 5. The Developed Software System A software system was developed to implement the algorithms discussed above. The programs are written in C language and compiled with Turbo C++ compiler. The block diagram approach of the program is shown in figure-3. Some of the modules of the software were prepared as a moderated version of that in [5] and [7]. M d. F a r u k u z z a m a n K h a n e t a l Page 38

6. Experiment with Recognition Methods In our experiment with Bangla phonemes, six vowels (/অ/, /আv/, /ই/, /উ/, /এ/ and /ও/) and four consonants (/ক/, /ট/, /ম/ and /শ/) were taken as input to the system. Three hundred utterances of ten phonemes were included and the utterances were recorded in three different time, one week later from the previous one. The duration of each phoneme was 0.5 sec. During computation of MFCC, phoneme data was grouped in a set of samples, called a frame representing 16 msec of speech. A total of 248 feature data are collected from the computation of 8-MFCC per frame. Thus the number of input nodes in the Recognition Tools Hamming Distance Measurement Minimum Unknown Preprocessing Feature Extraction Euclidean Distance Measurement eural Test etwork Distance Calculatio n Decision Making Recognized eural Training etwork Reference Preprocessing Feature Extraction Reference Data Base Figure-3: The main blocks of the developed software Table-I: The target output pattern. Phonemes ode#0 ode#1 ode#2 ode#3 ode#4 ode#5 ode#6 ode#7 ode#8 ode#9 অ 1 0 0 0 0 0 0 0 0 0 আ 0 1 0 0 0 0 0 0 0 0 ই 0 0 1 0 0 0 0 0 0 0 উ 0 0 0 1 0 0 0 0 0 0 এ 0 0 0 0 1 0 0 0 0 0 ও 0 0 0 0 0 1 0 0 0 0 ক 0 0 0 0 0 0 1 0 0 0 ট 0 0 0 0 0 0 0 1 0 0 ম 0 0 0 0 0 0 0 0 1 0 শ 0 0 0 0 0 0 0 0 0 1 Phonemes Table-II: Results of Experiment with Recognition Methods Total o. of Recognized using Recognized using Euclidean Utterances Hamming Distance Distance Recognized using eural etwork অ 30 22 22 23 M d. F a r u k u z z a m a n K h a n e t a l Page 39

আ 30 30 30 30 ই 30 30 30 30 উ 30 30 30 30 এ 30 30 30 30 ও 30 30 30 30 ক 30 23 24 14 ট 30 27 27 29 ম 30 28 28 30 শ 30 30 30 30 Total 300 280 281 276 Percent - 93.33% 93.66% 92% neural network was 248 and 10 output nodes were taken to represent 10 phonemes. The target output pattern is given in table-i. et parameters used in this experiment were as below: Hidden Unit=20 spr.=0.25 spread=0.25 eta1=0.7 eta2=0.1 The results of this experiment are given in table-ii. As seen, all recognition methods perform nearly the same, eural etwork = 92%, Hamming = 93.33% and Euclidean=93.66%. 7. DISCUSSIO AD COCLUSIO Three methods were used in this recognition scheme. They are eural etwork, Hamming and Euclidean distance measurement. As discussed in section-2 and 3, the distance measurements were very simple in computation but neural networks are somewhat more complex. Comparative recognition results were shown in table-2. As seen in table-ii, the all methods produce almost the same result, but Euclidean distance produces slightly better result. Because of these and for the simplicity of calculation, Euclidean distance measurement may be concluded as the better method for Bangla phoneme recognition. 8. REFERECES [1] Jean-Claude Junqua & Jean-Paul Haton, Robustness in Automatic Recognition: Fundamentals and Applications, Kluwer Academic Publishers, Dordrecht, The etherlands, 1997. [2] Rumelhart, D.E., McClelland, J.L. and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, MIT press, Cambridge, MA, 1986. [3].K. Bose and P. Liang, eural etwork Fundamentals: Graphs, Algorithms and applications, Tata McGraw Hill, ew Delhi, 1996. [4] Stamatios V. Kartalopoulos, Understanding eural etworks and Fuzzy Logic: Basic concepts and Application, IEEE, ew Delhi, 1996. [5] Edward Rietman, Exploring Parellel Processing, Windcrest Books,USA, 1990. [6] Rakib Ahmed, M.Sc. Thesis: Pattern Recognition by eural etwork, Department of Applied Physics & Electronics, Rajshahi University, Bangladesh, 1994. [7] Aldebaro Barreto da Rocha Klautau Jr, Old Aldebaro s Home Page, http://speech.ucsd.edu/aldebaro/software.htm, UC San Diego La Jolla, CA 2002. M d. F a r u k u z z a m a n K h a n e t a l Page 40