The ZCPA Based on the Gammachirp Filter Bank Used for Speaker Independent Recognition

Similar documents
An Introduction to Omega

SUPPORT VECTOR MACHINE FOR BANDWIDTH ANALYSIS OF SLOTTED MICROSTRIP ANTENNA

YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH

Multiband Microstrip Patch Antenna for Microwave Applications

Epdf Sulf petroleum, Eflecti and Eeflecti

The transport performance evaluation system building of logistics enterprises

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS

An Epidemic Model of Mobile Phone Virus

Software Engineering and Development

The Detection of Obstacles Using Features by the Horizon View Camera

Magnetic Bearing with Radial Magnetized Permanent Magnets

Research on Risk Assessment of the Transformer Based on Life Cycle Cost

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

Pessu Behavior Analysis for Autologous Fluidations

Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor

Financing Terms in the EOQ Model

Manual ultrasonic inspection of thin metal welds

Timing Synchronization in High Mobility OFDM Systems

Channel selection in e-commerce age: A strategic analysis of co-op advertising models

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses,

Concept and Experiences on using a Wiki-based System for Software-related Seminar Papers

Modal Characteristics study of CEM-1 Single-Layer Printed Circuit Board Using Experimental Modal Analysis

MATHEMATICAL SIMULATION OF MASS SPECTRUM

The Binomial Distribution

INVESTIGATION OF FLOW INSIDE AN AXIAL-FLOW PUMP OF GV IMP TYPE

STABILITY ANALYSIS IN MILLING BASED ON OPERATIONAL MODAL DATA 1. INTRODUCTION

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods

Comparing Availability of Various Rack Power Redundancy Configurations

Explicit, analytical solution of scaling quantum graphs. Abstract

Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN*

Peer-to-Peer File Sharing Game using Correlated Equilibrium

Episode 401: Newton s law of universal gravitation

Instituto Superior Técnico Av. Rovisco Pais, Lisboa virginia.infante@ist.utl.pt

Continuous Compounding and Annualization

Strength Analysis and Optimization Design about the key parts of the Robot

An Approach to Optimized Resource Allocation for Cloud Simulation Platform

High Availability Replication Strategy for Deduplication Storage System

A Study on the Conducted Interference of Capacitor Charging Power Supply

12.1. FÖRSTER RESONANCE ENERGY TRANSFER

Skills Needed for Success in Calculus 1

Spirotechnics! September 7, Amanda Zeringue, Michael Spannuth and Amanda Zeringue Dierential Geometry Project

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

Problem Set # 9 Solutions

Chapter 3 Savings, Present Value and Ricardian Equivalence

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

Do Vibrations Make Sound?

Advanced Control of Active Filters. in a Battery Charger Application. Martin Bojrup

Comparing Availability of Various Rack Power Redundancy Configurations

Power Monitoring and Control for Electric Home Appliances Based on Power Line Communication

Converting knowledge Into Practice

Data Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation

Effect of Contention Window on the Performance of IEEE WLANs

An Analysis of Manufacturer Benefits under Vendor Managed Systems

Mining Relatedness Graphs for Data Integration

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING

Evidence for the exponential distribution of income in the USA

Tracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors

The future challenges of Healthcare

CONCEPTUAL FRAMEWORK FOR DEVELOPING AND VERIFICATION OF ATTRIBUTION MODELS. ARITHMETIC ATTRIBUTION MODELS

Carter-Penrose diagrams and black holes

A Two-Step Tabu Search Heuristic for Multi-Period Multi-Site Assignment Problem with Joint Requirement of Multiple Resource Types

Supplementary Material for EpiDiff

A Glossary Of Complex Envelope Vectoization And Its Working Principle

Database Management Systems

Secure Smartcard-Based Fingerprint Authentication

A Capacitated Commodity Trading Model with Market Power

Distributed Computing and Big Data: Hadoop and MapReduce

VISCOSITY OF BIO-DIESEL FUELS

DOCTORAL DEGREE PROGRAM

Predictive Control of Permanent Magnet Synchronous Motor Based on Optimization Model Algorithmic Control

Automatic Closed Caption Detection and Filtering in MPEG Videos for Video Structuring

who supply the system vectors for their JVM products. 1 HBench:Java will work best with support from JVM vendors

SELF-INDUCTANCE AND INDUCTORS

DIFFERENT TYPES OF HUMAN HEAD SHAPES FOR CELLULAR PHONE EXPOSURE ON ELECTROMAGNETIC ABSORPTION

Electricity transmission network optimization model of supply and demand the case in Taiwan electricity transmission system

The Supply of Loanable Funds: A Comment on the Misconception and Its Implications

Referral service and customer incentive in online retail supply Chain

TECHNICAL DATA. JIS (Japanese Industrial Standard) Screw Thread. Specifications

AN INTEGRATED MOBILE MAPPING SYSTEM FOR DATA ACQUISITION AND AUTOMATED ASSET EXTRACTION

Gauss Law. Physics 231 Lecture 2-1

Vector Calculus: Are you ready? Vectors in 2D and 3D Space: Review

Real Time Tracking of High Speed Movements in the Context of a Table Tennis Application

Cloud Service Reliability: Modeling and Analysis

Top K Nearest Keyword Search on Large Graphs

NUCLEAR MAGNETIC RESONANCE

Seshadri constants and surfaces of minimal degree

THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION

Analyzing Ballistic Missile Defense System Effectiveness Based on Functional Dependency Network Analysis

The Role of Gravity in Orbital Motion

An Infrastructure Cost Evaluation of Single- and Multi-Access Networks with Heterogeneous Traffic Density

MULTIPLE SOLUTIONS OF THE PRESCRIBED MEAN CURVATURE EQUATION

Abstract. 2. Mathematical Formula To Calculate The Dimensions Of Microstrip Patch [Msp] Antenna. 1. Introduction

Approximation Algorithms for Data Management in Networks

The impact of migration on the provision. of UK public services (SRG ) Final Report. December 2011

A Hybrid DCT-SVD Video Compression Technique (HDCTSVD)

Physics HSC Course Stage 6. Space. Part 1: Earth s gravitational field

Application of the VISEVA demand generation software to Berlin using publicly available behavioral data

Alignment of Buckingham Parameters to Generalized Lennard-Jones Potential Functions

Transcription:

The ZCPA Based on the Gammachip Filte Bank Used fo Speake Independent Recognition X. Zhang, X. Liu, L. Huang, and Z. Wang College of Infomation Engineeing, Taiyuan Univesity of Technology, Taiyuan, Shanxi, China Abstact - This pape pesents the method of speech featue extaction based on the gammachip filte bank that was used to extact Zeo Cossing Peak Amplitude (ZCPA) featue, which woked as the input of the adial basis function (RBF) netwok. The gammachip filte was implemented though a combination of a gammatone filte and an IIR asymmetic filte, which lagely educed the computational cost compaed with the FIR method. The expeiments wee caied on the Koean isolate wods in the speake-independent ecognition system. The esults show that in case of taking no account of sound intensity, chip factos can lead to diffeent ecognition esults. When the chip facto is 4, the esult is bette than that in othe conditions. Keywods: Gammachip filte; featue extaction; Zeo Cossing Peak Amplitude (ZCPA); speech ecognition 1 Intoduction The speech is the acoustic manifestation of linguistic infomation. Aticulatoy phonetics, acoustic phonetics, and auditoy phonetics ae thee banches in moden linguistics [1]. The aticulatoy phonetics mainly studies the speech poduction mechanism. The acoustic phonetics focuses on analyzing speech by using acoustic methods. While the auditoy phonetics is doing the eseach about physiological popeties of speech peception, namely the acquiement of voice in human heaing pocess, the compehension of speech in ou bain, the stoage and compaison of speech infomation in bain. Theefoe, auditoy phonetics could be implemented by using a speech ecognition system. The peception of speech in cochlea is always one of the hot topics in the auditoy system eseach. The cochlea is geneally taken as a set of band pass filte bank, whee each band has shap selectivity. This means that thee is a fequency coesponding to evey place in membane. When the pue tone signal of this eigen fequency simulates cochlea, the coesponding place in membane will each the cest. The common auditoy filtes include the esonant filte [2][3], the oex function filte[4], the gammatone filte [5], and the gammachip filte [6]. The esonant filte is based on the chaacteistic of fequency selectivity of the basement membane. The filte fully consides the shap fequency selectivity, but ignoes the chaacteistic of active feedback and nonlineaity of basement membane. The oex filte was fistly used in masking expeiment, which was used to fit human ea in identifying the specific signal fequency theshold in noisy envionment. The oex function cuve is slowly widened on the left side as the stimulus intensity inceased, which is in line with cochlea asymmetic and level-dependent chaacteistics. Howeve, on the ight side, the slope of the cuve is almost unchanged, which is diffeent fom the filte popeties of cochlea. The gammatone filte has simple paametes, lowe ode and simple timedomain function, but it cannot ealize the chaacteistic of asymmetic fequency esponse and level-dependent. The gammachip auditoy filte is an extension of the popula gammatone filte; it has an additional fequency-modulation tem to poduce an asymmetic amplitude spectum. In ecent yeas, the gammachip filte was successfully applied in many aeas. The filte was combined with wavelet packet in the audio coding [7], and in the fomant estimated [8] by Noueddine Ellouze. Lotfi Salhi applied the filte in the analysis of the speech signal by combining it with the wavelet tansfom [9]. The othe eseaches used the gammachip filte fo simulating basement membane of the human ea, and got good esults [10]. In addition, the gammachip filte was successfully used in speake ecognition as the fontend featue extaction filte [11]. In this pape, the definition and implementation of the gammachip filte ae biefly intoduced in the section 2 and 3. Section 4 pesents the expeiment esults, and section 5 gives the conclusion.

2 Definition The gammachip filte was deived by Iino and Patteson in 1997. The complex impulse esponse of the gammachip is given by the following: g t at berb f t j f t jc t j u t n1 c( ) exp( 2 ( ) )exp( 2 ln ) ( ) 1, t 0 ut () 0, t 0 ERB( f ) 24.7 0.108 f (1) whee the time t 0, a is the amplitude, n and b ae paametes defining the envelope of the gamma distibution, n 4, b 1.109, f is the asymmetical fequency, c is a paamete fo the fequency modulation o the chip ate, is the initial phase. The initial phase have limited impact on the powe specta, it is geneally taken as zeo. ln t is a natual logaithm of time, and ERB( f ) is the equivalent ectangula bandwidth of the auditoy filte at f.paamete c is chip facto, which is linea with sound level P s [12][13]. When c 0, the chip tem, cln t, vanishes and equation (1) epesents the complex impulse esponse of the gammatone filte that has the envelope of a gamma distibution function and its caie is a sinusoid at fequency f. Accodingly, the gammachip is an extension of the gammatone with a fequency modulation tem. The Fouie tansfom of the gammachip in Eq.(1) is deived as follows: G ( f) a( n jc) e 2 2 2 ( ) f f actan b b berb( f ) a( n jc) e j C n jc 2 berb( f) j2 ( f f) j b f f e n jc j Simplify (2), then we get the following: (2) G ( f ) a 1 1 C 2 2 n 2 ( ) 2 2 ( ) 2 jc jn b f f e b f f e j a a( n jc) e (3) whee the fist tem a is a constant. The second tem is known as the Fouie spectum of the gammatone, GT ( f ). The thid tem epesents an asymmetic function, HA( f ), which will be descibed in details in next subsection. If we nomalize the amplitude, the fequency esponse of the gammachip can be epesented as follows: c G ( f ) G ( f ) H ( f ) (4) C T A The amplitude spectum is: G ( f ) G ( f ) H ( f ) (5) C T A 3 Implementation As shown in Eq.(4), a gammachip filte can be implemented by cascading a gammatone filte and an asymmetic filte. Since efficient implementations of the gammatone ae aleady known [14,15], this section concentates on an appoximation filte fo the asymmetic function. The well-known IIR Buttewoth and Chebyshev filtes can not satisfy the filte. Consequently, a new asymmetic compensation filte HC ( f ) can be designed as follows: H ( f ) H ( z), z e A 4 C k 1 j 2 f / fs HC ( z) HCk ( z) (6) H jk 1 jk 1 (1 ke z )(1 ke z ) Ck ( z) jk 1 jk 1 (1 ke z )(1 ke z ) whee the paametes of Eq.(6) ae: exp{ k. p.2 berb( f ) / f } k 1 s 2 { f p. p. c. berb( f )}/ f k 1 k 0 2 s 2 { f p. p. c. berb( f )}/ f k 1 k 0 2 s p 2, p 1.35 0.19 c p 0 1 2 0.29 0.004 c (7) hee, s f is the sampling fequency. Figue 1 shows the fequency esponse of asymmetic compensation

filte in diffeent chip factos. Figue 2 shows the fequency esponse of the gammachip filte in c 4, f 1931.1.Hz (a) FIR Figue 1. The Fequency Response of Asymmetic Compensation Filte with Diffeent Chip Factos (b) c 4 Figue 2. The Fequency Response of Gammachip Filte in c 4, f 1931.1 Hz 4 Expeiment Taditional ZCPA system uses FIR filte to analog the chaacteistic of cochlea. In this pape, the gammachip filte bank was used in the pocess of featue extacting. Without consideing the leveldependent popety of gammachip filte, the ecognition esults wee obtained by changing the value of chip facto. Figue 3 shows the fequency esponse of the gammachip filte and FIR filte. Figue 3. (c) c 4 The Fequency Response of the Gammachip and FIR filte The expeiments wee caied on the Koean isolate wods in speake-independent ecognition system. The copus include 50, 40, 30, 20, 10 Koean wods made by 16 male speakes unde diffeent signal-to-noise ations (SNRs: 15dB, 20dB, 25dB, 30dB and clean). Each wod was spoken 3 times. The utteances wee sampled at 11025 Hz sampling ate with 16-bits esolution. 9 pesons wee used as taining set, and the est 7 as testing set. The ZCPA featue was the input of RBF netwok. Table I shows the esults of the FIR filte in diffeent SNRs.

TABLE I. THE RESULTS OF FIR FILTER IN DIFFERENT SNRS(%) wods 10 87.1 90.5 90.5 91.4 92.9 20 89.8 92.1 93.3 93.1 94.5 30 92.1 93.2 93.0 94.3 94.0 40 91.9 93.7 94.3 94.0 94.4 50 89.7 91.7 93.4 93.3 94.3 TABLE II. THE RESULTS OF GAMMACHIRP FILTER IN DIFFERENT CHIRP FACTORS AND SNR(%) wods (a) c 4 15 20 25 30 clea n 10 94.3 95.2 95.7 94.8 95.2 20 90.7 92.1 92.6 91.4 92.6 30 90.6 91.9 92.5 91.6 92.9 40 92.0 92.1 93.1 92.7 92.9 50 90.8 92.0 92.1 92.3 93.0 wods (b) c 0 15 20 25 30 clea n 10 84.3 89.0 89.0 88.1 91.0 20 88.1 90.5 92.2 91.9 93.7 30 88.3 89.6 92.1 91.9 93.7 40 87.7 89.6 92.1 91.9 93.6 50 85.4 88.8 91.0 92.1 93.8 wods (c) c 4 10 91.4 92.9 96.2 95.2 95.2 20 92.4 92.6 94.0 94.0 94.5 30 90.6 92.5 93.8 93.7 95.1 40 90.4 93.2 94.6 94.0 95.1 50 89.0 92.6 94.1 94.2 95.6

TABLE III. THE AVERAGE RESULTS OF FEATURES IN DIFFERENT SNR(%) Filte SNRs (db) c 4 91.7 92.7 93.2 92.6 93.3 c 0 85.4 88.8 91.0 92.1 93.8 c 4 90.8 92.8 94.5 94.2 95.1 FIR 90.1 92.2 93.0 93.2 94.0 By changing the value of the chip facto, diffeent gammachip filte bank was obtained. The filte banks wee used as the font-end filte fo extacting ZCPA which was taken as the input of RBF netwok. The esults of diffeent chip factos wee shown in Table II. Table III shows the aveage ecognition ates of diffeent filte in diffeent SNR. Fom the above esults, we can find the following facts. (1) Table II shows that the gammachip filte banks have the bette pefomance than gammatone filte bank fom 15dB to clean. 0.4% at least in 25dB 20 wods and moe than 10% in 15dB 10 wods. (2) Futhemoe, the gammachip filte in c 4 have bette esults compaed with the case of c 4. In addition to the case of 15dB the highest esult 94.3% occus in c 4, the maximum esults unde othe SNR is in c 4 condition. filte has the bette pefomance in the clean condition. It is also the same tend in 30dB, 25dB and 20dB conditions. In 15dB, the case of c 4 has bette pefomance. The explanation of the above esults is as follows. The cochlea is consideed to be band pass filte bank, and one of the popeties is that the fequency esponse of the single filte is asymmetical. While the fequency esponse of gammachip filte is asymmetical, the fequency esponse of FIR filte banks used in taditional ZCPA is symmetical about the cente fequency. The gammatone filte has the same eason. Howeve, the FIR filte is designed channel by channel, which povides moe accuacy in the design than that of gammatone filte. This explains why the FIR has the bette ecognition esults than that the gammatone filte. (3) In Table III, we can see that between gammachip filte in the case of c 4 and FIR filte, the fome TABLE IV. THE AVERAGE RESULTS OF DIFFERENT GAMMACHIRP FILTER BANKS IN DIFFERENT SNR(%) filte SNRs (%) c 4 91.7 92.7 93.2 92.6 93.3 c 2 87.0 89.7 92.6 94.1 94.3 c 2 83.0 86.8 89.6 90.5 92.6 c 4 90.8 92.8 94.5 94.2 95.1 In pape [18], the expeiments wee caied on the gammachip filte in the condition of c 2 and c 2. The esults show that the system has fine pefomance when c 2. Table IV displays the aveage esults fo the diffeent gammachip filte banks with diffeent SNRs. The popety of cochlea that is consideed as filte bank is that the slope of the cuve at low fequencies is moe flat than that at high fequencies. That is when the chip facto is negative, the filte has good pefomance. But fom the above data, we can see that the gammachip filte in c 4 woks bette than that in the othe chip factos, while the chip facto is 2, the filte has the pooest pefomance. When the chip factos ae -2 and -4, the filte didn t have excellent pefomance. In efeence [17], they had the same conclusion that the gammachip filte in positive chip facto gave satisfactoy esults.

5 Conclusions In this pape, the gammachip filte was used in the font-end ZCPA featue extacting pocess. The expeiment esults wee obtained unde the diffeent chip factos without consideing the level of the sound pessue. The esults show that the value of the chip facto has significant impact on the popety of the filte. Howeve, the chip facto is not the sole element that can influence the esults, and the numbe of channels and the sound intensity of the input signal can also have impact on the esults. Acknowledgment This poject was sponsoed by the Intenational Technology Coopeation Plan of Shanxi Povince (Gant No.2011081047), and the Retuned Oveseas Pesonnel Meit-funded Pojects of Shanxi Povince (Gant No. [2013] 68). 6 Refeences [1] Cheng qing Zong. Statistical natual language pocessing. Tsinghua Univesity Pess, pp.1-10, 2008. [2] Lyon RF, Mead C. An analog electonic cochlea. IEEE Tans. Acoustics, Speech, and Signal Pocessing, vol.36, No.7. p p.1119-1134, 1988. [3] Lyon RF. A computational model of filteing, detection, and compession in the cochlea. Poceedings of IEEE - ICASSP, 82:pp.1282-1285, 1982. [4] Patteson RD, Mooe BCJ. Auditoy filtes and excitation pattens as epesonations of fequency esolution..fequency selectivity in heaing, pp.123-177, 1986. [5] Johannesma P. The pe-esponse stimulus ensemble of neuons in the cochlea nucleus. IPO Symposium on Heaing Theoy, pp.58-69, 1972. [6] Iino T, Patteson RD. A time-domain, level-dependent auditoy filte: the gammachip. Acoust Soc Am, 101:pp.412-419, 1997. [7] Sama K, Kaïs O, Noueddine E. Realization of a psychoacoustic model fo MPEG 1 using Gammachip wavelete tansfom. Tukey: EUSIPCO, pp.120-123, 2005. [8] Kaïs O, Zied L, Noueddine E. Fomant estimation using Gammachip filtebank. In Euospeech, pp.2471-2474, 2001. [9] Lotfi S. Design and implementation of the cochlea filte model based on wavelet tansfom as pat of speech signals analysis..reseach Jounal of Applied Sciences, vol.2,no.4. pp. 512-521, 2007,. [10] Yan Luo, Shouguo Zhao. Simulation of the human ea basila membane filte..beijing Jiaotong Univesity, pp.34-47, 2009. [11] Yue Wang Zhihong Qin. Study on speech featue extaction algoithm in speake ecognition system. Jilin Univesity, pp.63-75, 2009. [12] Iino T, Unoki M. A time-vaying, analysis/synthesis auditoy filtebank using the Gammachip. IEEE Int. Conf. Acoust. Speech Signal Pocessing (ICASSP-98), pp.3653-3656, 1998. [13] Iino T, Unoki M. Analysis/Synthesis auditoy filtebank based on Gammachip. Computational Models of Auditoy Function S. Geenbeg and M. Slaney (Eds.) IOS Pess, pp.397-406, 1999. [14] Lixia Huang, Xueying Zhang Xueyan Liu. Diffeent channels in Gammatone filte bank based on ZCPA fo speakeindependent ecognition task. ACPIM. Intenational Asia Confeence on Optical Instument and Measuement, pp.74-77, 2010. [15] Immeseel LV, Peetes S. Digital implementation of linea gammatone filtes: compaison of design methods..acoustics Reseach Lettes Online (ARLO), vol.4,no.3, pp.59-64, 2003. [16] [ Kim DS, Lee SY, Kil RM. Auditoy pocessing of speech signal fo obust speech ecognition in eal-wold noisy envionments..ieee Tans. Speech and Audio Poc, vol.7,no.2, pp.55-69, 1999. [17] Khalil Abid, Kais Ouni, Nouedinne Ellouze.The effect chip tem in audio compession using a gammachip wavelet. Netwok Infastuctue and Digital Content, 2009. IC-NID. IEEE Intenational Confeence on. 2009, pp.774 778, 2009 [18] Xueyan Liu, Xueying Zhang, Lixia Huang. Gammachip filte bank applied in speech featue extaction.[ol]. Chinese scientific and technological papes online 2011-11-09. http://www.pape.edu.cn/index.php/default/eleasepape/cont ent/201111-158.