Journal of Computer Scence 3 (5): 274-280, 2007 ISSN 1549-3636 2007 Scence Publcatons Intellgent Voce-Based Door Access Control System Usng Adaptve-Network-based Fuzzy Inference Systems (ANFIS) for Buldng Securty Wahyud, Wnda Astut and Syazlawat Mohamed Intellgent Mechatroncs Systems Research Group, Department of Mechatroncs Engneerng Internatonal Islamc Unversty Malaysa, P.O. Box, 10, 50728, Kuala Lumpur Malaysa Abstract: Secure buldngs are currently protected from unauthorzed access by a varety of devces. Even though there are many knds of devces to guarantee the system safety such as PIN pads, keys both conventonal and electronc, dentty cards, cryptographc and dual control procedures, the people voce can also be used. The ablty to verfy the dentty of a speaker by analyzng speech, or speaker verfcaton, s an attractve and relatvely unobtrusve means of provdng securty for admsson nto an mportant or secured place. An ndvdual s voce cannot be stolen, lost, forgotten, guessed, or mpersonated wth accuracy. Due to these advantages, ths paper descrbes desgn and prototypng a voce-based door access control system for buldng securty. In the proposed system, the access may be authorzed smply by means of an enrolled user speakng nto a mcrophone attached to the system. The proposed system then wll decde whether to accept or reject the user s dentty clam or possbly to report nsuffcent confdence and request addtonal nput before makng the decson. Furthermore, ntellgent system approach s used to develop authorzed person models based on thers voce. Partcularly Adaptve-Network-based Fuzzy Inference Systems s used n the proposed system to dentfy the authorzed and unauthorzed people. Expermental result confrms the effectveness of the proposed ntellgent voce-based door access control system based on the false acceptance rate and false rejecton rate. Keywords: Access control, voce, ntellgent, artfcal neural network, securty INTRODUCTION The personal safety of the populaton n publc and prvate buldng has always been a concern n current daly lfe. Access control for buldngs represents an mportant tool for protectng both buldng occupants and the structure tself. One of the mportant securty systems n buldng securty s door access control. The door access control s a physcal securty that assures the securty of a room or buldng by means lmtng access to that room or buldng to specfc people and by keepng records of such accesses. The most wdespread authentcaton method for such system s based on smartcards. Smartcard lmts room or buldng access to only those people who hold an allocated smart card. However, there s dffculty to prevent another person from attanng and usng a legtmate person s card. The conventonal smartcard can be lost, duplcated, stolen, forgotten, or mpersonated wth accuracy. Due to the lmtatons of conventonal securty procedures, a range of bometrc verfcaton optons are currently under consderaton for securty system ncludng door access control[ 1,2]. In the bometrcs methods, the dea s to enable automatc verfcaton of dentty by computer assessment of one or more behavoral and/or physologcal characterstcs of a person. Recently bometrcs methods used for personal authentcaton utlze such features as the face, the voce, the hand shape, the fnger prnt and the rs [1,2]. Each method has ts own advantages and dsadvantages based on ther usablty and securty [3]. Among the bometrcs methods, voce has the hgh usablty characterstcs whch nclude the smplcty for the user, feelng of resstance, speed of authentcaton and level of false-rejecton rate [4]. In order to overcome the problems of the smartcard-based door access control, ths paper ntroduces an ntellgent voce-based door access control system for buldng securty. The proposed ntellgent voce-based access control system s a performance bometrc whch offers an ablty to provde postve verfcaton of dentty from an ndvdual s voce characterstcs to access secure locatons (e.g. offce, laboratory, home). In the Correspondng Author: Wahyud, Intellgent Mechatroncs Systems Research Group, Department of Mechatroncs Engneerng, Internatonal Islamc Unversty Malaysa, PO. Box, 10, 50728, Kuala Lumpur Malaysa 274
proposed system features are extracted from the person voce data and then an Adaptve-Network-based Fuzzy Inference Systems (ANFIS) s used to develop models of the authorzed persons based on the feature extracted from the authorzed person voces. Frst, the prototype of the door access control s descrbed. Next, the speaker verfcaton process used n the proposed system s dscussed n detal. Fnally, the performance of the proposed ntellgent voce-based door access control s evaluated expermentally for door control access n Intellgent Mechatroncs System Laboratory, Faculty of Engneerng, Internatonal Islamc Unversty Malaysa. and closng. The electromagnetc lock works on 12 volts DC power supply and t s set n normally close (NC) condton. Therefore, wthout command sgnal from the verfcaton system, the lock s always swtched on and the door remans closed. In the case a person s verfed by the proposed voce-based verfcaton as an authorzed user, the access s granted. The parallel port sends a sgnal to the electromagnetc lock drver, whch s shown n Fg. 3, so that the electromagnetc lock s demagnetzed. As a result the door can be opened by that authorzed person for a certan perod of tme. PROPOSED VOICE-BASED DOOR ACCESS CONTROL Proposed system descrpton: Fgure 1 shows the schematc dagram of the proposed ntellgent vocebased access control. The proposed system bascally conssts of three man components namely voce sensor, speaker verfcaton system and door access control. A low-cost mcrophone commonly used n the computer system s used as voce sensor to record the person voce. The recorded voce s then sent to the vocebased verfcaton system whch wll verfy the authentcty of the person based on hs/her voce. Fg. 2: Door wth electromagnetc lock Mcrophone Intellgent Voce-based Door Access Control Magnetc lock Voce-based Verfcaton A/D Port Drver System Fg. 1: Proposed ntellgent voce-based door access control system Fg. 3: Electromagnetc lock drver A personal computer (PC) of 1.5 MHz Pentum III The access control system n general makes four processor equpped wth sound card s used for speaker possble decsons; the authorzed person s accepted, verfcaton mplementaton. The sound card records the the authorzed person s rejected, the unauthorzed voce data based on the samplng frequency of 22 khz. In ths system, all of the voce data processng and person (mpostor) s accepted and the unauthorzed speaker verfcaton algorthms are mplemented n the person (mpostor) s rejected. The accuracy of the PC usng MATLAB and ts toolboxes. As a result access control system s then specfed based on the rate of the voce-based verfcaton, a decson sgnal whch n whch the system makes decson to reject the wll accept or reject the access wll be sent through the authorzed person and to accept the unauthorzed parallel port of the PC to the door access control. person. The quanttes to measure the rate of the access As shown n Fg. 2, an electromagnetc lock s control accuracy to reject the authorzed person s then attached n the door for controllng the door openng called as false rejecton rate (FRR) and that to measure 275
the rate of access control to accept the unauthorzed person s called to as false acceptance rate (FAR). Mathematcally, both rates are expressed as percentage usng the followng smple calculatons [5] : NFR FRR = x100% NFA (1) NFA FAR = x100% NIA (2) NFR and NFA are the numbers of false rejectons and false acceptance respectvely, whle NAA and NIA are the number of the authorzed person attempts and the numbers of mpostor person attempts. For achevng hgh securty of the door access control system, t s expected that the proposed system wll have both low FRR and low FAR. s requred to enter the clamed dentty and hs/her voce. Furthermore, the entered voce s processed and compared wth the clamed person model to verfy hs/her clam. In ths phase, there s a decson process n whch the system decdes whether the feature extracted from the gven voce matches wth the model of the clamed person. In order to gve a defnte answer of access acceptance or rejecton, a threshold s set. When degree of smlarty between a gven voce and the model s greater then threshold, the system wll accept the access, otherwse the system wll reject the person to access the buldng/room. Voce Clamed Identty Voce-based verfcaton system: It s well known that not only conveys a person message, voce of a person also ndcates the person dentty. Therefore, t can also be used n bometrc system. The use of the voce for bometrc measurement becomes more popular due to some reasons such as natural sgnal generaton, convenent to process or dstrbuted and applcable for remote access. Bascally there are two knds of vocebased recognton or speaker recognton. Speaker dentfcaton s one of the two form of speaker recognton, whle speaker verfcaton beng the other one [5]. In the speaker verfcaton system, the system decdes that a person s the one who he/she clams to be. On the other hand, speaker dentfcaton decdes the person among a group of persons. Speaker recognton s further dvded nto two categores, whch are textdependent and text-ndependent speaker recogntons. Text dependent speaker recognton recognzes the phrases that spoken, whereas n text-dentfcaton the speaker can utter any word. The most approprate method for voce-based door access control s based on the concept of speaker verfcaton snce the objectve n the access control s to accept or reject a person to enter a specfc buldng or room. Fgure 4 shows the basc structure of the proposed voce-based verfcaton system. As other methods of bometrc-based securty system, there are two phases n the proposed system. Frst phase s tranng or enrollment phase as shown n Fg. 4(a). In ths phase the authorzed persons are regstered and ther voces are recorded. The recorded voces are then extracted. The features extracted from the recorded voces are used the develop models of the authorzed persons. The second phase n the proposed system s testng or operatonal phase as depcted n Fg. 4(b). In ths phase a person who wants to access the buldng/room 276 Voce Feature Extracton Authorzed Person Modelng Authorzed Person Model Databse Feature Extracton Model Matchng Decson Acceptance/Rejecton Authorzed Person Model Databse a. Tranng phase b. Testng (operatonal) phase Fg. 4: Basc structure of voce-based verfcaton system Feature extracton: As shown n Fg. 4, feature extracton s one of the mportant processes n the proposed system. Feature extracton s the process of convertng the raw voce sgnal to feature vector whch can be used for classfcaton. Features are some quanttes, whch are extracted from preprocessed voce and can be used to represent the voce sgnal. In general, there are two types of feature extracton technque, namely; cepstral coeffcent feature based and prosodc-based feature such as; fundamental frequency and formant frequency. In ths paper, Perceptual Lnear Predcton (PLP) coeffcents are used as feature n the proposed system. Fgure 5 shows schematcally extracton process of the PLP coeffcent from the raw voce sgnal. Perceptual Lnear Predcton (PLP), smlar to LPC analyss, s based on the short-term spectrum of speech. In contrast to pure lnear predctve analyss f speech, PLP modfes the short-term spectrum of the speech by several psychophyscally based transformatons. In ths
method the spectrum s warped accordng to Bark scale. The PLP used an all-pole model to smooth the modfed power spectrum. The output cepstral coeffcents are then computed based on ths model [6]. Fg. 5: PLP coeffcent extracton process In summary, the PLP coeffcents are calculated based on the followng steps: Crtcal band analyss: Frstly, the voce data s framed and wndowed (usng avalable wndow functon such as hammng wndow) and then t s transformed nto frequency doman usng the Fast Fourer Transform (FFT). Then, the obtaned spectrum s warped from the radal frequency (ω) to the Bark frequency (Ω) scale usng the followng formula: ω ( ω ) = 6 ln + 1200π 2 ω 1 + 1200π Ω (3) The power spectrum, S (Ω) of the warped spectrum s then calculated. Next, the convoluton operaton s carred out between the warped power spectrum S(Ω) and the power spectrum of a smulated crtcal-band maskng curve Ψ ( Ω ), whch has the followng form; 0, 10 Ψ ( Ω ) = 1, 10 0, 2.5( Ω + 0.5 ), 2.5( Ω 0.5 ) Ω < 1.3 1.3 Ω < 0.5 0.5 Ω < 0.5 0.5 Ω < 2.5 Ω > 2.5 (4) The followng s the result of the convoluton operaton: 2.5 Θ( Ω ) = S( Ω Ω ) Ψ ( Ω ), = 1,2,,B (5) Ω = 1.3 where B s the number of the sample. The samplng ntervals are chosen so that when the crtcal bands are added together t equally represents the frequency scale. Equal-loudness pre-emphass: Frstly, an equalloudness curve s constructed. An approxmaton of ths curve for the frequency up to 5 khz s 4 2 6 E ω ( ω + 56.8x10 ) ( ) = 2 2 2 2 ( ω + 6.3x10 ) ( ω + 0.38x10 9 ) ω (6) And the followng equaton s the approxmaton for frequency hgher than 5 khz: 4 2 6 E ω ( ω + 56.8x10 ) ( ) = 2 2 2 2 9 6 ( ω + 6.3x10 ) ( ω + 0.38x10 )( ω + 9.58x10 26 ) ω (7) Then, the smulated equal-loudness E(ω) s used to pre-emphass the sampled bark power spectrum Θ(Ω ). The followng s obtaned from the equal-loudness preemphass: L( Ω ( ω )) = E( ω ) Θ( Ω( ω )) (8) Intensty-loudness power converson: Here, a cubc root compresson of the ampltude s performed as follows: 1 / 3 φ ( ω ) = L( Ω ) (9) Inverse dscrete Fourer transform: The power spectrum that resulted from prevous ampltude compresson s converted back to tme doman usng nverse FFT (IFFT). Calculaton of the all-pole coeffcents: The autocorrelaton sgnal can be used to calculate the allpole coeffcents usng the well-known the Levnson- Durbn algorthm. Detal dscusson on the PLP coeffcent can be found n [6]. ANFIS-based speaker model: ANFIS proposed by Jang [7] s an archtecture whch functonally ntegrates the nterpretablty of a fuzzy nference system wth adaptablty of a neural network. Loosely speakng ANFIS s a method for tunng an exstng rule base of fuzzy system wth a learnng algorthm based on a collecton of tranng data found n artfcal neural network. Due to the use the less tunable of parameters of fuzzy system compared wth conventonal artfcal neural network, ANFIS s traned faster and more accurate than the conventonal artfcal neural network. An ANFIS whch corresponds to a Sugeno type fuzzy model of two nputs and sngle output s shown n Fg. 6. A rule set of frst order Sugeno fuzzy system s the followng form: Rule : If x s A and y s B then f = p x+q y+r. 277
In the perspectve of artfcal neural network, t s a feedfoward network consstng of 5 layers. Every node n the frst layer s an adaptve node wth the followng node functon Fg. 6: ANFIS archtecture [8] tranng data. A hybrd learnng algorthm s a popular learnng algorthm used to tran the ANFIS for ths purpose. In summary, the steps of buldng person model based on the voce data usng ANFIS are as follows: * Voce data collecton and feature extracton of the voce data. * Determnng the premse parameters. * Tranng of the ANFIS usng the nput pattern and desred output to obtan the consequent parameters. * Valdaton of the traned ANFIS usng tranng data. RESULTS µ A ( x ), = 1,2 O1, = (10) µ B 2( y ), = 3, 3 where x (or y) s the nput node and A (or B -2 ) s a lngustc label assocated wth ths node. In other words, O 1, s the membershp degree of a fuzzy set A (or B) to whch the nput x (or y) s quantfed. The membershp functon for A (or B) can be Gaussan functon, trangle membershp functon and others. The parameters of the membershp functon used n ths layer are termed as premse parameters. Second layer combnes the output of the frst layer so that t has the followng output: O2, = w = µ A ( x ) µ B ( y ) (11) Here each output represents the frng strength of a rule. Next layer, whch s thrd layer, normalzes the output of the prevous layer as follows; w O3, = w =, = 1,2 (12) w1 + w2 In the fourth layer, the followng output s calculated based on the thrd outputs: O4, = w f = w ( p x + q y + r ) (13) where f s functon whch s used n the frst order Sugeno type fuzzy system. Parameters n ths node (p, q and r ) are referred as consequent parameters. Fnally, the fnal output of the ANFIS s the last layer output and t s gven as O 5, = z = w f (14) The man objectve of the ANFIS desgn s to optmze the ANFIS parameters. There are two steps n the ANFIS desgn. Frst s desgn of the premse parameters and the other s consequent parameters tranng. There are several method proposed for desgnng the premse parameter such as grd partton, fuzzy c-means clusterng and subtractve clusterng [8]. Once the premse parameters are fxed, the consequent parameters are obtaned based on the nput-output 278 Expermental setup: In order to evaluate the effectveness of the proposed ntellgent voce-based door access control, the proposed system s nstalled at Intellgent Mechatroncs System Laboratory, Faculty of Engneerng, Internatonal Islamc Unversty Malaysa. Voces of nne (9) speakers from YOHO database are used n the experment. Three (3) speakers are consdered as the authorzed person to access the laboratory and the other sx (6) speakers are assumed as outsde mpostors. Each speaker, who s assumed as authorzed person, has to say word seven for 70 tmes where 20 voce data are used as tranng data and the other 50 voce data are used as testng data. Ths means the text-dependent speaker verfcaton system s used n the proposed system. The example of raw voce sgnal of the word seven for an authorzed person s shown n Fg. 7. To obtan the PLP coeffcents, the 17 crtcal-band flters are used, whch covers a 17 Bark frequency range. These flters are smulated by ntegratng the FFT spectrum of 20-ms Hammng-wndowed speech segments n whch the frame rate s 10-ms. Fgure 8 shows the 13 PLP coeffcents extracted from the voce sgnal shown n Fg. 7. Magntude 0.50 0.25 0.00-0.25-0.50 0 1000 2000 3000 Tme (msec) Fg. 7: Example of voce sgnal of authorzed people The effectveness of the proposed method n tranng phase s based upon tranng tme and the
classfcaton rate. Moreover, the effectveness of the proposed system n testng (operatonal) phase s evaluated based upon FRR and FAR. The FAR s Magntude 0.04 0.03 0.02 0.01 0.00-0.01 1 2 3 4 5 6 7 8 9 10 11 12 13 Coeffcent Number Fg. 8: Extracted 13 PLP coeffcents Table 1: Model ANFIS 1 (r = 0.25) ANFIS 2 (r = 0.50) ANFIS 3 (r = 1.00) Tranng performances Authorzed Tranng Tme Person (sec) Person 1 51 100% Person 2 45 100% Person 3 46 100% Classfcaton Rate (%) Average 47 100% Person 1 25.8 100% Person 2 28.9 100% Person 3 27.8 100% Average 27.8 100% Person 1 0.28 100% Person 2 0.27 100% Person 3 0.41 100% Average 0.32 100% calculated based on the close set and open set. In the close test, the voce of an authorzed person makes up the dsgused voce to the other authorzed person. On the other hand, test on the two students who are regarded as outsde mpostors consttute open set test. Tranng of the ANFIS-based speaker models: The ANFIS-based speaker model s developed usng Fuzzy Logc Toolbox of MATLAB. In order to allow the ANFIS learn from the nput-output data avalable so that the consequent parameters are obtaned, frstly the structure of the ANFIS has to be desgned. Desgn of the ANFIS structure s done by determnng premse parameters. Here the subtractve clusterng method s used wth dfferent radus parameters. Once the premse parameters are obtaned, the ANFIS model s traned by usng hybrd learnng algorthm for 10 teratons. Table 1 shows the tranng tme and the classfcaton rate for all of the ANFIS-based speaker models for dfferent subtractve clusterng parameters. As shown n the table, all of the speaker models gve perfect classfcaton rates. There are no errors n dentfyng the authorzed persons based on the voce data used n tranng phase. However, the tranng tme s sgnfcantly dfferent for dfferent radus. A smaller radus causes a longer tranng tme. Ths s due to the fact that a smaller cluster radus wll usually yeld more, smaller clusters n the data and hence more rules. A more rules of ANFIS system result n a larger number of consequent parameters. As consequent, a longer tme s needed n tranng process to optmze the parameters. Hence, t can be concluded a larger radus are preferable to shorten the tranng tme. Table 2: Testng performances of ANFIS 1, Radus of 0.25 Authorzed FRR (%) FAR (%) Person Close Set Open Set Person 1 16 5 6 Person 2 20 9 4 Person 3 14 11 7 Overall 16 8.3 5.7 Table 2: Testng performances of ANFIS 2, Radus of 0.50 Authorzed FRR (%) FAR (%) Person Close Set Open Set Person 1 18 5 6 Person 2 10 7 6 Person 3 14 8 7 Overall 14 6.7 6.3 Table 3: Testng performances of ANFIS 3, Radus of 1.00 Authorzed FRR (%) FAR (%) Person Close Set Open Set Person 1 36 2 4 Person 2 6 14 3 Person 3 16 11 10 Overall 19.3 9 5.6 Testng of the ANFIS-based speaker models: Tables 2-4 show the performances of the ANFIS models when testng voce data s used. In term of both FAR and FRR, ANFIS 2 produces a better performance than the other models. Hence t can be concluded that ANFIS 2 s the best canddate as voce-based model n the proposed system. From the securty pont of vew, ANFIS 2 s the best model for protectng the laboratory from unauthorzed person (mpostors) snce t gves the lowest FAR. The overall FAR of the ANFIS 2 s smaller than 10%, whch s good enough for common securty system. In the case hgh level of securty s needed, further mprovement has to be done so that the proposed system produces a small FAR, whch s smaller than 1 %. However although the FRR of the ANFIS 2 s also the smallest, ts FRR s larger than 10 %. Although t does not nfluence the level of securty, a qute large value of FRR makes the access control system nconvenent for the authorzed person. Further mprovement needs to be done to mprove the level of usablty of the ANFIS-based model for access control system. 279
CONCLUSION Ths study has documented development of ntellgent voce-based door access control for buldng securty. The proposed system adopted Perceptual Lnear Predcton (PLP) coeffcents as the feature of the person voce and used Adaptve-Network-based Fuzzy Inference Systems (ANFIS) to develop authorzed person models based on ther voces. Expermental results showed that the proposed system produced a good securty performance, especally t gave a good false rejecton rate (FRR) and a good false acceptance rate (FAR) of the close set condton. However, further study has to be done to mprove ts FRR. REFERENCES 1. Kung, S.Y., M.W. Mak and S.H. Ln, 2004. Bometrc Authentcaton: Machne Learnng Approach. Prentce Hall. 2. Zhang, D.D., 2000. Automated Bometrcs: Technologes and Systems. Kluwer Academc Publsher. 3. Osadcw, L., P. Varshney and K. Veeramachanen, 2002. Improvng Personal Identfcaton Accuracy Usng Multsensor Fuson for Buldng Access Control Applcaton. In Proceedngs the Ffth Intl. Conf. Informaton Fuson, pp: 1176-1183. 4. Anonymous, 2004. Door-access-control System Based on Fnger-ven Authentcaton. Htach Revew. Avalable onlne at http://www.htach.com/rev/archve/2004. 5. Campbell, J.P., 1997. Speaker Recognton: a Tutoral. In Proc. IEEE., pp: 1437-1462. 6. Hermansky, H., 1990. Perceptual lnear predctve (PLP) analyss for speech. J. Acoustcs Socety Amercan, pp: 1738-1752. 7. Jang, J.S., 1993. Adaptve-network-based fuzzy nference system. IEEE Trans. on System, Man and Cybernetcs, pp: 665-685. 8. Jang, J.S.R., C. T. Sun and E. Mzutan, 1997. Neuro-Fuzzy and Soft Computng, Prentce Hall, Upper Saddle Rver, NJ, USA. 280