Static dynamic features and hybrid deep learning models based spoof detection system for ASV

Size: px
Start display at page:

Download "Static dynamic features and hybrid deep learning models based spoof detection system for ASV"

Transcription

1 ORIGINAL ARTICLE Static dynamic features and hybrid deep learning models based spoof detection system for ASV Aakshi Mittal 1 Mohit Dua 1 Received: 2 April 2021 / Accepted: 12 October 2021 The Author(s) 2021 Abstract Detection of spoof is essential for improving the performance of current scenario of Automatic Speaker Verification (ASV) systems. Empowerment to both frontend and backend parts can build the robust ASV systems. First, this paper discuses performance comparison of static and static dynamic Constant Q Cepstral Coefficients (CQCC) frontend features by using Long Short Term Memory (LSTM) with Time Distributed Wrappers model at the backend. Second, it performs comparative analysis of ASV systems built using three deep learning models LSTM with Time Distributed Wrappers, LSTM and Convolutional Neural Network at backend and using static dynamic CQCC features at frontend. Third, it discusses implementation of two spoof detection systems for ASV by using same static dynamic CQCC features at frontend and different combination of deep learning models at backend. Out of these two, the first one is a voting protocol based two-level spoof detection system that uses CNN, LSTM model at first level and LSTM with Time Distributed Wrappers model at second level. The second one is a two-level spoof detection system with user identification and verification protocol, which uses LSTM model for user identification at first level and LSTM with Time Distributed Wrappers for verification at the second level. For implementing the proposed work, a variation in ASVspoof 2019 dataset has been used to introduce all types of spoofing attacks such as Speech Synthesis (SS), Voice Conversion (VC) and replay in single set of dataset. The results show that, at frontend, static dynamic CQCC feature outperform static CQCC features and at the backend, hybrid combination of deep learning models increases accuracy of spoof detection systems. Keywords ASV Spoof detection CQCC LSTM CNN Introduction Building the robust spoof detection system for Automatic Speaker Verification (ASV) is now an essential task, as the attention and demand for voice protected authentication systems is increasing in the users of smart devices. According to a survey users are curiously looking forward to use the speech driven authentication systems [1]. ASV system verifies whether the input speech signal is actually spoken by the authentic user or generated by the tricks by the imposter to gain access to the legitimate user s account. With the availability of low cost voice sensors, and advanced research B Mohit Dua Aakshi Mittal 1 Department of Computer Engineering, National Institute of Technology, Kurukshetra, India in mathematical and logical techniques for generating the synthetic speech, the number of spoofing attack types are also getting increased. Speech synthesis (SS), voice conversion (VC), replay, mimicry and twins attacks are the very potential spoofing attacks to these type of systems. SS attacked utterance is generated by the text to speech technique [2]. VC speech signals are generated by converting the imposter s voice in to the legitimate user s voice with the help of transformation functions [3 5]. Replay attack are the one of the easiest form of attacks in which spoofed speech is the recorded voice signal of targeted user [6]. For mimicking the legitimate user s voice, any professional manipulates his/her speech features. Twins attack is also a kind of mimicry attack [7, 8]. In some cases, twine siblings are able to get access to each other s voiced locked accounts [5, 9]. SS and VC attacks can be injected via the channel into the system. Hence, these attacks are named as Logical Access (LA) attacks [9]. The replay, mimicry and twins attacks are inserted by the microphone into the system. Hence, these

2 attacks known as physical access (PA) attacks. Performance of ASV systems is greatly affected in the presence of these spoofing attacks [10]. Various speech corpora have been proposed enriched with different kind of spoofing attacks. For instance, ASVspoof 2015 data includes SS and VC attacks [11], ASVspoof 2017 dataset includes only replay attack [12], Yoho dataset includes mimicry attacks [13], etc. The recently proposed ASVspoof 2019 dataset includes SS, VC and replay attacks, however, in two sets. This paper presents an initiative of putting all kind of attacks into a single dataset. Along with attacks consideration, the robust designs of frontend and backend of an ASV system can become a preventive shield for spoofing attacks. Frontend of an ASV system uses a speech feature extraction technique to extract useful information form the recorded speech signal. Features of cepstrum domain that are Mel Frequency Cepstrum Coefficients (MFCC), Inverse Mel Frequency Cepstrum Coefficients (IMFCC) [14], Linear Frequency Cepstrum Coefficients (LFCC), Constant Q Cepstrum Coefficients (CQCC), etc. have performed remarkably well for the spoof detection tasks, and for speech and speaker recognition tasks as well. These techniques can model the human vocal tract and human auditory system very well [15 17]. Human ear is proved to be deaf for the phase factor of sound. However, utilization of this factor for frontend development of speech driven devices [18, 19] can be done by using All Pole Group Delay Function (APGDF), Modified Group Delay Function (MODGDF), etc. Both static and dynamic coefficients of speech features deliver the information of context and speaker specification information. These coefficients are passed to the backend spoof detection model. CQCC features are specially designed for spoof detection tasks proposed in ASV systems of [20, 21] and it is claimed that, these features perform better than Instantaneous Frequency Cosine Coefficients (IFCC), MFCC, Epoch Features (EF). The proposed work in this paper also exploits a hybrid of static and dynamic CQCC features for developing the frontend. Also, it presents performance comparison of static and static dynamic CQCC features by using Long Short Term Memory (LSTM) with Time Distributed Wrappers model at the backend. Various machine learning techniques Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) [22 24], Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), etc. are playing crucial role for classification tasks even in speech based systems [25, 26]. In case of ASV system, backend classification model takes the speech features as input and classifies the signal as spoofed or bonafide after analyzing the speaker specific information in them. In the initial research, GMM was used effectively as the backend model [27]. As the deep learning algorithms are getting improved day by day ASV community has started to use CNN and LSTM models [28 30]. In various speech and speaker recognition tasks, LSTM-based deep learning models are performing better than the other models. However, CNN models are also giving satisfactory results [31 33]. Also, different arrangements of frontend and backend models can bring smoothness and accuracy to spoof detection task. The rest of the paper is organized as: second section discusses the related work then third section of the paper discusses the proposed method, the experimental setup details and results are presented in fourth section, fifth section explains the performance analysis of proposed models and systems then sixth section compares proposed systems with existing systems and seventh section concludes the proposal with dropping some light on future directions. Related works This section discusses the related works in this area. Literature is enriched with the experiments on various feature extraction techniques of audios at frontend and different classification models at the backend. Research done by Valenti et al. [34] discusses an approach with end to end speech signal passing to an evolving Recurrent Neural Network (RNN). System used in their work is designed with RNN and neuroevolution of augmenting topologies. The proposed work considers replay attack particularly. The review done by Kamble et al. [35] presents a wide analysis of many existing ASV spoof systems from the perspective of ASVspoof challenge. Lai et al. [36] proposed Attentive Filtering Network based and ResNet classifier based system to detect replay attacks. The proposed attention-based filtering approach is used to improve feature representations. The proposed work used ASVspoof 2017 Version 2.0 dataset to attain a very low Equal Error Rate (EER). The authors claimed an improvement of about 30% over the existing ASVspoof 2017 enhanced baseline system. ASVspoof 2019 challenge puts the three different types of attacks in one dataset and presents baseline models with LFCC and CQCC features at frontend and GMM at the backend [27]. Chettri et al. [10] trained various deep learning backend models and tested them with different features extraction approach in front end. These backend models are further combined to get three ensemble models, where all the systems were tested for physical access and logical access attacks. Recently, Dua et al. [30] also proposed the ensemble approach using LSTM based deep learning models at the backend, and three different feature extraction techniques Constant Q cepstral coefficients (CQCC), inverse Mel frequency cepstral coefficients (IMFCC) and MFCC at the frontend. The author claimed that their proposed ensemble model with CQCC features outperforms some already existing proposed ASV systems.

3 Motivated by these works, the proposed work in this paper compares performances of different deep learning models at backend by using them with static dynamic CQCC features at frontend. The implemented work of this paper has also used combination of LSTM and CNN models for development of the backend. Also, two two-level spoof detection systems for ASV by using static dynamic features at frontend are implemented. The first system does voting protocol based implementation by using CNN, LSTM models at first level and LSTM with Time Distributed Wrappers model at second level. The second system uses LSTM model for user identification at first level and LSTM with Time Distributed Wrappers for verification at the second level. These systems can bring new insights in the development of spoof detection methods for ASV. Proposed method This section of the paper discusses the architecture of the proposed ASV system. Figure 1a shows the frontend and backend arrangement that has been used for comparison of static CQCC and static dynamic CQCC features in the implemented ASV system. Speech signals taken from the dataset are applied to the frontend where static CQCC features are extracted with the general process of extraction and static dynamic hybrid features are extracted with the proposed methodology. Then these features along with the labels from the dataset are applied to the backend model that runs the classification. These classification results are useful for feature comparison. Figure 1b shows the frontend and backend arrangement that has been used for comparison of various deep learning models by keeping static dynamic CQCC features at frontend. Frontend used in this arrangement is the best performing feature extraction technique from the feature comparison. Speech signals and labels are the part of same dataset in whole arrangement. Backend here has all the proposed models for spoof detection and single model for speaker identification task. At the backend all chosen model are trained and their performances are analyzed. Systems of Fig. 2 are the arrangements of models from Fig. 1. Figure 2a shows the block diagram for the voting protocol based two-level spoof detection system. This system classifies the speech signal according to the voting protocol that is implemented with the help of level 1 and level 2. Level 1 applies analysis on the input that is further analyzed at level 2 as per the protocol to declare the decision. Figure 2bgives the block diagram for the two-level user identification and verification system. This two stages arrangement makes the use of speaker identification model at stage 1 result of which is passed to stage 2. Stage 2 uses the user identification and verification protocol along with chosen backend model to declare the classification result. Table 1 AllSpoofsASV dataset Sets Bonafide SS & VC spoofed Replay spoofed Training ,800 48,600 Development ,296 24,300 Evaluation 25,445 63, ,640 The following is the pointwise contribution of the proposed work and following subsections discuss each component in detail. This paper promotes the development of single countermeasure that is free from every kind of spoofing attack. Therefore, initiative of modification in the used dataset is taken. AllSpoofsASV dataset (Fig. 1) is a generated variation of the standard dataset. Selection of suitable features for the frontend is essential. This work tests, whether static CQCC or a combination of static and dynamic CQCC speech features perform better at frontend, where both features have LSTM with time distributed wrappers model at backend. Different deep learning models, LSTM, LSTM with time distributed wrappers and CNN based systems are implemented with static dynamic CQCC features to measure their performances individually. One voting protocol based implementation by using CNN, LSTM models at first level and LSTM with Time Distributed Wrappers model at second level is done. And, another implementation using LSTM model for user identification at first stage and LSTM with Time Distributed Wrappers for verification at the second stage is performed. AllSpoofsASV dataset A generated variant of ASVspoof 2019 dataset is used for building the proposed ASV systems. ASVspoof 2019 dataset is provided by the ASVspoof challenge community [37]. The design of this dataset is intended to tackle with SS, VC and replay attacks in ASV systems. LA set of the dataset includes SS and VC spoofed utterances and PA set includes replay attacked utterances [27]. All the audios are recorded in English language and are 2 8 s in length. However, the length of maximum number of audios lies between 4 and 6 s in both the sets. Proposed system is making use of both of the LA and PA sets by mixing them into a single set, All- SpoofsASV Dataset. Mixing of sets provides the reliability in developing the spoof detection systems in one run for all kind of included spoofing attacks of the used dataset. Table 1 shows number of bonafide, SS spoofed, VC spoofed and replay spoofed utterances in training, development and evaluation sets of AllSpoofsASV Dataset.

4 Fig. 1 a Proposed ASV system for features comparison. b Proposed system for Deep Learning Models Fig. 2 a Voting protocol based two-level spoof detection. b Two-level spoof detection system with user identification and verification Feature extraction using CQCC features Constant Q Cepstral Coefficients (CQCC) feature extraction is used for extracting useful information from the recorded speech signal during both training and testing phase of an ASV system. In recent years, this technique is proved to be most promising for the development of robust and accurate ASV systems [20, 21]. The mathematical representation of CQCC feature extraction approach is described as: C CQF (e) CQT(p(n)) (1) C CQCC ( j) E log C CQF (e) { } 2 j(e 0.5)π cos E e 1 Here, Eq. (1) finds out the Constant Q Transform (CQT) of input speech signal p(n) in C CQF (e), and Eq. (2) finds out j (2) number of CQCC coefficients in C CQCC ( j), where E is used for number of linearly spaced bins and e is used for indexing into the number of bins. The process of CQCC feature extraction applies Constant Q Transform (CQT) and then, it takes the log of powered spectrum [38]. Also, before calculating the Discrete Cosine Transformation (DCT) it applies the resampling [39, 40]. It sets the number of feature coefficients and returns CQCC features. The proposed system uses the find_cqcc_features () function for implementing CQCC feature extraction. This function applies the actual CQCC feature extraction process to the speech signal. This function takes an audio file as the input and returns a matrix of 90 m_frames with the 30 static, 30 delta (D) and 30 delta-delta (DD) features for m_frames number of frames. m_frames denotes the number of audio frames extracted depending on the length of input audio. Firstly, it sets the initial values for number of bins per

5 octave b, maximum frequency N max, minimum frequency N min, number of desired coefficients of any type n_coeff and type of feature f_type. Here, feature type f_type can be static (S), delta (D) or delta-delta (DD). Secondly, it calls the find_cqcc () function that takes all these initialized values as input to output the values as static, delta or delta-delta features. Algorithm working in this function starts with the calculation of gamma value that is one of the parameters to CQT application process. Then, it calculates the log power spectrum of the output of CQT application, which is considered for resampling before calculating the DCT. Function performing these operations are discussed further in this sections. Understanding of input taken, operations applied and nature of output of these functions are provided. Then, this algorithm takes care of taking only desired number of features. It returns the static, delta or delta-delta coefficients as per the value of f_type. Finally, find_cqcc_features () combines all type of coefficients into one matrix and finds out number of frames. This function ensures the 400 minimum number of frames in the output. If the number of frames are less than 400 then padding of zeros is done and the final matrix is the desired CQCC feature matrix. This whole process uses some functions that are inbuilt functions of different libraries of Python and MATLAB [41, 42]. In the proposed work, these functions are named according to their functionality and are described further in this section. Function 1 given in the Appendix gives the pseudo code for find_cqcc_features () that calls find_cqcc () to compute CQCC features. audioread (): This function takes an audio file (audio_file) as input, and returns its time series y and sampling rate N s. Number values in time series y depends on the length of the audio file, which further contributes to the number of frames. zscore (): This function calculates the row wise zscore for each value of the input matrix. As the values coming out from find_cqcc () function reside in a continuous range of small to large values. Hence, application of this function normalizes these values. General formula to calculate the zscore is given by the Eq. (3). zscore (x μ) σ Here, x is the element value to be normalized, μ is the mean of the values of entire row and σ stands for standard deviation of those values. length (): This function takes a matrix as input and outputs the value of number of columns in it. zero_padding (): Functionality of this function is to add extra columns of zero values up to the desired number of (3) rows. More specifically, it does the padding of zeros for the desired number of columns in a given matrix. cqt (): This function applies the Constant Q Transform (CQT) to the representative values of a speech signal. CQT changes the frequency domain into the time domain along with maintaining the constant Q factor across the signal. gamma_value is a parameter to this function that is calculated using Eq. (4) with the help of number of bins b per octave in speech signal. ( ) gamma_value b 1 2 b 1 (4) log (): This function applies logarithm operation on the input values. Logarithm is calculated for the squared spectrum that is output of cqt () function. resample (): This function converts the geometrically spaced bins provided by CQT into linearly spaced bins. Bins are converted into linear space to make the signal compatible with Discrete Cosine Transformation (DCT). dct (): This function applies DCT internally. Application of DCT is helpful in signal compression task, conversion of frequency domain into time domain, etc. cut ():cut () function cut a matrix to the desired number of rows. delta (): This function calculates the derivative of the applied values. Backend classification using deep learning models This section gives brief a detail of the Deep Learning models that are used at the backend of the different architectures proposed in this paper. Long short term memory (LSTM) with time distributed wrappers (M 1 ) Proposed Long Short Term Memory (LSTM) Network, shown in Fig. 3, is comprised of three time distributed dense layers, each having ReLU activation function. Time distribution wrapped layers are especially suitable for time varying data frames like audio, video, etc. Proposed LSTM model (M 1 ) has 32, 16 and 10 units in time distributed dense layers in this order. Number of units inside the layers are presented to provide the finer grained knowledge of structure of model to the readers. Motivation to choose these number of neuron is taken from the related work [30, 31]. After that 15% dropout is applied to disuse the effect of some randomly selected neurons. Addition of dropout layer prevents the model from overfitting. In the M 1 model, this operation is followed by three LSTM layers each having 10, 20 and 30 units in this order. Again these layers are followed by 10% dropout, and the result of dropout is passed to a dense layer having sigmoid activation function in it.

6 Fig. 3 Long short term memory (LSTM) with time distributed wrappers (M 1 ) Fig. 4 Long short term memory (LSTM) (M 2 ) Long short term memory (LSTM) (M 2 &M 4 ) Proposed Long Short Term Memory (LSTM) Network, shown by Fig. 4, takes input on the first LSTM layers that are followed by two more LSTM layers. These layers have 10, 20 and 30 LSTM units in this order, which are chosen as per the results shown in [30, 31]. Output of these layers is passed to a dense layer of 24 units after applying 10% dropout. Again the output of this dense layer is passed to the last layer that is a dense layer with sigmoid activation function, after, applying the 10% dropout. An LSTM model (M 4 ) with the similar architecture has 20, 30 and 400 units in this order in its first three LSTM layers (Fig. 4). However, all the dropout and dense layers are having same specifications. Two-dimensional convolutional neural network (2D CNN) (M 3 ) As shown in Fig. 5, the Two-Dimensional Convolutional layer (Conv2D) of proposed Two-Dimensional Convolutional Neural Network (2D CNN) (M 3 ) is comprised of 24 filters of 3 3 kernels size along with the ReLU activation function. After that a batch normalization layer is added, which itself is followed by three blocks of Conv2D and 2- Dimensional (2D) max pooling layers. Conv2D layers of these blocks have 16 filters of 5 5 kernel size, and 2D max pooling layers are of 2 2 pool size. These blocks are followed by a flatten layer that is followed by a dense layer of 10 units. After that, 10% dropout is applied to avoid the overfitting of the model. Last layer of this 2D CNN model is a dense layer with sigmoid activation function. Spoof detection systems This section discusses the two-spoof detection systems (System_1 and System_2) that are developed for the implementation of the proposed ASV system. Both System_1 and System_2 use the static dynamic hybrid combination of CQCC features at frontend and different arrangements of M 1,M 2,M 3 and M 4 models at backend. Voting protocol based two-level ASV system (System_1) The two-level ASV system with voting protocol i.e. System_1 focuses to the spoof detection task. It accepts the input

7 Fig. 5 Two-dimensional convolutional neural network (2D CNN) (M 3 ) TRUE for the bonafide decision made by the model. Voting protocol compares both outputs of find_binary () function for both the first level models. If the outputs from both the models is same, then it is returned as the final classification result of the system. Otherwise, the audio file is tested on the model M 1 at the second level and its classification result after passing to find_binary () function is returned. At the end, proposed system returns TRUE or FALSE for input speech being Bonafide or spoofed, respectively. Function 2, added in the Appendix section, gives the pseudo code for the implemented voting protocol that uses find_binary () function. Two-level ASV system with user identification and verification (System_2) Fig. 6 Two-level ASV system with Voting Protocol (System_1) speech signal if it is bonafide, and rejects it if it is spoofed by any of the SS, VC and replay attacks. Models M 1,M 2 and M 3 provide the corresponding labels: bonafide or spoofed as output. Figure 6 shows the proposed System_1 that has models M 2 and M 3 at the first level and M 1 resides at the second level, where F is treated as a global variable. Purpose of putting models M 2 and M 3 at level one is that both of these models are equally good, when evaluated for Equal Error Rate (EER). This adds fairness in the classification result of this level. M 1 is the most powerful model. Hence, it is put at the second level. Firstly, each input audio file is applied to the models M 2 and M 3. Then, voting protocol is applied to their decisions. A find_binary () function maps these decisions to the Boolean values i.e. FALSE for spoofed decision (due to any of the SS, VC and replay) and The System_2, as shown by Fig. 7, also executes its process in two stages/levels. In the first stage, it identifies the user id for the applied speech signal. Then, user s voice signal is verified, whether it is bonafide or spoofed, in the second stage of the system. System uses User Identification and Verification Protocol to accomplish this task, where F and I are treated as global variables. As a result, system identifies the validity of claimer along with the genuineness of the applied speech signal. Firstly, input audio signal is applied to the model M 4 of the first stage. Model M 4 predicts the identification of the user (U i ) out of already registered n users. This predicted identity is supplied to stage 2 where user identification and verification protocol is applied. At this stage, n number of instances {(M 1 U 1 ), (M 1 U 2 ),, (M 1 U n )} of model M 1 resides, which are trained for n number of users {U 1,U 2,.,U n }. Model M 1 checks whether the speech signal is bonafide or spoofed at this stage, and the decision is mapped with a valid integer value in variable A. set_terary () function maps to integer value THREE if the U i and I are not same, maps to integer value ONE if the decision is Bonafide along with U i and I are same, and maps to integer value TWO if the decision is spoofed. At the output variable, if A is ONE then the user is valid and speech is Bonafide, if A is TWO then the user is

8 Fig. 7 Two-level ASV system with user identification & verification (System_2) invalid and speech is spoofed, and if A is THREE then user is invalid. Function 3, appended in the appendix, gives pseudo code for the implementation of the System_2. Experimental setups This section of the paper deals with the experimental details for implementation of the proposed ASV system. The frontend feature extraction is implemented by using Octave on Linux Operating System. The training, development and evaluation of backend models are done with Anaconda platform on Windows operating system. All the used audios and labels are taken from training, development and evaluation sets of AllSpoofsASV Dataset. During training the deep learning models, Python s inbuilt features are used for weight updation, that is backpropagation algorithm and loss functions are used. For the two class classification problems, binary cross entropy loss is used as the loss function. It finds out the probability or score for an utterance between zero and one. Categorical cross entropy loss function is used as loss function for multi class classification of user identities (specifically in the training of M 4 ). A learning rate is required for iterative updation of weights during the training process. In the proposed work, ADAM (Adaptive Momentum) optimizer algorithm is used to achieve the adaptive value of learning rate [43, 44]. It combines the advantages of Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Algorithm (RMSProp). AdaGrad defines the learning rate for each parameter to improve the performance of model sparse gradient, whereas RMSProp makes the use of average of latest values of gradients of weights. ADAM algorithm passes both the gradient and square gradient to the exponential moving average function. For heavy models and large size of datasets, it can solve practical problems efficiently [43 45]. System arrangement for different comparisons and analysis are discussed later in this section. The performance of the proposed architectures and systems are evaluated with the help of two evaluation measures Equal Error Rate (EER) and Percentage Accuracy. Spoof detection systems are evaluated by using EER and user identification system is evaluated by percentage accuracy. EER is the equal value of False Acceptance Rate (FAR) and False Rejection Rate (FRR) [27, 28], where FAR is ratio of number of spoofed utterances having score more than or equal to the threshold to the total number of spoofed utterances and FRR is ration of the number of bonafide utterances having score less than the threshold value to the total number of bonafide utterances. The mathematical representation FAR and FRR is given by Eqs. (5) and (6), respectively. EER aims to calculate the FAR and FRR with the help of threshold. For the equal values of these parameters, it declares the EER for the system. Total count of utterances with score FAR (5) Total count of spoofed utterances Total count of bonafide utterances with score < FRR Total count of bonafide utterances (6) Percentage accuracy is calculated with the help of correct predictions and total number of input samples to be checked. Mathematical formula of percentage accuracy is given by Eq. (7). Percentage Accuracy Count(correct predictions) Count(input samples) 100 In this case, the division of total correctly predicted user samples by the total number of user input samples is multiplied by 100. Frontend features extraction For spoof detection task, firstly, model M 1 is trained with only 30 static CQCC features calculated by doing some modifications in find_cqcc_features () function. Mean of m_frames frames for each coefficient of 30 features is used. A vector of 1 30 dimensions is extracted in case of static features and Model M 1 is trained up to five epochs with the batch size of 512. Secondly, Model M 1 is trained with the static dynamic hybrid CQCC features calculated by find_cqcc_features () function. All 30 static, 30 delta and 30 delta-delta CQCC features for all m_frames frames (without taking mean) are (7)

9 Table 2 Comparative analysis of different CQCC features Features Development Set (EER) Evaluation set (D 1 ) (D 2 ) (D 3 ) (D 4 ) (D 5 ) Average (mean + sd) (EER) Static CQCC ± Static Dynamic ± CQCC Values in bold show the final and best performing results Table 3 Comparison of backend spoof detection models Model Development set (EER) Evaluation set (EER) System_1 (EER) (D 1 ) (D 2 ) (D 3 ) (D 4 ) (D 5 ) Average (mean + sd) M ± M ± M ± Values in bold show the final and best performing results Table 4 Performance analysis for LSTM (M 4 ) Model %Accuracy Evaluation Set %Accuracy (D 1 ) (D 2 ) (D 3 ) (D 4 ) (D 5 ) Average (mean + sd) M ± used in this arrangement. A matrix of 90 m_frames dimensions is extracted for each audio in this case. To balance the comparison criteria, this arrangement has also been trained up to five epochs with the batch size of 512. Equal Error Rate (EER) for both the arrangements is found out to compare the performances of the feature sets. The comparative analysis for evaluation data with both features is shown in Table 2. Backend deep learning models with System_1 The proposed work compares performance of all the backend deep learning models M 1,M 2 and M 3, implemented individually, with voting protocol based System_1 by using static dynamic CQCC features at the front end and All- SpoofASV dataset. Model M 1 is trained with the batch size of 512 up to five epochs, Model M 2 is trained with the batch size of 512 for 20 epochs and model M 3 is trained with the batch size of 500 for 15 epochs. For the training of all three models, patience of two is used for early stopping criteria, binary cross entropy loss function is used to measure the loss and ADAM optimizer is used for optimization purpose in both the systems [43, 44]. As described earlier, trained models M 1,M 2 areusedat level 1and M 3 is used at level 2 for development of voting protocol based spoof detection system System_1. The performance analysis of M 1,M 2,M 3 and System_1 is done by using the parameter EER. Table 3 shows the comparative values of EER for evaluation datasets for all the three backend models and System_1. Model M 4 User identification model M 4 is trained individually for eight users (n) with the batch 512 up to 80 epochs using categorical cross entropy loss function. Model M 4 is tested by using parameter percentage accuracy. Percentage accuracy of the model is calculated for evaluation set, as shown by Table 4. System_1 and System_2 System_2 uses trained model M 4 for user identification task is used at stage 1 and n number of instances of model M 1 are used at stage 2. However, the training of Model M 1 in System_2 is different from System_1. In System_2, it is trained eight times separately for each user out of the total eight existing users. For this, bonafide and spoofed utterances of each specific user are taken. Firstly, user identification is done for eight users by the stage 1, and then, user identification verification protocol is invoked for verification at stage 2. The performances of System_1 and System_2 for spoof detection task are evaluated using the parameter EER for evaluations sets, as shown in Table 5.

10 Table 5 Performance of proposed systems System Equal error rate (EER) Development set System_ System_ Results Evaluation set This section presents the performance and comparison results of all systems discussed in third section. For obtaining the results, the proposed work uses the procedure adopted by state of the art works of [10, 15, 26]. As described earlier in AllSpoofsASV dataset section, the dataset used by the proposed system is already divided in training, development and evaluation sets. Therefore, it is not required to partition the dataset in ratios for training, development and evaluation samples. For evaluation in case of ASV systems, EER is the used evaluation protocol that is applied on the classification results of the model for spoof detection task [10, 15, 26]. Models for this work have been trained five times with the training set, and for each trained model development set is applied. Network parameters have been tuned for all the systems to obtain stable parameters. On the development results, EER evaluation protocol is applied and accuracy of the model is verified. Mean of all five development set test results is considered to show in presented tables. Evaluation set is applied on the model when it becomes stable after all training passes and EER is calculated for the classification result. Protocols of systems one and two are applied with the evaluation set performances of models. For the task of speaker identification, percentage accuracy is calculated as evaluation measure on development set results using fivefold validation approach. It is also evaluated for evaluation to check the performance. Comparison of CQCC features Models set for features comparison are trained five times and average i.e. mean + standard deviation (SD) of the results is taken to conclude the EER. It can be observed in Table 2 that combination of static and dynamic CQCC features is performing better than static CQCC features. Hence, this combination is used in the development of further proposed spoofed detection systems. Comparison of used deep learning models with System_1 These models are trained five times and the EER evaluation measure is calculated on development set for each training for model. Table 3 represents the EER value for five training and development passes (presented by sequence of D i in Table 3) along with the average value of results. Then, the performance on evaluation set and System_1 are shown. Results presented in Table 3 shows that M 1 outperforms the other two backend models for spoof detection, when implemented individually. However, voting protocol based System_1 outperforms all the three backend models. Voting protocol is applied once the ave performances of all the deep learning models are concluded. Performance of model M 4 The average percentage accuracy of the model M 4 is calculated for evaluation set by averaging the five runnings, as shown by Table 4. The percentage accuracy, as described earlier, is calculated by Eq. (7) using correct predictions and total number of input samples to be checked. It can be observed from Table 4 that M 4 performs satisfactory. Comparative analysis for System_1 and System_2 The performances of System_1 and System_2 for spoof detection task are evaluated using the parameter EER for both development and evaluations sets, as shown in Table 5. It can easily be observed in Table 5 that System_2 is performing better than the System_1. However, System_2 is limited to the private or local domain because it uses limited number of users. An increase in number of users will add more complexity in development of an ASV system as for each user separate training model M 1 is required, which is not practically feasible. Hence, System_1 performs satisfactory as it is applicable to the public domain. Comparison of proposed system with existing systems This section compares the performances of the proposed systems, System_1 and System_2, with some of the existing systems from the literature. Chettri et al. [10] have designed three Ensemble systems (E1, E2 and E3) made up of different classical and deep learning models, where the ensemble system E1 performs the best among them. Cai et al. [15] have trained ResNet deep learning model with CQCC, LFCC, IMFCC, Short Term Fourier Transform (STFT) grams, and Group Delay (GD) gram features. However, it is trained only for replay attack. Kumar et al. [26] have trained a Time Delay Shallow Neural Network (TDSNN) with CQCC, IMFCC, Linear Frequency Band Cepstral Coefficients (LFBC) and LFCC features for SS, VC and replay attacks. ASVspoof 2019 challenge has provided a GMM model trained with LFCC and CQCC features at frontend for SS, VC and replay

11 Table 6 Comparison of proposed system with existing systems Works Backend Frontend features Evaluation set SS, VC Replay EER Chettri et al. [10] Ensemble 1 MFCC, IMFCC, SCM, i-vectors, long term Ensemble 2 average spectrum Cai et al. [15] ResNet Fusion CQCC, LFCC, IMFCC, STFT, GD grram ASVspoof 2019 Challenge [27] GMM CQCC GMM GMM LFCC GMM Kumar et al. [26] TDSNN CQCC, IMFCC, LFBC, LFCC TDSNN Jung et al. [46] DNN 7 spectrograms, i-vectors, raw waveforms Proposed work System_1 Static -Dynamic Hybrid CQCC System_ * Indicates that a particular attack is addressed and indicates that a particular attack is not addressed attacks [27]. Jung et al. [46] has trained a Deep Neural Network Model with 7 spectrograms, i-vectors and raw waveforms only for replay attack detection. Table 6 shows the comparison of these systems with proposed systems of this paper. Although, some systems from literature seem to be good for detection of a particular attack type. However, proposed system is also performing good for the detection of all three kinds of spoofing attacks in one run. Conclusion Undoubtedly, the ASV systems are highly exposed to spoofing attacks. However, their performance is fine enough that industry is attracted to use them in practical applications. Initiative to design a single dataset can provide new insights to the spoof detection task. AllSpoofsASV Dataset, a variation of ASVspoof 2019 dataset, is a small step towards this. Combination of different feature coefficients with hybrid deep learning models can help in development of robust ASVs. This paper shows that a combination of static and dynamic CQCC performs better with LSTM models than only static features. Also, comparison of results shows model LSTM with Time Distributed Wrappers (M 1 ) outperforms the models LSTM (M 2 ) and CNN (M 3 ), when evaluated by Equal Error Rate (EER). However, the two-level voting protocol based spoof detection system System_1 that uses M 2,M 3 at level 1 and M 1 at level 2 performs best of them all. As model LSTM (M 4 ) provides satisfactory performance it can be used particularly for speaker identification with spoof detection. Also, two-level spoof detection system with user identification and verification System_2 that uses M 4 at stage 1 and M 1 at stage 2 performs better than System_1. However, it is limited to limited number of users. Using it for public domain or an organization with more and variable number of speakers will increase the complexity and requirement of storage space for the system. For future work, more attacks like twins and mimicry should can be added into the dataset, and more hybrid possible combinations of features and deep learning models can be exploited. Considering the importance of the spoof detection in ASV, more efficient and complex structures like VGG-family of deep learning models can also be used as future extension of the proposed work. Declarations Conflict of interest The submitted work does not have any conflict of interest. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit ons.org/licenses/by/4.0/. Appendix

12

13 References 1. Beranek B (2013) Voice biometrics: success stories, success factors and what s next. Biometr Technol Today 2013(7): Indumathi A, Chandra E (2012) Survey on speech synthesis. Signal Process Int J (SPIJ) 6(5): Lim R, Kwan E (2011) Voice conversion application (VOCAL). In: 2011 international conference on uncertainty reasoning and knowledge engineering, vol 1. IEEE, pp Mohammadi SH, Kain A (2017) An overview of voice conversion systems. Speech Commun 88: Patil HA, Kamble MR (2018) A survey on replay attack detection for automatic speaker verification (ASV) system. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66: Hautamäki RG, Kinnunen T, Hautamäki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Interspeech, pp Hautamäki RG, Kinnunen T, Hautamäki V, Laukkanen AM (2014) Comparison of human listeners and speaker verification systems using voice mimicry data. Target 4000: Lindberg J, Blomberg M (1999) Vulnerability in speaker verification-a study of technical impostor techniques. In: Sixth European conference on speech communication and technology 10. Chettri B, Stoller D, Morfi V, Ramírez MAM, Benetos E, Sturm BL (2019) Ensemble models for spoofing detection in automatic speaker verification. arxiv: arxiv preprint 11. Sahidullah M, Delgado H, Todisco M, Yu H, Kinnunen T, Evans N, Tan ZH (2016) Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017) Audio replay attack detection with deep learning frameworks. In: Interspeech, pp Campbell JP (1995) Testing with the YOHO CD-ROM voice verification corpus. In: 1995 international conference on acoustics, speech, and signal processing, vol 1. IEEE, pp Chakroborty S, Saha G (2009) Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter. Int J Signal Process 5(1): Cai W, Wu H, Cai D, Li M (2019) The DKU replay detection system for the ASVspoof 2019 challenge: on data augmentation, feature representation, classification, and fusion. arxiv: arxiv preprint 16. Balamurali BT, Lin KE, Lui S, Chen JM, Herremans D (2019) Toward robust audio spoofing detection: a detailed comparison of traditional and learned features. IEEE Access 7: Dua M, Aggarwal RK, Biswas M (2017) Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system. In: International conference on computer and applications (ICCA), pp Sahidullah M, Kinnunen T, Hanilçi C (2015) A comparison of features for synthetic speech detection. In: 16th Annual Conference of the International Speech Communication Association (INTER- SPEECH 2015), pp Pal M, Paul D, Saha G (2018) Synthetic speech detection using fundamental frequency variation and spectral features. Comput Speech Lang 48: Todisco M, Delgado H, Evans NW (2016) Articulation rate filtering of CQCC features for automatic speaker verification. In: Interspeech, pp Jelil S, Das RK, Prasanna SM, Sinha R (2017) Spoof detection using source, instantaneous frequency and cepstral features. In: Interspeech, pp Dua M, Aggarwal R, Kadyan V, Dua S (2012) Punjabi Speech to text system for connected words, pp Dua M, Aggarwal RK, Biswas M (2018) Discriminative training using noise robust integrated features and refined HMM modeling. J Intell Syst 29(1): Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Hum Comput 10(2) 25. Dua M, Aggarwal RK, Biswas M (2019) Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Comput Appl 31(10): Kumar MG, Kumar SR, Saranya MS, Bharathi B, Murthy HA (2019) Spoof detection using time-delay shallow neural network and feature switching. In: 2019 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan* Huang L, Pun CM (2019) Audio replay spoof attack detection using segment-based hybrid feature and Dense Net-LSTM network. In: ICASSP IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp Mobiny A, Najarian M (2018) Text-independent speaker verification using long short-term memory networks. arxiv: arxiv preprint 30. Dua M, Jain C, Kumar S (2021) LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. J Ambient Intell Human Comput 31. Mittal A, Dua M (2021) Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network. International J Swarm Intell 32. Mittal A, Dua M (2021) Constant Q cepstral coefficients and long short-term memory model-based automatic speaker verification system. In: Proceedings of international conference on intelligent computing, information and control systems, pp Chettri B, Mishra S, Sturm BL, Benetos E (2018) Analysing the predictions of a cnn-based replay spoofing detection system. In: 2018 IEEE spoken language technology workshop (SLT). IEEE, pp Valenti G, Delgado H, Todisco M, Evans NW, Pilati L (2018) An end-to-end spoofing countermeasure for automatic speaker verification using evolving recurrent neural networks. In: Odyssey, pp Kamble MR, Sailor HB, Patil HA, Li H (2019) Advances in antispoofing: from the perspective of ASVspoof challenges. APSIPA Trans Signal Inf Process Lai CI, Abad A, Richmond K, Yamagishi J, Dehak N, King S (2019) Attentive filtering networks for audio replay attack detection. In: ICASSP IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp Edinburgh Data Share Brown JC, Puckette MS (1992) An efficient algorithm for the calculation of a constant Q transform. J Acoust Soc Am 92(5): Brown JC (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1): Yang J, Das RK, Li H (2018) Extended constant-q cepstral coefficients for detection of spoofing attacks. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Denoising Convolutional Autoencoders for Noisy Speech Recognition Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University mkayser@stanford.edu Victor Zhong Stanford University vzhong@stanford.edu Abstract We propose the use of

More information

SIGNATURE VERIFICATION

SIGNATURE VERIFICATION SIGNATURE VERIFICATION Dr. H.B.Kekre, Dr. Dhirendra Mishra, Ms. Shilpa Buddhadev, Ms. Bhagyashree Mall, Mr. Gaurav Jangid, Ms. Nikita Lakhotia Computer engineering Department, MPSTME, NMIMS University

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

ANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line

ANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 2016 E-ISSN: 2347-2693 ANN Based Fault Classifier and Fault Locator for Double Circuit

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Discriminative Multimodal Biometric. Authentication Based on Quality Measures

Discriminative Multimodal Biometric. Authentication Based on Quality Measures Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,

More information

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information

More information

FRACTAL RECOGNITION AND PATTERN CLASSIFIER BASED SPAM FILTERING IN EMAIL SERVICE

FRACTAL RECOGNITION AND PATTERN CLASSIFIER BASED SPAM FILTERING IN EMAIL SERVICE FRACTAL RECOGNITION AND PATTERN CLASSIFIER BASED SPAM FILTERING IN EMAIL SERVICE Ms. S.Revathi 1, Mr. T. Prabahar Godwin James 2 1 Post Graduate Student, Department of Computer Applications, Sri Sairam

More information

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Face Recognition in Low-resolution Images by Using Local Zernike Moments Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie

More information

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

More information

Efficient on-line Signature Verification System

Efficient on-line Signature Verification System International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information

More information

Optimization of PID parameters with an improved simplex PSO

Optimization of PID parameters with an improved simplex PSO Li et al. Journal of Inequalities and Applications (2015) 2015:325 DOI 10.1186/s13660-015-0785-2 R E S E A R C H Open Access Optimization of PID parameters with an improved simplex PSO Ji-min Li 1, Yeong-Cheng

More information

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance

More information

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems

More information

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Lecture 6: Classification & Localization. boris. ginzburg@intel.com Lecture 6: Classification & Localization boris. ginzburg@intel.com 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Gender Identification using MFCC for Telephone Applications A Comparative Study

Gender Identification using MFCC for Telephone Applications A Comparative Study Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Ph. D. Student, Eng. Eusebiu Marcu Abstract This paper introduces a new method of combining the

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails

More information

PIXEL-LEVEL IMAGE FUSION USING BROVEY TRANSFORME AND WAVELET TRANSFORM

PIXEL-LEVEL IMAGE FUSION USING BROVEY TRANSFORME AND WAVELET TRANSFORM PIXEL-LEVEL IMAGE FUSION USING BROVEY TRANSFORME AND WAVELET TRANSFORM Rohan Ashok Mandhare 1, Pragati Upadhyay 2,Sudha Gupta 3 ME Student, K.J.SOMIYA College of Engineering, Vidyavihar, Mumbai, Maharashtra,

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks

Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks James Cannady Georgia Tech Information Security Center Georgia Institute of Technology Atlanta, GA 30332-0832 james.cannady@gtri.gatech.edu

More information

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior

More information

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,

More information

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin * Send Orders for Reprints to reprints@benthamscience.ae 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766-771 Open Access Research on Application of Neural Network in Computer Network

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

ARTIFICIAL NEURAL NETWORKS IN THE SCOPE OF OPTICAL PERFORMANCE MONITORING

ARTIFICIAL NEURAL NETWORKS IN THE SCOPE OF OPTICAL PERFORMANCE MONITORING 1 th Portuguese Conference on Automatic Control 16-18 July 212 CONTROLO 212 Funchal, Portugal ARTIFICIAL NEURAL NETWORKS IN THE SCOPE OF OPTICAL PERFORMANCE MONITORING Vítor Ribeiro,?? Mário Lima, António

More information

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 3 Object detection

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,

More information

Application of Neural Network in User Authentication for Smart Home System

Application of Neural Network in User Authentication for Smart Home System Application of Neural Network in User Authentication for Smart Home System A. Joseph, D.B.L. Bong, D.A.A. Mat Abstract Security has been an important issue and concern in the smart home systems. Smart

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Multimodal Biometric Recognition Security System

Multimodal Biometric Recognition Security System Multimodal Biometric Recognition Security System Anju.M.I, G.Sheeba, G.Sivakami, Monica.J, Savithri.M Department of ECE, New Prince Shri Bhavani College of Engg. & Tech., Chennai, India ABSTRACT: Security

More information

The Development of a Pressure-based Typing Biometrics User Authentication System

The Development of a Pressure-based Typing Biometrics User Authentication System The Development of a Pressure-based Typing Biometrics User Authentication System Chen Change Loy Adv. Informatics Research Group MIMOS Berhad by Assoc. Prof. Dr. Chee Peng Lim Associate Professor Sch.

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Accurate and robust image superresolution by neural processing of local image representations

Accurate and robust image superresolution by neural processing of local image representations Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica

More information

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,

More information

A secure face tracking system

A secure face tracking system International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048 An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048 Hong Gui, Tinghan Wei, Ching-Bo Huang, I-Chen Wu 1 1 Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep Engineering, 23, 5, 88-92 doi:.4236/eng.23.55b8 Published Online May 23 (http://www.scirp.org/journal/eng) Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep JeeEun

More information

Analysis/resynthesis with the short time Fourier transform

Analysis/resynthesis with the short time Fourier transform Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID

Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID R.Gokulavanan Assistant Professor, Department of Information Technology, Nandha Engineering College, Erode, Tamil Nadu,

More information

PRODUCT INFORMATION. Insight+ Uses and Features

PRODUCT INFORMATION. Insight+ Uses and Features PRODUCT INFORMATION Insight+ Traditionally, CAE NVH data and results have been presented as plots, graphs and numbers. But, noise and vibration must be experienced to fully comprehend its effects on vehicle

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014 Efficient Attendance Management System Using Face Detection and Recognition Arun.A.V, Bhatath.S, Chethan.N, Manmohan.C.M, Hamsaveni M Department of Computer Science and Engineering, Vidya Vardhaka College

More information

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic

More information

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Integrity Preservation and Privacy Protection for Digital Medical Images M.Krishna Rani Dr.S.Bhargavi IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Abstract- In medical treatments, the integrity

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Index Terms Domain name, Firewall, Packet, Phishing, URL. BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet

More information

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Appendices. Appendix A: List of Publications

Appendices. Appendix A: List of Publications Appendices Appendix A: List of Publications The following papers highlight the findings of this research. These articles were published in reputed journals during the course of this research program. 1.

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Deep Neural Network Approaches to Speaker and Language Recognition

Deep Neural Network Approaches to Speaker and Language Recognition IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 10, OCTOBER 2015 1671 Deep Neural Network Approaches to Speaker and Language Recognition Fred Richardson, Senior Member, IEEE, Douglas Reynolds, Fellow, IEEE,

More information

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

A Simple Feature Extraction Technique of a Pattern By Hopfield Network A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani

More information

An Introduction to Neural Networks

An Introduction to Neural Networks An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,

More information

Neural Network based Vehicle Classification for Intelligent Traffic Control

Neural Network based Vehicle Classification for Intelligent Traffic Control Neural Network based Vehicle Classification for Intelligent Traffic Control Saeid Fazli 1, Shahram Mohammadi 2, Morteza Rahmani 3 1,2,3 Electrical Engineering Department, Zanjan University, Zanjan, IRAN

More information

Open-Set Face Recognition-based Visitor Interface System

Open-Set Face Recognition-based Visitor Interface System Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY 4 4th International Workshop on Acoustic Signal Enhancement (IWAENC) TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY Takuya Toyoda, Nobutaka Ono,3, Shigeki Miyabe, Takeshi Yamada, Shoji Makino University

More information