Improving Artificial Neural Network Estimates of Posterior Probabilities of Speech Sounds. Doctoral Thesis Proposal

Size: px
Start display at page:

Download "Improving Artificial Neural Network Estimates of Posterior Probabilities of Speech Sounds. Doctoral Thesis Proposal"

Transcription

1 Improving Artificial Neural Network Estimates of Posterior Probabilities of Speech Sounds Doctoral Thesis Proposal Samuel Thomas Department of Electrical and Computer Engineering Johns Hopkins University Hynek Hermansky (Advisor) Aren Jansen Mounya Elhilali October 14, 21 1

2 Abstract Speech contains information of at least three sources - the message that is being communicated, the speaker who is communicating and the environment. In this work we propose several approaches to improve the recognition of speech sounds that convey information about the message. We use phonemes which occur at the rate of few tens of milliseconds in the speech signal as basic units. Improvements in recognizing these units result in considerable performance gains in applications like automatic speech recognition (ASR) where the goal is to transcribe the message into text and automatic speaker verification that uses information in the speaker component to verify the the speaker s claimed identity. We propose several approaches to improve phoneme posterior estimates from artificial neural networks. These include combination of information for multiple acoustic streams and different neural network training architectures. For speech recognition, especially in lowresource scenarios where the amount training data is limited (for example 1 hour of training), features extracted from better phoneme posteriors using the proposed techniques provide significant word recognition improvements. For speaker recognition these posteriors are used in a recently proposed neural network architecture to give considerable improvements over the earlier neural network based approaches. In future we would like to investigate the enhancement of phoneme posteriors for these applications in noisy environments. In a multistream speech recognition framework, we propose to use statistics derived from phoneme posteriors to determine the reliability of individual streams and how the streams can be effectively combined to derive better posteriors. We would also like to explore how these posteriors can be used for reliable voice activity detection in these environments. 2

3 Contents 1 Introduction Overview of Speech and Speaker Recognition Systems Main Contributions Deriving Phoneme Posteriors From the Acoustic Signal to Features Spectral Envelope Features Modulation Features Estimating Phoneme Posteriors Phoneme Posteriors for Speech Recognition Posterior Features for Low-resource Languages Mapping Languages with a Common Phone set Training MLPs with Language Specific Output Layers Enhancing Acoustic Features with Out-of-language Posteriors Phoneme Posteriors for Speaker Verification Traditional Approaches to Speaker Verification Improvements to Neural Networks for Speaker Verification Conclusions and Future Directions Conclusions Future Directions Robust Estimation of Posteriors using Multistream Processing Posteriors for Voice Activity Detection

4 1 Introduction 1.1 Overview of Speech and Speaker Recognition Systems Acoustic models play an important role in both automatic speech recognition (ASR) and speaker verification tasks. Traditionally, generative models for example Gaussian mixture models, have been used to model the underlying distribution of basic acoustic units in speech. In ASR, Gaussian mixture models (GMMs) are used with hidden Markov models (HMMs) along with separate modules that model the language and pronunciation, to decode the message [1]. For speaker verification, GMMs are first used to train a universal background model (UBM) that captures the general acoustic space of all speakers [2]. The UBM which is trained on large amounts of data is is then adapted for each enrolled speaker. During test, acoustic evidence in the form of scores from both the UBM and the claimed speaker are compared to verify if the claim is true. In advanced speaker verification systems, factor analysis techniques are used with super vectors formed from GMMs to model speakers [3]. More recently, artificial neural networks are being used as acoustic models for these applications. For speech recognition, discriminately trained multi-layer perceptrons (MLPs) are used to generate acoustic evidence in the form of posterior probabilities of basic speech units like phonemes. These posteriors are used directly in hybrid HMM-ANN systems [4] or converted to features in the Tandem approach for ASR [5]. Apart from being discriminatively trained, MLPs can derive posteriors from high dimensional features without placing any assumptions about the parametric distributions or statistical independence of the features. Another class of neural networks - auto-associative neural networks (AANNs) have been recently proposed as alternative acoustic models instead GMMs for speaker verification [6]. An AANN is a feed-forward neural network trained to reconstruct its input at its output through a hidden compression layer. Similar to MLPs, AANNs also have several advantages compared to the GMMs when used to model the acoustic space - they relax the assumption of the distributions of feature vectors and can capture higher order moments. In [7] this neural network approach has been extended to use phoneme posteriors to provide additional evidence of phonetic classes to better model the acoustic space. 1.2 Main Contributions In this work we focus on improving speech recognition and verification systems using phoneme posteriors derived from MLPs. This is done by first improving phoneme posteriors at various levels and then integrating the improved posteriors. The process is carried out at multiple levels. A. Combining evidences from multiple acoustic feature streams Phoneme posteriors are derived from short-term spectral envelope and long-term modulation frequency features. These features are derived from sub-band temporal envelopes of speech estimated using Frequency Domain Linear Prediction (FDLP) [8]. While spectral envelope features are obtained by the short-term integration of the sub-band envelopes, the modulation frequency components are derived in long windows from the sub-band envelopes [9]. These features are combined at the phoneme posterior level (Section 2). 4

5 Speech Feature Extraction Multiple feature representations Short term spectral features, Long term modulation features A Multiple configurations of MLP in hierarchy or parallel Posterior Estimation using MLPs Posterior Enhancement Improving phoneme posteriors by combining posteriors from different feature streams, different domains and languages B Applications of phoneme posteriors Improved Phoneme Posteriors Parameters for HMM ANN based Phoneme Recognition Features for HMM GMM based LVCSR system Parameters for AANN based Speaker Verification Reliability scores for Multistream Speech Recognition C Figure 1: Applications of phoneme posteriors for speech and speaker recognition B. Different MLP training architectures and schemes We investigate approaches for large vocabulary continuous speech recognition (LVCSR) system for new languages or new domains using limited amounts of transcribed training data. In these low resource conditions, the performance of conventional LVCSR systems degrade significantly. We propose to train low resource LVCSR system with additional sources of information like annotated data from other languages (German and Spanish) and various acoustic feature streams (short-term and modulation features). We train multilayer perceptrons (MLPs) in different configurations on these sources of information for low resource LVCSR (Section 3). C. Integration of posteriors with speech recognition and verification systems For speech recognition systems the improved phoneme posteriors are converted back to features using the Tandem approach. These posteriors are used directly with a mixture of AANNs for speaker verification. The mixture consists of several AANNs connected using posterior probabilities of various broad phoneme classes. Since neural networks are not density models, these posteriors are obtained from a separate MLP classifier trained to estimate posterior probabilities of phoneme classes (Section 4). For multistream speech recognition, phoneme posterior probabilities are estimated from separate MLPs trained on different spectotemporal modulations of speech. Statistics derived from phoneme posteriors are then used to determine the reliability of individual streams (Section 5). Fig. 1 is a schematic of the proposed approach for using MLP based phoneme posteriors for speech and speaker recognition. 5

6 2 Deriving Phoneme Posteriors 2.1 From the Acoustic Signal to Features To extract acoustic features features we first analyze speech signals in frequency sub-bands over long temporal segments of the signal. This is done by estimating temporal envelopes in frequency sub-bands using the dual of the conventional time domain linear prediction (TDLP). In the same way as the TDLP fits an all pole model to the power spectrum of the signal, frequency domain linear prediction (FDLP) technique fits an all pole model to the squared Hilbert envelope. These representations of the speech signal are able to capture fine temporal events associated with transient events like stop bursts while at the same time summarize the signals gross temporal evolution in timescales of several hundred milliseconds. For phoneme recognition, the FDLP technique is implemented in several parts - first, the discrete cosine transform (DCT) is applied on long segments of speech to obtain a real valued spectral representation of the signal. Then, linear prediction is performed on the DCT coefficients to obtain a parametric model of the temporal envelope. After sub-band temporal envelopes are estimated using FDLP, these envelopes are converted into spectral envelope and modulation frequency features Spectral Envelope Features The Hilbert envelope, which is the squared magnitude of the analytic signal, represents the instantaneous energy of a signal in the time domain. Since integration of signal energy is identical in time and frequency domain, the sub-band Hilbert envelopes can equivalently be used for obtaining the sub-band energy based short-term spectral envelope features. This is achieved by integrating the sub band temporal envelopes in short term frames (of the order of 25 ms with a shift of 10 ms). These short term sub-band energies are then converted into 13 cepstral features along with their first and second derivatives Modulation Features The long-term sub-band envelopes from the FDLP form a compact representation of the temporal dynamics over long regions of the speech signal. The sub-band temporal envelopes are compressed using a static compression scheme - the logarithmic function and a dynamic compression scheme. The compressed temporal envelopes are divided into 200 ms segments with a shift of 10 ms. Discrete Cosine Transform (DCT) of both the static and the dynamic segments of temporal envelope yields the static and the dynamic modulation spectrum respectively. We use 14 modulation frequency components from each cosine transform, yielding modulation spectrum in the 0-70 Hz region with a resolution of 5 Hz [10]. 2.2 Estimating Phoneme Posteriors Once these FDLP based short-term spectral features and long-term modulation features have been extracted, they are used to train a phoneme posterior probability estimator. In our case we use a multilayer perceptron (MLP) to estimate phoneme posteriors. A three layered MLP is used 6

7 Static compression Modulation features (FDLPM) Speech FDLP Statically compressed sub bands envelopes Adaptive compression Posterior probability estimator Posterior probability merger Improved Phoneme Posteriors Adaptively compressed sub bands envelopes frequency Sub bands envelopes time Spectral features (FDLPS) Posterior probability estimator (FDLPS + FDLPM) Figure 2: Schematic of the estimating posterior vectors. The final posterior probabilities are derived by combining posteriors from two different representations. FDLPS FDLPM FDLPS + FDLPM Phoneme Accuracy Table 1: Phoneme recognition accuracies (%) on TIMIT. to estimate the phoneme posterior probabilities. The network is trained using the standard back propagation algorithm with cross entropy error criteria. The learning rate and stopping criterion are controlled by the error in the frame-based phoneme classification on the cross validation data. For our phoneme recognition experiments we use MLPs along with the FDLP based features. Each frame is appended with neighboring 8 frames. The static and adaptive modulation features for each sub-band are stacked together to obtain modulation features for each sub-band and used as features. Since the output of each MLP is a posterior vector for each frame, the posteriors can be combined using different probability combination rules. We combine the posteriors using the Dempster Shafer (DS) theory of evidence [12] to form a joint posterior feature set. This combination technique weights each stream using an entropy based reliability measure. Fig. 2 shows the schematic of the proposed feature extraction technique for estimating phoneme posteriors. Phoneme recognition experiments are conducted on the TIMIT database. The phoneme recognition system in our experiments is based on a hybrid HMM/MLP approach, where the posterior probability estimates of various phonemes are converted to the scaled likelihoods to model the HMM states. In these experiments, posterior probabilities are estimated in a hierarchical manner [11]. Table 1 shows the phoneme recognition accuracies that we obtain using improved posteriors. 7

8 3 Phoneme Posteriors for Speech Recognition For speech recognition, these improved posteriors are converted back to features using the Tandem approach. The phoneme posteriors are first gaussianized by using the log function and then decorrelated using the Karhunen-Loeve Transform (KLT) [5]. This reduces the dimensionality of the feature vectors by retaining only the feature components which contribute most to the variance of the data. We use 25 dimensional features in our Tandem representations similar to [13]. The proposed features are compared with three other feature extraction techniques - PLP [14] features with a 9 frame context [15] which are similar to spectral envelope features derived using FDLP (FDLP-S), M-RASTA features [16] and Modulation SpectroGram (MSG) features [17] with a 9 frame context, which are both similar to modulation frequency features (FDLP-M). We combine FDLP-S features with FDLP-M features using the DS theory of evidence to obtain a joint spectro-temporal feature set (FDLP-S+FDLP-M). Similarly, we derive two more feature sets by combining PLP features with M-RASTA features (PLP+M-RASTA) and MSG features (PLP+MSG). 25 dimensional Tandem representations of these features are used for our experiments. We also experiment with 39 dimensional PLP features without any Tandem processing (PLP-D). Features TOT AMI CMU ICSI NIST VT PLP-D PLP FDLP-S M-RASTA MSG FDLP-M PLP+M-RASTA PLP+MSG FDLP-S+FDLP-M Table 2: Word Error Rates (%) on RT05 Meeting data, for different feature extraction techniques. TOT - total WER(%) for all test sets, AMI, CMU, ICSI, NIST, VT - WER (%) on individual test sets [18] We use these features on an LVCSR task using the AMI LVCSR system for meeting transcription [18]. The training data for this system uses individual headset microphone (IHM) data from four meeting corpora; NIST (13 hours), ISL (10 hours), ICSI (73 hours) and a preliminary part of the AMI corpus (16 hours). MLPs are trained on the whole training set in order to obtain estimates of phoneme posteriors for each of the feature sets. Acoustic models are phonetically state tied triphone models trained using standard HTK maximum likelihood training procedures. The recognition experiments are conducted on the NIST RT05 [19] evaluation data. Juicer large vocabulary decoder is used for recognition with a pruned trigram language model [20]. This is used along with reference speech segments provided by NIST for decoding and the pronunciation dictionary used in AMI NIST RT05s system [18]. Table 3 shows the results for word recognition accuracies for these techniques on the RT05 meeting corpus. The proposed features (FDLP-S+FDLP-M) obtain a significant relative reduction of about 14 % in WER for the LCVSR task (compared to a relative reduction of 5% for PLP+M-RASTA and PLP+MSG features). 8

9 3.1 Posterior Features for Low-resource Languages An important factor that impacts performance of posterior features for LVCSR is the amount of data used to train the MLP systems. For new languages with only few hours of transcribed data, the performance of these data driven features is low. A potential solution to this problem is to use transcribed data available from other languages to build models which can be shared with the low-resource language. However training such systems requires all the multilingual data to be transcribed using a common phone set across the different languages. This common phone set can be derived either in a data driven fashion or using phonetic sets such as the International Phonetic Alphabet (IPA) [21]. More recently cross-lingual training with Subspace Gaussian Mixture Models (SGMM) [22] have also been proposed for this task. We propose three different approaches to improve posteriors for low-resource languages Mapping Languages with a Common Phone set In the first approach we explore a data driven approach for finding a common phone set across different languages [23]. In this method we initially train a cross-lingual MLP systems on data from multiple languages using an available phone set that covers phonemes from the languages. However this phone set might be different from that of the low resource language for which we need to build the LVCSR system. In order to describe the low resource training data in terms of cross-lingual phone set, we use a count based approach. The accumulated posterior outputs can hence be considered as soft counts corresponding to the presence or absence of different phoneme classes. The first step in this approach is to forward pass the low resource training data (in-language) through the cross lingual MLP to obtain phoneme posteriors. Using these posterior probabilities (described in terms of cross-lingual phone set) and their true labels from the low resource phone set, we estimate the following counts - c(x) - total instances when a particular label x of low resource phone set is present in the input. c(y) - accumulated posterior value for cross-lingual phoneme y. c(x,y) - accumulated posterior value for cross-lingual phoneme y whenxis the true label. With these counts, we now find C(x,y) = c(x,y). For each label x, the more frequently a c(x)c(y) particular label y occurs, higher the value of C(x,y). This measure can hence be used to map a label in cross-lingual phone set with a particular label in low-resource phone set. In our experiments we first train a cross-lingual MLP using German and Spanish data on a phone set of 52 phone set (combined set of phonemes which cover German and Spanish data). One hour of English data (considered as the low-resource language) is forward passed using the cross lingual MLP to obtain phoneme posteriors in terms of 52 cross-lingual phones. The true labels for English data contains 47 English phonemes. Using the mapping technique described above we then determine to which phoneme in the German-Spanish set we can map English phonemes to. Each English phoneme is mapped to the phone which gives the highest score in the German- Spanish set. Once the English data has been mapped, the cross-lingual MLPs are adapted using 1 hour of English data. We adapt the MLP by retraining it using the new data after initializing 9

10 Cross lingual MLP Trained on German and Spanish data Cross lingual MLP adapted using 1 hour of English data Modulation features Low resource MLP Posterior probability merger Tandem processing Features for ASR Spectral envelope features Low resource MLP Cross lingual MLP Cross lingual MLP adapted using 1 hour of English data Trained on German and Spanish data Figure 3: Deriving cross-lingual and multi-stream posterior features for low resource LVCSR systems Baseline PLP features 28.8 Multi-stream Cross-lingual Tandem features 36.5 Table 3: Word Recognition Accuracies (%) using multi-stream cross-lingual posterior features it with its original weights. Fig. 3 is a schematic of the proposed approach. All the data from these experiments are from the LDC Callhome Corpus. We use 30 dimensional Tandem features to train the subsequent single pass HTK based recognizer with 600 tied states and 4 mixtures per state. Table 3 shows the improvements we get by using posterior features over conventional PLP features in a low-resource setting with only 1 hour of data Training MLPs with Language Specific Output Layers In our second approach we propose a different MLP architecture and training method for deriving posteriors for low-resource languages. The primary advantage of this new architecture is that it does not require the multilingual data to be mapped using a common phoneme set across various languages. In the proposed architecture, we train a 4 layer multilayer perceptron. The MLP has a linear input layer with a size corresponding to the dimension of the input feature vector, followed by two non-linear hidden layers and a final linear layer with a size corresponding to the phoneme set of the language the MLP is being trained. Similar to bottleneck MLPs or the HATS approach, while the dimension of first hidden layer is high, the second hidden layer is low dimensional and is known as the bottleneck layer. While training on multiple languages with different phoneme sets, the first 3 layers are shared. The last layer that is specific to the phoneme set of each language 10

11 Input layer with size corresponding to input feature set PLP features Bottleneck layer introduced to allow the network to learn a common low dimensional representation among languages Expansion layer Layers common across languages Intermediate output layer with size corresponding to the phoneset of the high resource language. The network is first trained on the high resource with its phoneset Bottleneck features We derive two kinds of features from the bottleneck and the final layer Posterior features Final output layer trained on the low resource language with its phoneset Figure 4: Block schematic for the proposed MLP training scheme for low-resource languages. Baseline PLP features 28.8 Tandem features derived posterior features using Spanish and German with 1 hour of English 35.8 and 2 feature representations Bottleneck features with the same setup 37.2 Table 4: Word Recognition Accuracies (%) using multi-stream cross-lingual posterior features is then modified. Modifying only this layer allows us to train across different languages. Fig. 4 is a schematic of the proposed architecture for two languages a high-resource language (with several hours of data) and a low-resource language (with only few hours of data) each having different phoneme sets. We derive two kinds of features for LVCSR task from these networks - A. Tandem features - These features are derived from the posteriors estimated by the MLP at the fourth layer. When networks are trained on multiple feature representations, better posterior estimates can be derived by combining the outputs from different system using posterior probability combination rules. Phoneme posteriors are then converted to features by gaussianizing the posteriors using the log function and decorrelating them. A dimensionality reduction is also performed by retaining only the feature components which contribute most to the variance of the data. B. Bottleneck features - Unlike Tandem features, bottleneck features are derived as linear outputs of the neurons from the bottleneck layer. These outputs are used directly as features for LVCSR 11

12 Low resource (English) FDLPM features High resource out of language MLP trained on 200 hrs High resource out of language MLP trained on 200 hrs Posterior Combination Tandem Processing 25D Tandem Features derived from high resource (Spanish) posteriors Low resource (English) PLP features English MLP trained on M hours English Posteriors Figure 5: Low-resource MLP systems trained with acoustic features enhanced with out-oflanguage posteriors from multiple acoustic representations features without applying any transforms. When bottleneck features are derived from multiple feature representations, these features are appended together and a dimensionality reduction is performed using KLT to retain only relevant components. Both of these MLP features are derived using two acoustic feature representations - short-term spectral PLP features and long-term modulation features using frequency domain linear prediction (FDLP-M). Table 4 summarizes the word recognition accuracies for the same LVCSR task described earlier with 2 languages - Spanish and German along with 1 hour of English in a low-resource setting Enhancing Acoustic Features with Out-of-language Posteriors In this approach acoustic features used to train the low-resource MLPs with posteriors are enhanced with posteriors derived from large amounts of out-of-language data. Fig. 5 is a schematic of the proposed approach where posterior features from a separate MLPs trained on large amounts of out-of-language data (200 hours of Spanish) is used to enhance acoustic features used to train low-resource MLPs on fewer amounts of data (English M hours). Spanish MLPs are trained on two feature streams - short-term spectral PLP features and long-term modulation features derived using FDLP (FDLPM). Posterior features from the two acoustic streams (PLP and FDLP-M) are combined at the posterior level. This allows us to obtain more accurate and robust estimates of the out-of-language posteriors for LVCSR. Tandem representations of these features are appended along with 351 dimensional PLP features to train the low-resource English nets as shown in 5. The comparison of LVCSR results for the baseline HMM-GMM setup (CTS data from Call- Home English) and the performance of MLP systems trained with enhanced posteriors from outof-language MLPs are shown in Fig. 6. The plot summarizes the effect of enhanced posterior features as a function of the equivalent amount of additional in-language training data. The dotted lines indicate the correspondence of enhanced posteriors with respect to an equivalent performance of the baseline system using conventional PLP features with higher amounts of in-language data. With the enhancement of out-of-language posteriors on 1 hour of in-language data, we obtain an 12

13 Word Recognition Accuracy (%) In language data (English) In language data (English) enhanced with out of language posteriors (Spanish) Equivalent in language performance Amount of in language data (hours) Figure 6: Word accuracy improvements for low-resource LVCSR systems with out-of-language posteriors LVCSR performance equivalent to 4 hours of in-language data, an increase of 300% in the amount of in-language training data. However, the improvements using higher amounts of in-language training data are subsequently lower (for example, starting with 5 hours of in-language data, the improvements using out-of-language posteriors is equivalent to 8 hours of in-language training which is an additional increase of 60% on the original 5 hours). 4 Phoneme Posteriors for Speaker Verification 4.1 Traditional Approaches to Speaker Verification The goal of speaker verification is to verify the truth of a speaker s claimed identity. Majority of current speaker verification systems model overall acoustic feature vector space using a Universal Background Model (UBM), trained on large amounts of data from multiple speakers. In the enrollment phase, the UBM is adapted to model each enrollment speaker using limited amount of speech from the speaker. During test, the likelihood of the test speaker from both the UBM and the claimed speaker model are derived. If the claimed identity of the speaker is true, the likelihood from the claimed speaker model is assumed to be more than the likelihood of the utterance using the UBM and vice versa if false. Likelihood ratio of the adapted speaker model and the UBM is hence commonly used as an indication of the target speaker. Both UBM and the speaker-specific models are typically multivariate single-state GMMs with large number of mixture components. The GMM assumes that the data is composed by normally distributed clusters, each cluster representing a group of speech sounds. During the adaptation, the components that represent sounds present in the adaptation data are well adapted, the other components remain close to as they were in the UBM. The UBM-GMM method has proved to be successful and remains in use for the past two decades. 13

14 Acoustic Features Broad Class Posteriors A1 A2 UBM Model A3 A4 A5 UBM average reconstruction error Test Utterance Acoustic Features MLP Speaker Model Decision Logic Decision Acoustic Features A1 A2 A3 A4 A5 Speaker average reconstruction error Component AANNs Figure 7: Block schematic of the proposed AANN based speaker verification system 4.2 Improvements to Neural Networks for Speaker Verification A more recently proposed alternative for modeling the data distribution is the Auto-Associative Neural Network (AANN). AANNs are feed-forward neural network with equal number of input and output nodes, trained to learn an identity mapping from the input to the output layer with a restricted number of nodes at its hidden compression layer. In [6], these networks have been used instead of GMMs for speaker verification. However, the performance of AANN speaker verification systems so far do not meet the performance of the GMM based systems. We attribute this to the relatively unconstrained way in which AANNs are adapted to target speakers. We propose the following improvements to train these models better - Forming several independent class-specific AANNs as a UBM. The composite UBM-AANN is trained additionally using side information about the class of sounds present in the data. We use estimates of posterior probabilities of 5 broad phoneme classes (vowels, fricatives, plosives, nasals and silence) from a multilayer perceptron (MLP) as side information. Training separate AANNs on different channel conditions - microphone and telephone. Adapt parameters of the each class-specific AANN only after the compression layer instead of retraining the entire network, since the adaptation data is limited. The performance of the proposed modeling technique is evaluated on a decimated set of the 8 core conditions of the NIST 2008 speaker recognition evaluations. We train gender specific UBM- AANNs for both the microphone and telephone conditions. FDLP features described earlier are used to train these models. For each speaker in the enroll set we adapt the UBM-AANN to create a speaker specific AANN. The difference between average reconstruction error of both the UBM and the claimed model is used a score for each test speaker. The final recognition performance is then computed by finding the EER on these scores. 14

15 Cond. Task 1. Interview speech in training and test. 2. Interview speech from the same microphone type in training and test. 3. Interview speech from different microphones types in training and test. 4. Interview training speech and telephone test speech. 5. Telephone training speech and noninterview microphone test speech. 6. Telephone speech in training and test from multiple languages. 7. English language telephone speech in training and test. 8. English language telephone speech spoken by a native U.S. English speaker in training and test. Table 5: Core evaluation conditions in NIST 2008 SRE task. System Cond. 1 Cond. 2 Cond. 3 Cond. 4 Cond. 5 Cond. 6 Cond. 7 Cond. 8 GMM AANN Combined Table 6: Performance of various features in terms of min DCF ( 10 3 ). In order to train the composite UBMs and create speaker specific models, posteriors from MLPs trained on large amounts of conversational telephone and microphone speech are used. We use the proposed features to train these MLPs. Phoneme posteriors obtained at the outputs of these networks are combined appropriately to obtain 5 broad phonetic class posteriors corresponding to vowels, fricatives, plosives, nasals and silence. Table 6 below shows the NIST detection cost function (DCF) scores for all the 8 conditions (Table 5) using both conventional GMM based systems and the proposed AANN system. A simple weighted combination of scores from both the systems improves the performance still further by minimizing the DCF. More recently, factor analysis of GMMs has been used as a front-end for extracting a lower dimensional representations that capture both speaker and channel variabilities of mean supervectors known as i-vectors. In a simple i-vector system, cosine distance between test and enrollment i-vectors is used as a score. Similarly, we model the adaptation parameters (last layer weights) of mixture of AANNs in a lower dimensional subspace that captures both speaker and channel variabilities. The learning of subspace is formulated as a regularized weighted least squares problem. Posterior probabilities play a significant role by serving as soft counts in determining the number of points that align with each component of the composite AANN in this formulation [7]. The results using the proposed factor analysis are summarized in Table 7 for conditions 6,7,8 of NIST We use the same UBMs described above. Gender specific 300 dimensional subspaces are trained for mixture of AANNs. A 400 dimensional total variability space of GMMs is also trained as a baseline. For both the approaches, cosine distance between test and enrollment i- vectors is used as a score [24]. The proposed subspace approach improves the basic mixture of AANNs system (see Table 6) and combines well with the state-of-the-art GMM i-vector system yielding 10% relative improvement in DCF. 15

16 System Cond. 6 Cond. 7 Cond. 8 Mixture of AANNs (300 dim. i-vectors) GMM (400 dim. i-vectors) Score combination ( ) Table 7: Performance in terms of Min DCF ( 10 3 ) using subspace approaches. 5 Conclusions and Future Directions 5.1 Conclusions We have presented novel methods for improving phoneme posteriors - by combining posteriors from different streams and training methods for MLPs. We have applied the improved posteriors for a variety of tasks in speech and speaker recognition. The results show the usefulness of the proposed techniques for these applications. In future we propose to extend the work in noisy environments. 5.2 Future Directions Robust Estimation of Posteriors using Multistream Processing In the multistream recognition paradigm for processing of corrupted signals, various representations of the signal from different frequency bands of the spectrum are processed and classified in separate processing channels [25]. This is done to adaptively alleviate the corrupted channels while preserving the uncorrupted channels for further processing. We pursue this approach by deriving several streams from the power spectrum of speech using a bank of 2D Gabor filters tuned to different spectral (scale) and temporal (rate) modulations [26]. We train MLPs on each of these streams and derive statistics from the estimated posteriors by computing its autocorrelation matrix. The diagonal elements of this autocorrelation matrix reflect the occurrence frequency of each phoneme and the off-diagonal values correspond to the coactivation of different phoneme posteriors. Autocorrelation does not tell anything about whether the posterior estimates were correct, it merely reflects the first (diagonal) and second order (offdiagonal) statistics of the estimated posteriograms. However, the off diagonal elements reflect confusions among phoneme classes because an ideal posteriogram has only one phoneme active at each time instance. The autocorrelation matrix computed from posteriograms of undistorted speech also summarizes the behavior of each stream in the clean condition. Any additional distortion of the posteriogram due to any factor results in the change of these statistics. Thus, computing a measure of similarity between the autocorrelation matrices derived from the clean signal and from the signal corrupted by any means indicates the degradation of the stream due to the distortion. Using the Pearsons correlation as a measure of similarity for our initial experiments seems to effective predictor of streams recognition accuracy in both clean and distorted cases [27]. We propose to investigate how posteriors from each streams can be combined based on these reliability measures to obtain better estimates in noise. 16

17 5.2.2 Posteriors for Voice Activity Detection In most speech processing systems, the first step in dealing with a speech signal is the reliable detection of speech activity. We propose to explore the use of MLP posteriors for voice activity detection. For VAD, MLP phoneme posteriors corresponding to speech classes can be merged to give a two class posterior probability vector with speech/non-speech probabilities. These probabilities can then be hard thresholded to speech/non-speech decisions. A Viterbi decoder can be further used to smooth the decisions. However, this VAD decoder performs well only under matched conditions of training and test. In order to improve the applicability of the MLP based VAD for mis-matched scenarios, there is a need to develop robust phoneme posterior estimation techniques (especially in noisy and low-resource settings). By deriving improved posteriors using some of the approaches described above, we plan to improve VAD performance for various tasks like speech recognition and speaker verification References [1] F. Jelinek, Statistical Methods for Speech Recognition, The MIT Press, [2] D. Reynolds, T. Quatieri and R. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, [3] P. Kenny, G. Boulianne, P. Oullet and P. Dumouchel, Joint factor analysis versus eigenchannes in speaker recognition, IEEE Transactions on Audio, Speech, and Language Processing, [4] H. Bourlard and N. Morgan, Connectionist Speech Recognition: A Hybrid Approach, Springer, [5] H. Hermansky, D.P.W. Ellis and S. Sharma, Tandem connectionist feature extraction for conventional HMM systems, in IEEE ICASSP, pp , [6] B. Yegnanarayana and S. Kishore, AANN: an alternative to GMM for pattern recognition, Neural Networks, [7] G.S.V.S. Sivaram, S. Thomas and H. Hermansky, Mixture of Auto-Associative Neural Networks for Speaker Verification, in ISCA Interspeech, 21. [8] M. Athineos and D.P.W. Ellis, Frequency-domain linear prediction for temporal features, in IEEE ASRU, [9] S. Thomas, S. Ganapathy and H. Hermansky, Phoneme Recognition Using Spectral Envelope and Modulation Frequency Features, in IEEE ICASSP, [10] S. Ganapathy, S. Thomas and H. Hermansky, Modulation Frequency Features For Phoneme Recognition In Noisy Speech, JASA - Express Letters,

18 [11] J. Pinto, G.S.V.S. Sivaram, M. Magimai-Doss, H. Hermansky, and H. Bourlard, Analyzing MLP Based Hierarchical Phoneme Posterior Probability Estimator, IEEE Transactions on Audio, Speech, and Language Processing, 21. [12] F. Valente and H. Hermansky, Combination of Acoustic Classifiers based on Dempster- Shafer Theory of Evidence, in IEEE ICASSP, [13] Q. Zhu, B. Chen, N. Morgan and A. Stolcke, On using MLP features in LVCSR, in ISCA Interspeech, [14] H. Hermansky, Perceptual Linear Predictive (PLP) Analysis of Speech, JASA, [15] J. Pinto, B. Yegnanarayana, H. Hermansky, and M.M. Doss, Exploiting contextual information for improved phoneme recognition, in ISCA Interspeech, [16] H. Hermansky and P. Fousek, Multi-resolution RASTA filtering for TANDEM-based ASR, in ISCA Interspeech, [17] B. Kingsbury, Perceptually-inspired signal processing strategies for robust speech recognition in reverberant environments, Ph.D. thesis, University of California Berkeley, [18] T. Hain et.al., The 2005 AMI system for the transcription of speech in meetings, NIST RT05 Workshop, [19] The NIST Rich Transcription Spring 2005 Evaluation, Online Web Link: [20] D. Moore et.al., Juicer: A weighted finite state transducer speech coder, Lecture Notes in Computer Science, [21] H. Lin, L. Deng, D. Yu, Y. Gong, A. Acero, and C. Lee, A study on Multilingual Acoustic Modeling for Large Vocabulary ASR, in IEEE ICASSP, [22] L. Burget et. al., Multilingual Acoustic Modeling for Speech Recognition based on Subspace Gaussian Mixture Models, in IEEE ICASSP, 20. [23] S. Thomas, S. Ganapathy and H. Hermansky, Cross-lingual and Multi-stream Posterior Features for Low-resource LVCSR Systems, in ISCA Interspeech, 20. [24] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, 20. [25] H. Hermansky, S. Timberwala, and M. Pavel, Towards ASR on partially corrupted speech, in IEEE ICSLP, [26] T. Chi, P. Ru, and S.A. Shamma, Multiresolution spectrotemporal analysis of complex sounds, JASA, [27] N. Mesgarani, S. Thomas and H. Hermansky, Toward Optimizing Stream Fusion in Multistream Recognition of Speech, JASA - Express Letters,

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Deep Neural Network Approaches to Speaker and Language Recognition

Deep Neural Network Approaches to Speaker and Language Recognition IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 10, OCTOBER 2015 1671 Deep Neural Network Approaches to Speaker and Language Recognition Fred Richardson, Senior Member, IEEE, Douglas Reynolds, Fellow, IEEE,

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION

LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION Ivan Himawan 1, Petr Motlicek 1, David Imseng 1, Blaise Potard 1, Namhoon Kim 2, Jaewon

More information

Online Diarization of Telephone Conversations

Online Diarization of Telephone Conversations Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

ABC System description for NIST SRE 2010

ABC System description for NIST SRE 2010 ABC System description for NIST SRE 2010 May 6, 2010 1 Introduction The ABC submission is a collaboration between: Agnitio Labs, South Africa Brno University of Technology, Czech Republic CRIM, Canada

More information

Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data

Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data Alberto Abad, Hugo Meinedo, and João Neto L 2 F - Spoken Language Systems Lab INESC-ID / IST, Lisboa, Portugal {Alberto.Abad,Hugo.Meinedo,Joao.Neto}@l2f.inesc-id.pt

More information

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM Simon Wiesler 1, Kazuki Irie 2,, Zoltán Tüske 1, Ralf Schlüter 1, Hermann Ney 1,2 1 Human Language Technology and Pattern Recognition, Computer Science Department,

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

TED-LIUM: an Automatic Speech Recognition dedicated corpus

TED-LIUM: an Automatic Speech Recognition dedicated corpus TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA

More information

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3 Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

KL-DIVERGENCE REGULARIZED DEEP NEURAL NETWORK ADAPTATION FOR IMPROVED LARGE VOCABULARY SPEECH RECOGNITION

KL-DIVERGENCE REGULARIZED DEEP NEURAL NETWORK ADAPTATION FOR IMPROVED LARGE VOCABULARY SPEECH RECOGNITION KL-DIVERGENCE REGULARIZED DEEP NEURAL NETWORK ADAPTATION FOR IMPROVED LARGE VOCABULARY SPEECH RECOGNITION Dong Yu 1, Kaisheng Yao 2, Hang Su 3,4, Gang Li 3, Frank Seide 3 1 Microsoft Research, Redmond,

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 Data balancing for efficient training of Hybrid ANN/HMM Automatic Speech Recognition systems Ana Isabel García-Moral,

More information

Recent advances in Digital Music Processing and Indexing

Recent advances in Digital Music Processing and Indexing Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. www.phonexia.com, 1/41

SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. www.phonexia.com, 1/41 SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY www.phonexia.com, 1/41 OVERVIEW How to move speech technology from research labs to the market? What are the current challenges is speech recognition

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time

More information

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Denoising Convolutional Autoencoders for Noisy Speech Recognition Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University mkayser@stanford.edu Victor Zhong Stanford University vzhong@stanford.edu Abstract We propose the use of

More information

CHAPTER 2 LITERATURE REVIEW

CHAPTER 2 LITERATURE REVIEW 11 CHAPTER 2 LITERATURE REVIEW 2.1 INTRODUCTION Image compression is mainly used to reduce storage space, transmission time and bandwidth requirements. In the subsequent sections of this chapter, general

More information

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification (Revision 1.0, May 2012) General VCP information Voice Communication

More information

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

On sequence kernels for SVM classification of sets of vectors: application to speaker verification

On sequence kernels for SVM classification of sets of vectors: application to speaker verification On sequence kernels for SVM classification of sets of vectors: application to speaker verification Major part of the Ph.D. work of In collaboration with Jérôme Louradour Francis Bach (ARMINES) within E-TEAM

More information

Speech and Network Marketing Model - A Review

Speech and Network Marketing Model - A Review Jastrzȩbia Góra, 16 th 20 th September 2013 APPLYING DATA MINING CLASSIFICATION TECHNIQUES TO SPEAKER IDENTIFICATION Kinga Sałapa 1,, Agata Trawińska 2 and Irena Roterman-Konieczna 1, 1 Department of Bioinformatics

More information

THE goal of Speaker Diarization is to segment audio

THE goal of Speaker Diarization is to segment audio 1 The ICSI RT-09 Speaker Diarization System Gerald Friedland* Member IEEE, Adam Janin, David Imseng Student Member IEEE, Xavier Anguera Member IEEE, Luke Gottlieb, Marijn Huijbregts, Mary Tai Knox, Oriol

More information

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,

More information

Multisensor Data Fusion and Applications

Multisensor Data Fusion and Applications Multisensor Data Fusion and Applications Pramod K. Varshney Department of Electrical Engineering and Computer Science Syracuse University 121 Link Hall Syracuse, New York 13244 USA E-mail: varshney@syr.edu

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Matthias Paulik and Panchi Panchapagesan Cisco Speech and Language Technology (C-SALT), Cisco Systems, Inc.

More information

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department,

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network. Global Journal of Computer Science and Technology Volume 11 Issue 3 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

Gender Identification using MFCC for Telephone Applications A Comparative Study

Gender Identification using MFCC for Telephone Applications A Comparative Study Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is

More information

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for

More information

How to Improve the Sound Quality of Your Microphone

How to Improve the Sound Quality of Your Microphone An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,

More information

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller

More information

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS B.K. Mohan and S. N. Ladha Centre for Studies in Resources Engineering IIT

More information

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Annotated bibliographies for presentations in MUMT 611, Winter 2006 Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN2300 Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

Comparison Between Multilayer Feedforward Neural Networks and a Radial Basis Function Network to Detect and Locate Leaks in Pipelines Transporting Gas

Comparison Between Multilayer Feedforward Neural Networks and a Radial Basis Function Network to Detect and Locate Leaks in Pipelines Transporting Gas A publication of 1375 CHEMICAL ENGINEERINGTRANSACTIONS VOL. 32, 2013 Chief Editors:SauroPierucci, JiříJ. Klemeš Copyright 2013, AIDIC ServiziS.r.l., ISBN 978-88-95608-23-5; ISSN 1974-9791 The Italian Association

More information

Strategies for Training Large Scale Neural Network Language Models

Strategies for Training Large Scale Neural Network Language Models Strategies for Training Large Scale Neural Network Language Models Tomáš Mikolov #1, Anoop Deoras 2, Daniel Povey 3, Lukáš Burget #4, Jan Honza Černocký #5 # Brno University of Technology, Speech@FIT,

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,

More information

CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES

CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Proceedings of the 2 nd Workshop of the EARSeL SIG on Land Use and Land Cover CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Sebastian Mader

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Automatic Transcription of Conversational Telephone Speech

Automatic Transcription of Conversational Telephone Speech IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1173 Automatic Transcription of Conversational Telephone Speech Thomas Hain, Member, IEEE, Philip C. Woodland, Member, IEEE,

More information

EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE

EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE Uludağ Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, Cilt 18, Sayı 1, 2013 ARAŞTIRMA EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE Cemal HANİLÇİ * Figen ERTAŞ * Abstract:

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

AS indicated by the growing number of participants in

AS indicated by the growing number of participants in 1960 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 7, SEPTEMBER 2007 State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software Benoît

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA

Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA Audio Engineering Society Convention Paper Presented at the 135th Convention 2013 October 17 20 New York, USA This Convention paper was selected based on a submitted abstract and 750-word precis that have

More information

Objective Speech Quality Measures for Internet Telephony

Objective Speech Quality Measures for Internet Telephony Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice

More information

Accurate and robust image superresolution by neural processing of local image representations

Accurate and robust image superresolution by neural processing of local image representations Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information