ACOUSTIC KEYWORD SPOTTING IN SPEECH WITH APPLICATIONS TO DATA MINING

Size: px
Start display at page:

Download "ACOUSTIC KEYWORD SPOTTING IN SPEECH WITH APPLICATIONS TO DATA MINING"

Transcription

1 Speech and Audio Research Laboratory of the SAIVT program Centre for Built Environment and Engineering Research ACOUSTIC KEYWORD SPOTTING IN SPEECH WITH APPLICATIONS TO DATA MINING A. J. Kishan Thambiratnam BE(Electronics)/BInfTech SUBMITTED AS A REQUIREMENT OF THE DEGREE OF DOCTOR OF PHILOSOPHY AT QUEENSLAND UNIVERSITY OF TECHNOLOGY BRISBANE, QUEENSLAND 9 MARCH 2005

2

3 Keywords Keyword Spotting, Wordspotting, Data Mining, Audio Indexing, Keyword Verification, Confidence Scoring, Speech Recognition, Utterance Verification i

4 ii

5 Abstract Keyword Spotting is the task of detecting keywords of interest within continuous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unrestricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have suffered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the field of keyword spotting. The first major contribution is the development of a novel keyword verification method named Cohort Word Verification. This method combines high level linguistic information with cohort-based verification techniques to obtain dramatic improvements in verification performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique augments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains significant improvement in detection rate over lattice-based iii

6 audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple verifier fusion for the task of keyword verification. The reported experiments demonstrate that substantial improvements in verification performance can be obtained through the fusion of multiple keyword verifiers. The research focuses on combinations of speech background model based verifiers and cohort word verifiers. The final major contribution is a comprehensive study of the effects of limited training data for keyword spotting. This study is performed with consideration as to how these effects impact the immediate development and deployment of speech technologies for non-english languages. iv

7 Contents Keywords Abstract List of Tables List of Figures List of Abbreviations Authorship Acknowledgments i iii xiii xvi xxi xxiii xxv 1 Introduction Overview Aims and Objectives Research Scope Thesis Organisation Major Contributions of this Research List of Publications A Review of Keyword Spotting Introduction v

8 2.2 The keyword spotting problem Applications of keyword spotting Keyword monitoring applications Audio document indexing Command controlled devices Dialogue systems The development of keyword spotting Sliding window approaches Non-keyword model approaches Hidden Markov Model approaches Further developments Performance Measures The reference and result sets The hit operator Miss rate False alarm rate False acceptance rate Execution time Figure of Merit Equal Error Rate Receiver Operating Characteristic Curves Detection Error Trade-off Plots Unconstrained vocabulary spotting HMM-based approach Neural Network Approaches Approaches to non-keyword modeling Speech background model Phone models vi

9 2.7.3 Uniform distribution Online garbage model Constrained vocabulary spotting Language model approaches Event spotting Keyword verification A formal definition Combining keyword spotting and verification The problem of short duration keywords Likelihood ratio based approaches Alternate Information Sources Audio Document Indexing Limitations of the Speech-to-Text Transcription approach Reverse dictionary lookup searches Indexed reverse dictionary lookup searches Lattice based searches HMM-based spotting and verification Introduction The confusability circle framework Analysis of non-keyword models All-speech models SBM methods Phone-set methods Target-word-excluding methods Evaluation of keyword spotting techniques Experiment setup vii

10 3.4.2 Results Tuning the phone set non-keyword model Output score thresholding for SBM spotting Performance across keyword length Evaluation sets Results HMM-based keyword verification Evaluation set Evaluation procedure Results Discriminative background model KV System architecture Results Summary and Conclusions Cohort word keyword verification Introduction Foundational concepts Cohort-based scoring The use of language information Overview of the cohort word technique Cohort word set construction The choice of d min and d max Cohort word set downsampling Distance function Classification approach class classification approach viii

11 4.5.2 Hybrid N-class approach Summary of the cohort word algorithm Comparison of classifier approaches Evaluation set Recogniser parameters Cohort word selection Evaluation procedure Results Performance across target keyword length Evaluation set Recogniser parameters Results Analysis of poor 8-phone performance Conclusions Effects of selection parameters Cohort word set downsampling Cohort word selection range MED cost parameters Conclusions Fused cohort word systems Training dataset Neural network architecture Experimental procedure Baseline unfused results Fused SBM-CW experiments Fused CW-CW experiments Comparison of fused and unfused systems Conclusions and Summary ix

12 5 Dynamic Match Lattice Spotting Introduction Motivation Dynamic Match Lattice Spotting method Basic method Optimised Dynamic Match Lattice Search Evaluation of DMLS performance Evaluation set Recogniser parameters Lattice building Query-time processing Baseline systems Evaluation procedure Results Analysis of dynamic match rules System configurations Results Analysis of DMLS algorithm parameters Number of lattice generation tokens Pruning beamwidth Number of lattice traversal tokens MED cost threshold Tuned systems Conclusions Conversational telephone speech experiments Evaluation set Recogniser parameters x

13 5.7.3 Results Non-destructive optimisations Prefix sequence optimisation Early stopping optimisation Combining optimisations Optimised system timings Experimental procedure Results Summary Non-English Spotting Introduction The issue of limited resources The role of keyword spotting Experiment setup Database design Model architectures Evaluation set design Evaluation procedure English and Spanish stage 1 evaluations English and Spanish post keyword verification Indonesian spotting and verification Extrapolating Indonesian performance Summary and Conclusions Summary, Conclusions and Future Work HMM-based Spotting and Verification Conclusions xi

14 7.1.2 Future Work Cohort Word Verification Conclusions Future Work Dynamic Match Lattice Spotting Conclusions Future Work Non-English Spotting Conclusions Final Comments Bibliography 210 A The Levenstein Distance 217 A.1 Introduction A.2 Applications A.3 Algorithm xii

15 List of Tables 3.1 Keyword spotting performance of baseline systems on Switchboard 1 data Effect of target word insertion penalty on PM-KS performance Equal error rates of unnormalised and duration normalised output score thresholding applied to SBM-KS Details of phone-length dependent evaluation sets SBM-KS performance on Switchboard 1 data for different phonelength target words Statistics for keyword verification evaluation sets Equal error rates for SBM-based keyword verification Equal error rates for SBM and MLP-SBM keyword verification Evaluated cohort word selection parameters Performance of selected cohort word KV systems on TIMIT evaluation set. Cohort word systems are qualified with the appropriate cohort word selection parameters using a tag in the format {d min, d max, ψ d, ψ i } Performance of SBM-KV and selected cohort word systems on the SWB1 evaluation sets. Cohort word selection parameters are specified with each system in the format {d min, d max, ψ d, ψ i } xiii

16 4.4 Mean and standard deviation of the number cohort words used in the 3 best performing cohort word KV methods for the SWB1 evaluation set Performance of baseline SBM-KV and best cohort word systems on the SWB1 evaluation sets Performance of the best fused SBM-cohort systems on the SWB1 evaluation sets Performance of the best fused cohort-cohort systems on the SWB1 evaluation sets Correlation analysis of fused EER and individual unfused EER Summary of best performing systems Phone substitution costs for DMLS Baseline keyword spotting results evaluated on TIMIT TIMIT performance when isolating various DP rules Effect of adjusting number of lattice generation tokens Effect of adjusting pruning beamwidth Effect of adjusting number of traversal tokens Effect of adjusting MED cost threshold S max Optimised DMLS configurations evaluated on TIMIT Keyword spotting results on SWB Relative speeds of optimised DMLS systems Performance of a fully optimised DMLS system on Switchboard data Summary of key results Summary of training data sets Codes used to refer to model architectures Summary of evaluation data sets Stage 1 spotting rates for various model sets and database sizes. 191 xiv

17 6.5 Equal error rates after keyword verification for various model sets and training database sizes Stage 1 spotting and stage 2 post verification results for S1I experiments xv

18 xvi

19 List of Figures 2.1 An example of a Receiver Operating Characteristic curve An example of a Detection Error Trade-off plot Recognition grammar for HMM-based keyword spotting Sample recognition grammar for small non-keyword vocabulary keyword spotting System architecture for HMM keyword spotting using a Speech Background Model as the non-keyword model System architecture for HMM keyword spotting using a composite non-keyword model constructed from phone models Constructing a recognition network for constrained vocabulary keyword spotting An optimised constrained vocabulary keyword spotting recognition network (language model probabilities omitted) An event spotting network for detecting occurrences of times [16] Likelihood ratio based keyword occurrence verification with multiple verifier fusion Applying reverse dictionary searches to the detection of the word ACQUIRE in a phone stream Example of indexed reverse dictionary searching for the detection of the word ACQUIRE xvii

20 2.13 Using lattice based searching to locate instances of the word AC- QUIRE within a phone lattice Confusability circle for the target word STOCK Example of the shared subevent confusable acoustic region for the keyword STOCK Incorporating target word insertion penalty into HMM-based keyword spotting DET plots for unnormalised and duration normalised output score thresholding applied to SBM-KS DET plots for duration normalised output score thresholding applied to SBM-KS for keyword length dependent evaluation sets DET plots for different target keyword lengths for SBM-KV on Switchboard 1 evaluation sets System architecture for MLP background model based KV DET plots for SBM and MLP-SBM systems for 4-phone words DET plots for SBM and MLP-SBM systems for 6-phone words DET plots for SBM and MLP-SBM systems for 8-phone words Controlling the degree of CAR region modeling d min and d max tuning A N-class classifier approach to cohort word verification for the keyword w and cohort word set R(w) DET plot for best cohort word and SBM-KV systems on SWB1 4-phone length evaluation set DET plot for best cohort word and SBM-KV systems on SWB1 6-phone length evaluation set Equal error rate versus mean number of cohort words Trends in equal error rate with changes in cohort word set downsampling size xviii

21 4.7 Trends in equal error rate with changes in cohort word selection range for 4-phone length cohort word KV Trends in equal error rate with changes in cohort word selection range for 6-phone length cohort word KV Trends in equal error rate with changes in cohort word selection range for 8-phone length cohort word KV Trends in equal error rate with changes in MED cost parameters Correlation between unfused system performances and fused system performances Boxplot of EERs for all evaluated architectures and phone-lengths Boxplot of log(eers) for all evaluated architectures and phonelengths Segment of phone lattice for an instance of the word STOCK Effect of lattice traversal token parameter Trends in miss rate and FA/kw rate performance for various types of tuning Plot of miss rate versus FA/kw rate for HMM, CLS and DMLS systems evaluated on Switchboard The relationship between cost matrices for subsequences Demonstration of the MED prefix optimisation algorithm Effect of training dataset size on speech recognition [24] Trends in miss rate across training database size Trends in FA/kw rate across training database size DET plot for T16 experiments. 1=T16S3E, 2=T16S2E, 3=T16S1E, 4=T16S2S, 5=T16S1S DET plot for M16 experiments. 1=M16S3E, 2=M16S2E, 3=M16S1E, 4=M16S2S, 5=M16S1S xix

22 6.6 DET plot for M32 experiments. 1=M32S3E, 2=M32S2E, 3=M32S1E, 4=M32S2S, 5=M32S1S Trends in EER across training dataset size DET plot for S2S experiments. 1=T16S2S, 2=M16S2S, 3=M32S2S DET plot for S1I experiments. 1=T16S1I, 2=M16S1I, 3=M32S1I Extrapolations of Indonesian keyword spotting performance using larger sized databases A.1 Example of cost matrix calculated using Levenstein algorithm for transforming deranged to hanged. Cost of substitutions, deletions and insertions all fixed at 1, cost of match fixed at xx

23 List of Abbreviations ADI CAR CLS CMS CW DAR DET DMLS EER FA GMM HMM IRDL KS KV LVCSR MED MLP PLP RDL Audio Document Indexing Confusable Acoustic Region Conventional Lattice-based Spotting Cepstral Mean Subtraction Cohort Word Disparate Acoustic Region Detection Error Trade-off Dynamic Match Lattice Spotting Equal Error Rate False Alarm Gaussian Mixture Model Hidden Markov Model Indexed Reverse Dictionary Lookup Keyword Spotting Keyword Verification Large Vocabulary Continuous Speech Recognition Minimum Edit Distance Multi-Layer Perceptron Perceptual Linear Prediction Reverse Dictionary Lookup xxi

24 ROC Receiver Operating Characteristic SBM Speech Background Model SBM-KS Speech Background Model based Keyword Spotting SBM-KV Speech Background Model based Keyword Verification STT Speech-to-Text Transcription SWB1 Switchboard-1 TAR Target Acoustic Region WSJ1 Wall Street Journal 1 xxii

25 Authorship The work contained in this thesis has not been previously submitted for a degree or diploma at any other higher educational institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made. Signed: Date: xxiii

26 xxiv

27 Acknowledgments Foremost I would like to acknowledge my Lord and Saviour Jesus Christ. It is by His grace that I was given the opportunity and necessary abilities to partake in this research. I would also like to thank my beautiful wife, Melenie, who has been a constant source of support and inspiration. Your words of encouragement have seen me through the more difficult and frustrating times of this work. To my supervisor, Professor Sridha Sridharan, I would like to offer my heartfelt gratitude for your unrelenting support in bringing this research to completion. Your positive words and guidance have been a true blessing. I would also like to offer a special thanks for the friendship of the members of the QUT Speech Research Labs. In particular, I would like to thank Terry Martin, Robbie Vogt, Michael Mason and Brendan Baker for their constructive criticism as well as their constant joviality. Finally, I would like to thank my loving two families for believing in and supporting me during this long venture, and my wonderful dogs for always giving me a reason to smile. Kit Thambiratnam Queensland University of Technology February 2005 xxv

28 xxvi

29 Chapter 1 Introduction 1.1 Overview Keyword Spotting (KS) is the automated task of detecting keywords of interest within continuous speech. This technology has been used in a variety of applications, ranging from telephone call centre systems to covert surveillance applications. Keyword spotting is closely related to the task of speech transcription, but offers many advantages for certain applications. Primarily, keyword spotting is well suited to data-mining tasks that process large amounts of speech. This is because keyword spotting requires significantly less processing power than transcription, and can therefore run at considerably faster speeds. Real-time stream monitoring is one such example where this is required. These applications monitor audio in real-time and flag occurrences of segments of interest, such as news stories related to a specific topic. Clearly, the majority of the stream does not require attention, and therefore a keyword spotting solution that simply detects occurrences of topical keywords will be more efficient than a fully-fledged large vocabulary transcription engine. Keyword spotting is also an excellent technology for audio search applications, 1

30 2 Chapter 1. Introduction such as audio document indexing. In particular, recent developments in KS including lattice-based searching and reverse dictionary lookup methods have made possible the development of unrestricted vocabulary audio document database search engines that can search hours of data in seconds. However, many keyword spotting technologies are encumbered by poor detection performance or slow search speeds. There is a trade-off between accuracy and speed that needs to be managed, and unfortunately to date, many practical keyword spotting applications are forced to sacrifice detection performance to realise the execution speeds required for commercial deployment. One has only to use speech-recognition-enabled telephony services such as telephone banking to conclude that these systems are far from perfect. Nevertheless, keyword spotting is a powerful and relevant technology. Used appropriately, a keyword spotting solution brings with it reduced computational requirements, increased scalability and potentially higher accuracies than a large vocabulary transcription system Aims and Objectives This work specifically examines the application of keyword spotting technologies to two data mining tasks: real-time keyword monitoring and large audio document database indexing. With the ever-increasing amounts of audio and multimedia being generated daily, the ability to extract information from audio streams at high speeds while maintaining good detection rates is paramount. A desirable feature of data mining applications is the support for unrestricted vocabulary keyword queries. However, a significant portion of past keyword spotting research has dealt primarily with restricted vocabulary methods. Although these approaches offer advantages in terms of detection and false alarm performance, they limit the flexibility of queries. As such, this work concerns itself

31 1.1. Overview 3 solely with the study of unrestricted vocabulary keyword spotting techniques. Data throughput is also another major consideration when dealing with large amounts of data. Although the cost of computing is constantly becoming cheaper, it is nevertheless beneficial to run at high speeds. This is particularly true for audio indexing applications, where literally hundreds of hours may need to be interactively searched by a user. Unfortunately many published KS works neglect to consider execution time during experimentation. This research will therefore give considerable attention to the issue of processing speed. The primary objectives of this thesis are as follows: 1. To review and investigate current state-of-the-art keyword spotting techniques that are relevant to the tasks of real-time keyword monitoring and audio document indexing 2. To assess and evaluate the performance of these techniques with regards to crucial performance metrics relevant to the target applications, and as such, identify potential issues that need to be addressed 3. To investigate and develop novel techniques that can be used to improve the performance of keyword spotting techniques for data mining applications 4. To investigate the application of keyword spotting technologies for non- English data mining Research Scope Keyword spotting encompasses a plethora of speech recognition research topics that unfortunately cannot be fully addressed in a single work. As such, the scope of this research was limited to issues that were directly related to the application of keyword spotting to real-time keyword monitoring and audio document indexing. Additionally, the following restrictions and constraints were applied to this

32 4 Chapter 1. Introduction research: 1. Primarily this work concerns itself with the application of HMM-based speech recognition techniques to the keyword spotting task. Alternate statistical modeling approaches, such as neural network techniques, have been proposed and demonstrated to be suitable for keyword spotting. However, it is believed that the HMM-based approach provides a greater degree of flexibility particularly with regards to unrestricted vocabulary tasks, and as such is the modeling architecture of choice for this research. 2. Experiments reported within this work are limited to single keyword detection. Although most practical applications of keyword spotting use multiword detection during a single pass, it is believed that research constrained to single keyword detection offers a number of advantages. Primarily, it allows ease of comparison between results in this thesis and other published works. Additionally, the variability in performance due to different mixtures of words within a multi-word keyword set can be avoided, thereby ensuring greater consistency between experiments. Finally, it is believed that trends in single keyword spotting across methods will easily translate to multi-word keyword spotting tasks, and as such, does not limit the value of this research. 1.2 Thesis Organisation An overview of the organisation of this thesis is given below: Chapter 2 - A Review of Keyword Spotting presents a thorough review of keyword spotting and associated technologies. A formal definition of the keyword spotting problem is given, as well a discussion of its primary applications. This is followed by an overview of the key performance metrics that

33 1.2. Thesis Organisation 5 are relevant to evaluating and understanding keyword spotting methodology. A detailed review of KS literature is then presented covering the topics of unrestricted and restricted spotting techniques, non-keyword modeling architectures, keyword verification and confidence scoring methods, and audio indexing approaches. Chapter 3 - HMM-based Spotting and Verification discusses and evaluates existing HMM-based keyword spotting and verification techniques. Such methods have a strong following within the keyword spotting community. However, to date, there has been little published work that compares the performances of the various approaches. What little that has been published has primarily focused on measuring performance for simplistic domains such as read microphone speech. A number of HMM-based techniques are evaluated in this chapter and the strengths and weaknesses of these methods are discussed. Chapter 4 - Cohort Word Verification proposes a novel keyword verification approach that combines high level linguistic information with cohortbased verification techniques to yield improved performance. A number of experiments are reported on to measure the performance of this method for the conversational telephone speech and read microphone speech domains. The results demonstrate that significant gains can be obtained particularly for the difficult task of short-word keyword verification. In addition, experiments are performed using a fused architecture that combines cohort word verification with traditional background model based verification. Further gains in performance are obtained using this approach. Chapter 5 - Dynamic Match Lattice Spotting proposes a novel audio indexing technique that is presented and evaluated in this chapter. Although existing unrestricted audio indexing methods are capable of very fast search

34 6 Chapter 1. Introduction speeds, they are encumbered by very poor miss rate performance. It is argued here that this poor miss rate is a result of inherent phone recogniser errors that are not accommodated for by these techniques. As such, a new method of lattice-based searching is proposed that incorporates dynamic sequence matching methods to provide robustness against erroneous lattice realisations. The results demonstrate that dramatic gains in performance can be obtained while still maintaining extremely fast search speeds. Chapter 6 - Non-English Spotting studies the application of keyword spotting technologies to non-english languages. In particular, it examines the effects of limited training data on keyword spotting performance. The lack of availability of non-english training data has greatly hindered the development of other speech technologies such as large vocabulary speech transcribers. However, keyword spotting is a significantly more constrained task, and therefore may be less affected by reduced amounts of training data. If so, this may allow the immediate development of speech technologies for non-english languages without the need for the costly task of creating large training databases. Chapter 7 - Summary, Conclusions and Future Work presents the summary and conclusions of this work as well as a discussion of future research directions. 1.3 Major Contributions of this Research This work has generated a number of novel contributions to the field of keyword spotting. These are: 1. The development of the novel Cohort Word Verification technique. This

35 1.4. List of Publications 7 method combines high level linguistic knowledge with cohort-based verification techniques to yield significant improvements particularly for the problematic area of short-word keyword verification. 2. The use of multiple keyword verifier fusion, in particular applied to the combination of cohort word verification with existing HMM-based techniques. It is demonstrated that such fusion techniques allow the strengths of individual verifiers to be combined to yield considerable improvements in verification performance. 3. The development of the novel Dynamic Match Lattice Spotting approach. This technique augments existing lattice-based audio indexing techniques with dynamic sequence matching to improve robustness to erroneous lattice realisation. The resulting algorithm is capable of searching hours of speech using seconds of processor time while maintaining good miss and false alarm rates. 4. A detailed study of the effects of limited training data for keyword spotting, as well as how this impacts the immediate development and deployment of speech technologies for non-english languages. 1.4 List of Publications The research presented in this thesis has resulted in the publication of a number of fully referenced peer reviewed works. 1. K. Thambiratnam and S. Sridharan. Isolated word verification using Cohort Word-level Verification, in Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), (Geneva, Switzerland), 2003

36 8 Chapter 1. Introduction 2. K. Thambiratnam and S. Sridharan. A study on the effects of limited training data for English, Spanish and Indonesian keyword spotting, in Proceedings of the 10th Australian International Conference on Speech Science and Technology (SST), (Sydney, Australia), T. Martin, K. Thambiratnam and S. Sridharan. Target Structured Cross Language Model Refinement, in Proceedings of the 10th Australian International Conference on Speech Science and Technology (SST), (Sydney, Australia), K. Thambiratnam and S. Sridharan, Fusion of cohort-word and speech background model based confidence scores for improved keyword confidence scoring and verification, in Proceedings of the IEEE 3rd International Conference on Sciences of Electronic, Technologies of Information and Telecommunications, (Susa, Tunisia), K. Thambiratnam and S. Sridharan, Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting, in Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Philadelphia, USA), 2005

37 Chapter 2 A Review of Keyword Spotting 2.1 Introduction This chapter presents a comprehensive review of keyword spotting technologies to date. Section 2.2 gives a formal definition of the keyword spotting problem and is followed by a discussion of the various applications of keyword spotting in section 2.3. A brief synopsis of the development of keyword spotting research is provided in section 2.4 as well as a detailed description of how keyword spotting performance is measured in section 2.5. Subsequent sections discuss the current methods of keyword spotting with respect to their key applications. Section 2.6 discusses a number of algorithms for unconstrained vocabulary keyword spotting. This is followed by a description of the various approaches to non-keyword modeling in section 2.7. Approaches to constrained vocabulary keyword spotting are then presented in section 2.8 as well as methods for keyword occurrence verification in section 2.9. Finally, methods of applying KS to the task of audio document indexing are discussed in section

38 10 Chapter 2. A Review of Keyword Spotting 2.2 The keyword spotting problem Keyword spotting can be viewed as a special case of Speech-to-Text Transcription (STT), in which the transcription vocabulary is restricted to keywords of interest plus a non-keyword symbol that is used to represent all other words in the target application domain. Let O be an observation sequence, V be the vocabulary of the target application domain, Q be the set of keywords of interest and Ω be the non-keyword symbol. If STT is represented as the transformation W = T ranscribe(o, V ), where W = {w 1, w 2,...} is the resulting hypothesised word sequence, then the keyword spotting task can be defined as KS(O, V, Q) = f(t ranscribe(o, V ), Q) (2.1) where f(w, Q) is a transformation applied to the output of STT and is given by W W = 1, w 1 Q f(w, Q) = {Ω} W = 1, w 1 Q {w 1, f(t ail(w ), Q)} W > 1, w 1 Q {Ω, f(t ail(w ), Q)} f(t ail(w ), Q) W > 1, w 1 Q, w 2 Q otherwise and T ail({x i } N i=1) = {x i } N i=2 f(w, Q) essentially replaces all sequences of non-keywords in the word sequence output by the transcriber by a single non-keyword symbol Ω. Although valid, this formulation of keyword spotting is inefficient as it requires full transcription using a vocabulary of size V. Typically keyword spotting is

39 2.3. Applications of keyword spotting 11 only interested in occurrences of a much smaller set of words defined by Q. Given this simplification, a more practical and efficient formulation of keyword spotting is KS(O, V, Q) = T ranscribe(o, g(q)) (2.2) where g(q) = Q {Ω} This alternate approach requires transcription using a much smaller vocabulary of size Q + 1. Clearly, this is a considerably less computationally intensive task than transcription using the formulation in equation 2.1. However, it introduces the additional burden of an acoustic model representation of the nonkeyword symbol Ω. Definition of the non-keyword symbol is one of the active areas of keyword spotting research and is discussed further in section Applications of keyword spotting Keyword spotting lends itself to a plethora of speech-enabled applications. Keyword spotting is particularly well suited to applications where large amounts of speech need to be processed. This is because it offers a significant speed benefit over a large vocabulary STT approach. Four major applications of this technology are keyword monitoring, audio document indexing, command control devices and dialogue systems Keyword monitoring applications Keyword monitoring applications are required to continuously monitor a realtime stream of audio and to flag any occurrences of a keyword in the query set. Specific keyword monitoring applications include telephone tapping, listening device monitoring and broadcast monitoring.

40 12 Chapter 2. A Review of Keyword Spotting Telephone tapping and listening device monitoring are used extensively by security organisations to detect criminal or malicious activity. Keyword spotting provides a fast and automatic solution to this task and potentially a higher detection accuracy then human monitoring, particularly when a very large number of audio streams needs to be monitored. However, these applications create a considerable challenge for keyword spotting because of the noisy nature of the speech being monitored. Telephone conversations may be plagued with significant background noise, multiple languages and even multiple speakers, providing challenges for acoustic modeling. Listening device audio may suffer from very low signal-to-noise ratios, a difficulty for any speech processing application. Broadcast monitoring is actively performed by commercial broadcast monitoring companies to locate segments that may be of interest to a client. For example, a senator may be interested in all news stories in which he or she is mentioned in - broadcast monitoring organisations provide such a service at a fee. A significant challenge of broadcast monitoring is the amount of audio that needs to be processed daily. Broadcast monitoring clients may be interested in stories from a comprehensive set of broadcast sources, including free-to-air television, cable-television, commercial radio and community radio. It is easy to see that the vast numbers of these combined with the fact that many of these sources broadcast continually 24 hours a day, 7 days a week, makes broadcast monitoring a very data intensive problem. Keyword spotting provides an excellent solution to all these keyword monitoring tasks. Faster-than-real-time keyword spotting technologies are likely to process audio faster than a human processor. Additionally the accuracy of an automatic system is also likely to exceed that of a human processor since computers do not suffer from fatigue and mental distractions that plague a human processor. Keyword spotting is particularly well suited to the broadcast monitoring task since audio quality in this domain is usually of much higher quality

41 2.3. Applications of keyword spotting 13 than telephone and listening device audio Audio document indexing Audio document indexing is the task of rapidly searching an audio document database for keywords and topics of interest. This functionality is analogous to traditional text document indexing systems such as the Google [11] Internet search engine, but operates on audio documents instead. The need for efficient and fast audio document indexing is paramount in a world where audio and multimedia documents play a greater role in everyday life. STT systems are one solution to the audio document indexing problem. Audio is first transcribed to text that can then be rapidly searched during query time. However, many applications of audio document indexing, such as news database searching, require support for proper noun queries such as names, places and foreign words terms that in many cases are not a part of the transcription system s vocabulary. As such, alternates to the STT-based approach that do not constrain the query vocabulary are required. Thankfully, a keyword spotting solution does provide support for unrestricted vocabulary queries. The trade-off though is a reduction in query speeds, since most KS approaches are nowhere near as fast as text-based searching methods. Nevertheless, the support for unrestricted vocabulary queries is important, and as such, a keyword spotting system can be used to augment an STT-based system to provide very fast queries for in-vocabulary words while still supporting out-ofvocabulary queries Command controlled devices Command controlled devices monitor the ambient audio and react when they detect specific command words. Examples of command controlled devices are

42 14 Chapter 2. A Review of Keyword Spotting speech-enabled mobile phones, voice-controlled VCRs and command-controlled factory machinery. Although generic keyword monitoring technologies can be used for command controlled devices, they typically place too high a processing or memory requirement to be feasible, especially in the case of DSP-based or embedded applications. Additionally, the query terms of command controlled devices tend to be fixed, allowing more application-specific information to be incorporated into the keyword detection process. This includes query word linguistic context information and environmental noise conditions. Hence command controlled device KS lends itself to the development of custom solutions. Though many of these solutions may be based on existing keyword spotting approaches, significant enhancements and modifications are made to provide maximum performance for the intended application Dialogue systems Automated dialogue systems are becoming more common in the commercial environment as a viable alternative to human-operated call centres. A dialogue system mimics a human call-centre operator by playing voice prompts to a caller and then attempting to detect keywords that indicate the response of caller. Since the volume of calls processed by a call-centre can be very large, large vocabulary STT approaches have proven infeasible due to their high computational requirements. Instead restricted grammar speech recognisers or keyword spotting technologies are used to interpret the response of callers. Keyword spotting approaches offer a benefit over restricted grammar speech recogniser approaches because they allow greater flexibility in the response of the speaker. This is because KS accommodates out-of-vocabulary words by means of non-keyword modeling. However, a cleverly constructed restricted grammar

43 2.4. The development of keyword spotting 15 speech recogniser can better understand the intention of a caller using contextual information, and therefore may prove more appropriate for certain applications. 2.4 The development of keyword spotting In a similar fashion to general speech recognition theory, keyword spotting has undergone a number of generations of development. Early approaches were limited by low computing resources and hence KS research was limited to simpler tasks such as isolated keyword detection. As speech recognition technology matured, more advanced tasks were explored, such as the detection of keywords embedded in noise or continuous speech Sliding window approaches Initial methods focused on using sliding window approaches such as the dynamic time warping approaches proposed by Sakoe and Chiba [29] and Bridle [6], or the sliding window based neural network method prescribed by Zeppenfeld [40]. Such techniques yielded acceptable results in isolated keyword spotting tasks, but suffered from considerable drops in performance when spotting keywords embedded in continuous speech. A major reason for this drop in performance was because sliding window approaches did not model non-keywords either implicitly or explicitly. Spotting of keywords in continuous speech is essentially a 2-class discrimination task, attempting to classify regions as either a keyword or a non-keyword instance. Since the traditional sliding window approaches did not model non-keywords, they essentially were attempting discrimination with only knowledge of the target class. This was analogous to making measurements without a point of reference - all observations were purely relative and therefore provided little confidence for making absolute decisions.

44 16 Chapter 2. A Review of Keyword Spotting Non-keyword model approaches To address the lack of knowledge of the non-target class, the concept of nonkeyword models (also known as filler models) was introduced into keyword spotting. Non-keyword models attempted to model all speech that did not form a part of the target keyword speech. For example, in a closed vocabulary system, a non-keyword model would attempt to model all words in the vocabulary except for the target keywords. Using a non-keyword model provided more confidence when accepting or rejecting putative instances of target keywords compared to the sliding window approaches because a comparison was being made between the target keyword model and the non-keyword model. One of the initial approaches used to incorporate non-keyword models was proposed by Higgins and Wohlford [13]. Here a DTW-based continuous speech recogniser was modified to use filler non-keyword models to represent non-keyword speech. The modified speech recogniser was then used to transcribe continuous speech into regions of keywords and non-keywords. Finally, a likelihood ratio was used to normalise keyword likelihoods by the corresponding likelihood of the non-keyword model over the same observation sequence. Non-keyword models in this particular approach were modeled by using pieces and subsequences of the target keyword. The introduction of non-keyword models into keyword spotting saw the fusion of continuous speech recognition research with keyword spotting techniques. Whereas previously KS approaches had exclusively used sliding window techniques, the use of non-keyword models required a paradigm shift into the speech recognition context. Specifically, keyword spotting could be simply viewed as a special case of continuous speech recognition, where all non-keyword speech was labeled with a single non-keyword tag. Operating within the speech recognition framework allowed the latest developments in continuous speech recognition such

45 2.4. The development of keyword spotting 17 as advances in modeling techniques to be transferred to the KS domain. Hence, keyword spotting research began to more closely follow the trends of speech recognition research Hidden Markov Model approaches The advent of Hidden Markov Model (HMM) based speech recognition lead to the introduction of HMM-based keyword spotting techniques. As for DTW-based keyword spotting, HMM-based keyword spotting could be viewed as a special case of HMM-based speech recognition, where all non-target words were represented by a non-keyword model. One common approach was to use a word loop consisting of all target keywords in parallel with the non-keyword. Target keywords were typically modeled using either word models or sub-word models, while non-keyword speech was modeled using a plethora of architectures, including a high-order Gaussian Mixture Model as prescribed by Wilpon et al. [35] or a monophone model set as suggested by Rose and Paul [28]. This lead to the development of better performing KS systems, paving the way to more complex keyword spotting applications Further developments Advances in high-level linguistic modeling through recognition grammars and language modeling were also incorporated into keyword spotting. These advances were motivated by the need to reduce false alarm rates of KS systems through the use of contextual information, specifically to reduce or constrain the emission of false putative keyword occurrences. Kenji et al. [18] and Gou et al. [12] both described techniques of incorporating finite state grammars into the spotting process. The reported experiments demonstrated significant gains in performance for simple recognition grammar applications compared to non-grammar-constrained

The Data Warehouse Challenge

The Data Warehouse Challenge The Data Warehouse Challenge Taming Data Chaos Michael H. Brackett Technische Hochschule Darmstadt Fachbereichsbibliothek Informatik TU Darmstadt FACHBEREICH INFORMATIK B I B L I O T H E K Irwentar-Nr.:...H.3...:T...G3.ty..2iL..

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

Workflow Administration of Windchill 10.2

Workflow Administration of Windchill 10.2 Workflow Administration of Windchill 10.2 Overview Course Code Course Length TRN-4339-T 2 Days In this course, you will learn about Windchill workflow features and how to design, configure, and test workflow

More information

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Worse than not knowing is having information that you didn t know you had. Let the data tell me my inherent

More information

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,

More information

Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential

Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential white paper Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential A Whitepaper by Jacob Garland, Colin Blake, Mark Finlay and Drew Lanham Nexidia, Inc., Atlanta, GA People who create,

More information

life science data mining

life science data mining life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.

More information

Numerical Field Extraction in Handwritten Incoming Mail Documents

Numerical Field Extraction in Handwritten Incoming Mail Documents Numerical Field Extraction in Handwritten Incoming Mail Documents Guillaume Koch, Laurent Heutte and Thierry Paquet PSI, FRE CNRS 2645, Université de Rouen, 76821 Mont-Saint-Aignan, France Laurent.Heutte@univ-rouen.fr

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Business Administration of Windchill PDMLink 10.0

Business Administration of Windchill PDMLink 10.0 Business Administration of Windchill PDMLink 10.0 Overview Course Code Course Length TRN-3160-T 3 Days After completing this course, you will be well prepared to set up and manage a basic Windchill PDMLink

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

How To Write A Diagram

How To Write A Diagram Data Model ing Essentials Third Edition Graeme C. Simsion and Graham C. Witt MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF ELSEVIER AMSTERDAM BOSTON LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior

More information

Robustness of a Spoken Dialogue Interface for a Personal Assistant

Robustness of a Spoken Dialogue Interface for a Personal Assistant Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

The Influence of Topic and Domain Specific Words on WER

The Influence of Topic and Domain Specific Words on WER The Influence of Topic and Domain Specific Words on WER And Can We Get the User in to Correct Them? Sebastian Stüker KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Biometrics in Physical Access Control Issues, Status and Trends White Paper

Biometrics in Physical Access Control Issues, Status and Trends White Paper Biometrics in Physical Access Control Issues, Status and Trends White Paper Authored and Presented by: Bill Spence, Recognition Systems, Inc. SIA Biometrics Industry Group Vice-Chair & SIA Biometrics Industry

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

BSM 9.0 ESSENTIALS. Instructor-Led Training

BSM 9.0 ESSENTIALS. Instructor-Led Training BSM 9.0 ESSENTIALS Instructor-Led Training INTENDED AUDIENCE New users of Business Service Management (BSM) 9.0, including: Database Administrators System Administrators Network Administrators Operations

More information

Introduction to Windchill PDMLink 10.0 for Heavy Users

Introduction to Windchill PDMLink 10.0 for Heavy Users Introduction to Windchill PDMLink 10.0 for Heavy Users Overview Course Code Course Length TRN-3146-T 2 Days In this course, you will learn how to complete the day-to-day functions that enable you to create

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

Speech Analytics. Whitepaper

Speech Analytics. Whitepaper Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Windchill PDMLink 10.2. Curriculum Guide

Windchill PDMLink 10.2. Curriculum Guide Windchill PDMLink 10.2 Curriculum Guide Live Classroom Curriculum Guide Update to Windchill PDMLink 10.2 from Windchill PDMLink 9.0/9.1 for the End User Introduction to Windchill PDMLink 10.2 for Light

More information

Regulation On Attainment of Doctor of Sciences Degree at SEEU (PhD)

Regulation On Attainment of Doctor of Sciences Degree at SEEU (PhD) According to article 118 of the Law on Higher Education of Republic of Macedonia; articles 60, 68 and 69 of SEEU statute ; based on decision of Council of Teaching and Science of SEEU of date April 12th

More information

Core Training Quick Reference Guide Version 2.0

Core Training Quick Reference Guide Version 2.0 Core Training Quick Reference Guide Version 2.0 Page 1 of 34 Contents Changes from Previous Version... 3 Introduction... 5 Guidance for Professional Users based in Colleges/ Schools/ Departments... 5 Logging

More information

Thresholds & Pre-requisites

Thresholds & Pre-requisites Thresholds & Pre-requisites Please see below the scoring thresholds and pre-requisites for the QS Stars Evaluation: OVERALL 1000 1 Star... 100 Must have the authority to grant valid degree level programs

More information

Dragon Solutions Enterprise Profile Management

Dragon Solutions Enterprise Profile Management Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

15 Organisation/ICT/02/01/15 Back- up

15 Organisation/ICT/02/01/15 Back- up 15 Organisation/ICT/02/01/15 Back- up 15.1 Description Backup is a copy of a program or file that is stored separately from the original. These duplicated copies of data on different storage media or additional

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group Practical Data Mining Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor Ei Francis Group, an Informs

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error

More information

Building a Database to Predict Customer Needs

Building a Database to Predict Customer Needs INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

More information

IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN

IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN i I I I THE PRACTITIONER'S GUIDE TO DATA QUALITY IMPROVEMENT DAVID LOSHIN ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann

More information

LIST OF FIGURES. Figure No. Caption Page No.

LIST OF FIGURES. Figure No. Caption Page No. LIST OF FIGURES Figure No. Caption Page No. Figure 1.1 A Cellular Network.. 2 Figure 1.2 A Mobile Ad hoc Network... 2 Figure 1.3 Classifications of Threats. 10 Figure 1.4 Classification of Different QoS

More information

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is

More information

ReDBox SUPPORT AGREEMENT

ReDBox SUPPORT AGREEMENT ReDBox SUPPORT AGREEMENT This Agreement is made the. day of 2012, with a Commencement Date of the day of.. 2012 BETWEEN: (1 QUEENSLAND CYBER INFRASTRUCTURE FOUNDATION LTD of c/- Maths, University of Queensland,

More information

THE INTELLIGENT INTERFACE FOR ON-LINE ELECTRONIC MEDICAL RECORDS USING TEMPORAL DATA MINING

THE INTELLIGENT INTERFACE FOR ON-LINE ELECTRONIC MEDICAL RECORDS USING TEMPORAL DATA MINING International Journal of Hybrid Computational Intelligence Volume 4 Numbers 1-2 January-December 2011 pp. 1-5 THE INTELLIGENT INTERFACE FOR ON-LINE ELECTRONIC MEDICAL RECORDS USING TEMPORAL DATA MINING

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

Speech Analytics Data Reliability: Accuracy and Completeness

Speech Analytics Data Reliability: Accuracy and Completeness Speech Analytics Data Reliability: Accuracy and Completeness THE PREREQUISITE TO OPTIMIZING CONTACT CENTER PERFORMANCE AND THE CUSTOMER EXPERIENCE Summary of Contents (Click on any heading below to jump

More information

Comparative Error Analysis of Dialog State Tracking

Comparative Error Analysis of Dialog State Tracking Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 rws@cs.ecu.edu Abstract A primary motivation

More information

What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)

What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) data analysis data mining quality control web-based analytics What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) StatSoft

More information

Convention Paper 7896

Convention Paper 7896 Audio Engineering Society Convention Paper 7896 Presented at the 127th Convention 2009 October 9 12 New York, NY, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

PSD2 Regulating a New Payments World Patterns of Expertise The quest for a

PSD2 Regulating a New Payments World Patterns of Expertise The quest for a PSD2 Regulating a New Payments World Patterns of Expertise The quest for a A Guide from Icon Solutions By Tom Hay, Head of Payments December 2014 Overview The European Union has been drafting new legislation

More information

Delme John Pritchard

Delme John Pritchard THE GENETICS OF ALZHEIMER S DISEASE, MODELLING DISABILITY AND ADVERSE SELECTION IN THE LONGTERM CARE INSURANCE MARKET By Delme John Pritchard Submitted for the Degree of Doctor of Philosophy at HeriotWatt

More information

EGG HARBOR TOWNSHIP SCHOOLS CURRICULUM GUIDE HIGH SCHOOL BUSINESS BUSINESS MANAGEMENT/BUSINESS ETHICS

EGG HARBOR TOWNSHIP SCHOOLS CURRICULUM GUIDE HIGH SCHOOL BUSINESS BUSINESS MANAGEMENT/BUSINESS ETHICS EGG HARBOR TOWNSHIP SCHOOLS CURRICULUM GUIDE HIGH SCHOOL BUSINESS BUSINESS MANAGEMENT/BUSINESS ETHICS SUBJECT AREA PHILOSOPHY The purpose of business education is to provide students with the knowledge

More information

Measuring Data Quality for Ongoing Improvement

Measuring Data Quality for Ongoing Improvement Measuring Data Quality for Ongoing Improvement A Data Quality Assessment Framework Laura Sebastian-Coleman ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

INTERNAL REGULATIONS OF THE AUDIT AND COMPLIANCE COMMITEE OF BBVA COLOMBIA

INTERNAL REGULATIONS OF THE AUDIT AND COMPLIANCE COMMITEE OF BBVA COLOMBIA ANNEX 3 INTERNAL REGULATIONS OF THE AUDIT AND COMPLIANCE COMMITEE OF BBVA COLOMBIA (Hereafter referred to as the Committee) 1 INDEX CHAPTER I RULES OF PROCEDURE OF THE BOARD OF DIRECTORS 1 NATURE 3 2.

More information

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting Page 1 of 9 1. SCOPE This Operational Practice is recommended by Free TV Australia and refers to the measurement of audio loudness as distinct from audio level. It sets out guidelines for measuring and

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

1 of 7 31/10/2012 18:34

1 of 7 31/10/2012 18:34 Regulatory Story Go to market news section Company TIDM Headline Released Number Ironveld PLC IRON Holding(s) in Company 18:01 31-Oct-2012 0348Q18 RNS Number : 0348Q Ironveld PLC 31 October 2012 TR-1:

More information

Evaluation of speech technologies

Evaluation of speech technologies CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

Abbreviation Acknowledgements. The History Analysis of the Consumer Applicatl.o Using ASR to drive Ship, One Application Example

Abbreviation Acknowledgements. The History Analysis of the Consumer Applicatl.o Using ASR to drive Ship, One Application Example Contents Preface Abbreviation Acknowledgements xv xix xxiii 1. INTRODUCTION 1 1.1 NEW TIME WITH NEW REQUIREMENT 1 1.2 THE ASR APPLICATIONS 2 The History Analysis of the Consumer Applicatl.o Using ASR to

More information

C E D A T 8 5. Innovating services and technologies for speech content management

C E D A T 8 5. Innovating services and technologies for speech content management C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Australian Standard. Interactive voice response systems user interface Speech recognition AS 5061 2008 AS 5061 2008

Australian Standard. Interactive voice response systems user interface Speech recognition AS 5061 2008 AS 5061 2008 AS 5061 2008 AS 5061 2008 Australian Standard Interactive voice response systems user interface Speech recognition This Australian Standard was prepared by Committee IT-022, Interactive Voice Response

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Project Management Using Earned Value

Project Management Using Earned Value Project Management Using Earned Value Third Edition Gary C. Humphreys Earned Value Management Consulting Training 2002, 2011, 2014 Gary C. Humphreys Humphreys & Associates, Inc. All rights reserved. No

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

System Administration of Windchill 10.2

System Administration of Windchill 10.2 System Administration of Windchill 10.2 Overview Course Code Course Length TRN-4340-T 3 Days In this course, you will gain an understanding of how to perform routine Windchill system administration tasks,

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

customer care solutions

customer care solutions customer care solutions from Nuance white paper :: Understanding Natural Language Learning to speak customer-ese In recent years speech recognition systems have made impressive advances in their ability

More information

Connecticut Health Care Costs

Connecticut Health Care Costs How much does Connecticut spend on health care? Connecticut Health Care Costs In 2009 Connecticut spent $30.4 billion dollars on health care. That is $8,653.57 for every state resident. And those costs

More information

Online Failure Prediction in Cloud Datacenters

Online Failure Prediction in Cloud Datacenters Online Failure Prediction in Cloud Datacenters Yukihiro Watanabe Yasuhide Matsumoto Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly

More information

Eurostat's 2010 Survey Questionnaire on Access to Finance

Eurostat's 2010 Survey Questionnaire on Access to Finance From: Entrepreneurship at a Glance 2012 Access the complete publication at: http://dx.doi.org/10.1787/entrepreneur_aag-2012-en Eurostat's 2010 Survey Questionnaire on Access to Finance Please cite this

More information

Detecting Credit Card Fraud

Detecting Credit Card Fraud Case Study Detecting Credit Card Fraud Analysis of Behaviometrics in an online Payment environment Introduction BehavioSec have been conducting tests on Behaviometrics stemming from card payments within

More information

NECK INJURY IN RUGBY UNION: INCIDENCE, SEVERITY, AETIOLOGY AND PREVENTION

NECK INJURY IN RUGBY UNION: INCIDENCE, SEVERITY, AETIOLOGY AND PREVENTION NECK INJURY IN RUGBY UNION: INCIDENCE, SEVERITY, AETIOLOGY AND PREVENTION Michael Steven Swain, BChiroSc, MChiroprac, ICSSD Thesis presented for the degree of Master of Philosophy Department of Chiropractic,

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

Modelling, Simulation and Implementation of a Fault Tolerant Permanent Magnet AC Motor Drive with Redundancy

Modelling, Simulation and Implementation of a Fault Tolerant Permanent Magnet AC Motor Drive with Redundancy HE UNIVERSIY OF ADELAIDE Australia Modelling, Simulation and Implementation of a Fault olerant ermanent Magnet AC Motor Drive with Redundancy Jingwei Zhu Submitted to the Faculty of Engineering, Computer

More information

Efficient on-line Signature Verification System

Efficient on-line Signature Verification System International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Contents. Specialty Answering Service. All rights reserved.

Contents. Specialty Answering Service. All rights reserved. Contents 1 Abstract... 2 2 What Exactly Is IVR Technology?... 3 3 How to Choose an IVR Provider... 4 3.1 Standard Features of IVR Providers... 4 3.2 Definitions... 4 3.3 IVR Service Providers... 5 3.3.1

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Object-Oriented Systems Analysis and Design

Object-Oriented Systems Analysis and Design Object-Oriented Systems Analysis and Design Noushin Ashrafi Professor of Information System University of Massachusetts-Boston Hessam Ashrafi Software Architect Pearson Education International CONTENTS

More information

Implementing Large-Scale Autonomic Server Monitoring Using Process Query Systems. Christopher Roblee Vincent Berk George Cybenko

Implementing Large-Scale Autonomic Server Monitoring Using Process Query Systems. Christopher Roblee Vincent Berk George Cybenko Implementing Large-Scale Autonomic Server Monitoring Using Process Query Systems Christopher Roblee Vincent Berk George Cybenko These slides are based on the paper Implementing Large-Scale Autonomic Server

More information

DR. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD. PROGRAMME

DR. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD. PROGRAMME BCA 1 DR BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD Diploma in Business Management Semester I st & II nd Semester (New Syllabus 60/40) Examination May-2011 The Examination held on the Days and

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

More information

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke 1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

More information