Automatic Creation and Tuning of Context Free

Size: px
Start display at page:

Download "Automatic Creation and Tuning of Context Free"

Transcription

1 Proceeding ofnlp - 'E0 5 Automatic Creation and Tuning of Context Free Grammars for Interactive Voice Response Systems Mithun Balakrishna and Dan Moldovan Human Language Technology Research Institute The University of Texas at Dallas Dallas, Texas mithun,moldovan@hlt.utdallas.edu Ellis K. Cave Waterview Parkway Dallas, Texas Skip.Cave@intervoice.com Abstract-In this paper, we shall focus on the design of an AUTOCFGPROCESSOR procedure to automatically create and tune Context Free Grammars (CFGs) for directed dialog speech applications without the use of any domain specific text corpora. A reranking mechanism is used to post-process the Large Vocabulary Continuous Speech Recognizer (LVCSR) n- best lists with additional phonetic and higher level linguistic knowledge for transcribing the user utterances with improved Word Error Rate (WER). We also shall depict the classification of LVCSR transcriptions into semantic categories and the use of a statistical ifitering mechanism on the valid LVCSRtranscriptions for the CFG creation and tuning tasks. We also illustrate the importance of the additional improvements gained by using semantic classification strength in a feedback loop to the transcription mechanism. I. INTRODUCTION The current generation of directed dialog speech applications (DDSAs) predominantly use Context-Free Grammar (CFG) instead of a statistical n-gram based language model [1] i.e. the Automatic Speech Recognizer (ASR) embedded in speech applications rely on a CFG based language model rather than a statistical n-gram based language model to restrict the options presented to the ASR decoder by the ASR acoustic model. The preference for CFG in telephonic Interactive Voice Response (IVR) systems can be attributed to the very tight constraint placed on the ASR's response time to a user's request and the limited availability of text corpora for a wide range of application domains. The need for only the accurate semantic tag and its corresponding arguments associated with the user response rather than the entire set of words spoken by the user also justifies this preference. In the conventional grammar based IVR, the user response to a particular IVR prompt is fed to a CFG-based ASR. The ASR uses the CFGs to decide whether the user response is valid (Match) or invalid (No-Match) and the IVR's response is based on this decision. For example, the prompt "do you want your account balance or your cleared checks?" might have answers that match the words "checks" or "balance." If the user responds "what is the price of pizza?," the system simply puts the answer down as a no-match and repeats the prompt. Every user prompt in a DDSA needs a CFG to accurately recognize and semantically classify the user's response for that particular prompt. For the IVR to work with maximum accuracy, the IVR CFGs should cover the most probable responses that are expected from the user at every prompt in the application call-flow [1]. For the previous example prompt, the person can easily respond with "the sum of my account" or "the total of my account." These responses are out of the grammar that was written for this prompt, but still good answers. To correct this problem in the current IVRs, once prompts are created, a grammar designer must read the prompts and write what they believe to be all the possible answers for each prompt. The success of well-designed CFG-based DDSAs has resulted in the very negligible deployment of their statistical n-gram language model based counterparts. A CFG for a prompt in the IVR call-flow contains (refer [1] for detailed explanations about CFGs in IVR systems): 1) Universals - words for navigation or accessing assistance at any point in the call-flow of the speech application e.g. "help," "repeat," "start over," "operator." 2) The most probable words or phrases that are expected from the user at that particular prompt. The main purpose of this paper is to describe a method of automatically creating and tuning finite state grammars representing users' responses to prompts in IVR systems. For example, to an IVR prompt "do you want your account balance or your cleared checks?," users might make replies of the form "Total in my account" or "Sum of my account"; and our goal is to produce an efficient grammar containing the full set of actual responses. It should be noted that the goal is not to produce a grammar containing all the possible user response altematives to a particular prompt but to produce one containing only the most probable user responses required to improve the IVR performance. In this paper, we will present improvements of techniques presented in earlier work [2] for determining which category of possible response (as specified earlier by the speech application designer) is exemplified by a actual user response. These categorization techniques involve lexical chain paths [3] computed in a lexical resource like WordNet [4] between words in users' actual responses and words in previously prepared semantic descriptions of the possible responses. We will also present new statistically based techniques for adding new user /05/ $ IEEE 158

2 responses to the grammar of responses, and (importantly) for removing dormant word sequences from the grammar. A. Previous Work Reference [5] proposes a semantically structured model to reduce the manual labor in developing IVR system CFGs. This process creates an IVR system grammar by combining statistical n-grams and CFGs to reduce the understanding error rate by half of that obtained from a manually created grammar. The proposed method however requires a partially labeled (manually performed) text corpus in the IVR's domain for training the semantically structured model. Due to the high deployment demand for directed dialog systems in a wide variety of domains and the lack of any respectable size text corpora in these areas, the traditional manual processes of creating and tuning CFGs for each of the designed user prompts uses only the collected test user speech utterances. Until recently, the CFG creation process and the subsequent tuning process, based on collected user utterances, were primarily manual. A speech application designer first analyzes the system prompts and a sample CFG is created (an initial guess of the most probable user words or phrases) for each IVR prompt. This sample CFG is loaded into the IVR and a set of test users call and respond to the prompts. Their responses are manually transcribed and analyzed to modify the initial sample CFG, increasing its coverage of expected user responses and, hence, improving the IVR performance. Adding a large set of user response alternatives does not necessarily imply improved IVR performance. On the contrary, we have noticed severe IVR performance degradation due to increase in the recognition confusion. In this paper, we present improvements of techniques presented in earlier grammar automation work [2]. Since our corpus of user responses is spoken, a necessary first step in the automation process is transcribing the responses more accurately than has previously been possible. To transcribe the user responses, we will use a Large Vocabulary Continuous Speech Recognizer (LVCSR) system with an extemal reranking mechanism made of phonetic, lexical, syntactic and semantic features. We propose a WordNet based semantic categorizer to determine which category of possible response goals (as manually specified in advance by the system designer) is exemplified by the user utterance transcription. We also propose a statistical selection mechanism which not only adds altemative user responses to the CFG but also removes dormant word sequences from the CFG (to completely justify the observation in [2] that bigger CFGs do not necessarily lead to a better IVR performance, as more alternatives increase the utterance recognition confusion). Finally, we will show how constructive feedback can be facilitated between the improved LVCSR results and the grammar creation/tuning process. A multiple loop feedback mechanism between the semantic categorizer and reranking module using lexical chain strength measurements not only improves the transcription accuracy of the LVCSR system but also improves the IVR Fig. 1. Automatic creation and tuning of CFG performance by presenting better response altematives to the statistical selection mechanism. II. AUTOCFGPROCESSOR The goal of the system is to capture the most probable user responses, which have not yet been adequately represented by the semantic tags (creation task) or the previously created CFG (tuning task). For the automatic CFG creation process, the speech application designer just needs to conceive a semantic tag for each option in a prompt i.e. design the application call-flow with its prompts and semantic categories. The use of a "Wizard-of-oz" procedure [2] for guiding the test callers through the call flow and collecting their responses is expensive. Therefore, we recommend using a IVR loaded with a skeletal CFG (containing only the word sequences in the prompt options) to guide the callers. The automatic CFG tuning process modifies the CFGs (manually or automatically created) using a set of caller responses (previously created CFGs guide the callers through the IVR call-flow) to improve the IVR performance. Fig. 1 depicts our proposed design. A set of test callers respond to the IVR loaded with the manually created prompts and previously created CFGs (for CFG tuning). Irrespective of the task (creation or tuning), the user responses are recorded and transcribed using a LVCSR with the reranking mechanism. The transcriptions need to be as accurate as possible to realize the AUTOCFGPROCESSOR goal and hence the LVCSR with the reranking mechanism plays a major role in improving the Word Error Rate (WER). The LVCSR-transcribed responses are grouped based on the prompt for which they were uttered. For the CFG creation task, the semantic categorizer tries to map the user utterance transcription to the best semantic label from the set corresponding to each prompt. Utterance transcription and semantic label pairs are then statistically validated to create the appropriate CFG. For the CFG tuning 159

3 task, the semantic categorizer uses a set of semantic labels and their corresponding response alternatives (from the CFG to be tuned) and assigns the best semantic label to the user utterance. In this paper, the (utterance transcription, semantic label) pairs are then statistically validated to not only add some valid alternatives but also remove some statistically invalid entries from the old CFG. An automatically created/tuned CFG is loaded into the IVR and the entire process is repeated iteratively until no further improvement is achieved. A. LVCSR N-Best List Reranking Based Transcription Process The majority of the current LVCSRs compute the posterior probability using a Hidden Markov Model (HMM) based acoustic model (conditional probability), and an n-gram based language model (prior probability). Though a LVCSR based on such an architecture performs recognition with reasonable accuracy, a significant amount of untapped WER improvements remains hidden within the LVCSR n-best lists and wordlattices. Reference [2] presents an analysis for the HUB test corpus, showing that the choice of the best hypothesis from a 200-best list gives a 12.4% absolute WVER reduction over the baseline LVCSR hypothesis (produced by a 3-pass recognition using Gender-dependent acoustic models, Vocal Tract Length Normalization (VTLN) + Speaker Adaptive Trained (SAT) acoustic models + Maximum Likelihood Linear Regression (MLLR) and, MLLR(2) adaptation), while the word lattice gives a 13.0% absolute WER reduction. Results show that majority of the most accurate hypotheses are captured within the first 80 to 90 hypotheses for each utterance. These experimental results prove that big improvements can be gained by applying a strong reranking mechanism, even at a small n-best list depth. Hence, AUTOCFGPROCESSOR uses a LVCSR along with the n-best list reranking mechanism [2] for transcribing the test user utterance with considerable accuracy (ignoring the small additional improvements present in word lattices). The external reranking mechanism uses additional domain independent phonetic, lexical, syntactic and semantic knowledge in tandem, complementing each other to improve the WER. The reranking score assigned to the LVCSR n-best hypotheses is a simple linear weighed combination of the individual scores of each participating knowledge source [2]. For every n-best list hypothesis, the confidence score is computed in the following manner: if Ph = ph,, ph2,...,phm is the phoneme sequence corresponding to the n-best list hypothesis word sequence Wh = w1, w2,.w.,wng predicted by the LVCSR for the acoustic frames A of an utterance, then Score(Wh) = w1 * P(CategoriesAA)*P(Phi(CategoriesAA)) + +W2*(1 * P(PhJPh')*P(Phs)) P(PhlP()*(P) + W3 *(PLM(Wh)*H1n Count(w'JwV-boundary)) + W4 * (P(rr, Wh)C 1Count(Slots-Filled(j)Avi) rq Count(Slots-Filled(j)Avi)) + W5 * (Z1 Count(vi) Etj=1 Count(vj ) (W1,W2),W3,W4,W5 represent the weights assigned to the phonetic, lexical, syntactic and semantic features respectively. The remaining section will briefly describe the individual knowledge scores involved in the reranking score calculation equation (please refer to [2] for a detailed description). If Categories is the acoustic property based phoneme group sequence (grouping the phonemes into 13 categories based on acoustic properties avoids huge computation overhead) to which the phoneme sequence Ph belongs, then P(CategoriesIA) represents the Support Vector Machine (SVM) posterior probability of the acoustic frames belonging to Categories containing the original LVCSR phoneme sequence. P(PhlCategoriesAA) represents the SVM class posterior probability of the acoustic frames belonging to the original LVCSR phoneme sequence given the Categories. Let PhS = ph', ph',...,ph' be the best phoneme sequence predicted by SVM, for A, not containing phonemes present in W's phoneme sequence. (1 P(PhIPP1I)*P(Ph8)) - equation models the LVCSR phoneme classification accuracy probability with respect to Phs. The equation returns a high value if the original LVCSR phoneme sequence matches the best SVM phoneme sequence i.e. P(Ph) is high while P(PhS) is low. The probability P(PhIPh8) is computed using a phoneme confusion matrix. This matrix (error counts from training the acoustic model and SVM models) provides the probability of a phoneme being misclassified as ph,, instead of ph'. The numerator of P hjph)p(ph') is assigned a high value and its denominator a low value, if SVM disagrees with the LVCSR phoneme classification, resulting in an low overall LVCSR phoneme classification accuracy score. But this score is high if SVM agrees with the LVCSR phoneme classification as the numerator has a low value and the denominator has a high value. For calculating the lexical score, PLM(Wh) is the trigram score associated with the n-best list hypothesis containing n words. The second part of the lexical confidence score is equivalent to computing the unigram probabilities of the words in each hypothesis, given their word boundaries, using the set of words occurring in the n-best list as a reference i.e. the probability of the word w' occurring in a particular hypothesis time-frame with respect to all the words W occurring in that particular hypothesis time-frame in all the hypotheses of the n-best list. The syntactic score P(7r, Wh) is the probability of the best parse (7r), for the hypothesis Wh, generated by the Charniak's parser [6] as a lexicalized syntactic model. S Count(Slots-Filted(j)Avi) Eq Count(Slots-Filled(j)Av )) The semantic score 5r=- =' Count(vE) rep- Z 1 Count(vj) resents the extent of the predicate arguments' coverage in the hypothesis, where p is the number of arguments for predicate vi in hypothesis Wh, q is the total number of arguments possible for predicate vi in PropBank [7] and t is the total number of predicates identified in PropBank. We use the ASSERT semantic parser [8] to extract the predicate-argument 160

4 1) Grammar: Cable Account Change 4 2) User Utterance: "I'd like to speak to a live person please" 3) LVCSR Sequence: "I'd likelvb to speak/vb to a live/jj person/nn of these" 4) Target Semantic Tag: "Customer Service" 5) Target Grammar Entry: "Human/JJ/#2 Operator/NN/#2 " 6) Best 2 Lexical Chains of 8: a) operator/nn/#2 TO speak/vb: (n-operator#2, manipulator#1) HYPONYM (n-telephoneoperator#1, telephonist#1, switchboardcoperator#l) GLOSS (v-get#14) HYPERNYM (v-communicate#2, intercommunicate#2) HYPONYM (v-talk#1, speak#2) VALUE: 9.22 b) human/jj/#2 TO person/nn: (a-human#2) GLOSS (n-person#1, individual#1, someone#1, somebody#1, mortal#1, human#1, soul#2) VALUE: ) Lexical Chain Strength: Fig. 2. An example illustrating the semantic categorization mapping process relations defined in PropBank. B. Semantic Categorizer and Feedback Reference [3] presents a methodology for finding topically related words by increasing the connectivity between WordNet [4] synsets using the information from WordNet glosses. We use lexical chains to classify the LVCSR-transcription into one of the designer conceived semantic tags. For the CFG creation task, the semantic categorizer tries to map the user utterance transcription to the best semantic label from the set corresponding to each prompt. For the CFG tuning task, the semantic categorizer tries to map the user utterance transcription to a set of semantic labels and their corresponding response alternatives (from the CFG to be tuned) and assigns the best semantic label to the utterance. For each utterance, the top ten hypotheses from the reranked n-best list are selected for the classification task. By finding the best lexical chain between the word sequence and the designer conceived semantic label (for tuning, corresponding word sequence alternatives in the previously created CFG are also used), each candidate word sequence is mapped to a semantic tag. We used a majority voting mechanism to select the appropriate semantic tag for a user response. It is very important to note that designers spend the bulk of their time not only adding alternate word sequences but also rejecting invalid (NO-MATCH label in our experiments) or unnecessary user utterances. Fig. 2 illustrates the mapping process. Lexical chains are found between each content word in the LVCSR Sequence (LVCSR-S) and each content word in the Target Grammar Entry (TGE) (semantic label or, word sequences from the previously created CFG representing the Target Semantic Tag (TST)). Prior to the mapping process, all TGE words are manually assigned their POS tags and WordNet2.0 sense numbers. The LVCSR-S words are not sense disambiguated. A LVCSR-S to TGE mapping is valid if and only if there exists a lexical chain between every word in TGE and at least one word in the LVCSR-S. The Lexical Chain Strength (LCS) is the sum of the semantic similarity values of the best lexical chains from every TGE word. In our example, the mapping from LVCSR-S to TGE is valid because human/nn/#1 and operator/nn/#2 map to at least one word in LVCSR-S (person/nn and speak/ VB). The LCS is (the sum of the best lexical chain values) i.e (operator/nn/#2 to speak/vb, Value= 9.22) and (human/jj/#1 to person/nn, Value= 6.68). The first lexical chain (a) links the second WordNet2.0 sense of the noun operator with the verb speak. The lexical chain implementation [3] does not require a word sense for speak. It finds the best path to the second WordNet2.0 sense of speak and assigns a value of 9.22 to the found lexical path (higher values indicate stronger paths). Fig. 3 presents the algorithm used to map an user utterance transcription into the best semantic category for the prompt. For each user utterance, the top ten LVCSR transcriptions are selected and mapped to one of the various semantic categories available for the prompt using the previously illustrated lexical chain mechanism. A majority voting mechanism is then used to find the best semantic category from the various semantic categories proposed by the top ten LVCSR transcriptions. The confidence score associated with the semantic category chosen from the majority voting procedure is used to rerank the top ten hypotheses again and the process continues until the algorithm settles down to a particular transcription and semantic category. The procedure ValidMapping() identifies the mapping between a given (LVCSR-S, TGE) pair and returns true if and only if the validity condition for mapping holds. For such a valid pair, BestLexicalChain() computes the LCS. For each A in utterances of Tuning Task For each B in top ten LVCSR-S of n-best For each C in TSTs of Tuning Task For each D in TGEs of C If ValidMapping(B, D) BestLexicalChain(B, D, LCS[B, D]) else LCS[B,D]= NO-MATCH endif Value[C] = TGE-Max(LCS[B,D]) TST[B] = TST_Max(C,Value[C]) Sem-Cat(A) = Majority(TST[B]) Fig. 3. Algorithm to map the LVCSR n-best list transcriptions into the best semantic category for the IVR prompt 161

5 TGE-Max( ) finds the best TGE (based on LCS) for every TST and TSTMax( ) returns the best TGT (based on the LCS of the best TGE) for the top ten LVCSR-S of the n-best list. Ma j ority ( ) selects as the utterance semantic category, the TGT with the majority LVCSR-S votes. The selection of the word sequence representing the TST is as important as the selection of the correct TST. The LCS of a hypothesis, associated with the chosen TGT for the utterance, is used as a confidence score to rerank the top ten hypotheses again and the process continues until the system settles down to a particular transcription and semantic label. There are some transcription words which are currently unavailable in WordNet2.0. These missing transcription words cannot take part in the semantic categorization process as the lexical chains procedure relies on the availability of the word in WordNet2.0 to map it to previously prepared descriptions of possible responses. In this paper, the missing transcription words are ignored since we have found their frequency of occurrence in our actual user utterance transcriptions to be negligible. C. Statistical Selection Mechanism We propose an improved statistical selection mechanism to not only add user response alternatives but also remove dormant word sequences from the CFGs to improve the performance of the IVR. We run the following filters on valid LVCSR-transcriptions (labeled with semantic category) to find sequences that should be added/removed from the CFG: Occurrence Probability: Each response alternative inside the CFG contains a value indicating the occurrence probability of the word sequence for a particular semantic category in each prompt. This value indicates the probability of the word sequence being uttered by the user for that semantic category. A high value indicates that the users are more probable to use the corresponding word sequence in response to the system prompt. We prune the CFG to prevent the degradation in the IVR performance due to presence of low probability word sequences already added previously into the CFG (pruning is done only in CFG tuning). We remove response alternatives with occurrence probability lower than 0.05 and then re-adjust the occurrence probabilities of all the remaining alternatives for that particular prompt semantic category (the sum of the occurrence probability of all the alternatives for a semantic category is 1). The remaining rules provide a filtering mechanism to add valid LVCSR-transcriptions into the CFG and assign an occurrence probability to each alternative added into the CFG. Frequency: For a particular semantic category, add only widely used LVCSR-transcriptions (frequency greater than 1% of total valid LVCSR-transcriptions) e.g. A specific transcription like "Account number five one four three" for the "Billing Account Type Choice" prompt should not be added, though the user wants the IVR to map the account to a semantic category (Cable, Cellular Phone, etc.) in the prompt. The occurrence probability of all the response alternatives in the corresponding semantic category is recalculated. The Occurrence Probability rule is then applied to remove low probability response alternatives. Smallest Sequence: If a LVCSR-transcription Si is a substring of S2 (both with a frequency greater than the threshold and both mapped to the same semantic tag), then only the smaller sequence Si should be added to the CFG. e.g. If Si = "billing address" and S2 = "get my billing address," add Si to the CFG. The occurrence probability of all the response alternatives in the corresponding semantic category is recalculated with the count of the added smallest sequence (count of S1+ count of S2). The Occurrence Probability rule is then applied to remove low probability response alternatives. Sub-sequences: The above rules add high frequency transcriptions into the CFG. We also need to consider the fact that the entire LVCSR-transcription might not be valid due to LVCSR errors or due to invalid user uttered words. We might have low frequency LVCSR-transcriptions containing important, high frequency word sub-sequences e.g. the utterance "Ah Can I just speak to someone alive" is transcribed by the LVCSR as "that called religious speak someone by" or, utterance "I want to cancel an appointment would you get me somebody that is not a computer." We find valid LVCSR-transcription sub-sequences by extracting the words used to compute the LCS, for its corresponding semantic tag, in each low frequency LVCSR-transcription. If the count of a particular sub-sequence in a semantic tag class is greater than the frequency threshold, then the corresponding subsequence is added to the CFG for that semantic tag. We are basically trying to filter the worthless words and find useful high frequency sub-sequences. The occurrence probability of all the response alternatives in the corresponding semantic category is recalculated with the count of the added subsequence (count of all the LVCSR-transcriptions containing the sub-sequence). The Occurrence Probability rule is then applied to remove low probability response alternatives. III. RESULTS AND EXPERIMENTAL SETTINGS We use SONIC [9], a LVCSR system from the University of Colorado at Boulder, to produce n-best lists for the reranking mechanism. We trained the LVCSR acoustic models for the telephone transcription task using 160 CallHome English corpus conversation sides and 4826 Switchboard-I Release 2 corpus conversation sides. The SRI HUB model is used as a back-off tri-gram language model. We considered a set of 8013 collected test user utterances (live recordings from IVR systems deployed by Intervoice Inc.) for 4 prompts in 3 different applications: Change Cable Account Details (CCAD) (2452 utterances, 12 semantic categories), Billing Account Type Choice (BATC) (1328 utterances, 12 semantic categories), Bank Future Payment Update Options (BFPUO) (224 utterances, 9 semantic categories), Wireless Account Change Options (WACO) (4009 utterances, 13 semantic categories). Table I presents the transcription results obtained on a 30-best list reranking (our analysis similar to the one presented in [2] showed insignificant improvements beyond the 30-best hypotheses list) using the best set of knowledge 162

6 TABLE I VARIOUS WER RESULTS OBTAINED FOR THE AUTOCFGPROCESSOR TRANSCRIPTION TASK. Test User Utterance Set (8013 Utterances) wj=13,w2=18,w3=14 Error (%) Total W4=26,w5=29 Sub Del Ins Total Correct(%) Baseline Hypothesis Best List Reranking Best List Reranking + LCS Score Feedback TABLE II RESULTS OBTAINED FOR THE AUTOCFGPROCESSOR DIRECTED DIALOG SPEECH APPLICATION TASK. Prompt System Collected Test User Utterance Set (4006 Utterances) (Size) Error (%) Total(%) MisCat InCFG OutCFG Total Correct CCAD Original (1226) Manual Autol _ Auto S3 BATC Original (664) Manual Autol ir Auto BFPUO Original (112) Manual Autol Auto WACO Original (2004) Manual Autol Auto Tg.47U J weights obtained by running WER testing trials on 40 HUB Switchboard-I conversation sides using various weight combinations. Using the best weight combination, we achieve 6.1% absolute WER reduction (12.78% relative reduction). We got the best result (8.6% absolute WER reduction and 18.02% relative WER reduction) by using the recurring LCS score feedback to propose the best hypothesis from the reranked top 10 n-best list hypotheses. For the CFG tuning task, the AUTOCFGPROCESSOR performance is presented in Table III. We used the manually created CFGs for the previously mentioned set of 4 prompts as baseline. An application designer manually listed the semantic categories for each of the 4 prompts. The 8013 utterances set is divided equally, for each prompt, into a training and test set. The Original system results are obtained by running the test set using the original manually created CFGs. The system results (Auto], Auto2 and Manual) represent the quality of each prompt CFGs (tuned using the training set) against the test set. The Manual system prompt CFGs are obtained by tuning the original CFGs with word sequences proposed by a speech application designer, who manually analyzed the training set. The Auto2 (AUTOCFGPROCESSOR) system tunes the original CFGs with the word sequences obtained by using the baseline LVCSR, reranking mechanism, semantic categorizer and semantic categorizer feedback while Auto] is the implementation of [2]. MisCat errors are due to mismatches between the category proposed by the IVR and the actual utterance category. InCFG errors are due to the IVR proposing a category while the utterance's actual category is a NO-MATCH. OutCFG errors are due to the IVR proposing a NO-MATCH while the utterance actually has a valid category. Table Ill shows that the AUTOCFGPROCESSOR can successfully add good alternatives to the baseline CFG and improve the IVR performance. Autol consistently comes close to matching the performance of manual additions, though it adds more alternatives and hence achieves less MisCat errors and more InCFG errors than the manually tuned CFGs. Using the improved statistical selection mechanism to remove dormant word sequences from the CFGs, (Auto2) improves the performance of the IVR even in terms of InCFG errors. The multiple loop feedback using lexical chain strength presents better alternatives and hence, reduces the MisCat errors. IV. CONCLUSIONS AND FUTURE WORK This paper presented a novel method to automatically generate and tune grammars for directed dialog speech applications. The results show improved IVR performance closely matching the manual tuning performance. Non-domain-specific information sources based reranking, lexical chain based semantic category classification and, feedback based on semantic classification strength play an important role in improving the IVR performance. One of the future goals is to enable the proposed mechanism to handle grammar automation for user responses containing variable fields (dates, credit card number, etc.). The handling of transcription words missing from WordNet2.0 for the semantic categorization process will also be the focus of future work in this area. ACKNOWLEDGMENT We are grateful to Marta Tatu for many thoughtful suggestions and comments; Steven Ray, Mark Hittinger and Cathie Ranta for their priceless encouragement and support; and to the anonymous reviewers for their valuable feedback. REFERENCES [1] Intervoice Training Document - Voice User Interface Design - Speechworks 7.0 OSS/OSR and Naunce Speech Forms, Intervoice Inc., [2] M. Balakrishna, D. Moldovan, and E. K. Cave, "Higher level phonetic and linguistic knowledge to improve asr accuracy and its relevance in interactive voice response systems," in Proceedings of the AAAI Workshop on Spoken Language Understanding, [3] D. Moldovan and A. Novischi, "Lexical chains for question answering," in Proceedings of COLING 2002, [4] C. Fellbaum, Ed., WordNet: An Electronic Lexical Database. MIT Press, [5] A. Acero, Y. Y. Wang, and K. Wang, "A semantically structured language model," in Special Workshop in Maui (SWIM), [6] E. Chamiak, "Immediate-head parsing for language models," in Proceedings of ACL 2001, [7] P. Kingsbury, M. Palmer, and M. Marcus, "Adding semantic annotation to the penn treebank," in Proceedings of HLT 2002, [8] S. Pradhan, W. Ward, K. Hacioglu, J. H. Martin, and D. Jurafsky, "Shallow semantic parsing using support vector machines," in Proceedings of HLT-NAACL 2004, [9] B. Pellom, SONIC: The University of Colorado Continuous Speech Recognizer, University of Colorado, Boulder, Colorado, March

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information

Comparative Error Analysis of Dialog State Tracking

Comparative Error Analysis of Dialog State Tracking Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 rws@cs.ecu.edu Abstract A primary motivation

More information

A Comparative Analysis of Speech Recognition Platforms

A Comparative Analysis of Speech Recognition Platforms Communications of the IIMA Volume 9 Issue 3 Article 2 2009 A Comparative Analysis of Speech Recognition Platforms Ore A. Iona College Follow this and additional works at: http://scholarworks.lib.csusb.edu/ciima

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Automated Content Analysis of Discussion Transcripts

Automated Content Analysis of Discussion Transcripts Automated Content Analysis of Discussion Transcripts Vitomir Kovanović v.kovanovic@ed.ac.uk Dragan Gašević dgasevic@acm.org School of Informatics, University of Edinburgh Edinburgh, United Kingdom v.kovanovic@ed.ac.uk

More information

Speech Analytics Data Reliability: Accuracy and Completeness

Speech Analytics Data Reliability: Accuracy and Completeness Speech Analytics Data Reliability: Accuracy and Completeness THE PREREQUISITE TO OPTIMIZING CONTACT CENTER PERFORMANCE AND THE CUSTOMER EXPERIENCE Summary of Contents (Click on any heading below to jump

More information

Spoken Document Retrieval from Call-Center Conversations

Spoken Document Retrieval from Call-Center Conversations Spoken Document Retrieval from Call-Center Conversations Jonathan Mamou, David Carmel, Ron Hoory IBM Haifa Research Labs Haifa 31905, Israel {mamou,carmel,hoory}@il.ibm.com ABSTRACT We are interested in

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Worse than not knowing is having information that you didn t know you had. Let the data tell me my inherent

More information

CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN

CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN Min Tang, Bryan Pellom, Kadri Hacioglu Center for Spoken Language Research University of Colorado at Boulder Boulder, Colorado

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Corpus Design for a Unit Selection Database

Corpus Design for a Unit Selection Database Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus

More information

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Moving Enterprise Applications into VoiceXML. May 2002

Moving Enterprise Applications into VoiceXML. May 2002 Moving Enterprise Applications into VoiceXML May 2002 ViaFone Overview ViaFone connects mobile employees to to enterprise systems to to improve overall business performance. Enterprise Application Focus;

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Building A Vocabulary Self-Learning Speech Recognition System

Building A Vocabulary Self-Learning Speech Recognition System INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics

More information

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department,

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke 1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan 7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

customer care solutions

customer care solutions customer care solutions from Nuance white paper :: Understanding Natural Language Learning to speak customer-ese In recent years speech recognition systems have made impressive advances in their ability

More information

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing 1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: lurdes@sip.ucm.es)

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

Effective Self-Training for Parsing

Effective Self-Training for Parsing Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Brill s rule-based PoS tagger

Brill s rule-based PoS tagger Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

More information

Using classes has the potential of reducing the problem of sparseness of data by allowing generalizations

Using classes has the potential of reducing the problem of sparseness of data by allowing generalizations POS Tags and Decision Trees for Language Modeling Peter A. Heeman Department of Computer Science and Engineering Oregon Graduate Institute PO Box 91000, Portland OR 97291 heeman@cse.ogi.edu Abstract Language

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

VoiceXML-Based Dialogue Systems

VoiceXML-Based Dialogue Systems VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based

More information

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

The ROI. of Speech Tuning

The ROI. of Speech Tuning The ROI of Speech Tuning Executive Summary: Speech tuning is a process of improving speech applications after they have been deployed by reviewing how users interact with the system and testing changes.

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Parsing Software Requirements with an Ontology-based Semantic Role Labeler Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software

More information

What Is This, Anyway: Automatic Hypernym Discovery

What Is This, Anyway: Automatic Hypernym Discovery What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,

More information

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,

More information

A System for Labeling Self-Repairs in Speech 1

A System for Labeling Self-Repairs in Speech 1 A System for Labeling Self-Repairs in Speech 1 John Bear, John Dowding, Elizabeth Shriberg, Patti Price 1. Introduction This document outlines a system for labeling self-repairs in spontaneous speech.

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Survey: Retrieval of Video Using Content (Speech &Text) Information

Survey: Retrieval of Video Using Content (Speech &Text) Information Survey: Retrieval of Video Using Content (Speech &Text) Information Bhagwant B. Handge 1, Prof. N.R.Wankhade 2 1. M.E Student, Kalyani Charitable Trust s Late G.N. Sapkal College of Engineering, Nashik

More information

A Method for Automatic De-identification of Medical Records

A Method for Automatic De-identification of Medical Records A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract

More information

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014 Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook

More information

Speech Analytics. Whitepaper

Speech Analytics. Whitepaper Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the

More information

Question Answering and Multilingual CLEF 2008

Question Answering and Multilingual CLEF 2008 Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We

More information

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas

More information

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same 1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 7, SEPTEMBER 2008 System Combination for Machine Translation of Spoken and Written Language Evgeny Matusov, Student Member,

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

Efficient Discriminative Training of Long-span Language Models

Efficient Discriminative Training of Long-span Language Models Efficient Discriminative Training of Long-span Language Models Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language and Speech Processing, Johns

More information

Pattern based approach for Natural Language Interface to Database

Pattern based approach for Natural Language Interface to Database RESEARCH ARTICLE OPEN ACCESS Pattern based approach for Natural Language Interface to Database Niket Choudhary*, Sonal Gore** *(Department of Computer Engineering, Pimpri-Chinchwad College of Engineering,

More information

Scaling Shrinkage-Based Language Models

Scaling Shrinkage-Based Language Models Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com

More information

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning jtl@ifi.uio.no Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM

THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM THE RWTH ENGLISH LECTURE RECOGNITION SYSTEM Simon Wiesler 1, Kazuki Irie 2,, Zoltán Tüske 1, Ralf Schlüter 1, Hermann Ney 1,2 1 Human Language Technology and Pattern Recognition, Computer Science Department,

More information

A Survey on Product Aspect Ranking Techniques

A Survey on Product Aspect Ranking Techniques A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian

More information

RRSS - Rating Reviews Support System purpose built for movies recommendation

RRSS - Rating Reviews Support System purpose built for movies recommendation RRSS - Rating Reviews Support System purpose built for movies recommendation Grzegorz Dziczkowski 1,2 and Katarzyna Wegrzyn-Wolska 1 1 Ecole Superieur d Ingenieurs en Informatique et Genie des Telecommunicatiom

More information

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

PoS-tagging Italian texts with CORISTagger

PoS-tagging Italian texts with CORISTagger PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance

More information

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

Form. Settings, page 2 Element Data, page 7 Exit States, page 8 Audio Groups, page 9 Folder and Class Information, page 9 Events, page 10

Form. Settings, page 2 Element Data, page 7 Exit States, page 8 Audio Groups, page 9 Folder and Class Information, page 9 Events, page 10 The voice element is used to capture any input from the caller, based on application designer-specified grammars. The valid caller inputs can be specified either directly in the voice element settings

More information

Combining Slot-based Vector Space Model for Voice Book Search

Combining Slot-based Vector Space Model for Voice Book Search Combining Slot-based Vector Space Model for Voice Book Search Cheongjae Lee and Tatsuya Kawahara and Alexander Rudnicky Abstract We describe a hybrid approach to vector space model that improves accuracy

More information

Online Latent Structure Training for Language Acquisition

Online Latent Structure Training for Language Acquisition IJCAI 11 Online Latent Structure Training for Language Acquisition Michael Connor University of Illinois connor2@illinois.edu Cynthia Fisher University of Illinois cfisher@cyrus.psych.uiuc.edu Dan Roth

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information