Strongly Incremental Repair Detection

Size: px
Start display at page:

Download "Strongly Incremental Repair Detection"

Transcription

1 Strongly Incremental Repair Detection Julian Hough 1,2 1 Dialogue Systems Group Faculty of Linguistics and Literature Bielefeld University julian.hough@uni-bielefeld.de Matthew Purver 2 2 Cognitive Science Research Group School of Electronic Engineering and Computer Science Queen Mary University of London m.purver@qmul.ac.uk Abstract We present STIR (STrongly Incremental Repair detection), a system that detects speech repairs and it terms on transcripts incrementally with minimal latency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the different stages of repairs. Results on the Switchboard disfluency tagg corpus show utterance-final accuracy on a par with state-of-the-art incremental repair detection methods, but with better incremental accuracy, faster time-to-detection and less computational overhead. We evaluate its performance using incremental metrics and propose new repair processing evaluation standards. 1 Introduction Self-repairs in spontaneous speech are annotat according to a well establish three-phase structure from (Shriberg, 1994) onwards, and as describ in Meteer et al. (1995) s Switchboard corpus annotation handbook: John [ likes + {uh} } {{ } } {{ } reparandum interregnum loves ] Mary (1) } {{ } repair From a dialogue systems perspective, detecting repairs and assigning them the appropriate structure is vital for robust natural language understanding (NLU) in interactive systems. Downgrading the commitment of reparandum phases and assigning appropriate interregnum and repair phases permits computation of the user s intend meaning. Furthermore, the recent focus on incremental dialogue systems (see e.g. (Rieser and Schlangen, 2011)) means that repair detection should operate without unnecessary processing overhead, and function efficiently within an incremental framework. However, such left-to-right operability on its own is not sufficient: in line with the principle of strong incremental interpretation (Milward, 1991), a repair detector should give the best results possible as early as possible. With one exception (Zwarts et al., 2010), there has been no focus on evaluating or improving the incremental performance of repair detection. In this paper we present STIR (Strongly Incremental Repair detection), a system which addresses the challenges of incremental accuracy, computational complexity and latency in selfrepair detection, by making local decisions bas on relatively simple measures of fluency and similarity. Section 2 reviews state-of-the-art methods; Section 3 summarizes the challenges and explains our general approach; Section 4 explains STIR in detail; Section 5 explains our experimental set-up and novel evaluation metrics; Section 6 presents and discusses our results and Section 7 concludes. 2 Previous work Qian and Liu (2013) achieve the state of the art in Switchboard corpus self-repair detection, with an F-score for detecting reparandum words of using a three-step weight Max-Margin Markov network approach. Similarly, Georgila (2009) uses Integer Linear Programming post-processing of a CRF to achieve F-scores over 0.8 for reparandum start and repair start detection. However neither approach can operate incrementally. Recently, there has been increas interest in left-to-right repair detection: Rasooli and Tetreault (2014) and Honnibal and Johnson (2014) present dependency parsing systems with reparandum detection which perform similarly, the latter equalling Qian and Liu (2013) s F-score at However, while operating left-to-right, these systems are not design or evaluat for their incremental performance. The use of beam search over 78 Proceings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 78 89, October 25-29, 2014, Doha, Qatar. c 2014 Association for Computational Linguistics

2 different repair hypotheses in (Honnibal and Johnson, 2014) is likely to lead to unstable repair label sequences, and they report repair hypothesis jitter. Both of these systems use a non-monotonic dependency parsing approach that immiately removes the reparandum from the linguistic analysis of the utterance in terms of its dependency structure and repair-reparandum correspondence, which from a downstream NLU module s perspective is undesirable. Heeman and Allen (1999) and Miller and Schuler (2008) present earlier left-toright operational detectors which are less accurate and again give no indication of the incremental performance of their systems. While Heeman and Allen (1999) rely on repair structure template detection coupl with a multi-knowlge-source language model, the rarity of the tail of repair structures is likely to be the reason for lower performance: Hough and Purver (2013) show that only 39% of repair alignment structures appear at least twice in Switchboard, support by the 29% report by Heeman and Allen (1999) on the smaller TRAINS corpus. Miller and Schuler (2008) s encoding of repairs into a grammar also causes sparsity in training: repair is a general processing strategy not restrict to certain lexical items or POS tag sequences. The model we consider most suitable for incremental dialogue systems so far is Zwarts et al. (2010) s incremental version of Johnson and Charniak (2004) s noisy channel repair detector, as it incrementally applies structural repair analyses (rather than just identifying reparanda) and is evaluat for its incremental properties. Following (Johnson and Charniak, 2004), their system uses an n-gram language model train on roughly 100K utterances of reparandum-excis ( clean ) Switchboard data. Its channel model is a statistically-train S-TAG parser whose grammar has simple reparandum-repair alignment rule categories for its non-terminals (copy, delete, insert, substitute) and words for its terminals. The parser hypothesises all possible repair structures for the string consum so far in a chart, before pruning the unlikely ones. It performs equally well to the non-incremental model by the end of each utterance (F-score = 0.778), and can make detections early via the addition of a speculative next-word repair completion category to their S- TAG non-terminals. In terms of incremental performance, they report the novel evaluation metric of time-to-detection for correctly identifi repairs, achieving an average of 7.5 words from the start of the reparandum and 4.6 from the start of the repair phase. They also introduce delay accuracy, a word-by-word evaluation against goldstandard disfluency tags up to the word before the current word being consum (in their terms, the prefix boundary), giving a measure of the stability of the repair hypotheses. They report an F-score of at one word back from the current prefix boundary, increasing word-by-word until 6 words back where it reaches These results are the point-of-departure for our work. 3 Challenges and Approach In this section we summarize the challenges for incremental repair detection: computational complexity, repair hypothesis stability, latency of detection and repair structure identification. In 3.1 we explain how we address these. Computational complexity Approaches to detecting repair structures often use chart storage (Zwarts et al., 2010; Johnson and Charniak, 2004; Heeman and Allen, 1999), which poses a computational overhead: if considering all possible boundary points for a repair structure s 3 phases beginning on any word, for prefixes of length n the number of hypotheses can grow in the order O(n 4 ). Exploring a subset of this space is necessary for assigning entire repair structures as in (1) above, rather than just detecting reparanda: the (Johnson and Charniak, 2004; Zwarts et al., 2010) noisy-channel detector is the only system that applies such structures but the potential runtime complexity in decoding these with their S- TAG repair parser is O(n 5 ). In their approach, complexity is mitigat by imposing a maximum repair length (12 words), and also by using beam search with re-ranking (Lease et al., 2006; Zwarts and Johnson, 2011). If we wish to include full decoding of the repair s structure (as argu by Hough and Purver (2013) as necessary for full interpretation) whilst taking a strictly incremental and time-critical perspective, rucing this complexity by minimizing the size of this search space is crucial. Stability of repair hypotheses and latency Using a beam search of n-best hypotheses on a wordby-word basis can cause jitter in the detector s output. While utterance-final accuracy is desir, 79

3 for a truly incremental system good intermiate results are equally important. Zwarts et al. (2010) s time-to-detection results show their system is only certain about a detection after processing the entire repair. This may be due to the string alignment-inspir S-TAG that matches repair and reparanda: a rough copy dependency only becomes likely once the entire repair has been consum. The latency of 4.6 words to detection and a relatively slow rise to utterance-final accuracy up to 6 words back is undesirable given repairs have a mean reparandum length of 1.5 words (Hough and Purver, 2013; Shriberg and Stolcke, 1998). Structural identification Classifying repairs has been ignor in repair processing, despite the presence of distinct categories (e.g. repeats, substitutions, deletes) with different pragmatic effects (Hough and Purver, 2013). 1 This is perhaps due to lack of clarity in definition: even for human annotators, verbatim repeats withstanding, agreement is often poor (Hough and Purver, 2013; Shriberg, 1994). Assigning and evaluating repair (not just reparandum) structures will allow repair interpretation in future; however, work to date evaluates only reparandum detection. 3.1 Our approach To address the above, we propose an alternative to (Johnson and Charniak, 2004; Zwarts et al., 2010) s noisy channel model. While the model elegantly captures intuitions about parallelism in repairs and modelling fluency, it relies on stringmatching, motivat in a similar way to automatic spelling correction (Brill and Moore, 2000): it assumes a speaker chooses to utter fluent utterance X according to some prior distribution P (X), but a noisy channel causes them instead to utter a noisy Y according to channel model P (Y X). Estimating P (Y X) directly from observ data is difficult due to sparsity of repair instances, so a transducer is train on the rough copy alignments between reparandum and repair. This approach succes because repetition and simple substitution repairs are very common; but repair as a psychological process is not driven by string alignment, and deletes, restarts and rarer substitution forms are not captur. Furthermore, the noisy channel model assumes an inherently utteranceglobal process for generating (and therefore find- 1 Though see (Germesin et al., 2008) for one approach, albeit using idiosyncratic repair categories. ing) an underlying clean string much as similar spelling correction models are word-global we instead take a very local perspective here. In accordance with psycholinguistic evidence (Brennan and Schober, 2001), we assume characteristics of the repair onset allow hearers to detect it very quickly and solve the continuation problem (Levelt, 1983) of integrating the repair into their linguistic context immiately, before processing or even hearing the end of the repair phase. While repair onsets may take the form of interregna, this is not a reliable signal, occurring in only 15% of repairs (Hough and Purver, 2013; Heeman and Allen, 1999). Our repair onset detection is therefore driven by departures from fluency, via information-theoretic features deriv incrementally from a language model in line with recent psycholinguistic accounts of incremental parsing see (Keller, 2004; Jaeger and Tily, 2011). Considering the time-linear way a repair is process and the fact speakers are exponentially less likely to trace one word further back in repair as utterance length increases (Shriberg and Stolcke, 1998), backwards search seems to be the most efficient reparandum extent detection method. 2 Features determining the detection of the reparandum extent in the backwards search can also be information-theoretic: entropy measures of distributional parallelism can characterize not only rough copy dependencies, but distributionally similar or dissimilar correspondences between sequences. Finally, when detecting the repair end and structure, distributional information allows computation of the similarity between reparandum and repair. We argue a local-detectionwith-backtracking approach is more cognitively plausible than string-bas left-to-right repair labelling, and using this insight should allow an improvement in incremental accuracy, stability and time-to-detection over string-alignment driven approaches in repair detection. 4 STIR: Strongly Incremental Repair detection Our system, STIR (Strongly Incremental Repair detection), therefore takes a local incremental ap- 2 We acknowlge a purely position-bas model for reparandum extent detection under-estimates prepositions, which speakers favour as the retrace start and over-estimates verbs, which speakers tend to avoid retracing back to, preferring to begin the utterance again, as (Healey et al., 2011) s experiments also demonstrate. 80

4 John likes S 0 S 1 S 2 T0 proach to detecting repairs and isolat it terms, assigning words the structures in (2). We include interregnum recognition in the process, due to the inclusion of interregnum vocabulary within it term vocabulary (Ginzburg, 2012; Hough and Purver, 2013), a useful feature for repair detection (Lease et al., 2006; Qian and Liu, 2013). John likes uh {...[rm start...rm end + {}...rp end ]......{}... (2) S 0 S 1 S 2 S 3 John likes uh loves? S 0 S 1 S 2 S 3 S 4 John likes rm start rm end uh loves T1 T2 Rather than detecting the repair structure in its left-to-right string order as above, STIR functions as in Figure 1: first detecting it terms (possibly interregna) at step T1; then detecting repair onsets at T2; if one is found, backwards searching to find rm start at T3; then finally finding the repair end rp end at T4. Step T1 relies mainly on lexical probabilities from an it term language model; T2 exploits features of divergence from a fluent language model; T3 uses fluency of hypothesis repairs; and T4 the similarity between distributions after reparandum and repair. However, each stage integrates these basic insights via multiple relat features in a statistical classifier. S 0 S 1 John S 0 S 1 John S 0 S 1 rm start rm end S 2 S 3 likes rm start rm end rm start uh rm end S 2 S 3 likes rm start rm end rm start uh rm end S 2 S 3 loves rp sub end S 4 rp sub end loves rp sub end S 4 rp sub end S 4 Mary Figure 1: Strongly Incremental Repair Detection T3 T4 S 5 T5 4.1 Enrich incremental language models We derive the basic information-theoretic features requir using n-gram language models, as they have a long history of information theoretic analysis (Shannon, 1948) and provide reproducible results without forcing commitment to one particular grammar formalism. Following recent work on modelling grammaticality judgements (Clark et al., 2013), we implement several modifications to standard language models to develop our basic measures of fluency and uncertainty. For our main fluent language models we train a trigram model with Kneser-Ney smoothing (Kneser and Ney, 1995) on the words and POS tags of the standard Switchboard training data (all files with conversation numbers beginning sw2*,sw3* in the Penn Treebank III release), consisting of 100K utterances, 600K words. We follow (Johnson and Charniak, 2004) by cleaning the data of disfluencies (i.e. it terms and reparanda), to approximate a fluent language model. We call these probabilities p lex kn, ppos kn below. 3 3 We suppress the pos and lex superscripts below where we refer to measures from either model. 81

5 We then derive surprisal as our principal default lexical uncertainty measurement s (equation 3) in both models; and, following (Clark et al., 2013), the (unigram) Weight Mean Log trigram probability (WML, eq. 4) the trigram logprob of the sequence divid by the inverse summ logprob of the component unigrams (apart from the first two words in the sequence, which serve as the first trigram history). As here we use a local approach we restrict the WML measures to single trigrams (weight by the inverse logprob of the final word). While use of standard n-gram probability conflates syntactic with lexical probability, WML gives us an approximation to incremental syntactic probability by factoring out lexical frequency. s(w i 2... w i) = log 2 p kn (w i w i 2, w i 1) (3) WML(w 0... w n) = i=n i=2 log 2 p kn (w i w i 2, w i 1) n j=2 log 2 p kn(w j) (4) Distributional measures To approximate uncertainty, we also derive the entropy H(w c) of the possible word continuations w given a context c, from p(w i c) for all words w i in the vocabulary see (5). Calculating distributions over the entire lexicon incrementally is costly, so we approximate this by constraining the calculation to words which are observ at least once in context c in training, w c = {w count(c, w) 1}, assuming a uniform distribution over the unseen suffixes by using the appropriate smoothing constant, and subtracting the latter from the former see eq. (6). Manual inspection show this approximation to be very close, and the trie structure of our n- gram models allows efficient calculation. We also make use of the Zipfian distribution of n-grams in corpora by storing entropy values for the 20% most common trigram contexts observ in training, leaving entropy values of rare or unseen contexts to be comput at decoding time with little search cost due to their small or empty w c sets. H(w c) = p kn (w c) log 2 p kn (w c) (5) H(w c) [ w V ocab w w c p kn (w c) log 2 p kn (w c) [n λ log 2 λ] where n = V ocab w c and λ = 1 w w c p kn (w c) n ] (6) Given entropy estimates, we can also similarly approximate the Kullback-Leibler (KL) divergence (relative entropy) between distributions in two different contexts c 1 and c 2, i.e. θ(w c 1 ) and θ(w c 2 ), by pair-wise computing p(w c 1 ) log 2 ( p(w c 1) p(w c 2 ) ) only for words w wc 1 w c2, then approximating unseen values by assuming uniform distributions. Using p kn smooth estimates rather than raw maximum likelihood estimations avoids infinite KL divergence values. Again, we found this approximation sufficiently close to the real values for our purposes. All such probability and distribution values are stor in incrementally construct direct acyclic graph (DAG) structures (see Figure 1), exploiting the Markov assumption of n-gram models to allow efficient calculation by avoiding re-computation. 4.2 Individual classifiers This section details the features us by the 4 individual classifiers. To investigate the utility of the features us in each classifier we obtain values on the standard Switchboard heldout data (PTB III files sw4[5-9]*: 6.4K utterances, 49K words) Edit term detection In the first component, we utilise the well-known observation that it terms have a distinctive vocabulary (Ginzburg, 2012), training a bigram model on a corpus of all it words annotat in Switchboard s training data. The classifier simply uses the surprisal s lex from this it word model, and the trigram surprisal s lex from the standard fluent model of Section 4.1. At the current position w n, one, both or none of words w n and w n 1 are classifi as its. We found this simple approach effective and stable, although some delay decisions occur in cases where s lex and WML lex are high in both models before the end of the it, e.g. I like I {like} want.... Words classifi as are remov from the incremental processing graph (indicat by the dott line transition in Figure 1) and the stack updat if repair hypotheses are cancell due to a delay it hypothesis of w n Repair start detection Repair onset detection is arguably the most crucial component: the greater its accuracy, the better the input for downstream components and the lesser the overhead of filtering false positives requir. 82

6 WML i havent had any good really very good experience with child care Figure 2: WML lex values for trigrams for a repair utterance exhibiting the drop at the repair onset We use Section 4.1 s information-theoretic features s, WML, H for words and POS, and introduce 5 additional information-theoretic features: WML is the difference between the WML values at w n 1 and w n ; H is the difference in entropy between w n 1 and w n ; InformationGain is the difference between expect entropy at w n 1 and observ s at w n, a measure that factors out the effect of naturally high entropy contexts; BestEntropyRuce is the best ruction in entropy possible by an early rough hypothesis of reparandum onsets within 3 words; and BestWMLBoost similarly speculates on the best improvement of WML possible by positing rm start positions up to 3 words back. We also include simple alignment features: binary features which indicate if the word w i x is identical to the current word w i for x {1, 2, 3}. With 6 alignment features, 16 N-gram features and a single logical feature it which indicates the presence of an it word at position w i 1, detection uses 23 features see Table 1. We hypothesis repair onsets would have significantly lower p lex (lower lexicalsyntactic probability) and WML lex (lower syntactic probability) than other fluent trigrams. This was the case in the Switchboard heldout data for both measures, with the biggest difference obtain for WML lex (non-repair-onsets: (sd=0.359); repair onsets: (sd=0.359)). In the POS model, entropy of continuation H pos was the strongest feature (non-repair-onsets: (sd=0.769); repair onsets: (sd=0.899)). The trigram WML lex measure for the repair utterance I haven t had any [ good + really very good ] experience with child care can be seen in Figure 2. The steep drop at the repair onset shows the usefulness of WML features for fluency measures. To compare n-gram measures against other local features, we rank the features by Information Gain using 10-fold cross validation over the Switchboard heldout data see Table 1. The language model features are far more discriminative than the alignment features, showing the potential of a general information-theoretic approach Reparandum start detection In detecting rm start positions given a hypothesis (stage T3 in Figure 1), we use the noisy channel intuition that removing the reparandum (from rm start to ) increases fluency of the utterance, express here as WMLboost as describ above. When using gold standard input we found this was the case on the heldout data, with a mean WMLboost of (sd=0.267) for reparandum onsets and (sd=0.224) for other words in the 6-word history- the negative boost for non-reparandum words captures the intuition that backtracking from those points would make the utterance less grammatical, and conversely the boost afford by the correct rm start detection helps solve the continuation problem for the listener (and our detector). Parallelism in the onsets of and rm start can also help solve the continuation problem, and in fact the KL divergence between θ pos (w rm start, rm start 1 ) and θ pos (w, 1 ) is the second most useful feature with average merit ( ) in cross- 83

7 validation. The highest rank feature is WML (0.437 ( )) which here encodes the drop in the WMLboost from one backtrack position to the next. In ranking the 32 features we use, again information-theoretic ones are higher rank than the logical features. average merit average rank attribute ( ) 1 ( ) H pos ( ) 2 ( ) WML pos ( ) 3.4 ( ) WML lex ( ) 4 ( ) s pos ( ) 5.9 ( ) w i 1 = w i ( ) 5.9 ( ) BestWMLBoost lex ( ) 5.9 ( ) InformationGain pos ( ) 7.9 ( ) BestWMLBoost pos ( ) 9 ( ) H lex 0.08 ( ) 10.4 ( ) WML pos 0.08 ( ) 10.6 ( ) H pos ( ) 12 ( ) POS i 1 = POS i ( ) 13.1 ( ) s lex ( ) 14.2 ( ) WML lex ( ) 14.7 ( ) BestEntropyRuce pos ( ) 16.3 ( ) InformationGain lex ( ) 16.7 ( ) BestEntropyRuce lex ( ) 18 ( ) H lex ( ) 19 ( ) w i 2 = w i ( ) 20 ( ) POS i 2 = POS i 0.01 ( ) 21 ( ) w i 3 = w i ( ) 22 ( ) it ( ) 23 ( ) POS i 3 = POS i Table 1: Feature ranker (Information Gain) for detection- 10-fold x-validation on Switchboard heldout data Repair end detection and structure classification For rp end detection, using the notion of parallelism, we hypothesise an effect of divergence between θ lex at the reparandum-final word rm end and the repair-final word rp end : for repetition repairs, KL divergence will trivially be 0; for substitutions, it will be higher; for deletes, even higher. Upon inspection of our feature ranking this KL measure rank 5th out of 23 features (merit= ( )). We introduce another feature encoding parallelism ReparandumRepairDifference : the difference in probability between an utterance clean of the reparandum and the utterance with its repair phase substituting its reparandum. In both the POS (merit=0.366 ( )) and word (merit=0.352 ( )) LMs, this was the most discriminative feature. 4.3 Classifier pipeline STIR effects a pipeline of classifiers as in Figure 3, where the classifier only permits non words to be pass on to classification and for rp end classification of the active repair hypotheses, maintain in a stack. The classifier passes positive repair hypotheses to the rm start classifier, which backwards searches up to 7 words back in the utterance. If a rm start is classifi, the output is pass on for rp end classification at the end of the pipeline, and if not reject this is push onto the repair stack. Repair hypotheses are are popp off when the string is 7 words beyond its position. Putting limits on the stack s storage space is a way of controlling for processing overhead and complexity. Embd repairs whose rm start coincide with another s are easily dealt with as they are add to the stack as separate hypotheses. 4 Classifiers Classifiers are implement using Random Forests (Breiman, 2001) and we use different error functions for each stage using Meta- Cost (Domingos, 1999). The flexibility afford by implementing adjustable error functions in a pipelin incremental processor allows control of the trade-off of immiate accuracy against runtime and stability of the sequence classification. Processing complexity This pipeline avoids an exhaustive search all repair hypotheses. If we limit the search to within the rm start, possibilities, this number of repairs grows approximately n(n+1) in the triangular number series i.e. 2, a nest loop over previous words as n gets increment which in terms of a complexity class is a quadratic O(n 2 ). If we allow more than one rm start, hypothesis per word, the complexity goes up to O(n 3 ), however in the tests that we describe below, we are able to achieve good detection results without permitting this extra search space. Under our assumption that reparandum onset detection is only trigger after repair onset detection, and repair extent detection is dependent on positive reparandum onset detection, a pipeline with accurate components will allow us to limit processing to a small subset of this search space. 4 We constrain the problem not to include embd deletes which may share their word with another repair these are in practice very rare. 84

8 Figure 3: Classifier pipeline 5 Experimental set-up We train STIR on the Switchboard data describ above, and test it on the standard Switchboard test data (PTB III files 4[0-1]*). In order to avoid overfitting of classifiers to the basic language models, we use a cross-fold training approach: we divide the corpus into 10 folds and use language models train on 9 folds to obtain feature values for the 10th fold, repeating for all 10. Classifiers are then train as standard on the resulting featureannotat corpus. This result in better feature utility for n-grams and better F-score results for detection in all components in the order of 5-6%. 5 Training the classifiers Each Random Forest classifier was limit to 20 trees of maximum depth 4 nodes, putting a ceiling on decoding time. In making the classifiers cost-sensitive, MetaCost resamples the data in accordance with the cost functions: we found using 10 iterations over a resample of 25% of the training data gave the most effective trade-off between training time and accuracy. 6 We use 8 different cost functions in with differing costs for false negatives and positives of the form below, where R is a repair element word and F is a fluent onset: ( Rhyp F hyp ) R gold 0 2 We adopt a similar technique in rm start using 5 different cost functions and in rp end using 8 different settings, which when combin gives a total of 320 different cost function configurations. We hypothesise that higher recall permitt in the pipeline s first components would result in better overall accuracy as these hypotheses become refin, though at the cost of the stability of the hy- 5 Zwarts and Johnson (2011) take a similar approach on Switchboard data to train a re-ranker of repair analyses. 6 As (Domingos, 1999) demonstrat, there are only relatively small accuracy gains when using more than this, with training time increasing in the order of the re-sample size. potheses of the sequence and extra downstream processing in pruning false positives. We also experiment with the number of repair hypotheses permitt per word, using limits of 1- best and 2-best hypotheses. We expect that allowing 2 hypotheses to be explor per should allow greater final accuracy, but with the trade-off of greater decoding and training complexity, and possible incremental instability. As we wish to explore the incrementality versus final accuracy trade-off that STIR can achieve we now describe the evaluation metrics we employ. 5.1 Incremental evaluation metrics Following (Baumann et al., 2011) we divide our evaluation metrics into similarity metrics (measures of equality with or similarity to a gold standard), timing metrics (measures of the timing of relevant phenomena detect from the gold standard) and diachronic metrics (evolution of incremental hypotheses over time). Similarity metrics For direct comparison to previous approaches we use the standard measure of overall accuracy, the F-score over reparandum words, which we abbreviate F rm (see 7): precision = rmcorrect rm hyp recall = rmcorrect F rm = 2 rm gold precision recall precision + recall We are also interest in repair structural classification, we also measure F-score over all repair components (rm words, words as interregna and rp words), a metric we abbreviate F s. This is not measur in standard repair detection on Switchboard. To investigate incremental accuracy we evaluate the delay accuracy (DA) introduc by (Zwarts et al., 2010), as describ in section 2 against the utterance-final gold standard disfluency annotations, and use the mean of the 6 word F-scores. (7) 85

9 Input and current repair labels its John John likes rm rp John likes uh John likes uh loves rm rp John likes uh loves Mary rm rp ( rm) ( rp) ( rm) ( rp) rm rp Figure 4: Edit Overhead- 4 unnecessary its Timing and resource metrics Again for comparative purposes we use Zwarts et al s time-todetection metrics, that is the two average distances (in numbers of words) consum before first detection of gold standard repairs, one from rm start, TD rm and one from, TD rp. In our 1-best detection system, before evaluation we know a priori TD rp will be 1 token, and TD rm will be 1 more than the average length of rm start repair spans correctly detect. However when we introduce a beam where multiple rm start s are possible per with the most likely hypothesis committ as the current output, the latency may begin to increase: the initially most probable hypothesis may not be the correct one. In addition to output timing metrics, we account for intrinsic processing complexity with the metric processing overhead (PO), which is the number of classifications made by all components per word of input. Diachronic metrics To measure stability of repair hypotheses over time we use (Baumann et al., 2011) s it overhead (EO) metric. EO measures the proportion of its (add, revoke, substitute) appli to a processor s output structure that are unnecessary. STIR s output is the repair label sequence shown in Figure 1, however rather than evaluating its EO against the current gold standard labels, we use a new mark-up we term the incremental repair gold standard: this does not penalise lack of detection of a reparandum word rm as a bad it until the corresponding of that rm has been consum. While F rm, F s and DA evaluate against what Baumann et al. (2011) call the current gold standard, the incremental gold standard reflects the repair processing approach we set out in 3. An example of a repair utterance with an EO of 44% ( 4 9 ) can be seen in Figure 4: of the 9 its (7 repair annotations and 2 correct fluent words), 4 are unnecessary (bracket). Note Figure 6: Delay Accuracy Curves the final rm is not count as a bad it for the reasons just given. 6 Results and Discussion We evaluate on the Switchboard test data; Table 2 shows results of the best performing settings for each of the metrics describ above, together with the setting achieving the highest total score (TS) the average % achiev of the best performing system s result in each metric. 7 The settings found to achieve the highest F rm (the metric standardly us in disfluency detection), and that found to achieve the highest TS for each stage in the pipeline are shown in Figure 5. Our experiments show that different system settings perform better in different metrics, and no individual setting achiev the best result in all of them. Our best utterance-final F rm reaches 0.779, marginally though not significantly exceing (Zwarts et al., 2010) s measure and STIR achieves on the previously unevaluat F s. The setting with the best DA improves on (Zwarts et al., 2010) s result significantly in terms of mean values (0.718 vs ), and also in terms of the steepness of the curves (Figure 6). The fastest average time to detection is 1 word for TD rp and 2.6 words for TD rm (Table 3), improving dramatically on the noisy channel model s 4.6 and 7.5 words. Incrementality versus accuracy trade-off We aim to investigate how well a system could do in terms of achieving both good final accuracy and incremental performance, and while the best F rm setting had a large PO and relatively slow DA increase, we find STIR can find a good trade-off set- 7 We do not include time-to-detection scores in TS as it did not vary enough between settings to be significant, however there was a difference in this measure between the 1-best stack condition and the 2-best stack condition see below. 86

10 rp hyp start F hyp rp gold start 0 64 rp hyp start F hyp rp gold start 0 2 rm hyp start F hyp rm gold start 0 8 rm hyp start F hyp rm gold start 0 16 rp hyp end F hyp rp gold end 0 2 rp hyp end F hyp rp gold end 0 8 Stack depth = 2 Stack depth = 1 Figure 5: The cost function settings for the MetaCost classifiers for each component, for the best F rm setting (top row) and best total score (TS) setting (bottom row) F rm F s DA EO PO Best Final rm F-score (F rm ) Best Final repair structure F-score (F s ) Best Delay Accuracy of rm (DA) Best (lowest) Edit Overhead (EO) Best (lowest) Processing Overhead (PO) Best Total Score (mean % of best scores) (TS) Table 2: Comparison of the best performing system settings using different measures F rm F s DA EO PO TD rp TD rm 1-best rm start best rm start Table 3: Comparison of performance of systems with different stack capacities ting: the highest TS scoring setting achieves an F rm of whilst also exhibiting a very good DA (0.711) over 98% of the best record score and low PO and EO rates over 96% of the best record scores. See the bottom row of Table 2. As can be seen in Figure 5, the cost functions for these winning settings are different in nature. The best non-incremental F rm measure setting requires high recall for the rest of the pipeline to work on, using the highest cost, 64, for false negative words and the highest stack depth of 2 (similar to a wider beam); but the best overall TS scoring system uses a less permissive setting to increase incremental performance. We make a preliminary investigation into the effect of increasing the stack capacity by comparing stacks with 1-best rm start hypotheses per and 2-best stacks. The average differences between the two conditions is shown in Table 3. Moving to the 2-stack condition results in gain in overall accuracy in F rm and F s, but at the cost of EO and also time-to-detection scores TD rm and TD rp. The extent to which the stack can be increas without increasing jitter, latency and complexity will be investigat in future work. 7 Conclusion We have present STIR, an incremental repair detector that can be us to experiment with incremental performance and accuracy trade-offs. In future work we plan to include probabilistic and distributional features from a top-down incremental parser e.g. Roark et al. (2009), and use STIR s distributional features to classify repair type. Acknowlgements We thank the three anonymous EMNLP reviewers for their helpful comments. Hough is support by the DUEL project, financially support by the Agence Nationale de la Research (grant number ANR-13-FRAL-0001) and the Deutsche Forschungsgemainschaft. Much of the work was carri out with support from an EPSRC DTA scholarship at Queen Mary University of London. Purver is partly support by ConCreTe: the project ConCreTe acknowlges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET grant number

11 References T. Baumann, O. Buß, and D. Schlangen Evaluation and optimisation of incremental processors. Dialogue & Discourse, 2(1): Leo Breiman Random forests. Machine learning, 45(1):5 32. S.E. Brennan and M.F. Schober How listeners compensate for disfluencies in spontaneous speech. Journal of Memory and Language, 44(2): Eric Brill and Robert C Moore An improv error model for noisy channel spelling correction. In Proceings of the 38th Annual Meeting on Association for Computational Linguistics, pages Association for Computational Linguistics. Alexander Clark, Gianluca Giorgolo, and Shalom Lappin Statistical representation of grammaticality judgements: the limits of n-gram models. In Proceings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pages 28 36, Sofia, Bulgaria, August. Association for Computational Linguistics. Pro Domingos Metacost: A general method for making classifiers cost-sensitive. In Proceings of the fifth ACM SIGKDD international conference on Knowlge discovery and data mining, pages ACM. Kallirroi Georgila Using integer linear programming for detecting speech disfluencies. In Proceings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages Association for Computational Linguistics. Sebastian Germesin, Tilman Becker, and Peter Poller Hybrid multi-step disfluency detection. In Machine Learning for Multimodal Interaction, pages Springer. Jonathan Ginzburg The Interactive Stance: Meaning for Conversation. Oxford University Press. P. G. T. Healey, Arash Eshghi, Christine Howes, and Matthew Purver Making a contribution: Processing clarification requests in dialogue. In Proceings of the 21st Annual Meeting of the Society for Text and Discourse, Poitiers, July. Peter Heeman and James Allen Speech repairs, intonational phrases, and discourse markers: modeling speakers utterances in spoken dialogue. Computational Linguistics, 25(4): Matthew Honnibal and Mark Johnson Joint incremental disfluency detection and dependency parsing. Transactions of the Association of Computational Linugistics (TACL), 2: Julian Hough and Matthew Purver Modelling expectation in the self-repair processing of annotat-, um, listeners. In Proceings of the 17th SemDial Workshop on the Semantics and Pragmatics of Dialogue (DialDam), pages , Amsterdam, December. T Florian Jaeger and Harry Tily On language utility: Processing complexity and communicative efficiency. Wiley Interdisciplinary Reviews: Cognitive Science, 2(3): Mark Johnson and Eugene Charniak A TAGbas noisy channel model of speech repairs. In Proceings of the 42nd Annual Meeting on Association for Computational Linguistics, pages 33 39, Barcelona. Association for Computational Linguistics. Frank Keller The entropy rate principle as a prictor of processing effort: An evaluation against eye-tracking data. In EMNLP, pages Reinhard Kneser and Hermann Ney Improv backing-off for m-gram language modeling. In Acoustics, Speech, and Signal Processing, ICASSP-95., 1995 International Conference on, volume 1, pages IEEE. Matthew Lease, Mark Johnson, and Eugene Charniak Recognizing disfluencies in conversational speech. Audio, Speech, and Language Processing, IEEE Transactions on, 14(5): W.J.M. Levelt Monitoring and self-repair in speech. Cognition, 14(1): M. Meteer, A. Taylor, R. MacIntyre, and R. Iyer Disfluency annotation stylebook for the switchboard corpus. ms. Technical report, Department of Computer and Information Science, University of Pennsylvania. Tim Miller and William Schuler A syntactic time-series model for parsing fluent and disfluent speech. In Proceings of the 22nd International Conference on Computational Linguistics-Volume 1, pages Association for Computational Linguistics. David Milward Axiomatic Grammar, Non- Constituent Coordination and Incremental Interpretation. Ph.D. thesis, University of Cambridge. Xian Qian and Yang Liu Disfluency detection using multi-step stack learning. In Proceings of NAACL-HLT, pages Mohammad Sadegh Rasooli and Joel Tetreault Non-monotonic parsing of fluent umm I mean disfluent sentences. EACL 2014, pages Hannes Rieser and David Schlangen Introduction to the special issue on incremental processing in dialogue. Dialogue & Discourse, 2(1):

12 Brian Roark, Asaf Bachrach, Carlos Cardenas, and Christophe Pallier Deriving lexical and syntactic expectation-bas measures for psycholinguistic modeling via incremental top-down parsing. In Proceings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, pages Association for Computational Linguistics. Claude E. Shannon A mathematical theory of communication. technical journal. AT & T Bell Labs. Elizabeth Shriberg and Andreas Stolcke How far do speakers back up in repairs? A quantitative model. In Proceings of the International Conference on Spoken Language Processing, pages Elizabeth Shriberg Preliminaries to a Theory of Speech Disfluencies. Ph.D. thesis, University of California, Berkeley. Simon Zwarts and Mark Johnson The impact of language models and loss functions on repair disfluency detection. In Proceings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 11, pages , Stroudsburg, PA, USA. Association for Computational Linguistics. Simon Zwarts, Mark Johnson, and Robert Dale Detecting speech repairs incrementally using a noisy channel approach. In Proceings of the 23rd International Conference on Computational Linguistics, COLING 10, pages , Stroudsburg, PA, USA. Association for Computational Linguistics. 89

Effective Self-Training for Parsing

Effective Self-Training for Parsing Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

DEPENDENCY PARSING JOAKIM NIVRE

DEPENDENCY PARSING JOAKIM NIVRE DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes

More information

A System for Labeling Self-Repairs in Speech 1

A System for Labeling Self-Repairs in Speech 1 A System for Labeling Self-Repairs in Speech 1 John Bear, John Dowding, Elizabeth Shriberg, Patti Price 1. Introduction This document outlines a system for labeling self-repairs in spontaneous speech.

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

Brill s rule-based PoS tagger

Brill s rule-based PoS tagger Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based

More information

Chapter 7. Language models. Statistical Machine Translation

Chapter 7. Language models. Statistical Machine Translation Chapter 7 Language models Statistical Machine Translation Language models Language models answer the question: How likely is a string of English words good English? Help with reordering p lm (the house

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Using the Amazon Mechanical Turk for Transcription of Spoken Language Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.

More information

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke 1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

More information

Chapter 6. Decoding. Statistical Machine Translation

Chapter 6. Decoding. Statistical Machine Translation Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for translation p(e f) Task of decoding: find the translation e best with highest probability Two types of error

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

Building A Vocabulary Self-Learning Speech Recognition System

Building A Vocabulary Self-Learning Speech Recognition System INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes

More information

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction Uwe D. Reichel Department of Phonetics and Speech Communication University of Munich reichelu@phonetik.uni-muenchen.de Abstract

More information

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information

Tekniker för storskalig parsning

Tekniker för storskalig parsning Tekniker för storskalig parsning Diskriminativa modeller Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Tekniker för storskalig parsning 1(19) Generative

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Generating Training Data for Medical Dictations

Generating Training Data for Medical Dictations Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Comparative Error Analysis of Dialog State Tracking

Comparative Error Analysis of Dialog State Tracking Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 rws@cs.ecu.edu Abstract A primary motivation

More information

Using classes has the potential of reducing the problem of sparseness of data by allowing generalizations

Using classes has the potential of reducing the problem of sparseness of data by allowing generalizations POS Tags and Decision Trees for Language Modeling Peter A. Heeman Department of Computer Science and Engineering Oregon Graduate Institute PO Box 91000, Portland OR 97291 heeman@cse.ogi.edu Abstract Language

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

Mining a Corpus of Job Ads

Mining a Corpus of Job Ads Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

CS 533: Natural Language. Word Prediction

CS 533: Natural Language. Word Prediction CS 533: Natural Language Processing Lecture 03 N-Gram Models and Algorithms CS 533: Natural Language Processing Lecture 01 1 Word Prediction Suppose you read the following sequence of words: Sue swallowed

More information

Incremental ASR, NLU and Dialogue Management in the Potsdam INPRO P2 System

Incremental ASR, NLU and Dialogue Management in the Potsdam INPRO P2 System Incremental ASR, NLU and Dialogue Management in the Potsdam INPRO P2 System Timo Baumann, Michaela Atterer, Okko Buss IVI Workshop, Bielefeld, 2009-06-08 Context: Spoken Dialogue Systems ASR TTS visual

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM. Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3

THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM. Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3 Liu1;3 THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3 1International Computer Science Institute, USA2SRI International,

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Tracking Lexical and Syntactic Alignment in Conversation

Tracking Lexical and Syntactic Alignment in Conversation Tracking Lexical and Syntactic Alignment in Conversation Christine Howes, Patrick G. T. Healey and Matthew Purver {chrizba,ph,mpurver}@dcs.qmul.ac.uk Queen Mary University of London Interaction, Media

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

More information

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,

More information

A chart generator for the Dutch Alpino grammar

A chart generator for the Dutch Alpino grammar June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

PTE Academic Preparation Course Outline

PTE Academic Preparation Course Outline PTE Academic Preparation Course Outline August 2011 V2 Pearson Education Ltd 2011. No part of this publication may be reproduced without the prior permission of Pearson Education Ltd. Introduction The

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

Reading Assistant: Technology for Guided Oral Reading

Reading Assistant: Technology for Guided Oral Reading A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director

More information

Strategies for Training Large Scale Neural Network Language Models

Strategies for Training Large Scale Neural Network Language Models Strategies for Training Large Scale Neural Network Language Models Tomáš Mikolov #1, Anoop Deoras 2, Daniel Povey 3, Lukáš Burget #4, Jan Honza Černocký #5 # Brno University of Technology, Speech@FIT,

More information

A Systematic Cross-Comparison of Sequence Classifiers

A Systematic Cross-Comparison of Sequence Classifiers A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel grurgrur@gmail.com, feldman@cs.biu.ac.il,

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Automated Content Analysis of Discussion Transcripts

Automated Content Analysis of Discussion Transcripts Automated Content Analysis of Discussion Transcripts Vitomir Kovanović v.kovanovic@ed.ac.uk Dragan Gašević dgasevic@acm.org School of Informatics, University of Edinburgh Edinburgh, United Kingdom v.kovanovic@ed.ac.uk

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language

More information

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

Twitter sentiment vs. Stock price!

Twitter sentiment vs. Stock price! Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

Semantic parsing with Structured SVM Ensemble Classification Models

Semantic parsing with Structured SVM Ensemble Classification Models Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,

More information

Reading errors made by skilled and unskilled readers: evaluating a system that generates reports for people with poor literacy

Reading errors made by skilled and unskilled readers: evaluating a system that generates reports for people with poor literacy University of Aberdeen Department of Computing Science Technical Report AUCS/TR040 Reading errors made by skilled and unskilled readers: evaluating a system that generates reports for people with poor

More information

PCFGs with Syntactic and Prosodic Indicators of Speech Repairs

PCFGs with Syntactic and Prosodic Indicators of Speech Repairs PCFGs with Syntactic and Prosodic Indicators of Speech Repairs John Hale a Izhak Shafran b Lisa Yung c Bonnie Dorr d Mary Harper de Anna Krasnyanskaya f Matthew Lease g Yang Liu h Brian Roark i Matthew

More information

Applying Repair Processing in Chinese Homophone Disambiguation

Applying Repair Processing in Chinese Homophone Disambiguation Applying Repair Processing in Chinese Homophone Disambiguation Yue-Shi Lee and Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan, R.O.C.

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Scaling Shrinkage-Based Language Models

Scaling Shrinkage-Based Language Models Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com

More information

II. RELATED WORK. Sentiment Mining

II. RELATED WORK. Sentiment Mining Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

A Logistic Regression Approach to Ad Click Prediction

A Logistic Regression Approach to Ad Click Prediction A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Data Deduplication in Slovak Corpora

Data Deduplication in Slovak Corpora Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

The OpenGrm open-source finite-state grammar software libraries

The OpenGrm open-source finite-state grammar software libraries The OpenGrm open-source finite-state grammar software libraries Brian Roark Richard Sproat Cyril Allauzen Michael Riley Jeffrey Sorensen & Terry Tai Oregon Health & Science University, Portland, Oregon

More information

Log-Linear Models. Michael Collins

Log-Linear Models. Michael Collins Log-Linear Models Michael Collins 1 Introduction This note describes log-linear models, which are very widely used in natural language processing. A key advantage of log-linear models is their flexibility:

More information

Perplexity Method on the N-gram Language Model Based on Hadoop Framework

Perplexity Method on the N-gram Language Model Based on Hadoop Framework 94 International Arab Journal of e-technology, Vol. 4, No. 2, June 2015 Perplexity Method on the N-gram Language Model Based on Hadoop Framework Tahani Mahmoud Allam 1, Hatem Abdelkader 2 and Elsayed Sallam

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

PoS-tagging Italian texts with CORISTagger

PoS-tagging Italian texts with CORISTagger PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Package syuzhet. February 22, 2015

Package syuzhet. February 22, 2015 Type Package Package syuzhet February 22, 2015 Title Extracts Sentiment and Sentiment-Derived Plot Arcs from Text Version 0.2.0 Date 2015-01-20 Maintainer Matthew Jockers Extracts

More information

Identifying SPAM with Predictive Models

Identifying SPAM with Predictive Models Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to

More information

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

TREC 2007 ciqa Task: University of Maryland

TREC 2007 ciqa Task: University of Maryland TREC 2007 ciqa Task: University of Maryland Nitin Madnani, Jimmy Lin, and Bonnie Dorr University of Maryland College Park, Maryland, USA nmadnani,jimmylin,bonnie@umiacs.umd.edu 1 The ciqa Task Information

More information