EASY CONTEXTUAL INTENT PREDICTION AND SLOT DETECTION
|
|
|
- Ruth Norman
- 10 years ago
- Views:
Transcription
1 EASY CONTEXTUAL INTENT PREDICTION AND SLOT DETECTION A. Bhargava, A. Celikyilmaz 2, D. Hakkani-Tür 3, and R. Sarikaya 2 University of Toronto, 2 Microsoft, 3 Microsoft Research ABSTRACT Spoken language understanding (SLU) is one of the main tasks of a dialog system, aiming to identify semantic components in user utterances. In this paper, we investigate the incorporation of context into the SLU tasks of intent prediction and slot detection. Using a corpus that contains session-level information, including the start and end of a session and the sequence of utterances within it, we experiment with the incorporation of information from previous intra-session utterances into the SLU tasks on a given utterance. For slot detection, we find that including features indicating the slots appearing in the previous utterances gives no significant increase in performance. In contrast, for intent prediction we find that a similar approach that incorporates the intent of the previous utterance as a feature yields relative error rate reductions of 6.7% on transcribed data and 8.7% on automatically-recognized data. We also find similar gains when treating intent prediction of utterance sequences as a sequential tagging problem via SVM-HMMs. Index Terms spoken language understanding, slot detection, intent prediction, contextual models. INTRODUCTION Spoken language understanding (SLU) is an important part of modeling human-computer dialogs, aiming at determining a user s intents from their utterances, known as intent prediction, as well as identifying relevant actionable pieces of the utterance, known as slot detection []. For example, given the utterance Show me Joss Whedon s latest s, the user s intent is to see a list of content matching the query, and Joss Whedon fills one slot (director) and s fills another (media type). Once the intents and slots are determined, they are passed to the dialog manager which forms a query to the back-end and determines the response given to the user. Traditionally, both of these tasks are considered one utterance at a time by the SLU process (due in part to the fact that gathering the session data requires a live, deployed system, which is not available when building the initial SLU models). Both intent prediction and slot detection systems are given isolated utterances and asked to label them with the appropriate information as the task requires. This runs counter to the common means by which these utterances are gathered: by initiating sessions with users and recording their speech (we can see these sessions as crude human-computer conversations). Ultimately, each utterance occurs within the context of a larger discourse between a human and an automated agent. Fig. exemplifies this, showing a case where the user makes clear references to previous utterances or the system s responses to the same. In this paper, we examine the effect of incorporating information from previous intra-session utterances, to which we refer broadly as context, into both intent prediction and slot detection. Context is The first author performed the work during an internship at Microsoft Source Transcribed Recognized Utterance tv shows like buffy the vampire slayer tv show with play buffy the vampire slayer Table. An example of noise due to ASR. In this case, it is difficult to correctly determine the correct intent ( find similar content ) from the recognized utterance alone. an important viable source for new information, which is especially important in the case of noisy automatically-recognized data. For example, the true (transcribed) utterance in Table contains the word like, which would strongly indicate the find similar content intent, but this word is lost during the automatic recognition process. This inherent noise in the ASR process makes any alternative hints extremely valuable; in such situations the context may provide additional information that may help find the correct intent. Traditionally, it has been the dialog manager s role to handle the utterance in context, but incorporating this information earlier (during SLU) can help avoid cascaded errors further down the pipeline. We investigate two different approaches for integrating context into SLU for intent and slot determination: (i) modeling sessions as a sequence for classification; and (ii) feature engineering. We perform several experiments on both transcribed and automatically-recognized utterances. Using conditional random fields for slot detection, we find that incorporating features based on the appearance of slots in previous utterances yields no significant benefit. In contrast, we are able to increase the performance of intent prediction based on both classification and sequential tagging systems. We demonstrate higher reductions in error rate over automatically-recognized data, and find that the classification and sequential tagging approaches perform equally in this regard. 2.. Intents and slots 2. BACKGROUND Intents are global properties of utterances. Intents signify the goal of the user and vary across domains, but ultimately, a dialog system must at some point make a determination as to what a user wants given an utterance. This is of particular relevance to command-and-control systems, where each utterance signifies a command upon which the system is to act. To provide an analogy grounded in software development, intents map roughly to functions that are meant to be called: if the intent is detected to be find content, call the relevant find content function. Intent prediction has traditionally been modeled as an utterance classification task, mainly using local information about the utterance. Early work with discriminative classification algorithms was conducted on the AT&T HMIHY system, using for example boosting [2]
2 U Intent Transcribed Recognized u get clip show me the [firefly] content name [trailer] type show me the [firefly] content name [trailer] type u 2 find info who directed [it] content name ref who directed [it] content name ref u 3 find content what else has [he] director ref done what else has [he] director ref done u 4 play content play [the avengers] content name plane [avatars] content name Fig.. An example session, including both transcribed and recognized versions of the utterances u,..., u 4. Each utterance has an associated intent, while the slots are shown within each utterance. The slots are annotated on the transcribed data and transferred automatically to the ASR version. or max-margin classifiers [3]. Cox [4] proposed the use of generalized probabilistic descent (GPD), corrective training (CT), and linear discriminant analysis (LDA). Slots, on the other hand, exist within utterances. Slots are local properties in the sense that they span individual words rather than whole utterances. In the context of human-computer dialogs, and in particular command and control, the words that fill slots tend to be the only semantically loaded words in the utterance (i.e., the other words are function words). Slots represent actionable content; continuing the software development analogy, slot values map to values passed as function parameters: we would pass the value Joss Whedon to the director parameter, for example as find content(director= Joss Whedon ). Slot detection in SLU has commonly been approached using a classification method for filling frame slots given an application domain. These approaches include, among others, generative models such as hidden Markov models [5], discriminative classification methods [6, 7, 8], and probabilistic context-free grammars [9, 0] Session modeling In the dialog manager, belief-state update is a common method for modeling context. One of the earliest statistical approaches to multiturn interpretation was the statistical discourse model by Miller et al. [] that introduced a mapping from the pre-discourse meaning and the previous meaning (as determined after the previous user turn) to the post-discourse meaning. Later on, Levin and Pieraccini [2] proposed using Markov decision process (MDP) as a dialog model. However, MDPs assume that dialog states are observable, so they do not account for any uncertainty in the dialog history or the user state. The application of partially-observable Markov decision processes (POMDP) was suggested for dialog modeling [3] to handle uncertainty. Thomson [4] used dynamic Bayesian networks and separated the belief estimation based on observations till the current time from the estimation of the optimum action. In this paper, similar to dialog modeling, we focus on leveraging context, but at the level of SLU modeling. Considering context at the SLU level is important for several reasons: As the SLU process occurs earlier in the dialog system s architecture, errors during SLU can be cascaded throughout the rest of the system. Any improvements to the SLU will help minimize overall system error. Not all conversational applications require dialog modeling; in the traditional architecture, such cases would miss out on contextual information. Similarly, other downstream applications can benefit from improved SLU performance Where other dialog system components can be very tied to specific application scenarios and subject to revision based on changes in knowledge sources or user interface, general contextual SLU can be insulated from such sensitivities. 3.. Session data 3. METHOD Our session data are internally collected from real-use scenarios of a spoken dialog system. We focus here on the domain of audiovisual media, including s and television shows. The user is expected to interact by voice with a system that can perform a variety of tasks in relation to such media, including (among others) browsing, searching, querying information, and playing. In total, our corpus has a total of utterances split into 6390 sessions. There are 28 possible intents and 26 possible slot types. The most frequent intents include find content, play content, find similar, filter, and start over; top slot types include content name, content type, genre, stars, and content description. Our corpus consists of a series of utterances, where each utterance belongs to a particular session. Each utterance is annotated with the intent as well as slots occurring in the utterance. Our corpus contains both transcribed (TRA) and speech-recognized (ASR) versions of the utterance; since the intent is a global property of the utterance, it applies trivially to both versions. We transfer the correct slots from TRA to ASR using the word-by-word alignment of hypothesized words to the reference transcriptions. Once the words are aligned, labels are transferred from the reference to the hypothesized words. Since some tokens may be lost in this step (for example, a slotinitial word deleted in the ASR output), we normalize annotations on the ASR side in order to obtain a well-formed set. Fig. shows an example session consisting of four utterances with both TRA and ASR versions along with the intents and slots Learning approaches The work in this paper employs three machine learning methods: Support vector machines (SVMs) [5], in their most basic formulation, are a binary classification method based on the intuition of maximizing the margin around the classification boundary. SVMs have seen extensive use for language tasks, owing to their simplicity of use, ability to incorporate many features, and strong performance. SVMs can also be extended for multi-class scenarios and can be used with non-linear kernels. We use linear kernels as provided in the very fast LIBLINEAR [6] package, which allows for multi-classification. SVMs can also be applied to structural prediction. In particular, we consider their application to the training of hidden Markov mod-
3 els (SVM-HMMs) [7]. In contrast to standard (generative) HMMs, this method allows for the incorporation of numerous (potentially long-range) features. SVM-HMMs solve the following optimization problem: min w,ξ s.t. i y : 2 w 2 + C n ξ i i= l Φ ( x i j, yj, i yj) i j= l Φ ( ) ( x i j, y j, y j + y i, y ) ξ i j= where w is the weight vector to be learned, x i = (x i,..., x i j) is observation sequence i, y i = (y i,..., y i j) is the correct tag (hidden state) sequence corresponding to x i and ξ i are the slack variables for each input data point that measure the degree of misclassification of the data x i and help if the problem is not linearly separable. The candidate tag sequences y = (y,..., y j) are generated using the Viterbi algorithm. Φ is a feature function that returns a vector consisting of two components: a feature vector constructed from the observation state (as () a feature vector constructed from the observation state as specified by the user; and (2) a vector of binary transition features, having one feature for each possible hidden-state transition with all of them set to 0 except for the one corresponding to the transition (y i j, y i j) given to Φ. is a loss function equivalent to the number of incorrect tags in the candidate sequence y. As with regular SVMs, SVM-HMMs have been shown to work well on language tasks [8, 9, 20, 2]. We use the SVM hmm package. Lastly, conditional random fields (CRFs) [22] are discriminative (conditionally-trained) graphical models that have been used effectively for sequence labeling tasks, a standard example of which is part-of-speech tagging. CRFs also allow the incorporation of many more features than is possible with HMMs. Using similar notation as with SVM-HMMs above, CRFs define the conditional probability of y as: p w(y x) = Z w T t= φ(t) (y t, y t, x) where φ t( ) is the local potential function represented with maximum cliques of the graph. The partition function Z w(x) y T t= φt(yt, y t,x) ensures that the probability of all state sequences sum to one. The φ t functions factorize based on feature functions f k : (y t, y t, x) = observation { }} { (y t, x) transition { }} { 2 (y t, y t ) (y t, x) = exp ( k wk f k (y t, x t) ) 2 (y t, y t ) = exp ( k wk 2f k 2 (y t, y t ) ) where w is the weight vector and f k (y t, x t) and f k 2 (y t, y t ) are feature functions that encode topics (slots) of the current word and state transitions in sequence at the current index t. We use the CRFSGD 2 package [23] for our CRF implementation Incorporating contextual information For both intent prediction and slot detection, we aim to incorporate information from the previous utterance. In particular, for each task, we aim to incorporate the values of that task on the previous utterance in the session. svm_hmm.html 2 show me genre sci genre Þ type s Fig. 2. Slot detection as a sequential tagging problem with CRFs. Hidden states are shown shaded. get clip show me the Þreßy trailer Þnd info who directed it Þnd content what else has he done play content play the avengers Fig. 3. Contextual intent prediction as a sequential tagging problem with HMMs. Hidden states are shown shaded. For slot detection, which we treat as a sequential tagging problem as shown in Fig. 2, we apply CRFs. The baseline implementation uses lexical features consisting of unigrams in a five-word window around the current word. To incorporate contextual information, we add binary features for all possible slot types that might have occurred in the previous utterance. The features corresponding to slots occurring in the previous utterance are set to while all others are set to 0. For intent prediction, we try two separate approaches. First, we treat it as an utterance classification problem, with our baseline using bag-of-n-gram features to train an SVM. In this case, we can add context by adding additional features to indicate the intent of the previous utterance. We have one new feature for each possible intent, with the one corresponding to the intent of the previous utterance set to and all others set to 0. We also include a feature that indicates that the given utterance is the first in the session, which handles the case that there is no previous utterance. Second, we can treat intent prediction as a sequential tagging problem, as shown in Fig. 3. Each utterance is a candidate to be tagged, with the utterances occurring in sequence as defined by the session. We use SVM-HMMs in this case, as they allow for easier processing of sparse feature vectors. 4.. Setup 4. EXPERIMENTS The data corpus described above in Section 3. is split into training, development, and test sets comprising 80%, 0%, and 0% of the full corpus respectively. The test set is held out and used only for evaluation. We use the development set to tune hyperparameters, and then merge it with the training set to train the final model used for testing. We report the results on the held-out test set here. For the slot detection (CRF) and intent prediction (SVM classification) cases, we aim to incorporate information from the previous utterance into the task on the current utterance. We aim to gauge both the overall potential of incorporating information from previous utterances as well as testing this idea in a real-world scenario. Considering first intent prediction, this leads us to test two alternative cases: first we test the inclusion of the previous utterance s intention with the intent drawn directly from the corpus; this roughly represents an upper bound on how much improvement we can hope to achieve by
4 F TRA ASR LEX ORCLPREV PREDPREV Number of previous utterances used Fig. 4. Slot detection F -score with context size, showing a statistically insignificant increase in performance for immediately previous utterances and decreases in performance for larger context windows. incorporating contextual information in this way. Second, we test the more realistic case where the previous intent is instead generated by an intent prediction model. This second case requires that first a model is trained without any contextual features; this model is then used to generate hypothesized intents that can be used to train a new model. Note that this requires care to be taken in terms of experimental architecture: the first, non-contextual model, when trained on the training set, will obviously produce excellent results when evaluated on the training set (which is required to produce the hypothesized intents for training the subsequent contextual model). To produce realistic results, we generate the hypothesized intents by splitting the training set into 0 folds. We then successively hold out one fold and train a (non-contextual) model on the remaining folds, and then evaluate on the held-out fold. The results are all merged together to produce hypothesized intents for training the subsequent contextual model. The same process is repeated with slot detection, but of course with slots and slot detection models rather than intents and intent prediction models. Note that this methodology emulates a real-world two-stage model in which a first stage predicts (comparatively) naive outputs, while the second stage incorporates the outputs for previous utterances from the first stage. Note that the SVM-HMM method for intent prediction does not require the above considerations. In effect, it is most comparable to the SVM experiments that use the predicted previous intent, as it does not use the actual previous intent during test time at all, only its own predicted intents Slot detection Fig. 4 shows the results for slot detection. Here, we consider the effect of including slot types appearing in the previous n intra-session utterances. We evaluate the slot detection performance using the F -score, which is defined in terms of precision P and recall R as F = 2P R. Ultimately, while we see a small increase looking at P +R the previous two utterances, these are insignificant, and performance decreases if we look further back than that Intent prediction Table 2 shows the results for intent prediction, evaluated using accuracy over all utterances in the test set, of the SVM with either the lexical features only or the lexical features combined with the previous intent features (for both oracle and actual previous intents). As expected, we see a drop in accuracy on the ASR data compared to Table 2. Utterance accuracies in percentages for intent prediction. LEX indicates lexical features only; ORCLPREV indicates lexical features plus the oracle (actual) intent of the previous utterance; and PREDPREV indicates lexical features plus the predicted intent of the previous utterance. TRA indicates the corpus of transcribed utterances while ASR indicates the corpus of ASR outputs. the TRA data due to the additional noise in and hypothetical nature of the ASR outputs. As for the effect of incorporating the previous intent, we see a 6.7% error rate reduction over the TRA baseline and an 8.7% error rate reduction over the ASR baseline. Combining these results with the fact that the predicted-intent accuracies are very close to the oracle-intent accuracies, we can see that this approach of incorporating the previous intent is indeed effective. Lastly, we compare the simple additional-feature approach to the more complex SVM-HMM modeling method. We find no statistical significance between the SVM-HMM and the SVM with predicted previous intents; so while the SVM-HMM provides an improvement over using lexical features only, SVMs are a better choice due to their lower modeling complexity and commensurate speed. Furthermore, it s important to note that the SVM-HMM result is an unrealistic scenario, although for different reasons than the oracle-intent scenario for the vanilla SVM. The SVM-HMM approach uses the Viterbi algorithm, which looks through the whole session (including future utterances) to determine the best overall intent sequence. Obviously, during real-world use, we do not have access to the full session, so even this result is an upper bound for the SVM- HMM s performance for this task. Lastly, we note that the success of our approach to contextualizing intent prediction makes sense in light of the effect of the previous intent on the distribution of current intents. For example, 87% of first utterances in a session are annotated as having the find content intent, but this drops to 49% if the previous utterance was also a find content utterance. Similarly, if the previous utterance was a play content utterance, the most likely (32%) intent for the current utterance is also play content. This very clearly demonstrates the contextual sensitivity of the intent prediction task. 5. CONCLUSION In this paper, we examined the effect of adding session context to the SLU tasks of intent prediction and slot detection. Our simple feature engineering based approach involved adding features that indicated the state (whether slot types or intent) of previous utterances; while this approach yielded no significant improvement for CRF-based slot detection, we found significant error rate reductions of 6.7% and 8.7% for SVM-based intent prediction on transcribed and automaticallyrecognized data, respectively. We found a similar increase when intent prediction was modeled as a sequential tagging problem using SVM- HMMs. These results indicate that incorporating context into the SLU process rather than leaving it for the dialog manager is a viable and effective option. Possible feature work includes experiments using joint modeling of intent and slots to explore much higher-order models.
5 6. REFERENCES [] G. Tur and R. De Mori, Eds., Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, John Wiley and Sons, New York, NY, USA, 20. [2] A. L. Gorin, G. Riccardi, and J. H. Wright, how may i help you?, Speech Communication, vol. 23, pp. 3 27, 997. [3] V.N. Vapnik, Statistical learning theory, John Wiley and Sons, New York, NY, 998. [4] S. Cox, Discriminative techniques in call routing, in Proceedings of ICASSP, Hong Kong, April [5] R. Pieraccini, E. Tzoukermann, Z. Gorelov, J.-L. Gauvain, E. Levin, C.-H. Lee, and J. G. Wilpon, A speech understanding system based on statistical representation of semantics, in Proceedings of the ICASSP, San Francisco, CA, USA, March 992. [6] R. Kuhn and R. De Mori, The application of semantic classification trees to natural language understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, pp , 995. [7] Y.-Y. Wang and A. Acero, Discriminative models for spoken language understanding, in Proceedings of the ICSLP, Pittsburgh, PA, USA, September [8] C. Raymond and G. Riccardi, Generative and discriminative algorithms for spoken language understanding, in Proceedings of the Interspeech, Antwerp, Belgium, [9] S. Seneff, TINA: A natural language system for spoken language applications, Computational Linguistics, vol. 8, no., pp. 6 86, 992. [0] W. Ward and S. Issar, Recent improvements in the CMU spoken language understanding system, in Proceedings of the ARPA HLT Workshop, March 994, pp [] S. Miller, D. Stallard, R. Bobrow,, and R. Schwartz, A fully statistical approach to natural language interfaces, in Proceedings of the ACL, 996. [2] E. Levin and R. Pieraccini, A stochastic model of computerhuman interaction for learning dialogue strategies, in Proceedings of Eurospeech, 997. [3] J. Williams and S.J. Young, Scaling up POMDPs for dialog management: The summary POMDP method, in Proceedings of ASRU Workshop, [4] B. Thomson, Statistical methods for spoken dialogue management, Ph.D. thesis, University of Cambridge, [5] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol. 20, pp , September 995. [6] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, LIBLINEAR: A library for large linear classification, Journal of Machine Learning Research, vol. 9, pp , [7] Y. Altun, I. Tsochantaridis, and T. Hofmann, Hidden Markov support vector machines, in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, USA, August 2003, Morgan Kaufmann Publishers Inc. [8] S. Bartlett, G. Kondrak, and C. Cherry, Automatic syllabification with structured SVMs for letter-to-phoneme conversion, in Proceedings of ACL-08: HLT, Columbus, Ohio, June 2008, pp , Association for Computational Linguistics. [9] B. Chang and D. Han, Enhancing domain portability of chinese segmentation model using chi-square statistics and bootstrapping, in Proceedings of the 200 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, October 200, pp , Association for Computational Linguistics. [20] S. Kiritchenko and C. Cherry, Lexically-triggered hidden markov models for clinical document coding, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, June 20, pp , Association for Computational Linguistics. [2] M. Verbeke, V. Van Asch, R. Morante, P. Frasconi, W. Daelemans, and L. De Raedt, A statistical relational learning approach to identifying evidence based medicine categories, in Proceedings of the 202 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, July 202, pp , Association for Computational Linguistics. [22] J. D. Lafferty, A. McCallum, and F. C. N. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proceedings of the Eighteenth International Conference on Machine Learning (ICML-20), San Francisco, CA, USA, 200, pp , Morgan Kaufmann Publishers Inc. [23] L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of the 9th International Conference on Computational Statistics (COMPSTAT 200), Paris, France, August 200, pp , Springer.
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts Vitomir Kovanović [email protected] Dragan Gašević [email protected] School of Informatics, University of Edinburgh Edinburgh, United Kingdom [email protected]
2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
SVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
Revisiting Output Coding for Sequential Supervised Learning
Revisiting Output Coding for Sequential Supervised Learning Guohua Hao and Alan Fern School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR 97331 USA {haog, afern}@eecs.oregonstate.edu
THE THIRD DIALOG STATE TRACKING CHALLENGE
THE THIRD DIALOG STATE TRACKING CHALLENGE Matthew Henderson 1, Blaise Thomson 2 and Jason D. Williams 3 1 Department of Engineering, University of Cambridge, UK 2 VocalIQ Ltd., Cambridge, UK 3 Microsoft
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Comparative Error Analysis of Dialog State Tracking
Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 [email protected] Abstract A primary motivation
Semantic parsing with Structured SVM Ensemble Classification Models
Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Specialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
II. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
D2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Probabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Direct Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago [email protected] Tamir Hazan TTI-Chicago [email protected] Joseph Keshet TTI-Chicago [email protected] Abstract In discriminative
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Extracting Events from Web Documents for Social Media Monitoring using Structured SVM
IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85A/B/C/D, No. xx JANUARY 20xx Letter Extracting Events from Web Documents for Social Media Monitoring using Structured SVM Yoonjae Choi,
CSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist ([email protected]) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Metadata for Corpora PATCOR and Domotica-2
July 2013 Technical Report: KUL/ESAT/PSI/1303 Metadata for Corpora PATCOR and Domotica-2 Tessema N., Ons B., van de Loo J., Gemmeke J.F., De Pauw G., Daelemans W., Van hamme H. Katholieke Universiteit
Segmentation and Classification of Online Chats
Segmentation and Classification of Online Chats Justin Weisz Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract One method for analyzing textual chat
Distributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich [email protected] T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Employer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
Operations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 [email protected]
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN
CALL-TYPE CLASSIFICATION AND UNSUPERVISED TRAINING FOR THE CALL CENTER DOMAIN Min Tang, Bryan Pellom, Kadri Hacioglu Center for Spoken Language Research University of Colorado at Boulder Boulder, Colorado
Signature Segmentation from Machine Printed Documents using Conditional Random Field
2011 International Conference on Document Analysis and Recognition Signature Segmentation from Machine Printed Documents using Conditional Random Field Ranju Mandal Computer Vision and Pattern Recognition
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium [email protected]
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Conditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
WE DEFINE spam as an e-mail message that is unwanted basically
1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir
Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
Support Vector Machines for Dynamic Biometric Handwriting Classification
Support Vector Machines for Dynamic Biometric Handwriting Classification Tobias Scheidat, Marcus Leich, Mark Alexander, and Claus Vielhauer Abstract Biometric user authentication is a recent topic in the
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Simple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
DEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
The Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
Active Learning with Boosting for Spam Detection
Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
