THE THIRD DIALOG STATE TRACKING CHALLENGE
|
|
|
- Marvin Andrews
- 10 years ago
- Views:
Transcription
1 THE THIRD DIALOG STATE TRACKING CHALLENGE Matthew Henderson 1, Blaise Thomson 2 and Jason D. Williams 3 1 Department of Engineering, University of Cambridge, UK 2 VocalIQ Ltd., Cambridge, UK 3 Microsoft Research, Redmond, WA, USA [email protected] [email protected] [email protected] ABSTRACT In spoken dialog systems, dialog state tracking refers to the task of correctly inferring the user s goal at a given turn, given all of the dialog history up to that turn. This task is challenging because of speech recognition and language understanding errors, yet good dialog state tracking is crucial to the performance of spoken dialog systems. This paper presents results from the third Dialog State Tracking Challenge, a research community challenge task based on a corpus of annotated logs of human-computer dialogs, with a blind test set evaluation. The main new feature of this challenge is that it studied the ability of trackers to generalize to new entities i.e. new slots and values not present in the training data. This challenge received 28 entries from 7 research teams. About half the teams substantially exceeded the performance of a competitive rule-based baseline, illustrating not only the merits of statistical methods for dialog state tracking but also the difficulty of the problem. Index Terms Dialog state tracking, spoken dialog systems, spoken language understanding. 1. INTRODUCTION Task-oriented spoken dialog systems interact with users using natural language to help them achieve a goal. As the interaction progresses, the dialog manager maintains a representation of the state of the dialog in a process called dialog state tracking (DST). For example, in a tourist information system, the dialog state might indicate the type of business the user is searching for (pub, restaurant, coffee shop), and further constraints such as their desired price range and type of food served. Dialog state tracking is difficult because automatic speech recognition (ASR) and spoken language understanding (SLU) errors are common, and can cause the system to misunderstand the user. At the same time, state tracking is crucial because the system relies on the estimated dialog state to choose actions for example, which restaurants to suggest. The dialog state tracking challenge is a series of community challenge tasks that enables studying the state tracking problem by using common corpora of human-computer dialogs and evaluation methods. The first dialog state tracking challenge (DSTC1) used data from a bus timetable domain [1]. The second DSTC (DSTC2) used restaurant information dialogs, and added emphasis on handling user goal changes [2]. Entries to these challenges broke new ground in dialog state tracking, including the use of conditional random fields [3, 4, 5], sophisticated and robust hand-crafted rules [6], neural networks and recurrent neural networks [7, 8], multidomain learning [9], and web-style ranking [10]. This paper presents results from the third dialog state tracking challenge (DSTC3). Compared to previous DSTCs, the main feature of this challenge is to study the problem of handling of new entity (slot) types and values. For example, the training data for DSTC3 covered only restaurants, but the test data also included pubs and coffee shops. In addition, the test data included slots not in the train data, such as whether a coffee shop had internet, or whether a pub had a TV. This type of generalization is crucial for deploying real-world dialog systems, and has not been studied in a controlled fashion before. Seven teams participated, submitting a total of 28 dialog state trackers. The fully labelled dialog data, tracker output, evaluation scripts and baseline trackers are provided on the DSTC2/3 website 1. This paper first describes the data and evaluation methods used in this challenge, in sections 2-3. Next, the results from the 7 teams are analyzed in section 4, with a particular emphasis on the problem of handling new slots not in the training data. Section 5 concludes. 2. CHALLENGE OVERVIEW This challenge was very similar in design to the second dialog state tracking challenge [2]. This section gives a summary of the design, with particular emphasis on the new aspects. Full details are given in [13] Challenge design The data used in the challenge is taken from human-computer dialogs in which people are searching for information about restaurants, pubs, and coffee shops in Cambridge, UK. As in DSTC2, the callers are paid crowd-sourced users with a given 1 mh521/dstc/
2 task. Users may specify constraints (such as price range), and may query for information such as a business s address. Constraints and queries are drawn from a common, provided ontology of slots and slot values see table 1. Thus, in this challenge, the dialog state includes (1) the goal constraints, which is the set of constraints desired by the user specified as slot-value pairs, such as type=pub, pricerange=cheap; (2) the requested slots, which is a set of zero or more slots the user wishes to hear, such as address and phone; and (3) the search method employed by the user, which is one of byconstraints when the user is searching by constraining slot/value pairs of interest, byalternatives as in What else do you have like that?, byname as in Tell me about Brasserie Gerard, or finished if the user is done as in Thanks, bye. Each turn of each dialog is labelled with these three dialog state components, and the goal of dialog state tracking is to predict the components at each turn, given the ASR, SLU, and system output prior to that turn. This is a challenging problem because the ASR and SLU often contain errors and conflicting information on their N-best lists. Like the previous challenges, DSTC3 studies the problem of dialog state tracking as a corpus-based task. The challenge task is to re-run dialog state tracking over a test corpus of dialogs. A corpus-based challenge means all trackers are evaluated on the same dialogs, allowing direct comparison between trackers. There is also no need for teams to expend time and money in building an end-to-end system and getting users, meaning a low barrier to entry. Handling new unseen slots and values is a crucial step toward enabling dialog state tracking to adapt to new domains. To study this and unlike past DSTCs the test data includes slots and slot values which are not present in the training data. In particular, whereas the training data included dialogs about only restaurants, the test data included coffee shops and pubs two new values for the type slot. The sets of possible values for slots present in the training set changed in the test set, and several new slots were also introduced: near which indicates nearby landmarks such as Queens College, and three binary slots: childrenallowed, hastv, hasinternet. Table 1 gives full details. When a tracker is deployed, it will inevitably alter the performance of the dialog system it is part of, relative to any previously collected dialogs. The inclusion of new slots in the test data ensures that simply fitting the distribution in the train set will not result in good performance Data A corpus of 2,275 dialogs was collected using paid crowdsourced workers, as part of a study into the Natural Actor and Belief Critic algorithms for parameter and policy learning in POMDP dialog systems [11]. A set of 11 labelled dialogs were published in advance for debugging, the rest comprising a large test set used for evaluation. The training set consisted of a large quantity of data from DSTC2 in a smaller domain (see table 1). Size Slot Train Test Informable type 1* 3 yes area 5 15 yes food yes name yes pricerange 3 4 yes addr no phone no postcode no near 52 yes hastv 2 yes hasinternet 2 yes childrenallowed 2 yes Table 1. Ontology used in DSTC3 for tourist information. Counts do not include the special Dontcare value. All slots are requestable, and all slots are present in the test set. (*) For the type slot, 1 value was present at training time (restaurant), and 3 values were present at test time (restaurant, pub, coffee shop). Table 2 gives details of the train and test sets, including the Word Error Rate of the top hypothesis from the Automatic Speech Recognition (ASR), and the F-score of the top Spoken Language Understanding (SLU) hypothesis, which is calculated as in [12]. One key mis-match is the frequency of goal changes in the data, it being much more common in the training data for the user to change their mind for their constraint on a slot (most often when the system informs their existing constraint cannot be satisfied.) # Dialogs Goal Changes WER F-score Train 3, % 28.1% 74.3% Test 2, % 31.5% 78.1% Table 2. Statistics for the Train and Test sets. Goal Changes is the percentage of dialogs in which the user changed their mind for at least one slot. Word Error Rate and F-score are on the top ASR and SLU hypotheses respectively. 3. EVALUATION A tracker is asked to output a distribution over the three dialog state components goal constraints, requested slots, and search method as described in section 2.1. To allow evaluation of the tracker output, the single correct dialog state at each turn is labelled. Labelling of the dialog state is facilitated by first labelling each user utterance with its semantic representation, in the dialog act format described in [13]. The semantic labelling was achieved by first crowd-sourcing the transcription of the audio to text. Next a semantic decoder was run over the transcriptions, and the authors corrected the decoder s results by hand. Given the sequence of machine actions and user ac-
3 tions, both represented semantically, the true dialog state is computed deterministically using a simple set of rules. The components of the dialog state (goal constraint for each slot, the requested slots, and the search method) are each evaluated separately by comparing the tracker output to the correct label. The joint over the goal constraints is evaluated in the same way, where the tracker may either explicitly enumerate and score its joint hypotheses, or let the joint be computed as the product of the distributions over the slots. A bank of metrics are calculated in the evaluation. The full set of metrics is described in [2], including Mean reciprocal rank, Average probability, Log probability and Update accuracy. This section defines the Accuracy, L2 and ROC V2 CA 05 metrics, which are the featured metrics of the evaluation. These metrics were chosen to be featured in DSTC2 and DSTC3 as they each represent one of three groups of mostly uncorrelated metrics as found in DSTC1 [1]. Accuracy is a measure of 1-best quality, and is the fraction of turns where the top hypothesis is correct. L2 gives a measure of the quality of the tracker scores as probability distributions, and is the square of the l 2 norm between the distribution and the correct label (a delta distribution). The ROC V2 metrics look at the receiver operating characteristic (ROC) curves, and measure the discrimination in the tracker s output. Correct accepts (CA), false accepts (FA) and false rejects (FR) are calculated as fractions of correctly classified utterances, meaning the values always reach 100% regardless of the accuracy. These metrics measure discrimination independently of the accuracy, and are therefore only comparable between trackers with similar accuracies. Multiple metrics are derived from the ROC statistics, including ROC V2 CA05, the correct acceptance rate at a false-acceptance rate Two schedules are used to decide which turns to include when computing each metric. Schedule 1 includes every turn. Schedule 2 only includes a turn if any SLU hypothesis up to and including the turn contains some information about the component of the dialog state in question, or if the correct label is not None. E.g. for a goal constraint, this is whether the slot has appeared with a value in any SLU hypothesis, an affirm/negate act has appeared after a system confirmation of the slot, or the user has in fact informed the slot regardless of the SLU. The data is labelled using two schemes. The first, scheme A, is considered the standard labelling of the dialog state. Under this scheme, each component of the state is defined as the most recently asserted value given by the user. The None value is used to indicate that a value is yet to be given. A second labelling scheme, scheme B, is included in the evaluation, where labels are propagated backwards through the dialog. This labelling scheme is designed to assess whether a tracker is able to predict a user s intention before it has been stated. Under scheme B, the label at a current turn for a particular component of the dialog state is considered to be the next value which the user settles on, and is reset in the case of goal constraints if the slot value pair is given in a canthelp act by the system (i.e. the system has informed that this constraint is not satisfiable). The featured metrics (Accuracy, L2 and ROC V2 CA05) are calculated using schedule 2 and labelling scheme A for the joint goal constraints, the search method and the requested slots. This gives 9 numbers altogether. Note that all combinations of schedules, labelling schemes, metrics and state components give a total of 1,265 metrics reported per tracker in the full results, available online Baseline trackers Four baseline trackers are included in the results, under the ID team0. Source code for all the baseline systems is available on the DSTC website. The first (team0, entry0) follows simple rules commonly used in spoken dialog systems. It gives a single hypothesis for each slot, whose value is the top scoring suggestion so far in the dialog. Note that this tracker does not account well for goal constraint changes; the hypothesised value for a slot will only change if a new value occurs with a higher confidence. The focus baseline (team0, entry1) includes a simple model of changing goal constraints. Beliefs are updated for the goal constraint s = v, at turn t, P (s = v), using the rule: P (s = v) t = q t P (s = v) t 1 + SLU (s = v) t where 0 SLU(s = v) t 1 is the evidence for s = v given by the SLU in turn t, and q t = v SLU(s = v ) t 1. Two further baseline trackers (team0, entry2 and entry3) are included. These are based on the tracker presented in [14], and use a selection of domain independent rules to update the beliefs, similar to the focus baseline. 4. RESULTS In total, 7 research teams participated, submitting a total of 28 trackers. Appendix A gives the featured metrics for all submitted trackers, and also indicates whether each tracker used the SLU and/or ASR as input. Tracker output and full evaluation reports (as well as scripts to recreate the results) are available on the DSTC website. The baseline trackers proved strong competition, with only around half of entries beating the top baseline (team0, entry 2) in terms of joint goal accuracy. Figure 1 shows the fraction of turns where each tracker was better or worse than this baseline for joint goal accuracy. Results are shown for the best-performing entry for each team. Better means the tracker output the correct user goal where the baseline was incorrect; worse means the tracker output the incorrect user goal where the baseline was correct. This shows that even high-performing trackers such as teams 3 and 4 which in total make fewer errors than the baseline still make some errors that the baselines do not. Figure 2 shows the same analysis for an SLU-based oracle tracker, again for the best-performing entry for each team. This tracker considers the items on the SLU N-best list
4 Percent of dialog turns Percent of dialog turns 12% 10% 8% 6% 4% 2% 0% -2% -4% -6% -8% -10% -12% Team Better than baseline Worse than baseline Fig. 1. Fraction of 17,667 dialog turns where the best tracker entry from each team was better or worse than the best baseline (team0, entry 2) for schedule 2a joint goal accuracy on the test set. it is an oracle in the sense that, if a slot/value pair appears that corresponds to the user s goal, it is added to the state with confidence 1.0. In other words, when the user s goal appears somewhere in the SLU N-best list, the oracle always achieves perfect accuracy. The only errors made by the oracle are omissions of slot/value pairs which have not appeared on any SLU N-best list. Figure 2 shows that for teams 2, 3, 4 and 5 3-7% of tracker turns outperformed the oracle. These teams also used ASR features, which suggests they were successfully using ASR results to infer new slot/value pairs. Unsurprisingly, despite these gains no team was able to achieve a net performance gain over the oracle. 10% 5% 0% -5% -10% -15% -20% -25% Team Better than SLU-based oracle Worse than SLU-based oracle Fig. 2. Fraction of 17,667 dialog turns where the best tracker entry from each team was better or worse than the oracle tracker for schedule 2a joint goal accuracy on the test set Tracking unseen slots Figure 3 shows the performance of each team on tracking slots for which training data was given, and slots unseen in training. Some teams performed worse on the slots for which examples existed (such as teams 1, 2 and 7). This may be evidence of over-tuning in training, if systems attempted to tune to the seen slots, but defaulted to general models for the unseen slots. Generalization to new conditions was found to be a key limitation of some approaches in DSTC1 and DSTC2, where for example trackers often over-estimated their performance relative to a baseline on development sets. Performance on the individual slots is detailed in appendix A. No tracker was able to beat the top baseline accuracy on the childrenallowed slot, however this may be influenced by a small error in labelling found by the authors which affected 14 turns (out of 17,677 total). Accuracy team 1 team 2 team 3 team 4 team 5 team 6 team 7 L2 team 1 team 2 team 3 team 4 team 5 team 6 team Old Slots New Slots Fig. 3. Performance of each team s best entry under schedule 2a relative to the best baseline (team0, entry2) on Old slots and New slots, i.e. slots found and not found in the training data respectively. Recall a lower L2 score is better Types of errors Following [15], for tracking the user s goal three types of slotlevel errors can be distinguished: Wrong: when the user s goal contains a value for a slot, and the tracker outputs an incorrect value for that slot Extra: when the user s goal does not contain a value for a slot, and the tracker outputs a value for that slot Missing: when the user s goal contains a value for a slot, and the tracker does not output a value for that slot Note that a single turn may have multiple slot-level errors. Figure 4 shows the average number of slot-level errors per turn for the best entry from each team, including the best baseline. This figure also shows the average number of correct slots per turn. Missing slot errors account for most of the variation in performance, whereas wrong slot errors were rather consistent across teams. 5. CONCLUSIONS The third Dialog State Tracking Challenge built on the tradition of the first two DSTCs in providing an evaluation of the
5 Average slots per turn: wrong, extra, missing (bars) Average slots per turn: correct (diamonds) Team (0 = baseline) wrong extra missing correct Fig. 4. Average number of slots in error per turn (bar chart, left axis), and average number of correct slots per turn (black diamonds, right axis) for the best tracker from each team, for schedule 2a joint goal accuracy on the test set. See text for explanation of error types. state of the art in state tracking, with a particular focus on the ability of trackers to generalize to an extended domain. Results of the blind evaluation show that around half the teams were able to beat the competitive rule-based baseline in terms of joint goal accuracy. Several teams were found to perform better on new parts of the dialog state than they did on parts for which training examples existed. This may be an example of failing to generalize slot-specific models in new conditions, which was an issue found in the first two challenges. Studying dialog state tracking as an offline corpus task has advantages, and has lead to notable advances in the field, but it is clear that more work should be done to verify improving in these metrics translates to higher quality end-to-end dialog systems. Acknowledgements The authors would like to thank the DSTC advisory committee and those on the DST mailing list for their invaluable contributions. The authors also thank Zhuoran Wang for providing a baseline tracker. Finally thanks to SIGdial for their endorsement, SLT for providing a special session, and the participants for their hard work in creating high quality submissions. 6. REFERENCES [1] Jason D Williams, Antoine Raux, Deepak Ramachadran, and Alan Black, The Dialog State Tracking Challenge, in Proceedings of SIGDIAL, August [2] Matthew Henderson, Blaise Thomson, and Jason D Williams, The Second Dialog State Tracking Challenge, in Proceedings of SIGDIAL, [3] Sungjin Lee and Maxine Eskenazi, Recipe For Building Robust Spoken Dialog State Trackers: Dialog State Tracking Challenge System Description, in Proceedings of SIGDIAL, [4] Sungjin Lee, Structured Discriminative Model For Dialog State Tracking, in Proceedings of SIGDIAL, [5] Hang Ren, Weiqun Xu, Yan Zhang, and Yonghong Yan, Dialog State Tracking using Conditional Random Fields, in Proceedings of SIGDIAL, [6] Zhuoran Wang and Oliver Lemon, A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believability of observed information, in Proceedings of SIGDIAL, [7] Matthew Henderson, Blaise Thomson, and Steve Young, Deep Neural Network Approach for the Dialog State Tracking Challenge, in Proceedings of SIGDIAL, [8] Matthew Henderson, Blaise Thomson, and Steve Young, Word-Based Dialog State Tracking with Recurrent Neural Networks, in Proceedings of SIGDIAL, [9] Jason D Williams, Multi-domain learning and generalization in dialog state tracking, in Proceedings of SIG- DIAL, [10] Jason D Williams, Web-style ranking and SLU combination for dialog state tracking, in Proceedings of SIGDIAL, [11] Matthew Henderson and Jason Thomson, Blaise an illiams, Dialog State Tracking Challenge 2 & 3 Handbook, camdial.org/ mh521/dstc/, [12] Filip Jurccek, Blaise Thomson, and Steve Young, Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs., TSLP, vol. 7, [13] Matthew Henderson, Milica Gašić, Blaise Thomson, Pirros Tsiakoulis, Kai Yu, and Steve Young, Discriminative Spoken Language Understanding Using Word Confusion Networks, in Spoken Language Technology Workshop, IEEE, [14] Zhuoran Wang and Oliver Lemon, A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believability of observed information, in Proceedings of SIGDIAL, [15] Ronnie Smith, Comparative Error Analysis of Dialog State Tracking, in Proceedings of SIGDIAL, 2014.
6 Appendix A: Featured results of evaluation Tracker Inputs Joint Goal Constraints Search Method Requested Slots Team Entry SLU ASR Acc. L2 ROC Acc. L2 ROC Acc. L2 ROC (baselines) Team Accuracy L area food name pricerange childrenallowed hasinternet hastv near Performance of each team s best entry under schedule 2a relative to the best baseline (team0, entry2) for the goal constraint on every slot. Recall a lower L2 score is better. type 0.4 area food name pricerange childrenallowed hasinternet hastv near type
Comparative Error Analysis of Dialog State Tracking
Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 [email protected] Abstract A primary motivation
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results Alan W Black 1, Susanne Burger 1, Alistair Conkie 4, Helen Hastie 2, Simon Keizer 3, Oliver Lemon 2, Nicolas Merigaud 2, Gabriel
2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
D2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
Leveraging Frame Semantics and Distributional Semantic Polieties
Leveraging Frame Semantics and Distributional Semantics for Unsupervised Semantic Slot Induction in Spoken Dialogue Systems (Extended Abstract) Yun-Nung Chen, William Yang Wang, and Alexander I. Rudnicky
A Comparative Analysis of Speech Recognition Platforms
Communications of the IIMA Volume 9 Issue 3 Article 2 2009 A Comparative Analysis of Speech Recognition Platforms Ore A. Iona College Follow this and additional works at: http://scholarworks.lib.csusb.edu/ciima
Specialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
INF5820, Obligatory Assignment 3: Development of a Spoken Dialogue System
INF5820, Obligatory Assignment 3: Development of a Spoken Dialogue System Pierre Lison October 29, 2014 In this project, you will develop a full, end-to-end spoken dialogue system for an application domain
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Drawing a histogram using Excel
Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
Reputation Network Analysis for Email Filtering
Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu
Simple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Neovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Report on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
SLIM Estimate and Microsoft Project Best Practices
SLIM Estimate and Microsoft Project Best Practices There are many activities to perform during the life of a software development project. No single tool provides all of the functionality or data that
VoiceSign TM Solution. Voice Signature Overview
Voice Signature Overview VoiceSign adds 'speak on the dotted line' to transaction processes. Both business and client benefit from convenient and secure transaction verification process. How it works At
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
Bayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary [email protected] http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
Call Recorder Oygo Manual. Version 1.001.11
Call Recorder Oygo Manual Version 1.001.11 Contents 1 Introduction...4 2 Getting started...5 2.1 Hardware installation...5 2.2 Software installation...6 2.2.1 Software configuration... 7 3 Options menu...8
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
Language technologies for Education: recent results by the MLLP group
Language technologies for Education: recent results by the MLLP group Alfons Juan 2nd Internet of Education Conference 2015 18 September 2015, Sarajevo Contents The MLLP research group 2 translectures
Performance Measures in Data Mining
Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München
Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
(Refer Slide Time: 01:52)
Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
What makes a good process?
Rob Davis Everyone wants a good process. Our businesses would be more profitable if we had them. But do we know what a good process is? Would we recognized one if we saw it? And how do we ensure we can
IBM SPSS Statistics for Beginners for Windows
ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning
Performance Measures for Machine Learning
Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, -1/+1, True/False,
Evaluation of speech technologies
CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline
Index Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Revel8or: Model Driven Capacity Planning Tool Suite
Revel8or: Model Driven Capacity Planning Tool Suite Liming Zhu 1,2, Yan Liu 1,2, Ngoc Bao Bui 1,2,Ian Gorton 3 1 Empirical Software Engineering Program, National ICT Australia Ltd. 2 School of Computer
SAMPLE BOOKLET Published July 2015
National curriculum tests Key stage 2 Mathematics Mark schemes SAMPLE BOOKLET Published July 2015 This sample test indicates how the national curriculum will be assessed from 2016. Further information
Automatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration
Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Measurement in ediscovery
Measurement in ediscovery A Technical White Paper Herbert Roitblat, Ph.D. CTO, Chief Scientist Measurement in ediscovery From an information-science perspective, ediscovery is about separating the responsive
Prevention of Spam over IP Telephony (SPIT)
General Papers Prevention of Spam over IP Telephony (SPIT) Juergen QUITTEK, Saverio NICCOLINI, Sandra TARTARELLI, Roman SCHLEGEL Abstract Spam over IP Telephony (SPIT) is expected to become a serious problem
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features
Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features Matthew Frampton and Oliver Lemon HCRC, School of Informatics University of Edinburgh Edinburgh, EH8 9LW, UK [email protected],
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
technical tips and tricks
technical tips and tricks Document author: Produced by: Displaying a Critical Path Andy Jessop Project Learning International Limited The tips and tricks below are taken from Project Mentor, the smart
Artificial Intelligence
Artificial Intelligence ICS461 Fall 2010 1 Lecture #12B More Representations Outline Logics Rules Frames Nancy E. Reed [email protected] 2 Representation Agents deal with knowledge (data) Facts (believe
CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies
CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies Hang Li Noah s Ark Lab Huawei Technologies We want to build Intelligent
Measuring Performance in a Biometrics Based Multi-Factor Authentication Dialog. A Nuance Education Paper
Measuring Performance in a Biometrics Based Multi-Factor Authentication Dialog A Nuance Education Paper 2009 Definition of Multi-Factor Authentication Dialog Many automated authentication applications
Data Quality Assessment. Approach
Approach Prepared By: Sanjay Seth Data Quality Assessment Approach-Review.doc Page 1 of 15 Introduction Data quality is crucial to the success of Business Intelligence initiatives. Unless data in source
THE RECRUITMENT INDUSTRY ONLINE
THE RECRUITMENT INDUSTRY ONLINE AS A LEADING DIGITAL MARKETING AGENCY WITHIN THE RECRUITMENT INDUSTRY ENCENDO UNDERTOOK A STUDY IN MAY 2015 INTO THE ATTITUDES AND SPEND OF RECRUITMENT COMPANIES IN IRELAND
Project Code: SPBX. Project Advisor : Aftab Alam. Project Team: Umair Ashraf 03-1853 (Team Lead) Imran Bashir 02-1658 Khadija Akram 04-0080
Test Cases Document VOIP SOFT PBX Project Code: SPBX Project Advisor : Aftab Alam Project Team: Umair Ashraf 03-1853 (Team Lead) Imran Bashir 02-1658 Khadija Akram 04-0080 Submission Date:23-11-2007 SPBX
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Visual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
Automatic Network Protocol Analysis
Gilbert Wondracek, Paolo M ilani C omparetti, C hristopher Kruegel and E ngin Kirda {gilbert,pmilani}@ seclab.tuwien.ac.at chris@ cs.ucsb.edu engin.kirda@ eurecom.fr Reverse Engineering Network Protocols
Main Effects and Interactions
Main Effects & Interactions page 1 Main Effects and Interactions So far, we ve talked about studies in which there is just one independent variable, such as violence of television program. You might randomly
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
Response to Critiques of Mortgage Discrimination and FHA Loan Performance
A Response to Comments Response to Critiques of Mortgage Discrimination and FHA Loan Performance James A. Berkovec Glenn B. Canner Stuart A. Gabriel Timothy H. Hannan Abstract This response discusses the
