Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features
|
|
|
- Branden Copeland
- 10 years ago
- Views:
Transcription
1 Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features Matthew Frampton and Oliver Lemon HCRC, School of Informatics University of Edinburgh Edinburgh, EH8 9LW, UK Abstract We explore the use of restricted dialogue contexts in reinforcement learning (RL) of effective dialogue strategies for information seeking spoken dialogue systems (e.g. COMMUNICATOR (Walker et al., 2001)). The contexts we use are richer than previous research in this area, e.g. (Levin and Pieraccini, 1997; Scheffler and Young, 2001; Singh et al., 2002; Pietquin, 2004), which use only slot-based information, but are much less complex than the full dialogue Information States explored in (Henderson et al., 2005), for which tractabe learning is an issue. We explore how incrementally adding richer features allows learning of more effective dialogue strategies. We use 2 user simulations learned from COMMUNICATOR data (Walker et al., 2001; Georgila et al., 2005b) to explore the effects of different features on learned dialogue strategies. Our results show that adding the dialogue moves of the last system and user turns increases the average reward of the automatically learned strategies by 65.9% over the original (hand-coded) COMMUNI- CATOR systems, and by 7.8% over a baseline RL policy that uses only slot-status features. We show that the learned strategies exhibit an emergent focus switching strategy and effective use of the give help action. 1 Introduction Reinforcement Learning (RL) applied to the problem of dialogue management attempts to find optimal mappings from dialogue contexts to system actions. The idea of using Markov Decision Processes (MDPs) and reinforcement learning to design dialogue strategies for dialogue systems was first proposed by (Levin and Pieraccini, 1997). There, and in subsequent work such as (Singh et al., 2002; Pietquin, 2004; Scheffler and Young, 2001), only very limited state information was used in strategy learning, based always on the number and status of filled information slots in the application (e.g. departure-city is filled, destination-city is unfilled). This we refer to as low-level contextual information. Much prior work (Singh et al., 2002) concentrated only on specific strategy decisions (e.g. confirmation and initiative strategies), rather than the full problem of what system dialogue move to take next. The simple strategies learned for low-level definitions of state cannot be sensitive to (sometimes critical) aspects of the dialogue context, such as the user s last dialogue move (DM) (e.g. requesthelp) unless that move directly affects the status of an information slot (e.g. provide-info(destinationcity)). We refer to additional contextual information such as the system and user s last dialogue moves as high-level contextual information. (Frampton and Lemon, 2005) learned full strategies with limited high-level information (i.e. the dialogue move(s) of the last user utterance) and only used a stochastic user simulation whose probabilities were supplied via commonsense and intuition, rather than learned from data. This paper uses data-driven n-gram user simulations (Georgila et al., 2005a) and a richer dialogue context. On the other hand, increasing the size of the state space for RL has the danger of making the learning problem intractable, and at the very least means that data is more sparse and state approximation methods may need to be used (Henderson et al., 2005). To date, the use of very large state spaces relies on a hybrid supervised/reinforcement learning technique, where the reinforcement learning element has not yet been shown to significantly improve policies over the purely supervised case (Henderson et al., 2005). 185 Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages , Sydney, July c 2006 Association for Computational Linguistics
2 The extended state spaces that we propose are based on theories of dialogue such as (Clark, 1996; Searle, 1969; Austin, 1962; Larsson and Traum, 2000), where which actions a dialogue participant can or should take next are not based solely on the task-state (i.e. in our domain, which slots are filled), but also on wider contextual factors such as a user s dialogue moves or speech acts. In future work we also intend to use feature selection techniques (e.g. correlation-based feature subset (CFS) evaluation (Rieser and Lemon, 2006)) on the COMMUNICATOR data (Georgila et al., 2005a; Walker et al., 2001) in order to identify additional context features that it may be effective to represent in the state. 1.1 Methodology To explore these issues we have developed a Reinforcement Learning (RL) program to learn dialogue strategies while accurate simulated users (Georgila et al., 2005a) converse with a dialogue manager. See (Singh et al., 2002; Scheffler and Young, 2001) and (Sutton and Barto, 1998) for a detailed description of Markov Decision Processes and the relevant RL algorithms. In dialogue management we are faced with the problem of deciding which dialogue actions it is best to perform in different states. We use (RL) because it is a method of learning by delayed reward using trial-and-error search. These two properties appear to make RL techniques a good fit with the problem of automatically optimising dialogue strategies, because in task-oriented dialogue often the reward of the dialogue (e.g. successfully booking a flight) is not obtainable immediately, and the large space of possible dialogues for any task makes some degree of trial-and-error exploration necessary. We use both 4-gram and 5-gram user simulations for testing and for training (i.e. train with 4-gram, test with 5-gram, and vice-versa). These simulations also simulate ASR errors since the probabilities are learned from recognition hypotheses and system behaviour logged in the COMMUNICATOR data (Walker et al., 2001) further annotated with speech acts and contexts by (Georgila et al., 2005b). Here the task domain is flight-booking, and the aim for the dialogue manager is to obtain values for the user s flight information slots i.e. departure city, destination city, departure date and departure time, before making a database query. We add the dialogue moves of the last user and system turns as context features and use these in strategy learning. We compare the learned strategies to 2 baselines: the original COMMUNICATOR systems and an RL strategy which uses only slot status features. 1.2 Outline Section 2 contains a description of our basic experimental framework, and a detailed description of the reinforcement learning component and user simulations. Sections 3 and 4 describe the experiments and analyse our results, and in section 5 we conclude and suggest future work. 2 The Experimental Framework Each experiment is executed using the DIPPER Information State Update dialogue manager (Bos et al., 2003) (which here is used to track and update dialogue context rather than deciding which actions to take), a Reinforcement Learning program (which determines the next dialogue action to take), and various user simulations. In sections 2.3 and 2.4 we give more details about the reinforcement learner and user simulations. 2.1 The action set for the learner Below is a list of all the different actions that the RL dialogue manager can take and must learn to choose between based on the context: 1. An open question e.g. How may I help you? 2. Ask the value for any one of slots 1...n. 3. Explicitly confirm any one of slots 1...n. 4. Ask for the n th slot whilst implicitly confirming 1 either slot value n 1 e.g. So you want to fly from London to where?, or slot value n Give help. 6. Pass to human operator. 7. Database Query. There are a couple of restrictions regarding which actions can be taken in which states: an open question is only available at the start of the dialogue, and the dialogue manager can only try to confirm non-empty slots. 2.2 The Reward Function We employ an all-or-nothing reward function which is as follows: 1. Database query, all slots confirmed: Any other database query: 75 1 Where n = 1 we implicitly confirm the final slot and where n = 4 we implicitly confirm the first slot. This action set does not include actions that ask the n th slot whilst implicitly confirming slot value n 2. These will be added in future experiments as we continue to increase the action and state space. 186
3 3. User simulation hangs-up: DIPPER passes to a human operator: Each system turn: 5 To maximise the chances of a slot value being correct, it must be confirmed rather than just filled. The reward function reflects the fact that a successful dialogue manager must maximise its chances of getting the slots correct i.e. they must all be confirmed. (Walker et al., 2000) showed with the PARADISE evaluation that confirming slots increases user satisfaction. The maximum reward that can be obtained for a single dialogue is 85, (the dialogue manager prompts the user, the user replies by filling all four of the slots in a single utterance, and the dialogue manager then confirms all four slots and submits a database query). 2.3 The Reinforcement Learner s Parameters When the reinforcement learner agent is initialized, it is given a parameter string which includes the following: 1. Step Parameter: α = decreasing 2. Discount Factor: γ = 1 3. Action Selection Type = softmax (alternative is ɛ-greedy) 4. Action Selection Parameter: temperature = Eligibility Trace Parameter: λ = Eligibility Trace = replacing (alternative is accumulating) 7. Initial Q-values = 25 The reinforcement learner updates its Q-values using the Sarsa(λ) algorithm (see (Sutton and Barto, 1998)). The first parameter is the stepparameter α which may be a value between 0 and 1, or specified as decreasing. If it is decreasing, as it is in our experiments, then for any given Q-value update α is 1 k where k is the number of times that the state-action pair for which the update is being performed has been visited. This kind of step parameter will ensure that given a sufficient number of training dialogues, each of the Q-values will eventually converge. The second parameter (discount factor) γ may take a value between 0 and 1. For the dialogue management problem we set it to 1 so that future rewards are taken into account as strongly as possible. Apart from updating Q-values, the reinforcement learner must also choose the next action for the dialogue manager and the third parameter specifies whether it does this by ɛ-greedy or softmax action selection (here we have used softmax). The fifth parameter, the eligibility trace parameter λ, may take a value between 0 and 1, and the sixth parameter specifies whether the eligibility traces are replacing or accumulating. We used replacing traces because they produced faster learning for the slot-filling task. The seventh parameter is for supplying the initial Q-values. 2.4 N-Gram User Simulations Here user simulations, rather than real users, interact with the dialogue system during learning. This is because thousands of dialogues may be necessary to train even a simple system (here we train on up to dialogues), and for a proper exploration of the state-action space the system should sometimes take actions that are not optimal for the current situation, making it a sadistic and timeconsuming procedure for any human training the system. (Eckert et al., 1997) were the first to use a user simulation for this purpose, but it was not goal-directed and so could produce inconsistent utterances. The later simulations of (Pietquin, 2004) and (Scheffler and Young, 2001) were to some extent goal-directed and also incorporated an ASR error simulation. The user simulations interact with the system via intentions. Intentions are preferred because they are easier to generate than word sequences and because they allow error modelling of all parts of the system, for example ASR error modelling and semantic errors. The user and ASR simulations must be realistic if the learned strategy is to be directly applicable in a real system. The n-gram user simulations used here (see (Georgila et al., 2005a) for details and evaluation results) treat a dialogue as a sequence of pairs of speech acts and tasks. They take as input the n 1 most recent speech act-task pairs in the dialogue history, and based on n-gram probabilities learned from the COMMUNICATOR data (automatically annotated with speech acts and Information States (Georgila et al., 2005b)), they then output a user utterance as a further speech-act task pair. These user simulations incorporate the effects of ASR errors since they are built from the user utterances as they were recognized by the ASR components of the original COMMUNICATOR systems. Note that the user simulations do not provide instantiated slot values e.g. a response to provide a destination city is the speech-act task pair [provide info] [dest city]. We cannot assume that two such responses in the same dialogue refer to the same 187
4 destination cities. Hence in the dialogue manager s Information State where we record whether a slot is empty, filled, or confirmed, we only update from filled to confirmed when the slot value is implicitly or explicitly confirmed. An additional function maps the user speech-act task pairs to a form that can be interpreted by the dialogue manager. Post-mapping user responses are made up of one or more of the following types of utterance: (1) Stay quiet, (2) Provide 1 or more slot values, (3) Yes, (4) No, (5) Ask for help, (6) Hang-up, (7) Null (out-of-domain or no ASR hypothesis). The quality of the 4 and 5-gram user simulations has been established through a variety of metrics and against the behaviour of the actual users of the COMMUNICATOR systems, see (Georgila et al., 2005a) Limitations of the user simulations The user and ASR simulations are a fundamentally important factor in determining the nature of the learned strategies. For this reason we should note the limitations of the n-gram simulations used here. A first limitation is that we cannot be sure that the COMMUNICATOR training data is sufficiently complete, and a second is that the n-gram simulations only use a window of n moves in the dialogue history. This second limitation becomes a problem when the user simulation s current move ought to take into account something that occurred at an earlier stage in the dialogue. This might result in the user simulation repeating a slot value unnecessarily, or the chance of an ASR error for a particular word being independent of whether the same word was previously recognised correctly. The latter case means we cannot simulate for example, a particular slot value always being liable to misrecognition. These limitations will affect the nature of the learned strategies. Different state features may assume more or less importance than they would if the simulations were more realistic. This is a point that we will return to in the analysis of the experimental results. In future work we will use the more accurate user simulations recently developed following (Georgila et al., 2005a) and we expect that these will improve our results still further. 3 Experiments First we learned strategies with the 4-gram user simulation and tested with the 5-gram simulation, and then did the reverse. We experimented with different feature sets, exploring whether better strategies could be learned by adding limited context features. We used two baselines for comparison: The performance of the original COMMUNI- CATOR systems in the data set (Walker et al., 2001). An RL baseline dialogue manager learned using only slot-status features i.e. for each of slots 1 4, is the slot empty, filled or confirmed? We then learned two further strategies: Strategy 2 (UDM) was learned by adding the user s last dialogue move to the state. Strategy 3 (USDM) was learned by adding both the user and system s last dialogue moves to the state. The possible system and user dialogue moves were those given in sections 2.1 and 2.4 respectively, and the reward function was that described in section The COMMUNICATOR data baseline We computed the scores for the original handcoded COMMUNICATOR systems as was done by (Henderson et al., 2005), and we call this the HLG05 score. This scoring function is based on task completion and dialogue length rewards as determined by the PARADISE evaluation (Walker et al., 2000). This function gives 25 points for each slot which is filled, another 25 for each that is confirmed, and deducts 1 point for each system action. In this case the maximum possible score is 197 i.e. 200 minus 3 actions, (the system prompts the user, the user replies by filling all four of the slots in one turn, and the system then confirms all four slots and offers the flight). The average score for the 1242 dialogues in the COM- MUNICATOR dataset where the aim was to fill and confirm only the same four slots as we have used here was The other COMMUNICA- TOR dialogues involved different slots relating to return flights, hotel-bookings and car-rentals. 4 Results Figure 1 tracks the improvement of the 3 learned strategies for training dialogues with the 4- gram user simulation, and figure 2 for training dialogues with the 5-gram simulation. They show the average reward (according to the function of section 2.2) obtained by each strategy over intervals of 1000 training dialogues. Table 1 shows the results for testing the strategies learned after training dialogues (the baseline RL strategy, strategy 2 (UDM) and strategy 3 (USDM)). The a strategies were trained with the 4-gram user simulation and tested with 188
5 Features Av. Score HLG05 Filled Slots Conf. Slots Length 4 5 gram = (a) RL Baseline (a) Slots status RL Strat 2, UDM (a) + Last User DM 53.65** RL Strat 3, USDM (a) + Last System DM 54.9** gram = (b) RL Baseline (b) Slots status RL Strat 2, UDM (b) + Last User DM 54.46* RL Strat 3, USDM (b) + Last System DM 56.24** RL Baseline (av) Slots status RL Strat 2, UDM (av) + Last User DM 54.06** RL Strat 3, USDM (av) + Last System DM 55.57** COMM Systems Hybrid RL *** Information States Table 1: Testing the learned strategies after training dialogues, average reward achieved per dialogue over 1000 test dialogues. (a) = strategy trained using 4-gram and tested with 5-gram; (b) = strategy trained with 5-gram and tested with 4-gram; (av) = average; * significance level p < 0.025; ** significance level p < 0.005; *** Note: The Hybrid RL scores (here updated from (Henderson et al., 2005)) are not directly comparable since that system has a larger action set and fewer policy constraints. the 5-gram, while the b strategies were trained with the 5-gram user simulation and tested with the 4-gram. The table also shows average scores for the strategies. Column 2 contains the average reward obtained per dialogue by each strategy over 1000 test dialogues (computed using the function of section 2.2). The 1000 test dialogues for each strategy were divided into 10 sets of 100. We carried out t-tests and found that in both the a and b cases, strategy 2 (UDM) performs significantly better than the RL baseline (significance levels p < and p < 0.025), and strategy 3 (USDM) performs significantly better than strategy 2 (UDM) (significance level p < 0.005). With respect to average performance, strategy 2 (UDM) improves over the RL baseline by 4.9%, and strategy 3 (USDM) improves by 7.8%. Although there seem to be only negligible qualitative differences between strategies 2(b) and 3(b) and their a equivalents, the former perform slightly better in testing. This suggests that the 4-gram simulation used for testing the b strategies is a little more reliable in filling and confirming slot values than the 5-gram. The 3rd column HLG05 shows the average scores for the dialogues as computed by the reward function of (Henderson et al., 2005). This is done for comparison with that work but also with the COMMUNICATOR data baseline. Using the HLG05 reward function, strategy 3 (USDM) improves over the original COMMUNICATOR systems baseline by 65.9%. The components making up the reward are shown in the final 3 columns of table 1. Here we see that all of the RL strategies are able to fill and confirm all of the 4 slots when conversing with the simulated COMMUNI- CATOR users. The only variation is in the average length of dialogue required to confirm all four slots. The COMMUNICATOR systems were often unable to confirm or fill all of the user slots, and the dialogues were quite long on average. As stated in section 2.4.1, the n-gram simulations do not simulate the case of a particular user goal utterance being unrecognisable for the system. This was a problem that could be encountered by the real COMMUNICATOR systems. Nevertheless, the performance of all the learned strategies compares very well to the COMMUNI- CATOR data baseline. For example, in an average dialogue, the RL strategies filled and confirmed all four slots with around 9 actions not including offering the flight, but the COMMUNICATOR systems took an average of around 33 actions per dialogue, and often failed to complete the task. With respect to the hybrid RL result of (Henderson et al., 2005), shown in the final row of the table, Strategy 3 (USDM) shows a 34% improvement, though these results are not directly comparable because that system uses a larger action set and has fewer constraints (e.g. it can ask how may I help you? at any time, not just at the start of a dialogue). Finally, let us note that the performance of the RL strategies is close to optimal, but that there is some room for improvement. With respect to the HLG05 metric, the optimal system score would be 197, but this would only be available in rare cases where the simulated user supplies all 4 slots in the 189
6 40 Training With 4-gram 40 Training With 5-gram Average Reward Baseline Strategy 2 Strategy 3 Average Reward Baseline Strategy 2 Strategy Number of Dialogues (Thousands) Figure 1: Training the dialogue strategies with the 4-gram user simulation first utterance. With respect to the metric we have used here (with a 5 per system turn penalty), the optimal score is 85 (and we currently score an average of 55.57). Thus we expect that there are still further improvments that can be made to more fully exploit the dialogue context (see section 4.3). 4.1 Qualitative Analysis Below are a list of general characteristics of the learned strategies: 1. The reinforcement learner learns to query the database only in states where all four slots have been confirmed. 2. With sufficient exploration, the reinforcement learner learns not to pass the call to a human operator in any state. 3. The learned strategies employ implicit confirmations wherever possible. This allows them to fill and confirm the slots in fewer turns than if they simply asked the slot values and then used explicit confirmation. 4. As a result of characteristic 3, which slots can be asked and implicitly confirmed at the same time influences the order in which the learned strategies attempt to fill and confirm each slot, e.g. if the status of the third slot is filled and the others are empty, the learner learns to ask for the second or fourth slot Number of Dialogues (Thousands) Figure 2: Training the dialogue strategies with the 5-gram user simulation rather than the first, since it can implicitly confirm the third while it asks for the second or fourth slots, but it cannot implicitly confirm the third while it asks for the first slot. This action is not available (see section 2.1). 4.2 Emergent behaviour In testing the UDM strategy (2) filled and confirmed all of the slots in fewer turns on average than the RL baseline, and strategy 3 (USDM) did this in fewer turns than strategy 2 (UDM). What then were the qualitative differences between the three strategies? The behaviour of the three strategies only seems to really deviate when a user response fails to fill or confirm one or more slots. Then the baseline strategy s state has not changed and so it will repeat its last dialogue move, whereas the state for strategies 2 (UDM) and 3 (USDM) has changed and as a result, these may now try different actions. It is in such circumstances that the UDM strategy seems to be more effective than the baseline, and strategy 3 (USDM) more effective than the UDM strategy. In figure 3 we show illustrative state and learned action pairs for the different strategies. They relate to a situation where the first user response(s) in the dialogue has/have failed to fill a single slot value. NB: here emp stands for empty and fill for filled and they appear in the first four state variables, which stand for slot states. For strategy 2 (UDM), the fifth variable represents the user s last 190
7 dialogue move, and the for strategy 3 (USDM), the fifth variable represents the system s last dialogue move, and the sixth, the user s last dialogue move. BASELINE STRATEGY [emp,emp,emp,emp] Action: askslot2 STRATEGY 2 (UDM) [emp,emp,emp,emp,user(quiet)] Action: askslot3 [emp,emp,emp,emp,user(null)] Action: askslot1 STRATEGY 3 (USDM) [emp,emp,emp,emp,askslot3,user(quiet)] Action: askslot3 [emp,emp,emp,emp,askslot3,user(null)] Action: givehelp [emp,emp,emp,emp,givehelp,user(quiet)] Action: askslot3 [emp,emp,emp,emp,givehelp,user(null)] Action: askslot3 Figure 3: Examples of the different learned strategies and emergent behaviours: focus switching (for UDM) and giving help (for USDM) Here we can see that should the user responses continue to fail to provide a slot value, the baseline s state will be unchanged and so the strategy will simply ask for slot 2 again. The state for strategy 2 (UDM) does change however. This strategy switches focus between slots 3 and 1 depending on whether the user s last dialogue move was null or quiet NB. As stated in section 2.4, null means out-of-domain or that there was no ASR hypothesis. Strategy 3 (USDM) is different again. Knowledge of the system s last dialogue move as well as the user s last move has enabled the learner to make effective use of the give help action, rather than to rely on switching focus. When the user s last dialogue move is null in response to the system move askslot3, then the strategy uses the give help action before returning to ask for slot 3 again. The example described here is not the only example of strategy 2 (UDM) employing focus switching while strategy 3 (USDM) prefers to use the give help action when a user response fails to fill or confirm a slot. This kind of behaviour in strategies 2 and 3 is emergent dialogue behaviour that has been learned by the system rather than explicitly programmed. 4.3 Further possibilities for improvement over the RL baseline Further improvements over the RL baseline might be possible with a wider set of system actions. Strategies 2 and 3 may learn to make more effective use of additional actions than the baseline e.g. additional actions that implicitly confirm one slot whilst asking another may allow more of the switching focus described in section 4.1. Other possible additional actions include actions that ask for or confirm two or more slots simultaneously. In section 2.4.1, we highlighted the fact that the n-gram user simulations are not completely realistic and that this will make certain state features more or less important in learning a strategy. Thus had we been able to use even more realistic user simulations, including certain additional context features in the state might have enabled a greater improvement over the baseline. Dialogue length is an example of a feature that could have made a difference had the simulations been able to simulate the case of a particular goal utterance being unrecognisable for the system. The reinforcement learner may then be able to use the dialogue length feature to learn when to give up asking for a particular slot value and make a partially complete database query. This would of course require a reward function that gave some reward to partially complete database queries rather than the all-ornothing reward function used here. 5 Conclusion and Future Work We have used user simulations that are n-gram models learned from COMMUNICATOR data to explore reinforcement learning of full dialogue strategies with some high-level context information (the user and and system s last dialogue moves). Almost all previous work (e.g. (Singh et al., 2002; Pietquin, 2004; Scheffler and Young, 2001)) has included only low-level information in state representations. In contrast, the exploration of very large state spaces to date relies on a hybrid supervised/reinforcement learning technique, where the reinforcement learning element has not been shown to significantly improve policies over the purely supervised case (Henderson et al., 2005). We presented our experimental environment, the reinforcement learner, the simulated users, and our methodology. In testing with the simulated COMMUNICATOR users, the new strategies learned with higher-level (i.e. dialogue move) information in the state outperformed the lowlevel RL baseline (only slot status information) 191
8 by 7.8% and the original COMMUNICATOR systems by 65.9%. These strategies obtained more reward than the RL baseline by filling and confirming all of the slots with fewer system turns on average. Moreover, the learned strategies show interesting emergent dialogue behaviour such as making effective use of the give help action and switching focus to different subtasks when the current subtask is proving problematic. In future work, we plan to use even more realistic user simulations, for example those developed following (Georgila et al., 2005a), which incorporate elements of goal-directed user behaviour. We will continue to investigate whether we can maintain tractability and learn superior strategies as we add incrementally more high-level contextual information to the state. At some stage this may necessitate using a generalisation method such as linear function approximation (Henderson et al., 2005). We also intend to use feature selection techniques (e.g. CFS subset evaluation (Rieser and Lemon, 2006)) on in order to determine which contextual features this suggests are important. We will also carry out a more direct comparison with the hybrid strategies learned by (Henderson et al., 2005). In the slightly longer term, we will test our learned strategies on humans using a full spoken dialogue system. We hypothesize that the strategies which perform the best in terms of task completion and user satisfaction scores (Walker et al., 2000) will be those learned with high-level dialogue context information in the state. Acknowledgements This work is supported by the ESRC and the TALK project, References John L. Austin How To Do Things With Words. Oxford University Press. Johan Bos, Ewan Klein, Oliver Lemon, and Tetsushi Oka Dipper: Description and formalisation of an information-state update dialogue system architecture. In 4th SIGdial Workshop on Discourse and Dialogue, Sapporo. Herbert H. Clark Using Language. Cambridge University Press. Weiland Eckert, Esther Levin, and Roberto Pieraccini User modeling for spoken dialogue system evaluation. In IEEE Workshop on Automatic Speech Recognition and Understanding. Matthew Frampton and Oliver Lemon Reinforcement Learning Of Dialogue Strategies Using The User s Last Dialogue Act. In IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems. Kallirroi Georgila, James Henderson, and Oliver Lemon. 2005a. Learning User Simulations for Information State Update Dialogue Systems. In Interspeech/Eurospeech: the 9th biennial conference of the International Speech Communication Association. Kallirroi Georgila, Oliver Lemon, and James Henderson. 2005b. Automatic annotation of COMMUNI- CATOR dialogue data for learning dialogue strategies and user simulations. In Ninth Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL: DIALOR). James Henderson, Oliver Lemon, and Kallirroi Georgila Hybrid Reinforcement/Supervised Learning for Dialogue Policies from COMMUNI- CATOR data. In IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems,. Staffan Larsson and David Traum Information state and dialogue management in the TRINDI Dialogue Move Engine Toolkit. Natural Language Engineering, 6(3-4): Esther Levin and Roberto Pieraccini A stochastic model of computer-human interaction for learning dialogue strategies. In Eurospeech, Rhodes,Greece. Olivier Pietquin A Framework for Unsupervised Learning of Dialogue Strategies. Presses Universitaires de Louvain, SIMILAR Collection. Verena Rieser and Oliver Lemon Using machine learning to explore human multimodal clarification strategies. In Proc. ACL. Konrad Scheffler and Steve Young Corpusbased dialogue simulation for automatic strategy learning and evaluation. In NAACL-2001 Workshop on Adaptation in Dialogue Systems, Pittsburgh, USA. John R. Searle Speech Acts. Cambridge University Press. Satinder Singh, Diane Litman, Michael Kearns, and Marilyn Walker Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research (JAIR). Richard Sutton and Andrew Barto Reinforcement Learning. MIT Press. Marilyn A. Walker, Candace A. Kamm, and Diane J. Litman Towards Developing General Models of Usability with PARADISE. Natural Language Engineering, 6(3). Marilyn A. Walker, Rebecca J. Passonneau, and Julie E. Boland Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems. In Meeting of the Association for Computational Linguistics, pages
A Sarsa based Autonomous Stock Trading Agent
A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA [email protected] Abstract This paper describes an autonomous
Specialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
D2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
THE THIRD DIALOG STATE TRACKING CHALLENGE
THE THIRD DIALOG STATE TRACKING CHALLENGE Matthew Henderson 1, Blaise Thomson 2 and Jason D. Williams 3 1 Department of Engineering, University of Cambridge, UK 2 VocalIQ Ltd., Cambridge, UK 3 Microsoft
Lecture 1: Introduction to Reinforcement Learning
Lecture 1: Introduction to Reinforcement Learning David Silver Outline 1 Admin 2 About Reinforcement Learning 3 The Reinforcement Learning Problem 4 Inside An RL Agent 5 Problems within Reinforcement Learning
VoiceXML-Based Dialogue Systems
VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based
Language Technology II Language-Based Interaction Dialogue design, usability,evaluation. Word-error Rate. Basic Architecture of a Dialog System (3)
Language Technology II Language-Based Interaction Dialogue design, usability,evaluation Manfred Pinkal Ivana Kruijff-Korbayová Course website: www.coli.uni-saarland.de/courses/late2 Basic Architecture
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results Alan W Black 1, Susanne Burger 1, Alistair Conkie 4, Helen Hastie 2, Simon Keizer 3, Oliver Lemon 2, Nicolas Merigaud 2, Gabriel
2 Technical Support Applications
Technical Support Dialog Systems: Issues, Problems, and Solutions Kate Acomb, Jonathan Bloom, Krishna Dayanidhi, Phillip Hunter, Peter Krogh, Esther Levin, Roberto Pieraccini SpeechCycle 535 W 34 th Street
An Environment Model for N onstationary Reinforcement Learning
An Environment Model for N onstationary Reinforcement Learning Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang pmchoi~cs.ust.hk dyyeung~cs.ust.hk lzhang~cs.ust.hk Department of Computer Science, Hong Kong
2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
Adaptive Mixed-Initiative Dialogue Management
Adaptive Mixed-Initiative Dialogue Management Gert Veldhuijzen van Zanten IPO, Center for Research on User-System Interaction, P.O. Box 213, 5600 MB Eindhoven, the Netherlands [email protected] ABSTRACT
Robustness of a Spoken Dialogue Interface for a Personal Assistant
Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia
On Intuitive Dialogue-based Communication and Instinctive Dialogue Initiative
On Intuitive Dialogue-based Communication and Instinctive Dialogue Initiative Daniel Sonntag German Research Center for Artificial Intelligence 66123 Saarbrücken, Germany [email protected] Introduction AI
Comparative Error Analysis of Dialog State Tracking
Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 [email protected] Abstract A primary motivation
Machine Learning. 01 - Introduction
Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge
Learning the Structure of Task-driven Human-Human Dialogs
Learning the Structure of Task-driven Human-Human Dialogs Srinivas Bangalore AT&T Labs-Research 180 Park Ave Florham Park, NJ 07932 [email protected] Giuseppe Di Fabbrizio AT&T Labs-Research 180 Park
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems
Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems Thomas Degris [email protected] Olivier Sigaud [email protected] Pierre-Henri Wuillemin [email protected]
Learning Agents: Introduction
Learning Agents: Introduction S Luz [email protected] October 22, 2013 Learning in agent architectures Performance standard representation Critic Agent perception rewards/ instruction Perception Learner Goals
ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
USER MODELLING IN ADAPTIVE DIALOGUE MANAGEMENT
USER MODELLING IN ADAPTIVE DIALOGUE MANAGEMENT Gert Veldhuijzen van Zanten IPO, Center for Research on User-System Interaction, P.O. Box 213, 5600 MB Eindhoven, the Netherlands [email protected]
Modeling Brief Alcohol Intervention Dialogue with MDPs for Delivery by ECAs
Modeling Brief Alcohol Intervention Dialogue with MDPs for Delivery by ECAs Ugan Yasavur, Christine Lisetti, Napthali Rishe School of Computing and Information Sciences Florida International University
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
PROMISE - A Procedure for Multimodal Interactive System Evaluation
PROMISE - A Procedure for Multimodal Interactive System Evaluation Nicole Beringer Ute Kartal Katerina Louka Florian Schiel Uli Türk Ludwig-Maximilians-Universität München Report Nr. 23 Mai 2002 . Mai
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Flexible Spoken Dialogue System based on User Models and Dynamic Generation of VoiceXML Scripts
Flexible Spoken Dialogue System based on User Models and Dynamic Generation of VoiceXML Scripts Kazunori Komatani Fumihiro Adachi Shinichi Ueno Tatsuya Kawahara Hiroshi G. Okuno Graduate School of Informatics
Beyond Reward: The Problem of Knowledge and Data
Beyond Reward: The Problem of Knowledge and Data Richard S. Sutton University of Alberta Edmonton, Alberta, Canada Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge
ARTIFICIALLY INTELLIGENT COLLEGE ORIENTED VIRTUAL ASSISTANT
ARTIFICIALLY INTELLIGENT COLLEGE ORIENTED VIRTUAL ASSISTANT Vishmita Yashwant Shetty, Nikhil Uday Polekar, Sandipan Utpal Das, Prof. Suvarna Pansambal Department of Computer Engineering, Atharva College
Using Markov Decision Processes to Solve a Portfolio Allocation Problem
Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................
D6.5 Annotated Data Archive
D6.5 Annotated Data Archive Olivier Pietquin, Helen Hastie, Srini Janarthanam, Simon Keizer, Ghislain Putois, Lonneke van der Plas Distribution: Public CLASSiC Computational Learning in Adaptive Systems
TD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Abstract
2 Reinforcement learning architecture
Dynamic portfolio management with transaction costs Alberto Suárez Computer Science Department Universidad Autónoma de Madrid 2849, Madrid (Spain) [email protected] John Moody, Matthew Saffell International
Feature Selection with Monte-Carlo Tree Search
Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1 Agenda 1 Feature Selection 2 Feature Selection
A Proposal for the use of Artificial Intelligence in Spend-Analytics
A Proposal for the use of Artificial Intelligence in Spend-Analytics Mark Bishop, Sebastian Danicic, John Howroyd and Andrew Martin Our core team Mark Bishop PhD studied Cybernetics and Computer Science
Analysis and Synthesis of Help-desk Responses
Analysis and Synthesis of Help-desk s Yuval Marom and Ingrid Zukerman School of Computer Science and Software Engineering Monash University Clayton, VICTORIA 3800, AUSTRALIA {yuvalm,ingrid}@csse.monash.edu.au
Design of an FX trading system using Adaptive Reinforcement Learning
University Finance Seminar 17 March 2006 Design of an FX trading system using Adaptive Reinforcement Learning M A H Dempster Centre for Financial Research Judge Institute of Management University of &
Automatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
Contributions to Gang Scheduling
CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,
Automated Collaborative Filtering Applications for Online Recruitment Services
Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,
Intrusion Detection System using Log Files and Reinforcement Learning
Intrusion Detection System using Log Files and Reinforcement Learning Bhagyashree Deokar, Ambarish Hazarnis Department of Computer Engineering K. J. Somaiya College of Engineering, Mumbai, India ABSTRACT
Using Query Browser in Dashboards 4.0: What You Need to Know
Using Query Browser in Dashboards 4.0: What You Need to Know The BusinessObjects 4.0 release gave birth to a few new interesting features to the BI toolkit. One key enhancement was the addition of the
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
Credit Card Market Study Interim Report: Annex 4 Switching Analysis
MS14/6.2: Annex 4 Market Study Interim Report: Annex 4 November 2015 This annex describes data analysis we carried out to improve our understanding of switching and shopping around behaviour in the UK
INF5820, Obligatory Assignment 3: Development of a Spoken Dialogue System
INF5820, Obligatory Assignment 3: Development of a Spoken Dialogue System Pierre Lison October 29, 2014 In this project, you will develop a full, end-to-end spoken dialogue system for an application domain
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy
Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis
Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis Felipe Augusto Nunes de Oliveira - GRR20112021 João Victor Tozatti Risso - GRR20120726 Abstract. The increasing
Inductive QoS Packet Scheduling for Adaptive Dynamic Networks
Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria [email protected] Abdelhamid MELLOUK LISSI Laboratory University
Open-EMR Usability Evaluation Report Clinical Reporting and Patient Portal
Open-EMR Usability Evaluation Report Clinical Reporting and Patient Portal By Kollu Ravi And Michael Owino Spring 2013 Introduction Open-EMR is a freely available Electronic Medical Records software application
AN ANALYSIS OF INSURANCE COMPLAINT RATIOS
AN ANALYSIS OF INSURANCE COMPLAINT RATIOS Richard L. Morris, College of Business, Winthrop University, Rock Hill, SC 29733, (803) 323-2684, [email protected], Glenn L. Wood, College of Business, Winthrop
Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.
Ming-Wei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at Urbana-Champaign, Urbana, IL 61801 +1 (917) 345-6125 [email protected] http://flake.cs.uiuc.edu/~mchang21 Research
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper Reliable Customer Service and Automation is the key for Success in Hosted Interactive Voice Response Speech Enabled
Genetic algorithm evolved agent-based equity trading using Technical Analysis and the Capital Asset Pricing Model
Genetic algorithm evolved agent-based equity trading using Technical Analysis and the Capital Asset Pricing Model Cyril Schoreels and Jonathan M. Garibaldi Automated Scheduling, Optimisation and Planning
Semantic Errors in SQL Queries: A Quite Complete List
Semantic Errors in SQL Queries: A Quite Complete List Christian Goldberg, Stefan Brass Martin-Luther-Universität Halle-Wittenberg {goldberg,brass}@informatik.uni-halle.de Abstract We investigate classes
Interface Design Rules
Interface Design Rules HCI Lecture 10 David Aspinall Informatics, University of Edinburgh 23rd October 2007 Outline Principles and Guidelines Learnability Flexibility Robustness Other Guidelines Golden
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
EFFECTIVENESS OF ARTIFICIAL INTELLIGENCE BASED
Ain Shams University Women s College For Arts, Science and Education Curricula & Teaching Methods Dept. EFFECTIVENESS OF ARTIFICIAL INTELLIGENCE BASED ON ADAPTIVE FEEDBACK VERSUS FIXED FEEDBACK IN MASTERING
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
A Fully Annotated Corpus for Studying the Effect of Cognitive Ageing on Users Interactions with Spoken Dialogue Systems
A Fully Annotated Corpus for Studying the Effect of Cognitive Ageing on Users Interactions with Spoken Dialogue Systems Kallirroi Georgila(1), Maria Wolters(1), Vasilis Karaiskos(1), Melissa Kronenthal(1),
Learning Trading Negotiations Using Manually and Automatically Labelled Data
Learning Trading Negotiations Using Manually and Automatically Labelled Data Heriberto Cuayáhuitl, Simon Keizer, Oliver Lemon School of Mathematical and Computer Sciences, Heriot-Watt University, United
A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity
A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity Dongwon Kang 1, In-Gwon Song 1, Seunghun Park 1, Doo-Hwan Bae 1, Hoon-Kyu Kim 2, and Nobok Lee 2 1 Department
Conversion Rate Optimisation Guide
Conversion Rate Optimisation Guide Improve the lead generation performance of your website - Conversion Rate Optimisation in a B2B environment Why read this guide? Work out how much revenue CRO could increase
Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith
Policy Discussion Briefing January 27 Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Introduction It is rare to open a newspaper or read a government
Tracking performance evaluation on PETS 2015 Challenge datasets
Tracking performance evaluation on PETS 2015 Challenge datasets Tahir Nawaz, Jonathan Boyle, Longzhen Li and James Ferryman Computational Vision Group, School of Systems Engineering University of Reading,
Multi-Objective Genetic Test Generation for Systems-on-Chip Hardware Verification
Multi-Objective Genetic Test Generation for Systems-on-Chip Hardware Verification Adriel Cheng Cheng-Chew Lim The University of Adelaide, Australia 5005 Abstract We propose a test generation method employing
The correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
Evaluating OO-CASE tools: OO research meets practice
Evaluating OO-CASE tools: OO research meets practice Danny Greefhorst, Matthijs Maat, Rob Maijers {greefhorst, maat, maijers}@serc.nl Software Engineering Research Centre - SERC PO Box 424 3500 AK Utrecht
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
CHAPTER-6 DATA WAREHOUSE
CHAPTER-6 DATA WAREHOUSE 1 CHAPTER-6 DATA WAREHOUSE 6.1 INTRODUCTION Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated analyses of their
Empirically Identifying the Best Genetic Algorithm for Covering Array Generation
Empirically Identifying the Best Genetic Algorithm for Covering Array Generation Liang Yalan 1, Changhai Nie 1, Jonathan M. Kauffman 2, Gregory M. Kapfhammer 2, Hareton Leung 3 1 Department of Computer
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Eligibility Traces. Suggested reading: Contents: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.
Eligibility Traces 0 Eligibility Traces Suggested reading: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998. Eligibility Traces Eligibility Traces 1 Contents:
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
Invited Applications Paper
Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK [email protected] [email protected]
NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES
NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES EDWARD ROBINSON & JOHN A. BULLINARIA School of Computer Science, University of Birmingham Edgbaston, Birmingham, B15 2TT, UK [email protected] This
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
