Flexible Spoken Dialogue System based on User Models and Dynamic Generation of VoiceXML Scripts
|
|
|
- Cornelius White
- 10 years ago
- Views:
Transcription
1 Flexible Spoken Dialogue System based on User Models and Dynamic Generation of VoiceXML Scripts Kazunori Komatani Fumihiro Adachi Shinichi Ueno Tatsuya Kawahara Hiroshi G. Okuno Graduate School of Informatics Kyoto University Yoshida-Hommachi, Sakyo, Kyoto , Japan Abstract We realize a telephone-based collaborative natural language dialogue system. Since natural language involves very various expressions, a large number of VoiceXML scripts need to be prepared to handle all possible input patterns. We realize flexible dialogue management for various user utterances by generating VoiceXML scripts dynamically. Moreover, we address appropriate user modeling in order to generate cooperative responses to each user. Specifically, we set up three dimensions of user models: skill level to the system, knowledge level on the target domain and the degree of hastiness. The models are automatically derived by decision tree learning using real dialogue data collected by the system. Experimental evaluation shows that the cooperative responses adapted to individual users serve as good guidance for novice users without increasing the dialogue duration for skilled users. Keywords: spoken dialogue system, user model, VoiceXML, cooperative responses, dialogue strategy 1 Introduction A Spoken dialogue system is one of the promising applications of the speech recognition and natural language understanding technologies. A typical task of spoken dialogue systems is database retrieval. Some IVR (interactive voice response) systems using the speech recognition technology are being put into practical use as its simplest form. According to the spread of cellular phones, spoken dialogue systems via telephone enable us to obtain information from various places without any other special apparatuses. In order to realize user-friendly interaction, spoken dialogue systems should be able to (1) accept various user utterances to enable mixed-initiative dialogue and (2) generate cooperative responses. Currently, a lot of IVR systems via telephone operate by using VoiceXML, which is a script language to prescribe procedures of spoken dialogues. However, only the next behaviors corresponding to every input are prescribed in the VoiceXML scripts, so the dialogue procedure is basically designed as system-initiated one, in which the system asked required items one by one. In order to realize mixedinitiative dialogue, the system should be able to accept various user-initiated utterances. By allowing to accept various user utterances, the combination of words included in the utterances accordingly gets enormous, and then it is practically impossible to prepare VoiceXML scripts that correspond to the enormous combinations of input words in advance. It is also difficult to generate cooperative responses adaptively in the above framework. We propose a framework to generate VoiceXML scripts dynamically in order to realize the mixedinitiative dialogue, in which the system is needed to accept various user utterances. This framework realizes flexible dialogue management without requiring
2 preparation for a large number of VoiceXML scripts in advance. Furthermore, it enables various behaviors adaptive to the dialogue situations such as obtained query results. Another problem to realize user-friendly interaction is how to generate cooperative responses. When we consider the responses generated from the system side, the dialogue strategies, which determine when to make guidance and what the system should tell to the user, are the essential factors in spoken dialogue systems. There are many studies in respect of the dialogue strategy such as confirmation management using confidence measures of speech recognition results (Komatani and Kawahara, 2000; Hazen et al., 2000), dynamic change of dialogue initiative (Litman and Pan, 2000; Chu-Carroll, 2000; Lamel et al., 1999), and addition of cooperative contents to system responses (Sadek, 1999). Nevertheless, whether a particular response is cooperative or not depends on individual user s characteristic. In order to adapt the system s behavior to individual users, it is necessary to model the user s patterns (Kass and Finin, 1988). Most of conventional studies on user models have focused on the knowledge of users. Others tried to infer and utilize user s goals to generate responses adapted to the user (van Beek, 1987; Paris, 1988). Elzer et al. (2000) proposed a method to generate adaptive suggestions according to users preferences. However, these studies depend on knowledge of the target domain greatly, and therefore the user models need to be deliberated manually to be applied to new domains. Moreover, they assumed that the input is text only, which does not contain errors. We propose more comprehensive user models to generate user-adapted responses in spoken dialogue systems taking account of information specific to spoken dialogue. Spoken utterances include various information such as the interval between the utterances, the presence of barge-in and so on, which can be utilized to judge the user s character. These features also possess generality in spoken dialogue systems because they are not dependent on domainspecific knowledge. As user models in spoken dialogue systems, Eckert et al. (1997) defined stereotypes of users such as patient, submissive and experienced, in order to evaluate spoken dialogue systems by simulation. We introduce user models not for defining users behaviors beforehand, but for detecting users patterns in real-time interaction. We define three dimensions in the user models: skill level to the system, knowledge level on the target domain and degree of hastiness. The user models are trained by decision tree learning algorithm, using real data collected from the Kyoto city bus information system. Then, we implement the user models on the system and evaluate them using data collected with 20 novice users. 2 Flexible Spoken Dialogue System based on Dynamic Generation of VoiceXML Scripts VoiceXML 1 is a script language to prescribe procedures in spoken dialogues mainly on telephone, and is becoming to a standard language of IVR systems. The VoiceXML scripts consist of three parts: (1) specifications of system s prompts, (2) specifications of grammars to accept a user s utterance, and (3) description of the next behaviors. However, most of existing services using the VoiceXML imposes rigid interaction, in which user utterances are restricted by system-initiated prompts and a user is accordingly allowed to specify only requested items one by one. It is more user-friendly that users can freely convey their requests by natural language expressions. We present a framework to realize flexible interaction by generating VoiceXML scripts dynamically (Pargellis et al., 1999; Nyberg et al., 2002). The framework enables users to express their requests by natural language even in VoiceXML-based systems. Furthermore, cooperative responses in the Kyoto city bus information system that has been developed in our laboratory are also presented in this section. 2.1 Dynamic Generation of VoiceXML Scripts In VoiceXML scripts, acceptable keywords and corresponding next states must be explicitly specified. However, since there exists enormous combinations of keywords in natural language expressions, it is practically impossible to describe all VoiceXML scripts that correspond to the combinations. Then, we introduce the framework in which VoiceXML 1 VoiceXML Forum.
3 speech speech recognizer keywords dialogue manager user grammar rules synthesized speech TTS engine VWS (Voice Web Server) database VoiceXML response sentences VoiceXML generator Front end CGI script Figure 1: Overview of spoken dialogue system based on dynamic generation of VoiceXML scripts scripts are generated dynamically to enable the system to accept natural language expressions. Figure 1 shows the overview of the framework. The front end that operates based on VoiceXML scripts is separated from the dialogue management portion, which accepts speech recognition results and generates corresponding VoiceXML scripts. The user utterance is recognized based on a grammar rule specified in VoiceXML scripts, and keywords extracted from a speech recognition result are passed to a CGI script. The CGI script retrieves corresponding information from the database on Web, and generates a VoiceXML script for the succeeding interaction. If sufficient information is not obtained from a user utterance, a script that prompts to fill remaining contents is generated, and if the user utterance contains ambiguity, a script that makes a disambiguating question is generated. Consequently, the generation of VoiceXML scripts enables to accept natural language expressions without preparing a large number of the scripts corresponding to various inputs beforehand. The framework also enables to generate cooperative responses adapted to the situation such as retrieval results without spoiling portability. Sys: Please tell me your current bus stop, your destination or the specific bus route. User: Shijo-Kawaramachi. Sys: Do you take a bus from Shijo-Kawaramachi? User: Yes. Sys: Where will you get off the bus? User: Arashiyama. Sys: Do you go from Shijo-Kawaramachi to Arashiyama? User: Yes. Sys: Bus number 11 bound for Arashiyama has departed Sanjo-Keihanmae, two bus stops away. Figure 2: Example dialogue of the bus system 2.2 Kyoto City Bus Information System We have developed the Kyoto city bus information system, which locates the bus a user wants to take, and tell him/her how long it will take before arriving. The system can be accessed via telephone including cellular phones 2. From any places, users can easily get the information on the bus that changes every minute. Users are requested to input the bus stop to get on, the destination, or the bus route number by speech, and consequently get the bus information. There are 652 bus stops and 85 bus routes in Kyoto city. The bus stops can also be specified by the name of famous places or public facilities nearby. Figure 2 shows a simple example of the dialogue. 2.3 Generation of Cooperative Responses in Kyoto City Bus Information System In spoken dialogue systems, it is important to generate cooperative responses by guessing the user s intention. The cooperative responses consist of adding information more than the user has requested, telling the reason when there is no query result, presenting the alternatives, and so on (Sadek, 1999). In order to generate cooperative responses, it is necessary to change responses according to the situations. Since we separate language-understanding modules from individual VoiceXML scripts that are generated dynamically, our system can generate appropriate responses cooperatively according to query results. The examples of the cooperative responses implemented in our Kyoto city bus information system are listed below
4 (1) Suggestive answers The information related to a query result is offered when the query result is negative. (Example) User: From Hyakumanben to Kyoto Station by bus number 17. Sys: Bus number 17 does not come for a while, but bus number 206 has departed Takano, four bus stops away. (2) Corrective answers The reason is presented when there is no query result or when a user utterance contains an error. (Example) User: From Kyoto Station to Arashiyama. Sys: There is no bus that directly goes from Kyoto station to Arashiyama. (3) Intensional answers If several query results are obtained, generating intensional answer is more cooperative than listing all the results. (Example) User: From Shijo-Kawaramachi to Shijo-Karasuma. Sys: All buses to Shijo-Karasuma do not come for a while. (4) Completion answers It is cooperative to generate responses including information more than the user s request. However, whether the information should be presented or not differs in individual users. (Example) User: From Kyoto station by bus number 5. Sys: Bus number 5 will depart at 17:34 from platform A1. (There are a lot of platforms in Kyoto station.) 3 Response Generation Using User Models We propose comprehensive user models to generate user-adaptive responses taking account of information specific to spoken dialogue. Whether a particular response is regarded as cooperative depends on individual user s characteristics. So, we address appropriate user modeling in order to generate cooperative responses to the users. 3.1 Classification of User Models We define three dimensions as user models listed below. Skill level to the system Knowledge level on the target domain Degree of hastiness Skill Level to the System Since spoken dialogue systems are not widespread yet, there arises a difference in the skill level of users in operating the systems. It is desirable that the system changes its behavior including response generation and initiative management in accordance with the skill level of the user. In conventional systems, a system-initiated guidance has been invoked on the spur of the moment either when the user says nothing or when speech recognition is not successful. In our framework, we address a radical solution for the unskilled users by modeling the skill level as the user s property before such a problem arises. Knowledge Level on the Target Domain There also exists a difference in the knowledge level on the target domain among users. Thus, it is necessary for the system to change information to present to users. For example, it is not cooperative to tell too detailed information to strangers. On the other hand, for inhabitants, it is useful to omit too obvious information and to output more detailed information. Therefore, we introduce a dimension that represents the knowledge level on the target domain. Degree of Hastiness In speech communications, it is more important to present information promptly and concisely compared with the other communication modes such as browsing. Especially in the bus system, the conciseness is preferred because the bus information is urgent to most users. Therefore, we also take account of degree of hastiness of the user, and accordingly change the system s responses. 3.2 Response Generation Strategy Using User Models Next, we describe the response generation strategies adapted to individual users based on the proposed
5 user models: skill level, knowledge level and hastiness. Basic design of dialogue management is based on mixed-initiative dialogue, in which the system makes follow-up questions and guidance if necessary while allowing a user to utter freely. It is investigated to add various contents to the system responses as cooperative responses in (Sadek, 1999). Such additional information is usually cooperative, but some people may feel such a response redundant. Thus, we introduce the user models and control the generation of additional information. By introducing the proposed user models, the system changes generated responses by the following two aspects: dialogue procedure and contents of responses. Dialogue Procedure The dialogue procedure is changed based on the skill level and the hastiness. If a user is identified as having the high skill level, the dialogue management is carried out in a user-initiated manner; namely, the system generates only open-ended prompts. On the other hand, when user s skill level is detected as low, the system takes an initiative and prompts necessary items in order. When the degree of hastiness is low, the system makes confirmation on the input contents. Conversely, when the hastiness is detected as high, such a confirmation procedure is omitted; namely, the system immediately makes a query and outputs the result without making such a confirmation. Contents of Responses Information that should be included in the system response can be classified into the following two items. 1. Dialogue management information 2. Domain-specific information The dialogue management information specifies how to carry out the dialogue including the instruction on user s expression for yes/no questions like Please reply with either yes or no. and the explanation about the following dialogue procedure like Now I will ask in order. This dialogue management information is determined by the user s skill skill level dialogue state low initial state otherwise presense of barge-in the maximum number of filled slots skill level low 0.07> rate of no input skill level high 3 average of recognition score 58.8>= 58.8< Figure 3: Decision tree for the skill level skill level high level to the system, and is added to system responses when the skill level is considered as low. The domain-specific information is generated according to the user s knowledge level on the target domain. Namely, for users unacquainted with the local information, the system adds the explanation about the nearest bus stop, and omits complicated contents such as a proposal of another route. The contents described above are also controlled by the hastiness. For users who are not in hurry, the system generates the additional contents that correspond to their skill level and knowledge level as cooperative responses. On the other hand, for hasty users, the contents are omitted to prevent the dialogue from being redundant. 3.3 Classification of User based on Decision Tree In order to implement the proposed user models as classifiers, we adopt a decision tree. It is constructed by decision tree learning algorithm C5.0 (Quinlan, 1993) with data collected by our dialogue system. Figure 3 shows an example of the derived decision tree for the skill level. We use the features listed in Figure 4. They include not only semantic information contained in the utterances but also information specific to spoken dialogue systems such as the silence duration prior to the utterance, the presence of barge-in and so on. Except for the last category of Figure 4 including attribute of specified bus stops, most of the features are domain-independent. The classification of each dimension is done for every user utterance except for knowledge level. The model of a user can change during a dialogue. Features extracted from utterances are accumulated as history information during the session.
6 features obtained from a single utterance dialogue state (defined by already filled slots) presence of barge-in lapsed time of the current utterance recognition result (something recognized / uncertain / no input) score of speech recognizer the number of filled slots by the current utterance features obtained from the session the number of utterances dialogue state of the previous utterance lapsed time from the beginning of the session the number of the repetition of a same question the average number of the repetition of a same question ratio of the time of user utterances in whole elapsed time ratio of the occurrence of barge-in out of the whole number of utterances recognition result of the previous utterance ratio of something recognized ratio of getting uncertain results ratio of no input the number of barge-in the number of something recognized the number of getting uncertain results the number of no input average of recognition scores the maximum number of slots filled by a single utterance features obtained from the session (used only in classification of knowledge level) attribute of specified bus stops the way of specifying bus stops (whether a bus stop is specified by its correct name or not) ratio of each attribute of specified bus stops ratio of each way of specifying bus stops Figure 4: Features to classify the user models 3.4 Decision Tree Learning for User Models We train and evaluate the decision tree for the user models using dialogue data collected by our syscase: [skill level: low][knowledge level: low] [hastiness: low] Sys: Please tell me your current bus stop, your destination or the specific bus route number. User: Maruyama Park. Sys: Let me confirm one by one. ( The explanation is added by [skill level: low].) Will you get on at Maruyama Park? ( The content is confirmed by [hastiness: low].) Please reply with yes or no. ( The instruction is added by [skill level: low].) User: Yes. Sys: Where will you get off the bus? Please specify by a bus stop name or a well-known landmark. ( The explanation is added by [skill level: low].) User: Tofukuji temple. Sys: Will you get off the bus at Tofukuji temple? Please reply with yes or no. User: Yes. Sys: The following buses are located. ( The explanation is added by [skill level: low].) The nearest bus stop to Maruyama Park is Gion. ( The domain information is added by [knowledge level: low].) Bus number 202 bound for Nishioji-Kujo has departed... Higashiyama-Sanjo, which is two stops away. Figure 5: An example dialogue with the proposed user models low indeterminable high total skill level knowledge level hastiness Table 1: Number of manually annotated labels for decision tree learning Figure 5 shows an example of the system behavior with the proposed user models. The skill level is classified as being low by the decision tree, because the first user s utterance includes only one content word. Then, dialogue procedure is changed to the system-initiated one. Similarly, the hastiness is classified as being low by the decision tree, and the system includes the explanation on the dialogue procedure and the instruction on the expression in the responses. They are omitted if the hastiness is identified as high. tem. The data was collected from December 10th 2001 to May 10th The number of the sessions (telephone calls) is 215, and the total number of utterances included in the sessions is We annotated the subjective labels of the user models by hand. The annotator judges the user models for every utterance based on the recorded speech data and logs. The labels were given to the three dimensions described in section 3.1 among high, indeterminable or low. It is possible that the annotated model of a user changes during a dialogue, especially from indeterminable to low or high. The number of the labeled utterances is shown in Table 1.
7 condition #1 #2 #3 skill level 80.8% 75.3% 85.6% knowledge level 73.9% 63.7% 78.2% hastiness 74.9% 73.7% 78.6% user VWS (Voice Web Server) Table 2: Classification accuracy of the proposed user models recognition results (including features other than language info.) recognition results (keywords) VoiceXML Using the labeled data, we trained the decision tree and evaluated the classification accuracy of the proposed user models. All the experiments were carried out by the method of 10-fold cross validation. The process, in which one tenth of all data is used as the test data, and the remainder is used as the training data, is repeated ten times, and the average of the classification accuracy is computed. The result is shown in Table 2. The conditions #1, #2 and #3 in Table 2 are described as follows. #1: The 10-fold cross validation is carried out per utterance. #2: The 10-fold cross validation is carried out per session (call). #3: We calculate the accuracy under more realistic condition. The accuracy is calculated not in three classes (high / indeterminable / low) but in two classes that actually affect the dialogue strategies. For example, the accuracy for the skill level is calculated for the two classes: low and the others. As to the classification of knowledge level, the accuracy is calculated for dialogue sessions, because the features such as the attribute of a specified bus stop are not obtained in every utterance. Moreover, in order to smooth unbalanced distribution of the training data, a cost corresponding to the reciprocal ratio of the number of samples in each class is introduced. By the cost, the chance rate of two classes becomes 50%. The difference between condition #1 and #2 is that the training was carried out in a speaker-closed or speaker-open manner. The former shows better performance. The result in condition #3 shows useful accuracy in the skill level. The following features play important part in the decision tree for the skill level: user model identifier CGI user profiles dialogue manager database on Web VoiceXML generator the system except for proposed user models Figure 6: Overview of the Kyoto city bus information system with user models the number of filled slots by the current utterance, presence of barge-in and ratio of no input. For the knowledge level, recognition result (something recognized / uncertain / no input), ratio of no input and the way to specify bus stops (whether a bus stop is specified by its exact name or not) are effective. The hastiness is classified mainly by the three features: presence of barge-in, ratio of no input and lapsed time of the current utterance. 3.5 System Overview Figure 6 shows an overview of the Kyoto city bus information system with the user models. The system operates by generating VoiceXML scripts dynamically as described in section 2.1. The real-time bus information database is provided on the Web, which can be accessed via Internet. Then, we explain the modules in the following. VWS (Voice Web Server) The Voice Web Server drives the speech recognition engine and the TTS (Text-To-Speech) module accordingly to the specifications by the generated VoiceXML script. Speech Recognizer The speech recognizer decodes user utterances based on specified grammar rules and vocabulary, which are defined by VoiceXML at each dialogue state.
8 Dialogue Manager The dialogue manager generates response sentences based on recognition results (bus stop names or a route number) received from the VWS. If sufficient information to locate a bus is obtained, it retrieves the corresponding bus information on the Web. VoiceXML Generator This module dynamically generates VoiceXML scripts that contain response sentences and specifications of speech recognition grammars, which are given by the dialogue manager. User model identifier This module classifies user s characters based on the user models using features specific to spoken dialogue as well as semantic attributes. The obtained user profiles are sent to the dialogue manager, and are utilized in the dialogue management and response generation. 4 Experimental Evaluation of the System with User Models We evaluated the system with the proposed user models using 20 novice subjects who had not used the system. The experiment was performed in the laboratory under adequate control. For the speech input, the headset microphone was used. 4.1 Experiment Procedure First, we explained the outline of the system to subjects and gave a document in which experiment conditions and scenarios were described. We prepared two sets of eight scenarios. Subjects were requested to acquire the bus information according to the scenarios using the system with/without the user models. In the scenarios, neither the concrete names of bus stops nor the bus number were given. For example, one of the scenarios was as follows: You are in Kyoto for sightseeing. After visiting the Ginkakuji temple, you go to Maruyama Park. Supposing such a situation, please get information on the bus. We also set the constraint in order to vary the subjects hastiness such as Please hurry as much as possible in order to save the charge of your cellular phone. The subjects were also told to look over questionnaire items before the experiment, and filled in them duration (sec.) # turn group 1 with UM (with UM w/o UM) w/o UM group 2 w/o UM (w/o UM with UM) with UM UM: User Model Table 3: Duration and the number of turns in dialogue after using each system. This aims to reduce the subject s cognitive load and possible confusion due to switching the systems (Over, 1999). The questionnaire consisted of eight items, for example, When the dialogue did not go well, did the system guide intelligibly? We set seven steps for evaluation about each item, and the subject selected one of them. Furthermore, subjects were asked to write down the obtained information: the name of the bus stop to get on, the bus number and how much time it takes before the bus arrives. With this procedure, we planned to make the experiment condition close to the realistic one. The subjects were divided into two groups; a half (group 1) used the system in the order of with user models without user models, the other half (group 2) used in the reverse order. The dialogue management in the system without user models is also based on the mixed-initiative dialogue. The system generates follow-up questions and guidance if necessary, but behaves in a fixed manner. Namely, additional cooperative contents corresponding to skill level described in section 3.2 are not generated and the dialogue procedure is changed only after recognition errors occur. The system without user models behaves equivalently to the initial state of the user models: the hastiness is low, the knowledge level is low and the skill level is high. 4.2 Results All of the subjects successfully completed the given task, although they had been allowed to give up if the system did not work well. Namely, the task success rate is 100%. Average dialogue duration and the number of turns in respective cases are shown in Table 3. Though the users had not experienced the system at
9 group 1 with UM 0.72 (with UM w/o UM) w/o UM 0.70 group 2 w/o UM 0.41 (w/o UM with UM) with UM 0.63 Table 4: Ratio of utterances for which the skill level was judged as high all, they got accustomed to the system very rapidly. Therefore, as shown in Table 3, the duration and the number of turns were decreased obviously in the latter half of the experiment in both groups. However, in the initial half of the experiment, the group 1 completed with significantly shorter dialogue than group 2. This means that the incorporation of the user models is effective for novice users. Table 4 shows a ratio of utterances for which the skill level was identified as high. The ratio is calculated by dividing the number of utterances that were judged as high skill level by the number of all utterances. The ratio is much larger for group 1 who initially used the system with user models. This fact means that the novice users got accustomed to the system more rapidly with the user models, because they were instructed on the usage by cooperative responses generated when the skill level is low. The results demonstrate that cooperative responses generated according to the proposed user models can serve as good guidance for novice users. In the latter half of the experiment, the dialogue duration and the number of turns were almost same between the two groups. This result shows that the proposed models prevent the dialogue from becoming redundant for skilled users, although generating cooperative responses for all users made the dialogue verbose in general. It suggests that the proposed user models appropriately control the generation of cooperative responses by detecting characters of individual users. 5 Conclusions We have presented a framework to realize flexible interaction by dynamically generating VoiceXML scripts. This framework realizes mixed-initiative dialogues and the generation of cooperative responses in VoiceXML-based systems. We have also proposed and evaluated user models for generating cooperative responses adaptively to individual users. The proposed user models consist of the three dimensions: skill level to the system, knowledge level on the target domain and the degree of hastiness. The user models are identified by decision tree using features specific to spoken dialogue systems as well as semantic attributes. They are automatically derived by decision tree learning, and all features used for skill level and hastiness are independent of domain-specific knowledge. So, it is expected that the derived user models can be generally used in other domains. The experimental evaluation with 20 novice users shows that the skill level of novice users was improved more rapidly by incorporating user models, and accordingly the dialogue duration becomes shorter more immediately. The result is achieved by the generated cooperative responses based on the proposed user models. The proposed user models also suppress the redundancy by changing the dialogue procedure and selecting contents of responses. Thus, the framework generating VoiceXML scripts dynamically and the proposed user models realize a user-adaptive dialogue strategies, in which the generated cooperative responses serve as good guidance for novice users without increasing the dialogue duration for skilled users. References Jennifer Chu-Carroll MIMIC: An adaptive mixed initiative spoken dialogue system for information queries. In Proc. of the 6th Conf. on applied Natural Language Processing, pages Wieland Eckert, Esther Levin, and Roberto Pieraccini User modeling for spoken dialogue system evaluation. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, pages Stephanie Elzer, Jennifer Chu-Carroll, and Sandra Carberry Recognizing and utilizing user preferences in collaborative consultation dialogues. In Proc. of the 4th Int l Conf. on User Modeling, pages Timothy J. Hazen, Theresa Burianek, Joseph Polifroni, and Stephanie Seneff Integrating recognition confidence scoring with language understanding and dialogue modeling. In Proc. Int l Conf. Spoken Language Processing (ICSLP).
10 Robert Kass and Tim Finin Modeling the user in natural language systems. Computational Linguistics, 14(3):5 22. Kazunori Komatani and Tatsuya Kawahara Flexible mixed-initiative dialogue management using concept-level confidence measures of speech recognizer output. In Proc. Int l Conf. Computational Linguistics (COLING), pages Lori Lamel, Sophie Rosset, Jean-Luc Gauvain, and Samir Bennacef The LIMSI ARISE system for train travel information. In IEEE Int l Conf. Acoust., Speech & Signal Processing (ICASSP). Diane J. Litman and Shimei Pan Predicting and adapting to poor speech recognition in a spoken dialogue system. In Proc. of the 17th National Conference on Artificial Intelligence (AAAI2000). Eric Nyberg, Teruko Mitamura, Paul Placeway, Michael Duggan, and Nobuo Hataoka Dialogxml: Extending voicexml for dynamic dialog management. In Proc. of Human Language Technology 2002 (HLT2002), pages Paul Over Trec-7 interactive track report. In Proc. of the 7th Text REtrieval Conference (TREC7). Andrew Pargellis, Jeff Kuo, and Chin-Hui Lee Automatic dialogue generator creates user defined applications. In Proc. European Conf. Speech Commun. & Tech. (EUROSPEECH). Cecile L. Paris Tailoring object descriptions to a user s level of expertise. Computational Linguistics, 14(3): J. Ross Quinlan C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. David Sadek Design considerations on dialogue systems: From theory to technology -the case of artimis-. In Proc. ESCA workshop on Interactive Dialogue in Multi-Modal Systems. Peter van Beek A model for generating better explanations. In Proc. of the 25th Annual Meeting of the Association for Computational Linguistics (ACL- 87), pages
Specialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML
Speech Recognition of a Voice-Access Automotive Telematics System using VoiceXML Ing-Yi Chen Tsung-Chi Huang [email protected] [email protected] Department of Computer Science and Information
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
VoiceXML-Based Dialogue Systems
VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based
ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Christian Leibold CMU Communicator 12.07.2005. CMU Communicator. Overview. Vorlesung Spracherkennung und Dialogsysteme. LMU Institut für Informatik
CMU Communicator Overview Content Gentner/Gentner Emulator Sphinx/Listener Phoenix Helios Dialog Manager Datetime ABE Profile Rosetta Festival Gentner/Gentner Emulator Assistive Listening Systems (ALS)
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
1. Introduction to Spoken Dialogue Systems
SoSe 2006 Projekt Sprachdialogsysteme 1. Introduction to Spoken Dialogue Systems Walther v. Hahn, Cristina Vertan {vhahn,vertan}@informatik.uni-hamburg.de Content What are Spoken dialogue systems? Types
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper Reliable Customer Service and Automation is the key for Success in Hosted Interactive Voice Response Speech Enabled
USER MODELLING IN ADAPTIVE DIALOGUE MANAGEMENT
USER MODELLING IN ADAPTIVE DIALOGUE MANAGEMENT Gert Veldhuijzen van Zanten IPO, Center for Research on User-System Interaction, P.O. Box 213, 5600 MB Eindhoven, the Netherlands [email protected]
An Introduction to VoiceXML
An Introduction to VoiceXML ART on Dialogue Models and Dialogue Systems François Mairesse University of Sheffield [email protected] http://www.dcs.shef.ac.uk/~francois Outline What is it? Why
Web page creation using VoiceXML as Slot filling task. Ravi M H
Web page creation using VoiceXML as Slot filling task Ravi M H Agenda > Voice XML > Slot filling task > Web page creation for user profile > Applications > Results and statistics > Language model > Future
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
A secure face tracking system
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
1 Introduction. An Emotion-Aware Voice Portal
An Emotion-Aware Voice Portal Felix Burkhardt*, Markus van Ballegooy*, Roman Englert**, Richard Huber*** T-Systems International GmbH*, Deutsche Telekom Laboratories**, Sympalog Voice Solutions GmbH***
Information & Data Visualization. Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN [email protected]
Information & Data Visualization Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN [email protected] 1 Introduction Contents Self introduction & Research purpose Social Data Analysis Related Works
Voice Driven Animation System
Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take
Personal Voice Call Assistant: VoiceXML and SIP in a Distributed Environment
Personal Voice Call Assistant: VoiceXML and SIP in a Distributed Environment Michael Pucher +43/1/5052830-98 [email protected] Julia Tertyshnaya +43/1/5052830-45 [email protected] Florian Wegscheider +43/1/5052830-45
Australian Standard. Interactive voice response systems user interface Speech recognition AS 5061 2008 AS 5061 2008
AS 5061 2008 AS 5061 2008 Australian Standard Interactive voice response systems user interface Speech recognition This Australian Standard was prepared by Committee IT-022, Interactive Voice Response
How do non-expert users exploit simultaneous inputs in multimodal interaction?
How do non-expert users exploit simultaneous inputs in multimodal interaction? Knut Kvale, John Rugelbak and Ingunn Amdal 1 Telenor R&D, Norway [email protected], [email protected], [email protected]
2 Technical Support Applications
Technical Support Dialog Systems: Issues, Problems, and Solutions Kate Acomb, Jonathan Bloom, Krishna Dayanidhi, Phillip Hunter, Peter Krogh, Esther Levin, Roberto Pieraccini SpeechCycle 535 W 34 th Street
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
D2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
Robustness of a Spoken Dialogue Interface for a Personal Assistant
Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Development of a personal agenda and a distributed meeting scheduler based on JADE agents
Development of a personal agenda and a distributed meeting scheduler based on JADE agents Miguel Ángel Sánchez Álvaro Rayón Alonso Grupo de Sistemas Inteligentes Departamento de Ingeniería Telemática Universidad
Virtual Patients: Assessment of Synthesized Versus Recorded Speech
Virtual Patients: Assessment of Synthesized Versus Recorded Speech Robert Dickerson 1, Kyle Johnsen 1, Andrew Raij 1, Benjamin Lok 1, Amy Stevens 2, Thomas Bernard 3, D. Scott Lind 3 1 Department of Computer
A Framework for Personalized Healthcare Service Recommendation
A Framework for Personalized Healthcare Service Recommendation Choon-oh Lee, Minkyu Lee, Dongsoo Han School of Engineering Information and Communications University (ICU) Daejeon, Korea {lcol, niklaus,
How To Develop A Voice Portal For A Business
VoiceMan Universal Voice Dialog Platform VoiceMan The Voice Portal with many purposes www.sikom.de Seite 2 Voice Computers manage to do ever more Modern voice portals can... extract key words from long
Comparative Error Analysis of Dialog State Tracking
Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 [email protected] Abstract A primary motivation
Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
Moving Enterprise Applications into VoiceXML. May 2002
Moving Enterprise Applications into VoiceXML May 2002 ViaFone Overview ViaFone connects mobile employees to to enterprise systems to to improve overall business performance. Enterprise Application Focus;
INF5820, Obligatory Assignment 3: Development of a Spoken Dialogue System
INF5820, Obligatory Assignment 3: Development of a Spoken Dialogue System Pierre Lison October 29, 2014 In this project, you will develop a full, end-to-end spoken dialogue system for an application domain
NTT DOCOMO Technical Journal. Knowledge Q&A: Direct Answers to Natural Questions. 1. Introduction. 2. Overview of Knowledge Q&A Service
Knowledge Q&A: Direct Answers to Natural Questions Natural Language Processing Question-answering Knowledge Retrieval Knowledge Q&A: Direct Answers to Natural Questions In June, 2012, we began providing
Interactive Information Visualization of Trend Information
Interactive Information Visualization of Trend Information Yasufumi Takama Takashi Yamada Tokyo Metropolitan University 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan [email protected] Abstract This paper
Language Technology II Language-Based Interaction Dialogue design, usability,evaluation. Word-error Rate. Basic Architecture of a Dialog System (3)
Language Technology II Language-Based Interaction Dialogue design, usability,evaluation Manfred Pinkal Ivana Kruijff-Korbayová Course website: www.coli.uni-saarland.de/courses/late2 Basic Architecture
Smart Queue Scheduling for QoS Spring 2001 Final Report
ENSC 833-3: NETWORK PROTOCOLS AND PERFORMANCE CMPT 885-3: SPECIAL TOPICS: HIGH-PERFORMANCE NETWORKS Smart Queue Scheduling for QoS Spring 2001 Final Report By Haijing Fang([email protected]) & Liu Tang([email protected])
How To Develop Software
Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which
An Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham
Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control Phudinan Singkhamfu, Parinya Suwanasrikham Chiang Mai University, Thailand 0659 The Asian Conference on
2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results Alan W Black 1, Susanne Burger 1, Alistair Conkie 4, Helen Hastie 2, Simon Keizer 3, Oliver Lemon 2, Nicolas Merigaud 2, Gabriel
KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037 medical diagnosis, ontology, subjective intelligence, reasoning, fuzzy rules Hamido FUJITA 1 KNOWLEDGE-BASED IN MEDICAL DECISION
Standard Languages for Developing Multimodal Applications
Standard Languages for Developing Multimodal Applications James A. Larson Intel Corporation 16055 SW Walker Rd, #402, Beaverton, OR 97006 USA [email protected] Abstract The World Wide Web Consortium
Real Time Bus Monitoring System by Sharing the Location Using Google Cloud Server Messaging
Real Time Bus Monitoring System by Sharing the Location Using Google Cloud Server Messaging Aravind. P, Kalaiarasan.A 2, D. Rajini Girinath 3 PG Student, Dept. of CSE, Anand Institute of Higher Technology,
Multimodal Web Content Conversion for Mobile Services in a U-City
Multimodal Web Content Conversion for Mobile Services in a U-City Soosun Cho *1, HeeSook Shin *2 *1 Corresponding author Department of Computer Science, Chungju National University, 123 Iryu Chungju Chungbuk,
Advice for Class Teachers. Moderating pupils reading at P 4 NC Level 1
Advice for Class Teachers Moderating pupils reading at P 4 NC Level 1 Exemplars of writing at P Scales and into National Curriculum levels. The purpose of this document is to provide guidance for class
PRIVATE TEXTUAL NETWORK USING GSM ARCHITECTURE
PRIVATE TEXTUAL NETWORK USING GSM ARCHITECTURE * Qurban A. Memon, **Zubair Shaikh and ***Ghulam Muhammad * Associate Professor; **Associate Professor, ***Senior Year Student Karachi Institute of Information
Optimization of Image Search from Photo Sharing Websites Using Personal Data
Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search
Abstract. Avaya Solution & Interoperability Test Lab
Avaya Solution & Interoperability Test Lab Application Notes for LumenVox Automated Speech Recognizer, LumenVox Text-to-Speech Server and Call Progress Analysis with Avaya Aura Experience Portal Issue
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
The Big Data methodology in computer vision systems
The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS. Except where reference is made to the work of others, the work described in this thesis is.
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. Weihong
LIVE SPEECH ANALYTICS. ProduCt information
ProduCt information LIVE SPEECH ANALYTICS (EVC) ELSBETH is the first software solution offering fully automated quality assurance and call optimization. Innovative technology allows monitoring of ALL calls,
A Development Tool for VoiceXML-Based Interactive Voice Response Systems
A Development Tool for VoiceXML-Based Interactive Voice Response Systems Cheng-Hsiung Chen Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University Chiayi,
Do you know? "7 Practices" for a Reliable Requirements Management. by Software Process Engineering Inc. translated by Sparx Systems Japan Co., Ltd.
Do you know? "7 Practices" for a Reliable Requirements Management by Software Process Engineering Inc. translated by Sparx Systems Japan Co., Ltd. In this white paper, we focus on the "Requirements Management,"
CLASSIFIED EMPLOYEE PERFORMANCE EVALUATION PROCESS
SHASTA COUNTY OFFICE OF EDUCATION CLASSIFIED EMPLOYEE PERFORMANCE EVALUATION PROCESS Implemented: 2003/04 School Year SHASTA COUNTY OFFICE OF EDUCATION CLASSIFIED EMPLOYEE PERFORMANCE EVALUATION PROCESS
ABSTRACT. would end the use of the hefty 1.5-kg ticket racks carried by KSRTC conductors. It would also end the
E-Ticketing 1 ABSTRACT Electronic Ticket Machine Kerala State Road Transport Corporation is introducing ticket machines on buses. The ticket machines would end the use of the hefty 1.5-kg ticket racks
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
Owner's Manual for Voice Control. The Convenient Alternative to Manual Control.
Owner's Manual for Voice Control. The Convenient Alternative to Manual Control. 2000 BMW AG Munich/Germany Reprinting, including excerpts, only with the written consent of BMW AG, Munich. Part number 01
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications
Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications Lerato Lerato, Maletšabisa Molapo and Lehlohonolo Khoase Dept. of Maths and Computer Science, National University of Lesotho Roma
Ontology-Based Discovery of Workflow Activity Patterns
Ontology-Based Discovery of Workflow Activity Patterns Diogo R. Ferreira 1, Susana Alves 1, Lucinéia H. Thom 2 1 IST Technical University of Lisbon, Portugal {diogo.ferreira,susana.alves}@ist.utl.pt 2
Online Failure Prediction in Cloud Datacenters
Online Failure Prediction in Cloud Datacenters Yukihiro Watanabe Yasuhide Matsumoto Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly
Oracle IVR Integrator
Oracle IVR Integrator Concepts and Procedures Release 11i for Windows NT July 2001 Part No. A86103-03 1 Understanding Oracle IVR Integrator This topic group provides overviews of the application and its
Adaptive Mixed-Initiative Dialogue Management
Adaptive Mixed-Initiative Dialogue Management Gert Veldhuijzen van Zanten IPO, Center for Research on User-System Interaction, P.O. Box 213, 5600 MB Eindhoven, the Netherlands [email protected] ABSTRACT
Prevention of Spam over IP Telephony (SPIT)
General Papers Prevention of Spam over IP Telephony (SPIT) Juergen QUITTEK, Saverio NICCOLINI, Sandra TARTARELLI, Roman SCHLEGEL Abstract Spam over IP Telephony (SPIT) is expected to become a serious problem
Relational Learning for Football-Related Predictions
Relational Learning for Football-Related Predictions Jan Van Haaren and Guy Van den Broeck [email protected], [email protected] Department of Computer Science Katholieke Universiteit
Case-Based Reasoning for General Electric Appliance Customer Support
Case-Based Reasoning for General Electric Appliance Customer Support William Cheetham General Electric Global Research, One Research Circle, Niskayuna, NY 12309 [email protected] (Deployed Application)
CONFIOUS * : Managing the Electronic Submission and Reviewing Process of Scientific Conferences
CONFIOUS * : Managing the Electronic Submission and Reviewing Process of Scientific Conferences Manos Papagelis 1, 2, Dimitris Plexousakis 1, 2 and Panagiotis N. Nikolaou 2 1 Institute of Computer Science,
How To Teach A Language To A Foreign Language
1 Teaching Oral Communication Using Videoconferencing Alexander E. Dalampan Introduction Videoconferencing has made it possible to link-up students from geographically dispersed locations so that they
