Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
|
|
|
- Nora Cross
- 9 years ago
- Views:
Transcription
1 PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT, Seoul, Korea School of Computer Science and Engineering, Seoul National University, Seoul, Korea [email protected] Abstract In this paper, we present a conference call service with speaker-independent name dialing, which allows any members of a group to communicate with all the other members by saying their names over the telephone. The service is implemented on AIN with a VoiceXML interpreter. Members names in a group are registered over the telephone or via the Web interface. If users register their names with phones without an operator s help, automatic speech recognition is used. We use speaker-independent sub-word models for speech recognition. The name registration via the Web interface or by an operator is done with text input. In the case of name registration with speech, the average recognition rate of the names for the members who register them is 96.4%, while the average rate for the other members is 9%. When the names are registered with text input, the average recognition rate is 99.8%.. Introduction Of all the applications of speech recognition, those involving recognition of speech transmitted over telephone links have received the most attention. Korea Telecom (KT) has deployed several services with speech recognition over the telephone network (Park, Koo, and Jhon 999; Koo, Park, and Kim 000; Kim, Koo, and Park 00). There have been researches on feature extraction, acoustic models, pronunciation dictionaries, and other areas to improve recognition accuracy, but it is not easy to obtain high recognition accuracy with a large vocabulary. The other way, in this paper, we propose a service using a very small vocabulary, which leads to high recognition accuracy with the current technology. Membering T M is a conference call service with speech recognition developed by KT. Conference call is a service feature that allows a call to be established among three or more stations in such a manner that each of the stations is able to communicate with all the other stations. Membering T M allows any member of a group to communicate with all the other members by saying their names over the telephone. For example, when a user wants to communicate with other members, he dials the phone number of his group and says his name if the number of the calling phone is not a registered one, whereupon the system is expected to recognize the name and connects the call to the other members. In case that the number of the calling phone is a registered one in the group, the system automatically connects the call to the other members without asking the caller s name. The service is implemented on advanced intelligent network (AIN) using a Voice extensible Markup Language (VoiceXML) interpreter. The AIN architecture was derived from a set of functional needs associated with the provision of voice-band telecommunications services (Berman and Brewster 99). AIN services are provided through interactions between switching systems and the systems supporting AIN service logic. The interaction with users may involve the provision of service-specific announcements to a user or the collection of digits input by a user. The participant interaction resources may reside in a switching system or may be provided by an intelligent peripheral (IP). An IP is a system which controls and manages resources such as text-tospeech synthesis, announcements, speech recognition, and digit collection. The IP in our system includes a VoiceXML interpreter that reads and processes VoiceXML pages as described by the VoiceXML language standard (Edgar 00). To accomplish dialing by speech, a user has to first register a set of names together with phone numbers, at which point the system creates a model for each name in the set. The registered models are used later for recognition. The models could be either speaker-dependent or speaker-independent. In general, speaker-dependent models are more accurate, but the memory required for storing the models becomes prohibitively large as the number of registered names increases. To overcome this problem, we use speaker-independent, sub-word models. This means that one need not store acoustic models for each name, resulting in significant savings in memory. Names can be registered with speech or with text. To register names with speech, a user dials the reserved phone number which is used only for registration, say all members names including his name, entering the corresponding phone number for each name. A group number is given after registration. In the registration with speech, the problem of automatically finding transcriptions of the unknown words as sequences of sub-word units arises because the spellings of the words are unknown and are not exploited. Some strategies have been developed in the literature for this problem (Haeb-Umbach, Beyerlein, and Thelen 995; Jain, Cole, and Barnard 996; Elvira and Torrecilla 998). In our system, users utterances are stored when a user registers names. The stored utterances are used when the system provides a feedback of the recognized name by playing them
2 If the SCP instructs the SSP to route a call to an IP, the IP performs a particular function, for example play announcements and collect digits. To support this function, the IP in our system includes a VoiceXML interpreter that has the role of interpreting the VoiceXML documents submitted from the Web, and then generates the interactive voice dialing services. VoiceXML documents are stored in the application server (AS). The AS also includes the VoiceXML documents for other services as well as Membering T M. PAGE 30.. VoiceXML Interpreter Figure : AIN system architecture of Membering T M. and prompting for a confirmation during recognition. Another problem is that a lowering of recognition rates can be caused by the wrong transcriptions. Temporal variations of speech also may decrease the rates because the utterances used for creating the models may not represent all the other utterances at the stage of recognition. So, the system also provides registration with text. The system translates text names into sub-word sequences and stores them in the dictionary. This paper is organized as follows. Section II describes system architecture, a VoiceXML interpreter and speech recognizer. In Section III, two main phases of the service, registration and recognition, are presented. Database and experimental results are presented in Section IV and V. Our conclusions are given in Section VI... System architecture. System overview AIN is a telecommunication network services control and management architecture. Fig. shows the architecture of the Membering T M service on AIN. The service switching point (SSP) is an enhanced switching system. This node has extra call-processing software to enable the switching system to recognize when callprocessing at the external service control point (SCP) is required (Sharp and Clegg 994). When the conditions for external processing are detected, a query/begin transaction message is launched to the service control point, where the application service logic will decide how to process the call. The SCP is a network database which is one of the intelligent network (IN) physical elements that also contain service control logic. Intelligent peripherals (IPs) handle specialized interactions between the user and the intelligent network. The IP has resources such as tones and announcements, speech recognition and voice synthesis. As a result, it has both signaling and voice circuits to the service switching point. The VoiceXML is an interactive markup language, which supports an environment to develop applications that search Web information via voice and phone. It is designed for creating audio dialogs that feature synthesized speech, recognition of spoken and dual-tone multifrequency (DTMF) key input, telephony, and mixed initiative conversations. Its major goal is to bring the full power of Web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management. The VoiceXML architecture has following key components: Implementation platform, VoiceXML interpreter context, VoiceXML interpreter, and document servers. The relationship between components is shown in Fig.. The implementation platform checks the inbound and outbound calls, and generates events in response to user actions and system status. These events are acted upon by the VoiceXML interpreter or the VoiceXML context. The VoiceXML interpreter context may monitor user inputs and manage the environment variables related to the sessions. VoiceXML interpreter parses the VoiceXML document responded from a document server, and controls the entire logic according to the dialog. The document server processes the requests from the VoiceXML interpreter. The server produces the VoiceXML documents in reply, which are processed by the interpreter. The VoiceXML interpreter can request any file format, such as documents, CGI scripts, and audio files. VoiceXML Interpereter Context Document Server Request VoiceXML Interpreter Implementation Platform Document Figure : Relations between components in VoiceXML.
3 .3. Speech recognizer The system uses the familiar phonetically-based, hidden Markov model (HMM) approach. Logically it is divided into two modules: the front-end and the decoder. The front-end computes dimensional linear predictive coding (LPC) cepstral coefficients and log energy from 6-bit PCM speech data sampled at 8kHz. Speech samples are partitioned into overlapping frames of 0 ms duration with a frame-shift of 0 ms. Each frame of speech is windowed with a Hamming window and represented by a dimensional cepstral vector. With the calculated cepstral coefficients and log energy, the front-end computes first and second differences of the 3-dimensional vectors, and concatenates these with the original elements excluding log energy to yield a 38-dimensional feature vector. To reduce the effects of channel, the front-end performs cepstral mean removal. The cepstral mean is calculated using all the frames of the utterance and is subtracted from each frame. The decoder computes the log likelihood of each feature vector according to observation densities associated with the states of the HMMs. The results are used in a synchronous Viterbi search over the active vocabulary. Words are represented as sequences of context-dependent phonemes, with each phoneme modeled as a three-state HMM. The HMMs of each state are created using tree-based state tying (Young, Kershaw, Odell, Ollason, Valtchev, and Woodland 000). A phonetic decision tree is a binary tree in which a yes/no phonetic question is attached to each node. The question at each node is chosen to maximize the likelihood of the training data given the final set of state tying. The question set of our system has 58 questions in total. Initially all states in a given item list are placed at the root of a tree. Depending on each answer, the pool of states is successively split and this continues until the states have trickled down to leaf-nodes. The final system consists of 4 mixture component context-dependent HMMs. 3. Registration and recognition Any user can get a group number consisting of four digits by registering members names and phone numbers, and can connect to other members by dialing service identification number 646 followed by the group number. The number of members is currently limited up to four. If the phone number of the calling phone is registered, the call is automatically connected to the other numbers. Otherwise the system requests user s name and determines with speech recognition whether the name is a registered one in the group. If the recognized name is a registered one, the system connects the call to other members. The group number expires if it is not used for a week. There are two methods for registration. One is the registration via the Web interface. A user just enters members names and phone numbers, and gets a group number. Input names are transcribed to phoneme sequences and later used in the grammar for speech recognition. In the other method, telephone is used. The phone number is reserved for the registration. The scenario is shown in Fig. 3. Play "You can register 3 or 4 members including you.". Play "Press your phone number and '#'.". Get DTMFs and store them.. Play "Press another mem-. Get DTMFs and store them.. Play "Press another mem-. Get DTMFs and store them.. Play "If you have one more member to register, press '', otherwise press ''.. Get a DTMF.. Play "Press the last mem-. Get DTMFs and store them. Play "Your group number is xxxx. Please call 646-xxxx to use the service. The group number will expire if it is not used for a week. Thank you." End Start. Play "This is the registration for Membering service. Press '' for fast registration, '' for normal registration, and '3' for transfering to an operator.". Get a DTMF. Play "You can register 3 or 4 members including you.". Play "Please say your name (eg N).". Get the name and store it. 3. Play "Press your phone number and '#'.. Play "Say another member's name (eg N).". Get the name and store it. 3. Play "Press N's phone number and '#'.. Play "Say another member's name (eg N3).". Get the name and store it. 3. Play "Press N3's phone number and '#'.. Play "If you have one more member to register, press '', otherwise press ''.. Get a DTMF.. Play "Say the last member's name (eg N4).". Get the name and store it. 3. Play "Press N4's phone number and '#'. Figure 3: Scenario of registration. 3 Transfer to an operator. When connected to an operator, users can register their names and phone numbers by providing necessary information to the operator. In the fast registration, members names are not registered. Accordingly, the phones whose numbers are not registered cannot be used for the service. In the normal registration, the system uses speech recognition to register members names. The registration with speech recognition is done by the recognition of the names spoken by the user. The recognizer produces a sequence of syllables for each name. This sequence is used later as a grammar for recognition. After registration, any member can use the service by calling the service number 646 followed by the group number. The scenario is shown in Fig. 4. If the calling phone is a registered one, the system automatically connects the call to the other members. Otherwise, the caller is requested to say his name for confirmation. The speak-independent phone models along with the grammars obtained during registration are used for recognition. The complete registration-recognition scheme is shown in Fig. 5. PAGE 303
4 Yes Play "This is Membering service. You are connected to other members. Please wait for a moment." Is the call connected to any other members? No Play "The lines are busy or other members can't answer the phone. Please call again later." Start Is the number of the calling phone a registered one? Yes No. Play "This is Membering service. Please say your name.". Get the name using speech recognizer (eg N). 3. Play "If your name is N, press '', otherwise press ''." Play "You are connected to other members. Please wait for a moment." Conversation End Figure 4: Scenario of the service after registration. Figure 5: Scheme of registration and recognition with speech recognition. 4. Database For the experiments, a set of 50 Korean names was selected among directory assistance database that includes 470,000 names. The selected 50 names includes PLUs (phonelike units) which cover over 90% of the occurrences. The 50 names were divided into five groups that contain thirty words respectively. 00 speakers recorded them. Ten male and ten female speakers were assigned to each group and each speaker recorded 30 words six times over the telephone. To reflect temporal variations of pronunciation, the recordings were done at two sessions with an interval of over one week. In each session, three utterances for each word were recorded. From the first session recordings, one utterance for each word was used for grammar. One set or two sets of the first session recordings was used for registration with speech and 00 pronunciation dictionaries were created. 5. Experimental results One group who recorded a set of 30 names consists of 0 people. All the subsets consisting of four names were created for each speaker in the group, registered, and tested by the member who registered the names and the other members of the group. The other four groups followed the same steps for the experiments. The first recognition tests were performed using the data of the first session recordings. The data used for the registration were excluded in the test. In case of registration with speech, the average recognition rates for the members who registered names were 96.8% while the rates for the other members were 9%. The data of the second session recordings showed similar results. The members who registered names showed the rates of 95.9% while the other members showed 9%. In case of registration with text, the recognition rates are higher than those of registration with speech. Because the grammars from the registered names are independent of who registered the names, the recognition rates were averaged for all the members. The rates are 99.7% and 99.8% for the first session recordings and the second session recordings. All the results are summarized in Table. In case of registration with speech, the system is made to ask the user to speak each word just once for users convenience because the registration is an additional burden to users. For the purpose of comparison, however, more experiments were made for the case of using two utterances for the registration with speech. If two utterances for each name are used for the registration, the grammars have two transcriptions for each name and the recognition rates may be improved when the transcriptions are different and reflects the speaker s variations of the utterances. Table shows the results of the experiments in which two utterances were used for registration of a name. The recognition rates of registration with text are the Table : Recognition rates of the registration with speech and with text. Registrant The other members Registration st 96.8% 9% with nd 95.9% 9% speech Average 96.3% 9% Registration st 99.7% with nd 99.8% text Average 99.8% PAGE 304
5 Table : Recognition rates of the registration with two utterances for each name. Registrant The other members Registration st 97.3% 94.7% with nd 97.9% 94.6% speech Average 97.8% 94.7% of International Conference on Speech Processing, Sharp, C. and K. Clegg (994). Advanced intelligent networks : now a reality. Electronics & Communication Engineering Journal, Young, S., D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland (000). The HTK Book, Version 3.0. Microsoft Corporation. PAGE 305 highest. Accordingly, to improve the rates, the system allows users to modify the dictionary via the Web interface even after the names have been registered with speech. 6. Conclusions In this paper, we proposed the Membering T M service which is a conference call service with speech recognition. The architecture of AIN and the components of the system were described. We also presented the experimental results of registrations with speech and with text. While the average recognition rates according to registration with speech are 96.3% and 9%, respectively for the registrant and the other members, the average rate of registration with text is 99.8%. High recognition accuracy could be obtained with a very small vocabulary. References Berman, R. K. and J. H. Brewster (99). Perspectives on the AIN Architecture. IEEE Communications Magazine, 7 3. Edgar, B. (00). The VoiceXML Handbook : Understanding and Building the Phone-Enabled Web. New York: CMP Books. Elvira, J. and J. Torrecilla (998). Name dialing using final user defined vocabularies in mobile (GSM & TACS) and fixed telephone networks. Proceedings of IEEE ICASSP, Haeb-Umbach, R., P. Beyerlein, and E. Thelen (995). Automatic transcription of unknown words in a speech recognition system. Proceedings of IEEE ICASSP, Jain, N., R. Cole, and E. Barnard (996). Creating speaker-specific phonetic templates with a speakerindependent phonetic recognizer: implications for voice dialing. Proceedings of IEEE ICASSP, Kim, H.-G., M.-W. Koo, and Y.-K. Park (00). TeleGateway: A Spoken Dialogue System Based on VoiceXML. KT Journal 6, Koo, M.-W., S.-J. Park, and H.-K. Kim (000). An Experimental Study on a Speech Recognition System for Barge-In and Non-Barge-In Utterances on the Telephone Network. Proceedings of the 7th Western Pacific Regional Acoustics Conference, Park, S.-J., M.-W. Koo, and C.-S. Jhon (999). An Implementation of Continuous Speech Recognition for a Stock Information Retrieval System. Proceedings
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,
Version 2.6. Virtual Receptionist Stepping Through the Basics
Version 2.6 Virtual Receptionist Stepping Through the Basics Contents What is a Virtual Receptionist?...3 About the Documentation...3 Ifbyphone on the Web...3 Setting Up a Virtual Receptionist...4 Logging
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications
Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications Lerato Lerato, Maletšabisa Molapo and Lehlohonolo Khoase Dept. of Maths and Computer Science, National University of Lesotho Roma
VoiceXML Tutorial. Part 1: VoiceXML Basics and Simple Forms
VoiceXML Tutorial Part 1: VoiceXML Basics and Simple Forms What is VoiceXML? XML Application W3C Standard Integration of Multiple Speech and Telephony Related Technologies Automated Speech Recognition
ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
An Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
Hardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML
Speech Recognition of a Voice-Access Automotive Telematics System using VoiceXML Ing-Yi Chen Tsung-Chi Huang [email protected] [email protected] Department of Computer Science and Information
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Speech-Enabled Interactive Voice Response Systems
Speech-Enabled Interactive Voice Response Systems Definition Serving as a bridge between people and computer databases, interactive voice response systems (IVRs) connect telephone users with the information
Develop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
9RLFH$FWLYDWHG,QIRUPDWLRQ(QWU\7HFKQLFDO$VSHFWV
Université de Technologie de Compiègne UTC +(8',$6
1. Public Switched Telephone Networks vs. Internet Protocol Networks
Internet Protocol (IP)/Intelligent Network (IN) Integration Tutorial Definition Internet telephony switches enable voice calls between the public switched telephone network (PSTN) and Internet protocol
Speech recognition technology for mobile phones
Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
Deploying Cisco Unified Contact Center Express 5.0 (UCCX)
Deploying Cisco Unified Contact Center Express 5.0 (UCCX) Course Overview: This course, Deploying Cisco Unified Contact Center Express (UCCX) v5.0, provides the student with handson experience and knowledge
Description: Objective: Upon completing this course, the learner will be able to meet these overall objectives:
Course: Deploying Cisco Unified Contact Center Express Software v9.0 Duration: 5 Day Hands-On Lab & Lecture Course Price: $ 3,695.00 Learning Credits: 37 Description: Deploying Cisco Unified Contact Center
Application Notes for Configuring Microsoft Office Communications Server 2007 R2 and Avaya IP Office PSTN Call Routing - Issue 1.0
Avaya Solution & Interoperability Test Lab Application Notes for Configuring Microsoft Office Communications Server 2007 R2 and Avaya IP Office PSTN Call Routing - Issue 1.0 Abstract These Application
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS. Except where reference is made to the work of others, the work described in this thesis is.
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. Weihong
VoiceXML-Based Dialogue Systems
VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based
Reduce Mobile Phone Expense with Avaya Unified Communications
Reduce Mobile Phone Expense with Avaya Unified Communications Table of Contents Section 1: Reduce Inbound Minutes... 2 Section 2: Reduce Outbound Minutes... 3 Section 3: Take Greater Advantage of Free
Intelligent Networks. John Anderson. Principles and ApplicaHons. The Institution of Electrical Engineers
Intelligent Networks Principles and ApplicaHons John Anderson The Institution of Electrical Engineers Contents Preface Acknowledgments Introduction to intelligent networks. The basics of intelligent networks..
Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller
Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller Zafar Ullah Senior Application Engineer Scenix Semiconductor Inc. Leo Petropoulos Application Manager Invox TEchnology 1.0
Ericsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
Philips 9600 DPM Setup Guide for Dragon
Dragon NaturallySpeaking Version 10 Philips 9600 DPM Setup Guide for Dragon Philips 9600 DPM Setup Guide (revision 1.1) for Dragon NaturallySpeaking Version 10 as released in North America The material
A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents
A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents JIEUN KIM, JIEUN PARK, JUNSUK PARK, DONGWON HAN Computer & Software Technology Lab, Electronics and Telecommunications
Dialog planning in VoiceXML
Dialog planning in VoiceXML Csapó Tamás Gábor 4 January 2011 2. VoiceXML Programming Guide VoiceXML is an XML format programming language, describing the interactions between human
Application Note. Using Dialogic Boards to Enhance Interactive Voice Response Applications
Using Dialogic Boards to Enhance Interactive Voice Response Applications Using Dialogic Boards to Enhance Interactive Voice Response Applications Executive Summary Interactive Voice Response (IVR) systems
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
Hosted Fax Mail. Hosted Fax Mail. User Guide
Hosted Fax Mail Hosted Fax Mail User Guide Contents 1 About this Guide... 2 2 Hosted Fax Mail... 3 3 Getting Started... 4 3.1 Logging On to the Web Portal... 4 4 Web Portal Mailbox... 6 4.1 Checking Messages
Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31
Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the
How To Develop A Voice Portal For A Business
VoiceMan Universal Voice Dialog Platform VoiceMan The Voice Portal with many purposes www.sikom.de Seite 2 Voice Computers manage to do ever more Modern voice portals can... extract key words from long
Phone Routing Stepping Through the Basics
Ng is Phone Routing Stepping Through the Basics Version 2.6 Contents What is Phone Routing?...3 Logging in to your Ifbyphone Account...3 Configuring Different Phone Routing Functions...4 How do I purchase
VoIP Conferencing. The latest in IP technologies deliver the next level of service innovation for better meetings. Global Collaboration Services
Global Collaboration Services VoIP Conferencing The latest in IP technologies deliver the next level of service innovation for better meetings. ENERGIZE YOUR CONNECTIONS Table of Contents > > Contents...
Information Leakage in Encrypted Network Traffic
Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium [email protected]
Testing IVR Systems White Paper
Testing IVR Systems Document: Nexus8610 IVR 05-2005 Issue date: Author: Issued by: 26MAY2005 Franz Neeser Senior Product Manager Nexus Telecom AG, Switzerland We work to improve your network Abstract Interactive
BT Inbound Contact global (formerly CCS International) Service Annex to the General Service Schedule
1 Definitions Page 1 of 6 The following definitions apply, in addition to those in the General Terms and Conditions and the General Services Schedule of the Agreement. Caller means the person calling the
Online Recruitment - An Intelligent Approach
Online Recruitment - An Intelligent Approach Samah Rifai and Ramzi A. Haraty Department of Computer Science and Mathematics Lebanese American University Beirut, Lebanon Email: {samah.rifai, [email protected]}
A Smart Telephone Answering Machine with Voice Message Forwarding Capability
A Smart Telephone Answering Machine with Voice Message Forwarding Capability Chih-Hung Huang 1 Cheng Wen 2 Kuang-Chiung Chang 3 1 Department of Information Management, Lunghwa University of Science and
Design Grammars for High-performance Speech Recognition
Design Grammars for High-performance Speech Recognition Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks
Dialogos Voice Platform
Dialogos Voice Platform Product Datasheet D i a l o g o s S p e e c h C o m m u n i c a t i o n S y s t e m s S. A. September 2007 Contents 1 Dialogos Voice Platform... 3 1.1 DVP features... 3 1.1.1 Standards-based
1 Introduction. 2 Assumptions. Implementing roaming for OpenBTS
Implementing roaming for OpenBTS 1 Introduction One of the main advantages of OpenBTS TM system architecture is absence of a legacy GSM core network. SIP is used for registering, call control and messaging.
IVR Primer Introduction
IVR Primer Introduction Speech-enabled applications are quickly becoming very popular. Why? Because using voice to navigate is more natural for users than punching telephone keypads. Speech as a navigation
VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
Deploying Cisco Unified Contact Center Express - Digital
Course Code: CUCCX Vendor: Cisco Course Overview Duration: 5 RRP: 2,396 Deploying Cisco Unified Contact Center Express - Digital Overview This course provides you with hands-on experience and knowledge
1. Login to www.ifbyphone.com with your User ID and password. Select Virtual Receptionist from the Basic Services tab.
Virtual Receptionist Virtual Receptionist is a hosted PBX auto attendant service with intelligent routing that automatically greets and routes phone calls based on your office schedule. It gives your company
Personal Call Manager User Guide. BCM Business Communications Manager
Personal Call Manager User Guide BCM Business Communications Manager Document Status: Standard Document Version: 04.01 Document Number: NN40010-104 Date: August 2008 Copyright Nortel Networks 2005 2008
Myanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for
Avaya Aura Orchestration Designer
Avaya Aura Orchestration Designer Avaya Aura Orchestration Designer is a unified service creation environment for faster, lower cost design and deployment of voice and multimedia applications and agent
Communication Networks. MAP-TELE 2011/12 José Ruela
Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Network Architectures Protocol Layering Network architecture concept A network architecture is an abstract model used to describe
CONTROL MICROSYSTEMS DNP3. User and Reference Manual
DNP3 User and Reference Manual CONTROL MICROSYSTEMS SCADA products... for the distance 48 Steacie Drive Telephone: 613-591-1943 Kanata, Ontario Facsimile: 613-591-1022 K2K 2A9 Technical Support: 888-226-6876
Voice Call Addon for Ozeki NG SMS Gateway
Voice Call Addon for Ozeki NG SMS Gateway Document version v.1.0.0.0 Copyright 2000-2011 Ozeki Informatics Ltd. All rights reserved 1 Table of Contents Voice Call Addon for Ozeki NG SMS Gateway Introduction
Combining VoiceXML with CCXML
Combining VoiceXML with CCXML A Comparative Study Daniel Amyot and Renato Simoes School of Information Technology and Engineering University of Ottawa Ottawa, Canada [email protected], [email protected]
Standard Languages for Developing Multimodal Applications
Standard Languages for Developing Multimodal Applications James A. Larson Intel Corporation 16055 SW Walker Rd, #402, Beaverton, OR 97006 USA [email protected] Abstract The World Wide Web Consortium
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 03 XML based Standards and Formats for Applications
Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 03 XML based Standards and Formats for Applications Oxford University Press 2007. All rights reserved. 1 XML An extensible language The
Moving Enterprise Applications into VoiceXML. May 2002
Moving Enterprise Applications into VoiceXML May 2002 ViaFone Overview ViaFone connects mobile employees to to enterprise systems to to improve overall business performance. Enterprise Application Focus;
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist ([email protected]) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
Unified Messaging and Fax
April 25, 2007 Telecom White Paper Presented By: Toshiba Telecommunications Systems Division www.telecom.toshiba.com Unified Messaging and Fax Toshiba s Stratagy Enterprise Server Overview: Unified Messaging
Voice XML: Bringing Agility to Customer Self-Service with Speech About Eric Tamblyn Voice XML: Bringing Agility to Customer Self-Service with Speech
Voice XML: Bringing Agility to Customer Self-Service with Speech Author: Eric Tamblyn, Director of Voice Platform Solutions Engineering, Genesys Telecommunications Laboratories, Inc. About Eric Tamblyn
D.N.A. 5.6 MANAGEMENT APPLICATIONS
D.N.A. 5.6 MANAGEMENT APPLICATIONS The D.N.A. suite of is composed of management specific and end user. The management allow administrators to maintain, monitor, and adjust configurations and data to maximize
VECSYS LIMSI ARCHITECTURE
VECSYS LIMSI ARCHITECTURE Samir Bennacef Vecsys Centralized Architecture Acoustic models Language models Caseframe Grammar Task Model Database Speech Recognizer text Semantic Analyzer Semantic frame Dialog
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper Reliable Customer Service and Automation is the key for Success in Hosted Interactive Voice Response Speech Enabled
How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
An Introduction to astercc Call Center ver 1.0 2012/06/09 [email protected]
An Introduction to astercc Call Center ver 1.0 2012/06/09 [email protected] Contents 1 At a Glance 2 Major Functions 3 Successful Cases 4 About astercc A quick glance at astercc astercc is a call center
Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts
Envox CDP 7.0 Performance Comparison of and Envox Scripts Goal and Conclusion The focus of the testing was to compare the performance of and ENS applications. It was found that and ENS applications have
Feature and Technical
BlackBerry Mobile Voice System for SIP Gateways and the Avaya Aura Session Manager Version: 5.3 Feature and Technical Overview Published: 2013-06-19 SWD-20130619135120555 Contents 1 Overview...4 2 Features...5
Contact Center Help: Campaign Configuration
Contact Center Help: Campaign Configuration Topic: LiveOps Admin Portal > Routing > Campaigns Help: Page Help: Campaigns Site: https://tenant.hostedcc.com/mason/admin/doc/pagehelp/campaigns/chapter0.html
Telephony Systems Trainer CODITEL
Technical Teaching Equipment Telephony Systems Trainer CODITEL Always included in the supply: Teaching Technique used EDIBON Computer Control System 2 Computer Control Software Computer (not included in
TECHNICAL NOTE. GoFree WIFI-1 web interface settings. Revision Comment Author Date 0.0a First release James Zhang 10/09/2012
TECHNICAL NOTE GoFree WIFI-1 web interface settings Revision Comment Author Date 0.0a First release James Zhang 10/09/2012 1/14 Web interface settings under admin mode Figure 1: web interface admin log
UCCXA: Cisco Unified Contact Center Express Advanced v4
Course: UCCXA: Cisco Unified Contact Center Express Advanced v4 Description: Students build on the knowledge and scripting experience gained in the prerequisite UCCXD course by exploring more advanced
Testing IVR systems. White Paper. Document Number: Nexus8610 IVR 05-2005. Issue Date: May 2005. Number of pages: 14. File Name: IVR Testing Ed 2.0.
White Paper Document Number: Nexus8610 IVR 05-2005 Issue Date: May 2005 Number of pages: 14 File Name: IVR Testing Ed 2.0.doc Franz Neeser Senior Product Manager Nexus Telecom AG We Work to Improve your
Need for Signaling and Call Control
Need for Signaling and Call Control VoIP Signaling In a traditional voice network, call establishment, progress, and termination are managed by interpreting and propagating signals. Transporting voice
Network Sensing Network Monitoring and Diagnosis Technologies
Network Sensing Network Monitoring and Diagnosis Technologies Masanobu Morinaga Yuji Nomura Takeshi Otani Motoyuki Kawaba (Manuscript received April 7, 2009) Problems that occur in the information and
Call Center Metrics: Glossary of Terms
Call Center Metrics: Glossary of Terms A. abandoned call. A call or other type of contact that has been offered into a communications network or telephone system but is terminated by the person originating
ATA: An Analogue Telephone Adapter is used to connect a standard telephone to a high-speed modem to facilitate VoIP and/or calls over the Internet.
KEY VOIP TERMS 1 ACD: Automatic Call Distribution is a system used to determine how incoming calls are routed. When the ACD system receives an incoming call it follows user-defined specifications as to
Cisco Unified Contact Center Express Reporting
Cisco Unified Contact Center Express Reporting UNIFIED CONTACT CENTER EXPRESS REPORTING V4.0 CISCO UNIFIED CONTACT CENTER EXPRESS SOFTWARE V9.0 Join Cisco Press Author and one of the world s most respected
Siemens HiPath ProCenter Multimedia
Siemens HiPath ProCenter Multimedia Today s business climate is tougher than ever, and chances are your competitors are no longer just a local concern. All this means finding ways of improving customer
NEW. ProduCt information
NEW ProduCt information Your customers expect you to provide optimal service with maximum availability. We can help. ELSBETH Communication Manager (ECM) is the innovative solution for centralized management
COPYRIGHT 2011 COPYRIGHT 2012 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED
Subtitle insertion GEP100 - HEP100 Inserting 3Gb/s, HD, subtitles SD embedded and Teletext domain with the Dolby HSI20 E to module PCM decoder with audio shuffler A A application product note COPYRIGHT
