Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML



Similar documents
Thin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals

Interfaces de voz avanzadas con VoiceXML

IVR Primer Introduction

An Introduction to VoiceXML

VoiceXML Tutorial. Part 1: VoiceXML Basics and Simple Forms

Cisco IOS VoiceXML Browser

Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications

Version 2.6. Virtual Receptionist Stepping Through the Basics

IVR CRM Integration. Migrating the Call Center from Cost Center to Profit. Definitions. Rod Arends Cheryl Yaeger BenchMark Consulting International

Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper

A Development Tool for VoiceXML-Based Interactive Voice Response Systems

VXI* IVR / IVVR. VON.x 2008 OpenSER Summit. Ivan Sixto CEO / Business Dev. Manager. San Jose CA-US, March 17th, 2008

Using Dialogic Boards to Enhance Voice Mail/Messaging Applications. Application Note

Contents. Specialty Answering Service. All rights reserved.

A Comparative Analysis of Speech Recognition Platforms

Voice Processing Standards. Mukesh Sundaram Vice President, Engineering Genesys (an Alcatel company)

Speech-Enabled Interactive Voice Response Systems

VOICE INFORMATION RETRIEVAL FOR DOCUMENTS. Except where reference is made to the work of others, the work described in this thesis is.

9RLFH$FWLYDWHG,QIRUPDWLRQ(QWU\7HFKQLFDO$VSHFWV

Lesson 2: Technology Applications Major Transit Technologies Grouped by Function

Phone Routing Stepping Through the Basics

Cisco IOS Voice XML Browser

Voice XML: Bringing Agility to Customer Self-Service with Speech About Eric Tamblyn Voice XML: Bringing Agility to Customer Self-Service with Speech

Cisco IOS Voice XML Browser

Real-World Experience Adding Speech to IVR Solutions with MRCP

Application Note. Using Dialogic Boards to Enhance Interactive Voice Response Applications

IP Media Servers for Next-Generation Contact Centers

1. Login to with your User ID and password. Select Virtual Receptionist from the Basic Services tab.

RAPID VOICEXML DEVELOPMENT USING IBM S GRAPHICAL CALL FLOW BUILDER

VoiceXML-Based Dialogue Systems

PBS&J Drives Caller Satisfaction with Voxeo-Powered 511 Phone Applications

V O C A L C O M H E R M E S I V R. H e r m e s I V R. I n t e r a c t i v e V o i c e R e s p o n s e

PRODUCT GUIDE Version 1.2 HELPDESK EXPRESS 1.0

A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents

Building Applications with Vision Media Servers

Dialogos Voice Platform

Development of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations

Need for Signaling and Call Control

CCXML & the Power of Standards-Based Call Control E X E C U T I V E B R I E F I N G M A R C H

Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 03 XML based Standards and Formats for Applications

Automated Profile Vehicle Using GSM Modem, GPS and Media Processor DM642

Deploying Cisco Unified Contact Center Express - Digital

Owner's Manual for Voice Control. The Convenient Alternative to Manual Control.

Signatures. Advanced User s Guide. Version 2.0

Envox Call Information Manager

Development of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards

An OSGi based HMI for networked vehicles. Telefónica I+D Miguel García Longarón

Buying Guide: On-Demand vs. On-Premise IVR Systems

The preliminary design of a wearable computer for supporting Construction Progress Monitoring

INTERNET FOR VANET NETWORK COMMUNICATIONS -FLEETNET-

MITEL Communications Platform

The Cross-Media Contact Center

Deploying Cisco Unified Contact Center Express Volume 1

Getting Started with Cisco Unified IP IVR, Release 8.5(1)

Prevention of Spam over IP Telephony (SPIT)

! <?xml version="1.0">! <vxml version="2.0">!! <form>!!! <block>!!! <prompt>hello World!</prompt>!!! </block>!! </form>! </vxml>

Migrating Legacy IVR Applications to VoiceXML with Voxeo The advantages of a 100% VoiceXML compliant platform

White paper. SIP An introduction

Avaya Aura Orchestration Designer

Internet Telephony Terminology

Shared VRU. A Key Link in Your Customer Service Chain Kyle Shadday, Director, Voice Response Strategy

Developing Usable VoiceXML Applications

Oracle IVR Integrator

VoiceXML and VoIP. Architectural Elements of Next-Generation Telephone Services. RJ Auburn

A REVIEW ON DESIGN AND IMPLEMENTATION OF IVR SYSTEM USING ASTERISK

Interavtive Voice Response System

CHAPTER 4 Enhanced Automated Attendant

Cisco Healthcare Intelligent Contact Center

CRM in the Contact Center and the Emergence of the Unified Agent Desktop

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Cisco Unified Contact Center Express 6.0

Cisco Survivable Remote Site Telephony Version 7.0

Contact Center Applications Market in India CY 2014

Avaya Media Processing Server 500

Management Summary for Unified Communications IP PBX

Testing IVR Systems White Paper

simplifying communication through CARES solution

VoIP Conferencing. The latest in IP technologies deliver the next level of service innovation for better meetings. Global Collaboration Services

VoiceXML Data Logging Overview

How To Use Voicexml On A Computer Or Phone (Windows)

Enhanced VoIP Based Virtual PC Troubleshooting

Australian Standard. Interactive voice response systems user interface Speech recognition AS AS

Description: Objective: Upon completing this course, the learner will be able to meet these overall objectives:

NEW. ProduCt information

Transcription:

Speech Recognition of a Voice-Access Automotive Telematics System using VoiceXML Ing-Yi Chen Tsung-Chi Huang ichen@csie.ntut.edu.tw rick@ilab.csie.ntut.edu.tw Department of Computer Science and Information Engineering, National Taipei University of Technology, Taiwan, ROC Abstract In order to provide a safe way for drivers to retrieve information, a voice-access Telematics system is implemented based on VoiceXML and web architecture. The noise problems during moving affect recognition rate greatly, and make drivers repeat commands once and once again. For the sake of raising the accuracy of recognition, this paper makes several improvements to the major components of Automated Speech Recognition engine. After applying these enhancements, the average recognition rate exceeds 70% even in the high speed condition. This result makes the Telematics system more practical. Keywords : Telematics system, VoiceXML, Speech Recognition 1. INTRODUCTION The main concept of Telematics is the combination of telecommunication and information in the vehicle. Telematics is viewed as the third revolution after high compression engine and micro electronic system in the automobile industry. In the past, the first generation of Telematics system was composed of a GPS (Global Positioning System) and a CD-ROM module for data storage, and the main function of this system was just for car navigation. The second generation of Telematics integrated the telecommunication module to connect the call center of the service provider. In the recent years, with the integration of telecommunication and internet technology, Telematics systems can retrieve more data from other content providers and provide more useful and real-time information for drivers. With the convenience of Telematics systems, drivers take more time and attention on operating these systems while they are driving. These distractions usually cause many car accidents. Due to this reason, it is necessary to operate Telematics systems with more safe and simple way. Voice-access interface is a proper solution for this problem. Drivers send commands by speaking to Telematics systems, and receive responses in the form of voice. In this hand-free environment, the distractions can be reduced greatly. With the improvement of voice technologies, it is easier to apply voice-access interface to the Telematics systems. 465

2. ANALYSIS Traditionally, the IVR (Interactive Voice Response) systems must be deployed and implemented on the specialized PBX (Private Branch Exchange) hardware. Programmers have to develop applications in the particular environment, because each vendor provides a set of proprietary programming interface and library. Besides, the voice responses during the conversation usually need to be pre-recorded, and users are restricted to use the telephone keypad to input their commands (DTMF mode, Dual-Tone Multi-Frequency, i.e., touch-tone). Therefore, the service providers usually take many efforts on developing the IVR systems, and actually, these systems are just one-way (DTMF input, voice output) voice systems. VoiceXML In order to provide an open and standard platform for voice system, several CTI (Computer Telephony Integration) companies (IBM, AT&T, Lucent and Motorola) submitted the VoiceXML specification to the W3C organization in 2000. The key technologies of VoiceXML are TTS (Text to speech) and ASR (Automated Speech Recognition). TTS provides a great support in transforming a large number of text data into voices. ASR is an important component to recognize what users say. That means the VoiceXML-based systems are truly two-way (voice input, voice output) voice systems. Due to these benefits of VoiceXML, it is easier to implement a voice-access system based on web architecture. By replacing the markup language with VoiceXML and integrating with a voice server, a basic voice system is set up. The Telematics system is composed of this voice system and the GPS/GSM module. Figure1. VoiceXML based Telematics system. The major difference between traditional and VoiceXML-based IVR system is the speech recognition input mode, and it is also the most important key of the whole system. The accuracy of recognition affects the practicability of voice system greatly, especially in the environment of Telematics system. In order to decrease distractions during driving, the recognition rate must be raised as high as possible. Noise Issues In the environment of Telematics system, the major obstacle to recognition is the noise during car moving. These noises usually cause the incorrect results in recognition. The main reasons are as follows. 466

i. In general, voice system usually prompts some information before users select their options, but it is redundant when users are familiar with this system. Therefore, most voice systems provide the function called Barge-in. This function provides users to interrupt the prompt information and enter the next voice layer directly. In the Telematics environment, the microphone that installed in the car is a sensitive one in order to receive the voices from driver precisely. That means each louder noise such as conversation of passengers or sound of horn is possibly regarded as a voice input command. ii. The second issue is the noise in low frequency during car is moving. These continued noises do not cause the Barge-in issue above, but affect the accuracy of recognition. The mixed waves of voice signal decrease the true signals that drivers input. If the signals of noise exceed the limit, ASR engine will receive incomplete waves and fail to recognize the correct input commands. This kind of noise is relative to the friction about the stability of streets and moving speed. When the car drives on a smoother street with a lower speed, the noise of low frequency is smaller; on the contrary, especially in a high speed condition, the Telematics system usually fails to recognize the input commands due to the low frequency noises. Figure2 shows the recognition rates in different test environments. Vehicle (80kph) 0.32 Vehicle (60kph) 0.58 Vehicle (0kph) General Phone Line PC-Headset 0.87 0.91 0.93 0 0.2 0.4 0.6 0.8 1 Recognition Rate Figure2. Recognition rates in different test environments. Each test environment uses 100 sample single words to test the recognition rate. Obviously, in the PC-Headset and stop environments, the recognition rates are proper to fit the requirement. With the increasing speed, the recognition rates are lower and lower. In the high speed condition (100kph), the system is almost unable to recognize any word due to the noises. This result shows the seriousness of noise problems. 3. SOLUTION Figure3 shows the basic steps of recognition process. 467

Recognition Processes Figure3. Recognition Processes Step1: User Input Microphone catches user s voices in the form of analog signals. Step2: Digitization Sound card digitizes the analog signals. Step3: Phonetic Breakdown Breaking signals into phonemes. Step4: Matching According to the grammar, phonetic representation and vocabulary library, the system returns the proper word. The whole process of recognition can be separated into hardware and software. The telephone line from the telephone company to the PBX server was a traditional analog line, and it caused many unnecessary signals during signal transmission. Hence, the system provider decides to upgrade the whole telephone system form analog to digital. High (100kph) Middle (80kph) Low (60kph) Park (0kph) 0.52 0.67 0.8 0.91 0 0.2 0.4 0.6 0.8 1 Recognition Rate Figure4. Recognition rates in digital environment. The test results of digital environment are much better than before. The hardware upgrade gains an effective improvement in the process of transforming signals from analog to digital (From Step1 to Step3), but the recognition rates of higher speed are still unable to fit the requirement. In order to reduce the effect of these noise problems and increase the accuracy of recognition, the Telematics system is disabled the function of Barge-in to avoid all unexpected interruptions, and make several enhancements as follows. Improvements in Recognition The most important fundamentals of recognition process are vocabulary library and grammar of applications. When ASR engine receives the digital signals, it recognizes each phoneme depending on the vocabulary library, and returns a matching list to the VoiceXML application. 468

VoiceXML application receives the matching list and compares with grammar list. If there is a matching word in the grammar list, application will process this command. Otherwise, application will return a nomatch error to user. Therefore, improving these two elements is the most efficient way to raise the recognition rate. Figure5. Command matching flow i. Basically, ASR engine contains a regular vocabulary library. It is sufficient to recognize basic words. In the Telematics system, applications usually contain many peculiar options such as name of street or restaurant. It is difficult to recognize these words especially in the noise environment. The first enhancement is to rebuild the vocabulary library of recognition engine with the specified words which are usually used in this Telematics system. This helps ASR engine to recognize a particular word more easily. The line in figure5 shows this result. The recognition rate retains upon 60% after rebuilding the ASR engine. Recognition Rate(%) 100 90 80 70 60 50 40 30 20 10 0 Park 0 Low 60 Middle 80 High 100 Car Speed(km/h) Analog Digital Digital + new ASR engine Figure6. ii. In the later stage of implementation, the W3C announce the VoiceXML 2.0 specification. In this specification, the major improvement is to strengthen the capability of recognizing grammar. This helps application to specify a separate word more effective. By upgrading the VoiceXML from 1.0 to 2.0, the recognition rate is higher than before. Figure7 shows the final results after these improvements. Recognition rates exceed 70% in each testing speed, and it is reach the requirement. 469

High (100kph) Middle (80kph) Low (60kph) Park (0kph) 0.74 0.83 0.92 0.97 0 0.2 0.4 0.6 0.8 1 Recognition Rate Figure7. Final Recognition Rate 4. CONCLUSION In this paper, the Telematics system is implemented with voice-access capability by using VoiceXML. In order to provide a safe and convenient voice interface, the most important thing of whole system is to raise the recognition rate as high as possible. After applying several improvements, the final average recognition rates exceed 70% in the real driving environment. This result is proper enough to apply in a Telematics system. 5. ACKNOWLEDGEMENT The authors would like to thank the National Science Council (NSC 91-2213-E-033-028) for supports to this project. 6. REFERENCES [1] C. Bisdikian, I. Boamah, V. Rasin, Intelligent Pervasive Middleware for Context-Based and Localized Telematics Services, International Conference on Mobile Computing and Networking, 2002. [2] D. Reilly, A. Taleb-Bendiab, A Service-Based Architecture for In-Vehicle Telematics Systems, IEEE 22nd International Conference on Distributed Computing Systems Workshops, 2002. [3] Y. Obuchi, E. Nyberg, T. Mitamura, M. Duggan, Robust Dialog Management Architecture using VoiceXML for Car Telematics Systems, Proc. IEEE Workshop on DSP in Mobile and Vehicular Systems, April, 2003. [4] E.Nyberg, T.Mitamura, P.Placeway, M.Duggam and N.Hataoka, DialogXML : Extending Voice-XML for Dynamic Dialog Management, Proc. of HLT-2002, 2002. [5] N. Hataoka, Y. Obuchi, T. Mitamura and E. Nyberg, Robust speech dialog interface for car telematics service, IEEE Consumer Communications and Networking Conference 2004, Page 331~335, Jan. 2004. [6] Carl M. Rebman, Jr., Milam W. Aiken and Casey G. Cegielski, Speech recognition in the human computer interface, Information & Management, Volume 40, Issue 6, July 2003, Pages 509-519. [7] J. Gröschel, F. Philipp, St. Skonetzki, H. Genzwürker, Th. Wetter and K. Ellinger, Automated speech recognition for time recording in out-of-hospital emergency medicine, Resuscitation, Volume 60, Issue 2, February 2004, Pages 205-212. 470