Real-World Experience Adding Speech to IVR Solutions with MRCP A webinar by NMS, ScanSoft and CapitalOne
Agenda Introduction to speech technology Dr. Rob Kassel, Senior Product Manager, ScanSoft, Inc. MRCP and Natural Access Jack Chase, Director, Product Marketing, NMS MRCP integration on the TelBert IVR Platform using NMS and ScanSoft Eric Cunningham, Enterprise Architect, Capital One Slide 2
Introduction to Speech Technology Rob Kassel Senior Product Manager ScanSoft Slide 3
The Need For Speech Recognition Automation less costly than live agents Increases call handling capacity / reduces hold times DTMF often is pressed into service Numeric entry is easy unless you are reading Spelling entry is more difficult Menus need to be enumerated, can t be too long Deep menu structure becomes tiresome Assignment inconsistent between vendors (e.g., voicemail) How do you enter 5 ½% or Albuquerque? With speech, questions are answered naturally Caller satisfaction is higher Fewer zero-outs leads to additional cost savings Slide 4
Speech Recognition Process Speech Speech Detector Feature Extraction Grammar Grammar Compiler Phoneme Classifier Search Acoustic Models System Dictionary Pronunciation Rules Slide 5 Confidence Scoring Results
Speech Recognition Challenges Speech can be difficult to decode, even for humans Fixed, confusable vocabularies: B-C-D-E-G-P-T-V-Z Ambiguous boundaries: It s hard to wreck a nice beach! Speaker variability: dialect, volume, gender, etc. Noise rejection: hands-free, mobile, telematics Out-of-vocabulary rejection & confidence measures Processor and memory demands Slide 6
Speech Recognition: State of the Art Callers speak naturally in directed dialogs Million-word vocabularies: stocks, names, addresses Open-ended responses, coupled with language understanding: How may I help you? High accuracy, infrequent confirmation Transaction completion rate over 90% is typical Automatically adapt to caller population and channel characteristics Slide 7
The Need For Text-To-Speech Professional recordings costly and time-consuming Large output vocabularies common (e.g. city names) Word concatenation is difficult to do well Often used for numeric output Can sound mechanical; irritating when frequent Some applications defy recordings (e.g. messaging) Slide 8
Text-To-Speech Process Text System Dictionary Pronunciation Rules Text Normalization Pronunciation Generation Prosody Generation Voice Database Unit Selection Concatenate and Smooth Speech Slide 9
Text-To-Speech Challenges Text Normalization Numerics: 12535 (number / zip code), 2x4 Abbreviations: OR (or / Oregon), Dr. Jones on Elm Dr. Acronyms: IBM is listed on NASDAQ Evolving usage: CUL8R Pronunciation Generation Homographs: minute (60 seconds / tiny) Vowel reduction: he came to town vs. he came to Prosody Generation Phrasing: he is physically and mentally exhausted Emphasis: Are you flying tomorrow? Emotion: upbeat vs. serious, calming vs. urgency Slide 10
Text-to-Speech: State of the Art Natural sounding output, no more drunken Swede Seamlessly mix dynamic data with recorded prompts Accurate pronunciation, including proper names A variety of voices to choose from Custom voices to maintain brand identity Listen here http://www.scansoft.com/speechworks/realspeak/teleco/ Slide 11
MRCP and Natural Access Jack Chase Director, Product Marketing NMS Slide 12
What Is MRCP v1? Control: MRCP/ RTSP/ TCP/ IP Speech: G.711/ RTP/ UDP/ IP MRCP Server PSTN IVR IVR Servers Servers IP Speech Speech Servers Servers Speech servers are connected by VoIP to IVR serves Standard API for ASR and TTS Easy to reconfigure system as needs change Easy to implement redundancy Slide 13
Natural Access and MRCP Call Control PSTN Trunking Conferencing Universal Speech Access (MRCP) IVR Services Fusion (VoIP) Fax Services Video Access OAM Service Managers, Libraries SNMP Driver Driver Driver IPC PCI PCI PCI IP HMP CX Boards AG Boards CG Boards PacketMedia HMP Slide 14
Universal Speech Access Makes Speech Integration Easy Slide 15
Current Support for Universal Speech Access Vendor Type Universal Speech Access 1.0 Universal Speech Access 1.1 ScanSoft ASR OMS 2.0.1 OSR 2.0 SWMS 3.1 OSR 3.0 ScanSoft TTS OMS 2.0.1 Speechify 2.0 SWMS 3.1 RealSpeak 4.0 Loquendo ASR N/A Loquendo ASR LSS 6.0 Nuance ASR MRCP Server SP5 Nuance 8.5 MRCP Server SP7 Nuance 8.5 Nuance TTS Vocalizer 3.0 Vocalizer 3.0.8 Telisma ASR Philsoft 3.2 telispeech 1.0 SP4 Slide 16
What s Next for MRCP? MRCP v2 draft-ietf-speechsc-mrcpv2-06, Feb 20, 2005 Adds SIP/ SDP for session setup Replaces RTSP Adds support for speaker verification Little deployment yet NMS will update Universal Speech Access when deployments occur Slide 17
MRCP Integration on the TelBert IVR Platform using NMS and ScanSoft Eric Cunningham Enterprise Architect Capital One Slide 18
Agenda Why use MRCP Main business drivers for voice enablement Overview of architecture Lessons learned Slide 19
Why Use MRCP Capital One has built its own IVR system (TelBert) Internally built and maintained Linux based C/C++ system 5000+ ports in production Handles nearly 100% of all in-bound credit card calls Business wants to have speech enabled applications Leading speech vendors are embracing MRCP for integration Centralizes automated speech recognition (ASR) and text-to-speech (TTS) resources in the network Standards based protocol, allowing multi-vendor interoperability continued Slide 20
Why Use MRCP (cont'd) Benefits to Capital One MRCP allows integration with leading vendors and avoids vendor lock-in NMS APIs simplify the learning of MRCP and RTP protocols and integration; accelerated the adoption of MRCP into TelBert Migration from AG 4000 to CG 6000 clean evolution CG 6000 provides on-board Ethernet and T1 terminations; eliminates host based processing of RTP data Current AG 4000 code compatible with CG 6000; quick upgrade to existing platform Slide 21
Overview of TelBert Architecture Where applications run. The control what grammars are used, processing of results, and user prompting Where NMS libraries are integrated. Single, statemachine model handling 184 ISDN callers, Voice processing commands, and the new ASR/TTS commands via Universal Speech Access. ScanSoft has their MRCP server (SWMS) co-located on the same machine as the OSR and RealSpeak servers. Note: This means that load balancing and failover is done by TelBert, not the MRCP serer Private network (100MB switch) to encapsulate the RTP traffic. Slide 22
Main Business Drivers for Voice Enablement Improve customer experience Provide both touch-tone and speech-enabled handling Switch between modes Provide additional automated customer servicing Automating time-consuming call center activities Frees call center representatives for more complex tasks Basically, all of the standard reasons a business wants to start using voice recognition technologies Slide 23
Lessons Learned NMS Universal Speech Access and Fusion APIs front-end the complexity of RTSP, MRCP, and RTP protocols You still need to read the specifications to troubleshoot problems You need to understand the specifications in order to talk to vendors you are integrating with (ScanSoft) continued Slide 24
Lessons Learned (cont'd) Example: NMS code if( (nrtn = saicreatesynthesizer(m_cta_context_handle, m_strtpendpointtts, m_ob_locate.get_server(), TELBERT_CONTEXT_TTS, &m_stttshd))!= SUCCESS){ } RTSP/MRCP sniffer trace (what the MRCP server sees) Request SETUP rtsp://newbox36/synthesizer/ RTSP/1.0 CSeq: 7 Transport: RTP/AVP;unicast;destination=10.87.204.8;client_port=3000-3001 Content-Type: application/sdp Content-Length: 167 v=0 o=139112752 0 127.0.0.1 s=nms speech c=in IP4 0.0.0.0 t=0 0 m=audio 3000 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 Response RTSP/1.0 200 OK CSeq: 7 Session: RQKCRCSPWX0000000368fgJiuWPnxz Transport: RTP/AVP;unicast;client_port=3000-3001 Content-Length: 215 Content-Type: application/sdp v=0 o=- RQKCRCSPWX0000000368fgJiuWPnxz RQKCRCSPWX0000000368fgJiuWPnxz IN IP4 10.87.204.36 s=speechworks OpenSpeech Media Server version 2.0 c=in IP4 0.0.0.0 t=0 0 m=audio 3000 RTP/AVP 0 a=rtpmap: 0 pcmu/8000 Slide 25
Lessons Learned (cont'd) Load Balancing The MRCP specification allows for the MRCP server to coordinate where to setup the RTP connection with the ASR/TTS server; allows performance of load balancing activities Currently ScanSoft s MRCP server does not provide load balancing, but their engineers are looking at providing this Until then, your IVR will have to create its own load balancing and failover logic for the ASR/TTS server farm continued Slide 26
Lessons Learned (cont'd) Lots of specifications to be learned and not just by the integration team Specification Media Resource Control Protocol (MRCP) Specification Real Time Streaming Protocol (RTSP) Specification Real-Time Protocol (RTP) Specification Speech Recognition Grammar Specification Natural Language Semantics Markup Language for Speech Interface Framework (nl-spec) Specification Location ftp://ftp.rfc-editor.org/innotes/rfc2326.txt ftp://ftp.rfc-editor.org/innotes/internet-drafts/draftshanmugham-mrcp-05.txt ftp://ftp.rfc-editor.org/innotes/std/std64.txt http://www.w3.org/tr/2004/recspeech-grammar-20040316/ http://www.w3.org/tr/nl-spec/ Who needs to understand/ be aware of this spec Integration Team Application Interface Team Integration Team Integration Team Application Interface Team Application Developers Application Interface Team Application Developers Slide 27
Thank You! Note: PDF will be posted today Recorded version posted in a few days Slide 28
Q & A Session Please use the text messaging feature to send your questions Slide 29
For more information Contact Dr. Rob Kassel, Senior Product Manager, ScanSoft +1 617 428 4444; rob.kassel@scansoft.com Jack Chase, Director, Product Marketing, NMS +1 508 271 1109; jack_chase@nmss.com Eric Cunningham, Enterprise Architect, Capital One +1 804 855 3597; eric.cunningham@capitalone.com Upcoming Events VON Europe May 23 26 Stockholm, Sweden Booth # 1040 Upcoming Webinars June: Ready for Mainstream: AdvancedTCA Solutions Become Reality July: Transforming Speech Applications With NMS' new VoiceXML Server Slide 30