IVR Primer Introduction



Similar documents
White Paper: Performance of Host-based Media Processing

Business Communications Solutions - A Paper Summary

Choosing the Right Media Board for IVR Systems

Using Dialogic Boards to Enhance Voice Mail/Messaging Applications. Application Note

Application Note. Configuring Dialogic Host Media Processing Software Release 3.0 for Windows Software Licenses

PIKA HMP 3.0 High Level API Programmer's Guide

Application Note. Using Dialogic Boards to Enhance Interactive Voice Response Applications

Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML

Digital T1/E1 PCI (HMP) Board

Dialogic Diva V-1PRI, V-2PRI and V-4PRI (PCI/PCIe)

Avaya Call Recording Solution Configuration

Version 2.6. Virtual Receptionist Stepping Through the Basics

White Paper: Voice Over IP Networks

1. Login to with your User ID and password. Select Virtual Receptionist from the Basic Services tab.

Testing IVR Systems White Paper

Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper

RT Series IP PBX Products Introduction. All in one telephone system

Dialogic Diva SIPcontrol Software

Unified Messaging and Fax

Workforce Management IVR. A multi-service voice platform

Australian Standard. Interactive voice response systems user interface Speech recognition AS AS

Dialogic Diva Analog Media Boards

Using Avaya Aura Messaging

VoIP Conferencing. The latest in IP technologies deliver the next level of service innovation for better meetings. Global Collaboration Services

X t e n d I V R. Handbook

Hosted Fax Mail. Hosted Fax Mail. User Guide

MultiDSLA. Measuring Network Performance. Malden Electronics Ltd

IP Office 8.1 Using Voic Pro in Intuity Mode

MITEL Communications Platform

Crystal Innovation Solution 16 Moshe Shapira St., Rishon Le-Zion, 75704, Israel Crystal Gears

GUIDE TO PURCHASING A PHONE SYSTEM FOR YOUR CALL CENTER

Enterprise Messaging, Basic Voice Mail, and Embedded Voice Mail Card

MITEL IP Communications Platform

PRODUCT GUIDE Version 1.2 HELPDESK EXPRESS 1.0

StarWind iscsi SAN Software: Tape Drives Using StarWind and Symantec Backup Exec

Intel NetStructure Host Media Processing Release 2.0 for Windows

PCI DSS compliance when recording calls in contact centres. Application note. Ian Colville, Product Manager, Aculab

Contents. Specialty Answering Service. All rights reserved.

Avaya Microsoft Lync Integration User Guide for IP Office

Internet Telephony Terminology

NEC 2400 IMS --- T1 PRI

Electra Elite and InfoSet are registered trademarks of NEC America, Inc.

Administering Communication Manager for Avaya one-x Agent

Building Applications with Vision Media Servers

Adjusting Voice Quality

AA9604 Auto Attendant

Phone Routing Stepping Through the Basics

E I M S - Interactive Voice Response System

3300 IP Communications Platform Release 7.1 License Information

In addition to our VoiceDirector hardware products, the following SIP broadband devices are also compatible with VoiceDirector:

VoIP 101. E911-Enhanced 911- Used for providing emergency service on cellular and internet voice calls.

CMG 7.5. Quick Guide InConference. Makes your extension a conference room. Rev A4

Business Communications Solutions

Intel NetStructure Host Media Processing Software Release 1.0 for the Windows * Operating System

IP Office Platform. Avaya IP Office Platform Embedded Voic User Guide (IP Office Mode) Issue 15b - (22 January 2015)

Cisco ATA 186 Analog Telephone Adaptor

Application Note. Using Ethereal to Debug SIP and RTP on Dialogic Voice over IP (VoIP) Products

CHAPTER 4 Enhanced Automated Attendant

Avaya Engagement Assistant Web Portal Administration

Voic . Advanced User s Guide. Version 2.0

We KNOW We CARE We SERVE. Helping Businesses Make Intelligent Use of Technology. Sample Company. Telephone Engineering Analysis

Application Note. Configuring Dialogic Host Media Processing Software Release 3.1LIN Software Licenses

NuPoint Unified Messaging

Allworx Queuing and Automated Call Distribution Guide (Release x)

INTRODUCTION 3 XYZ EXAMPLE 3 CONNECTING THE FAXFINDER TO THE IP OFFICE SWITCH 4 CONFIGURING THE FAXFINDER FOR INTER-OPERATION -- THE XYZ EXAMPLE 4

Using Multiple Appearance Directory Number - Single Call Appearance with Polycom Phones

Toll-bypass Long Distance Calling What Is VOIP? Immediate Cost Savings Applications Business Quality Voice...

Application Notes. Contents. Overview. Introduction. Echo in Voice over IP Systems VoIP Performance Management

HD Voice Conference IP Phone with PSTN

The MOST Affordable HD Video Conferencing. Conferencing for Enterprises, Conferencing for SMBs

TESTING REPORT: Callware's Callegra

Hosted IVR and Contact Center Solutions: The Compelling Case for Adoption

Building Conferencing Applications Using Intel NetStructure Host Media Processing Software

User Guide for the Cisco Unity Connection Phone Interface (Release 8.x)

4. H.323 Components. VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19

BUYER S GUIDE IP COMMUNICATIONS BUYER S GUIDE

Verizon Business National Unified Messaging Service Enhanced Service Guide

Configuration Notes 290

Emerald ICE Digital Key Telephone System

Getting Started with Loyola s New Voic System

Choosing a Dialogic Product Option for Creating a PSTN-HMP Interface

ReadySHARE Printer. Easy to Set Up: Instructions. 350 East Plumeria Drive San Jose, CA USA

Reference Guide for Inter-Operation with Avaya IP Office INTRODUCTION 3 XYZ EXAMPLE 3 CONNECTING THE FAXFINDER TO THE IP OFFICE SWITCH 4

Welcome to Cogeco Business Digital Phone Service

Abstract. Avaya Solution & Interoperability Test Lab

Dialplate Receptionist Console Version

Unified Messaging Solutions from Captaris. Access Your Messages Anytime, Anywhere

Host Media Processing for Windows Operating Systems Reference Design Guide

IPitomy User Guide Business Phones Conferencing Voice Mail

ehealth and VoIP Overview

OAISYS and ShoreTel: Call Recording Solution Configuration. An OAISYS White Paper

PIKA GrandPrix 1.3 Programmer's Guide

STATE OF THE IVR: INDUSTRY EXPERTS WEIGH IN Insights and best practices for getting the most out of your IVR interactions.

Audio and Web Conferencing

Cisco IOS VoiceXML Browser

VoIP from A to Z. NAEO 2009 Conference Cancun, Mexico

Voice Processing Standards. Mukesh Sundaram Vice President, Engineering Genesys (an Alcatel company)

Traditional Telephony

The Cross-Media Contact Center

Transcription:

IVR Primer Introduction Speech-enabled applications are quickly becoming very popular. Why? Because using voice to navigate is more natural for users than punching telephone keypads. Speech as a navigation tool also allows clients to swiftly drill down to the information they need. Users are more satisfied! Similar to a well-constructed sentence featuring nouns, verbs, and flowery adjectives, a well-constructed IVR system needs a voice board featuring echo cancellation, buffer size, pre-speech buffers, voice activity detection, and barge-in. The two sections in this primer Speech Basics and Putting Speech Together are intended to help you better understand the importance of voice boards when constructing IVR systems. Let s get started with Speech Basics. 1

Block One Speech Basics What is an IVR? In it s simplest terms, an IVR system connects users to applications through their telephone. Users can navigate using dual tone multi-frequency (DTMF)/touch-tone or speech. A good example of a speech-enabled IVR system is one that offers weather updates. After you dial the number, you are greeting by a voice prompting you for the city. If you say, San Francisco, the voice prompt then tells you the temperature in San Francisco. Quick and easy. With speech, users no longer need to struggle with the myriad of choices currently offered by touchtone (DTMF). Speech-activated responses cut down on the number of options. Speech recognition applications are expected to revitalize the market as enterprises replace and/or enhance their existing touch-tone IVR systems. Speech-based IVR systems lower expenses making a dramatic impact on the bottom line, as only the more complex calls require agent-intervention. What are IVRs used for? Access to weather updates was one example mentioned earlier. More examples are: Automated telephone ordering services, support desks, and order tracking; Access to traffic conditions, driving directions, e-mail, calendars, personal assistants, banking, and banking information; Voice dialing and voice mail messages; and Text-to-speech (TTS), automatic speech recognition (ASR) and IVR services. According to statistics from telecommunications industry analysts, the demand for IVRs is very promising. Both Global Information and Radicati Group project the IVR market to reach US$12B by 2005, and The Kelsey Group projects US$16B by 2006. What components do developers need to build an IVR system? IVRs consist of several components: voice boards to connect to the telecom network, speech technology for automated speech recognition and text-to-speech conversion, application software, and the servers and network infrastructure to run the system. Voi PIKA Technologies specializes in providing voice boards and associated software our primer focuses on the features you need from a vendor such as PIKA Technologies when you re putting speech together. Why are voice boards so important in the design of the IVR system? It is the responsibility of the voice board to get the clean audio buffers to the speech recognition engine for processing the speech as quickly as possible. The speech recognizer deciphers the content of the buffers, and returns the results to the application. PIKA Technologies hardware (voice boards) and software (API and drivers) media processing building blocks are designed to provide full duplex streaming audio to any third party speech recognizer engine. In any well-designed IVR, the following features are critical for overall performance and efficiency of the IVR system: Echo cancellation Optimized buffer sizes Pre-speech buffering Voice activity detection (VAD) Barge-in We ll now delve deeper into each these features in the section, Putting Speech Together. 2

Block Two Putting Speech Together Echo Cancellation What is Echo? Echo occurs when a portion of the transmitted signal is reflected back through the circuit to its point of origin. Why is echo bad? Echo will result in decreased performance of speech recognition accuracy since the incoming voice/speech is now corrupted with the echo of the prompt being played. Once the echo is removed, the speech recognition will have a cleaner representation of the actual word or phrase spoken. How does echo cancellation work? Echo cancellation is a digital signal processing (DSP) application. Echo cancellers operate by receiving two signals. The first is from source corrupted by echo in the signal. The second is a reference signal, part of which is reflected as echo. The reference signal is used to predict the echoed portions within the source signal. The predicted echo is then subtracted from the corrupted source signal, resulting in a much cleaner transmission. This technology is extremely import for cut-through and barge-in functionality, both of which allow users to speak over system voice prompts. We ve got echo cancellation! PIKA Technologies echo canceller is designed in conformance with G.168 recommendations. G.168 is a guideline for implementing line echo canceller systems. Implementation requires the use of a reference port that can be selected through PIKASetup in the PIKA API. The echo canceller API has a pause and resume feature, which increases the ASR barge-in performance. The echo canceller resumes from where it left off, using the learned adaptive filter coefficients, which means it requires less adaptation time and therefore produces cleaner audio. The API is very flexible as it allows the developer to set the tail length (up to 128ms), double-talk threshold, speech threshold, and suppression threshold of the echo canceller. Despite the fact that a suppression threshold is supported, it is generally not considered necessary since most speech recognition engines can handle a consistent background noise. In some cases, it may actually decrease the performance or accuracy of speech recognition if the beginning and/or end of soft-spoken words were below the set threshold. Buffer Size When an application starts to use audio functions, one of the first things to do is set the audio format to be used and, based on that, query the driver for the optimal buffer size. For optimal performance, the application should make sure it sends audio data in chunks, which are multiples of this buffer size. Depending on the application, buffer size can help or hinder system performance: Small buffers allow for a quicker transfer of data to the speech recognizer engine, but can also overwork the host central processing unit (CPU). Large buffers are easier on the host CPU, but increase latency and reduce systems performance. We ve got optimized buffer size! The flexible PIKA API allows you to set the buffer size so that you can optimize your system for your unique needs. 3

VAD The ability to detect voice at the DSP level, and then stream the audio buffers to the host-based speech recognizer engine, greatly improves system performance because it reduces the load on the host CPU. The voice activity detector (VAD) is located in the DSP. VAD is used to control (start and stop) a recording at the DSP level in order to filter out silence so that only relevant audible activity is provided to the application. When energy is detected on the port, the DSP notifies the application and immediately starts to stream the audio buffers to the application for processing. This reduces the amount of un-necessary audio buffers being sent to the application. There is a pre-speech buffer in the DSP that contains the first utterance, thereby eliminating the need of keeping a circular buffer at the application level. We ve got voice activity detection! Our VAD is based on energy detection and provides time stamping of the state changes between the presence of voice activity (start event) and the removal of voice activity (stop events). The PIKA VAD API allows you to configure the: Maximum threshold (level to start recording); Minimum threshold (level to stop recording); Activation debounce time; Deactivation debounce time; and Pre-speech buffer size. Figure 3: Voice activity detection 4

Pre-speech buffering The pre-speech buffer saves time by removing the need to keep track of the audio buffers to find out when speech started. We ve got pre-speech buffering! There is a pre-speech buffer in the DSP that contains the first utterance, thereby eliminating the need of keeping a circular buffer at the application level. As seen in the VAD diagram (figure 3), the speech detection is only triggered when the energy level reaches a specific point, the high threshold. The energy, or speech in this case, at the beginning of the word is not detected. To ensure that the data is not lost, the DSP is continuously recording 250 ms of speech into the pre-speech buffer. When the VAD is triggered, this pre-speech buffer will contain the beginning portion of the utterance. This buffer is sent to the application first, then the audio is streamed to the application until the VAD detects the end of speech. Barge-in The barge-in feature allows users to interrupt voice prompts. We ve got barge-in! The combination of echo cancellation and VAD on the DSP allows for barge-in/cut through. This means that the user can start talking while the prompt is playing. The echo canceller blocks the prompt from the recording port, giving a cleaner audio signal for recording. When the VAD detects speech, the prompt is stopped, and the audio buffers are streamed to the application for processing. The pre-speech buffer saves time by removing the need to keep track of the audio buffers to find out when speech started. The configurable audio buffer size allows for optimization of the system. All of the events and processes that are done on the DSP translate to less host CPU time, which means a faster IVR system. 5

Plug-in to the converged network. Expertly. PIKA Technologies reliable media processing building blocks deliver the features required to design a low-latency, highly efficient IVR system. The voice boards feature echo cancellation, optimized buffer sizes, pre-speech buffers, voice activity detection, and barge-in. PIKA Technologies delivers the voice boards and features you need to stream clean audio to any speech recognition engine on the market. In addition, when you chose PIKA Technologies, you are in the hands of technical experts. The PIKA Plus Advantage PIKA Technologies delivers the products and services you need to design voice and fax applications: high density digital, analog and IP boards; a market-leading granular API that works across our entire series of voice boards; powerful and feature-rich DSPs; competitive pricing; and a comprehensive support package that includes direct pre-sales support, free development support, and free ongoing maintenance support by our in-house technical support team. Get a cost effective total solution from a single source the PIKA Plus advantage. About PIKA Technologies PIKA Technologies reliable media processing building blocks connect computer systems to TDM and IP networks. Brand name companies design groundbreaking IVR, call center, custom PC/IP PBX, fax and logging solutions using PIKA Technologies building blocks. With two decades of experience in this industry, the company is recognized for earning strong relationships with its customers worldwide by delivering direct, expert technical support. Headquartered in Ottawa, ON, Canada, the company has ranked in The Branham300, an authoritative ranking of successful Canadian high tech firms, for four consecutive years. Visit www.pikatechnologies.com or call +1-613-591-1555 for more information. This document is provided to you for informational purposes only and is believed to be accurate as of the date of its publication, and is subject to change without notice. PIKA Technologies Inc. assumes no responsibility for any errors or omissions in this document and shall have no obligation to you as a result of having made this document available to you or based upon the information it contains. PIKA is a registered trademark of PIKA Technologies Inc. AllOnBoard and AllOnHost are a trademarks of PIKA Technologies Inc. All other trademarks, product names and company names and/or logos cited herein, if any, are the property of their respective holders. Copyright PIKA Technologies Inc., 2005. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, without the express written permission of PIKA Technologies Inc. 535 Legget Drive, Suite 400, Ottawa, Ontario, Canada, K2K 3B8 Tel: 613-591-1555 Fax: 613-591-9295 Visit www.pikatechnologies.com Email: sales@pikatech.com Copyright PIKA Technologies Inc., 2005. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, without the express written permission of PIKA Technologies Inc. This document is provided to you for informational purposes only and is believed to be accurate as of the date of its publication, and is subject to change without notice. PIKA Technologies Inc. Assumes no responsibility for any errors or omissions in this document and shall have no obligation to you as a result of having made this document available to you or based upon the information it contains. PIKA is a registered trademark and AllOnBoard and AllOnHost are trademarks of PIKA Technologies. LINUX is a trademark of Linus Tovalds. SUSE is a registered trademark of SUSE LINUX AG. RED HAT is a registered trademark of Red Hat, Inc. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corp. in the United States and/or other countries. 6