Speech Technology Summit 2001 Developing Usable VoiceXML Applications Neil Bowers neilb@src.co.uk
Contents About SRC Professional services company Methodology How we go about developing speech applications Example development Placing a bet on formula 1 Observations What we ve learned so far
SRC Deploying speech recognition since 1995 Desktop dictation solutions Bespoke application development Digital workflow SRC Telecom formed December 2000 Telephony-based speech solutions Bespoke development Telco grade hosting VXML development hosting
Scope Call Centre Automation Voice Web [mv]-commerce Corporate Portals
Hosted speech recognition services Public Telephony Network + SRC Telco Partner Network ACD / IVR System Customer or Employee Call-centre or customer reps SRC Speech Telephony Platform Voice path Data path Internet / Intranet / Data links Data Systems Call control path SRC Telecom SRC Customer
Myths We can just add speech to our IVR menus our HTML guys can knock up our VXML apps ASR is really good (bad) Development finishes at deployment It s like the web If you build it, they will ring US TTS is ok for the UK VoiceXML will make everyone open
Methodology (1) Consulting Identify suitable applications & business case Set expectations Start with a small-scope application Requirements & Design Observations, recordings, interviews Use cases and simulations System view VXML app may not be the hard bit Persona and dialogue strategy
What is it? The personality of your application Defines the style of interaction Why? Improves user experience Creates stickiness How do you pick one? The end-user population & frequency of use Goal of service / motivation of the end-user Corporate & service brand Write a biography (with the customer) What does it affect? Dialogue strategy Wording of prompts Choice of voice talent Recording of prompts Sidebar: Persona
Methodology (2) Development Short iterations regular feedback Early integration & test Usability tests critical (early and often) Pilot Start with a small call volume Tuning Production hosting Ongoing usability evaluation and tuning
Sidebar: Usability Testing Don t leave it until the end Test early prototypes or simulations with real users Test as you develop Test major chunks with ~5 non-developers Majority of potential problems found with < 10 users Several iterations after development Pilot with initially small call volume Monitor ongoing live usage
What makes an application hard? Intrinsically hard recognition tasks Dynamic data Users (heterogeneity, noisy environment, mobiles) Telephony & CTI Integration with customer s systems Complexity of dialogue Low process maturity
Example: Formula SRC Betting Grand Prix betting demonstration Driver, race, amount Place multiple bets, confirmation, summary at end a tenner on Schumacher Disambiguation ( which Schumacher brother? ) Mixed initiative, then directed ( in which grand prix? ) 020 7381 7890 Username: Murray Walker Password: 4 3 7 6
Iterations The main prompt, all 3 tokens Straight through the call flow graph Other main states in call flow And error handling Tuning Particular focus on grammars Top & Tail Usability
What it took Approach VXML 1.0+ - static, generated, & dynamic Close working with VoiceGenie Recorded audio for all prompts Usability testing Various tools Text editor, perl, SP, tuning tools, logfile analysis, index cards, video camera, recording tools
Lessons learned Don t over-specify the grammars Use universals sparingly A light touch with mixed initiative (cf human agents) Set up in-house prompt recording facilities Use UK acoustic models (not US!) Don t put too much in 1 st level prompts Resist the temptation to tune for the 1% when usability testing Barge-in, timing, spacing, & delays are critical Spend time on dialogue strategy (including hand-off) Get to know your platform and technology suppliers
Skill-set of development team Human factors Dialogue design Grammar development & tuning Speech technologist Architect VoiceXML Usability testing Software & web development Toolsmith Telephony Audio Project management
80/20 Where To Focus Understanding the end-users Crafting prompts & error handling Limits of ASR, and how to move beyond them
Summary Multi-disciplinary team Make friends with your technology suppliers Design separately for voice (but across modes) Pick a persona/style for the dialogue Learn how to improve perceived recognition Usability evaluation Prompt crafting and error handling Pilot and ramp-up volume gradually Monitor and tune
Neil Bowers neilb@src.co.uk Grand Prix Betting Demo 020 7381 7890 User: Murray Walker Password: 4 3 7 6