Develop Software that Speaks and Listens

Similar documents

Design Grammars for High-performance Speech Recognition

Voice Driven Animation System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Welcome to The Grid 2

Dragon Medical Practice Edition v2 Best Practices

Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.

Standard Languages for Developing Multimodal Applications

MS WORD 2007 (PC) Macros and Track Changes Please note the latest Macintosh version of MS Word does not have Macros.

Philips 9600 DPM Setup Guide for Dragon

WALKABOUT 5215 SUPPLEMENTAL INSTRUCTIONS FOR SETTING UP YOUR USER PROFILE IN DRAGON NATURALLYSPEAKING

Voice-Recognition Software An Introduction

Dragon Solutions. Using A Digital Voice Recorder

Wave IP 2.0 SP1. Wave ViewPoint User Guide

9RLFH$FWLYDWHG,QIRUPDWLRQ(QWU\7HFKQLFDO$VSHFWV

Construct User Guide

What's New in BarTender 2016

Call Recorder Oygo Manual. Version

Book Builder Training Materials Using Book Builder September 2014

TELUS Business ConnectTM. User Guide

ADDING DOCUMENTS TO A PROJECT. Create a a new internal document for the transcript: DOCUMENTS / NEW / NEW TEXT DOCUMENT.

Manual Version CV

The Prognosis is Good: Speech Recognition Software Can Increase Productivity in the Medical Environment

Dragon Solutions Using A Digital Voice Recorder

Deploying Cisco Unified Contact Center Express Volume 1

VoiceXML Data Logging Overview

AUTOCUE IVR. User Guide Updated: 06/18/15 Document Number: 36UG

OpenScape Xpressions V7 Client Applications. User Guide A31003-S2370-U

AAM Web Interface Carroll University Information Technology Services

Sanako Study 1200 USER GUIDE

MyIC setup and configuration (with sample configuration for Alcatel Lucent test environment)

Microsoft Office PowerPoint 2013

Avaya Aura Orchestration Designer

DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION

Easy Bangla Typing for MS-Word!

Mobile Application Languages XML, Java, J2ME and JavaCard Lesson 03 XML based Standards and Formats for Applications

Step by step guide to using Audacity

Hermes.Net IVR Designer Page 2 36

WebSphere Business Monitor

Version 2.6. Virtual Receptionist Stepping Through the Basics

SMART Board Menu. Full Reference Guide

14.1. bs^ir^qfkd=obcib`qflk= Ñçê=emI=rkfuI=~åÇ=léÉåsjp=eçëíë

VoiceXML-Based Dialogue Systems

Asset Track Getting Started Guide. An Introduction to Asset Track

ADOBE CONNECT 8 USER GUIDE. Colorado Department of Labor & Employment

Transcription FAQ. Can Dragon be used to transcribe meetings or interviews?

Creating tables of contents and figures in Word 2013

Adobe Acrobat: Creating Interactive Forms

Microsoft PowerPoint 2010

RingCentral Office. Basic Start Guide FOR USERS

WORKBOOK. SpeechControl4Mac V2.8 Build 260 DICTATE 2.5

Thin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals

Introduction to Smart Board. Table of Contents. Connection Basics 3. Using the Board (Basics) 4. The Floating Tools Toolbar 5-6

Simply Accounting Intelligence Tips and Tricks Booklet Vol. 1

Docsoft:AV Appliance. User Guide. Version: 3.0 Copyright 2005,2009 Docsoft, Inc.

The preliminary design of a wearable computer for supporting Construction Progress Monitoring

MICROSOFT WORD (2003) FEATURES

interviewscribe User s Guide

Microsoft PowerPoint 2011

Quintet Enterprise Unified Communication Solutions

Bluetooth for Windows

Raptor K30 Gaming Software

Broadcom Bluetooth Software BTW6.X Audio Switch Function

Microsoft Office Live Meeting Audio Controls Users' Guide

Online Recruitment - An Intelligent Approach

Enhancing the SAS Enhanced Editor with Toolbar Customizations Lynn Mullins, PPD, Cincinnati, Ohio

ASNA DataGate Studio Working with Connections

Auto Clicker Tutorial

Professional. SlickEdif. John Hurst IC..T...L. i 1 8 О 7» \ WILEY \ Wiley Publishing, Inc.

ECMA TR/88. 1 st Edition / June Designing an Object Model for ECMA-269 (CSTA)

Acano solution. Acano Clients v1.7 Getting Started Guide. June D

White Paper. Guidelines for Developing Windows Applications Compatible with Dragon NaturallySpeaking and Dragon Medical

Citrix EdgeSight for Load Testing User s Guide. Citrx EdgeSight for Load Testing 2.7

S7 for Windows S7-300/400

Welcome to the new Netop School 6.0 interface!

StreamServe Persuasion SP4 StreamServe Connect for SAP - Business Processes

Mbox Basics Guide. Version 6.7 for LE Systems on Windows XP or Mac OS X. Digidesign

For Introduction to Java Programming, 5E By Y. Daniel Liang

Phone Routing Stepping Through the Basics

StreamServe Persuasion SP5 Control Center

Hosted Fax Mail. Hosted Fax Mail. User Guide

Microsoft PowerPoint Exercises 4

HP CLASSROOM MANAGER. Empowering teachers, engaging students. QuickStart Guide for Teachers

DiskPulse DISK CHANGE MONITOR

Siebel Business Process Framework: Workflow Guide. Siebel Innovation Pack 2013 Version 8.1/8.2 September 2013

Dragon Solutions Enterprise Profile Management

PCB Project (*.PrjPcb)

User s Guide for Polycom CX7000 Systems

Recording and Editing Audio with Audacity

Exporting to DVD with EDIUS

User Manual. 3CX VOIP client / Soft phone Version 6.0

Quick Start Guide: Read & Write 11.0 Gold for PC

VOCOLLECT VOICEARTISAN. Extending Your Vocollect Configuration

Speech Analytics. Whitepaper

Transcription:

Develop Software that Speaks and Listens

Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered trademarks of Chant Inc. Other marks are trademarks or registered trademarks of their respective holders.

Develop Software that Speaks and Listens You really don t have to sit in front of a computer with a mouse and keyboard to use information technology. Your applications can be enhanced to speak and listen to you from where ever you need them to. Speech technology allows you to voice-enable your applications for: controlling application functions without having to use a mouse or keyboard; prompting users for applicable data capture; capturing data by speaking rather than typing; and confirming data capture with audio acknowledgement. It is comprised of many technologies that can enable users to be more productive with their applications: Speech Recognition (speech-to-text) involves converting an acoustic signal (i.e. audio data), captured by a microphone or a telephone, to a set of words that can then be used for controlling computer functions, data entry, and application processing. Speech Synthesis (text-to-speech) is the process of converting words to phonetic and prosodic symbols and generating synthetic speech audio data to be used for answering questions, event notification, and reading documents aloud. A grammar is a collection of rules comprised of words and phrases to be recognized from speech that enables applications to capture data efficiently and assert domain constraints to elevate accuracy. A lexicon is a collection of word pronunciations that a speech recognition engine (i.e., recognizer) uses to improve recognition accuracy and a speech synthesis engine (i.e., synthesizer) uses to enhance how it pronounces words. A profile is a collection of training and background noise information to use in recognizing speech that a speech recognition engine (i.e., recognizer) improves its recognition accuracy for a specific individual s voice and environment. TTS markup is text with imbedded indicators to tailor speaking qualities such as the speed, pitch, emphasis, and word pronunciation in reproducing speech from text. 3

When building applications, it s essential that you have the right tools to get the job done. Now you can develop applications that speak and listen faster and easier using Developer Workbench from Chant. Chant Developer Workbench is comprised of tools and components for integrating speech technology. As an interactive toolset, it provides a development and testing environment for working with the component libraries and the speech technology objects they manage. You can manage grammars, profiles, lexicons, recognizers, synthesizers, and text-to-speech markup interactively and directly within application software you develop and deploy. Chant Developer Workbench provides a comprehensive development and testing environment for working with speech technology that features: Multi-document, interactive, customizable environment; Powerful editor with color-coded formatting, IntelliPrompt, optional outlining, optional line numbers, undo-redo, word wrap, and find/replace; Command line testing; and Event tracing. The tabbed-document interface provides for fast switching among multiple speech objects. The editing environment is designed to accelerate speech technology grammar and markup development with built-in syntax checking and prompting. The multi-docked windows layout is configurable to yield productivity for various development and testing scenarios. 4

Toolbars can be easily customized to display the facilities most often used Window layout and toolbar settings are persisted across interactive environment executions. Within the Chant Developer Workbench interactive environment, you can: Manage grammars with GrammarKit: o Create and edit grammars in native grammar syntax o Generate word pronunciation phonemes o Compile and debug grammars o Test grammars with live and recorded audio, and text simulation (requires SpeechKit) Manage lexicons with LexiconKit: o Create and edit lexicon using XML o Generate word pronunciation phonemes o Edit word pronunciation phonemes o Import and export lexicon word pronunciations Manage speaker profiles with ProfileKit: o Create and delete speaker profiles o Enumerate speaker profiles for selection and command line testing o Invoke speaker training o Import and export speaker profiles Manage speech engines with SpeechKit: o Enumerate audio devices and speech engines for selection and command line testing of audio-, recognizer-, and synthesizer-specific features o Trace audio, recognition, and synthesis events o Support grammar activation and testing (requires GrammarKit) o Support TTS markup playback (requires VoiceMarkupKit) Manage TTS markup with VoiceMarkupKit: o Create and edit documents with TTS markup o Generate TTS markup o Generate word pronunciation phonemes o Edit word pronunciation phonemes (requires LexiconKit) o Playback text with TTS markup (requires SpeechKit) MANAGING GRAMMARS A grammar is a collection of rules comprised of words and phrases to be recognized from speech. A speech recognition engine (i.e., recognizer) uses a grammar to enhance its ability to recognize specific combinations of spoken words and phrases. Chant GrammarKit provides you an easy way to create, modify, and test context-free grammars before you integrate and deploy them with your application. 5

Grammar Editing: Edit speech SAPI 4, SAPI 5, IBM SRCL, and L&H BNF+ grammars faster with builtin intelliprompt that suggests valid grammar syntax. Grammar Compiling and Testing: Compile and test grammars with a click of a button. Review compiler messages in the output window. Speak into a microphone to test grammars or use the command line to test with recorded audio. Test SAPI 5 grammars with text strings using simulated recognition. 6

Recognition Results: View recognition results in the Output window. Recognition Events: Browse recognition events in the Events window. 7

Error Debugging: Browse compilation errors in the Error window. Click on the error to take you to the location of it in the document window. MANAGING LEXICONS A lexicon is a collection of words and information about these words used by a speech recognition engine to increase its recognition accuracy. A text-to-speech engine uses a lexicon to enhance the quality of its pronunciation. Lexicons play an important role in the accuracy of speech recognition. A speech recognition engine (i.e., recognizer) uses lexicons in the process of recognizing speech. Lexicons consist of the words that a recognizer understands and returns as recognized speech. Since it s impractical for a recognizer to maintain every possible word and context in its spoken language, you enhance the accuracy of speech recognition by extending its lexicon. Lexicons play an important role in the quality of text-to-speech playback. A text-to-speech engine (i.e., synthesizer) uses lexicons to obtain pronunciation information associated with words to generate the appropriate speech sounds for the word. For example, with a lexicon you may ensure record is pronounced correctly when used as a noun and when used as a verb. Chant LexiconKit provides you an easy way to create, delete, modify, and extend lexicons. It provides a simple way to backup and restore lexicons for distribution with your applications. 8

Lexicon Editing: Edit word pronunciations faster using XML with built-in intelliprompt that suggest valid syntax. Pronunciation Editing: Use the phoneme selection dialog to list the phonemes by speech engine and select them for editing pronunciations. 9

Pronunciation Generation: Use the Add Words dialog to generate word pronunciation entries with default pronunciations. MANAGING PROFILES Your speech recognition profile is a critical component for accurate speech recognition. It contains acoustic information that helps the speech recognition engine (i.e., recognizer) in converting your speech to text. You help the recognizer to perform its function by providing it samples of your speech through training. Training is a process of capturing and analyzing your speech in your environment. The recognizer uses the information saved from training to fine tune how it distinguishes speech from noise during speech recognition processing. Some recognizers automatically create a default speech recognition profile in case you have not created one. Some can adjust to your speech and environment during speech recognition and automatically update your profile. Chant ProfileKit provides you an easy way to create, delete, and train speaker profiles. It provides a simple way to backup and restore profiles for distribution with your applications or administering across your network of end users. 10

Profile Management: Enumerate, Add, Delete, Backup, Restore and Train speech recognition profiles with a click of a button. MANAGING SPEECH ENGINES Integrating speech technology involves managing the resources to process speech and audio. Audio devices handle the inbound recording and outbound playback of speech. Speech recognizers handle detecting speech from an audio source and converting to text. Speech synthesizers handle converting text to speech and generating audio. Chant SpeechKit provides you an easy way to manage audio devices, speech recognizers, and speech synthesizers. It provides a simple way to construct software that speaks and listens. MANAGING AUDIO DEVICES Managing audio devices is an application runtime function. Chant SpeechKit is comprised of application ready software components that handle the complexities of working with audio devices. The components minimize the programming efforts necessary to record and playback audio. 11

Chant Developer Workbench provides you powerful prototyping and testing environment to model and validate your application code for managing audio devices. Within Chant Developer Workbench, you can open an audio device enumerator to perform command line testing and trace callback events. This enables you to model and test your audio device use before, during, and after integrating code in your applications. Audio Device Management: Enumerate and test audio devices with recording and playback requests. Trace audio events in the Events window. MANAGING RECOGNIZERS Managing recognizers for speech recognition is an application runtime function. Chant SpeechKit is comprised of application ready software components that handle the complexities of speech recognition. The components minimize the programming efforts necessary to construct software that listens. Chant Developer Workbench provides you powerful prototyping and testing environment to model and validate your application code for managing recognizers. Within Chant Developer Workbench, you can open a recognizer enumerator to perform command line testing and trace callback events. This enables you to model and test your speech recognizer use before, during, and after integrating code in your applications. 12

Recognizer Management: Enumerate and test recognizers. Use the command line to invoke methods such as recognizing from prerecorded audio. Trace recognition events in the Events window. MANAGING SYNTHESIZERS Managing synthesizers for speech synthesis is an application runtime function. Chant SpeechKit is comprised of application ready software components that handle the complexities of speech synthesis. The components minimize the programming efforts necessary to construct software that speaks. Chant Developer Workbench provides you powerful prototyping and testing environment to model and validate your application code for managing synthesizers. Within Chant Developer Workbench, you can open a synthesizer enumerator to perform command line testing and trace callback events. This enables you to model and test your synthesizer use before, during, and after integrating code in your applications. 13

Synthesizer Management: Enumerate and test synthesizers. Use the command line to invoke methods such as synthesizing text from a file or audio file playback. Trace synthesis events in the Events window. Command Line Intelliprompt: Test component methods with built-in prompts for method signatures. Simply begin typing and pop-ups guide you through parameter specification. 14

Browse Events: Browse synthesis events in the Events window. Analyze event data to determine applicability of callbacks before integrating in applications. MANAGING TEXT-TO-SPEECH MARKUP A text-to-speech engine (i.e., synthesizer) uses TTS markup to enhance its ability to synthesize speech from text and generate the audio for playback. Chant VoiceMarkupKit provides you an easy way to create, modify, and test TTS markup before you integrate it with your application. 15

Marking Up Text: Highlight and click. It s that simple to markup text for enhanced speech synthesis. SSML Editing: Edit L&H Native Control Sequence, SAPI 4, SAPI 5, and W3C Speech Synthesis Markup Language (SSML) faster with built-in intelliprompt that suggest valid markup syntax. 16

TTS Playback: Playback text-to-speech markup with a click of the button. Highlight specific text or playback the entire document. MORE INFORMATION To learn more about developing software that speaks and listens, explore how easily you can manage grammars, profiles, lexicons, recognizers, synthesizers, and text-to-speech markup directly within application software you develop in the following documents: Integrate Speech Technology for Hands-free Operation, Design Grammars for High-performance Speech Recognition, Tailor Pronunciations for Maximum Clarity, Administer Speaker Profiles for Accurate Speech Recognition, and Fine-tune Speech Synthesis Using Text-to-Speech Markup. 17