Develop Software that Speaks and Listens
Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered trademarks of Chant Inc. Other marks are trademarks or registered trademarks of their respective holders.
Develop Software that Speaks and Listens You really don t have to sit in front of a computer with a mouse and keyboard to use information technology. Your applications can be enhanced to speak and listen to you from where ever you need them to. Speech technology allows you to voice-enable your applications for: controlling application functions without having to use a mouse or keyboard; prompting users for applicable data capture; capturing data by speaking rather than typing; and confirming data capture with audio acknowledgement. It is comprised of many technologies that can enable users to be more productive with their applications: Speech Recognition (speech-to-text) involves converting an acoustic signal (i.e. audio data), captured by a microphone or a telephone, to a set of words that can then be used for controlling computer functions, data entry, and application processing. Speech Synthesis (text-to-speech) is the process of converting words to phonetic and prosodic symbols and generating synthetic speech audio data to be used for answering questions, event notification, and reading documents aloud. A grammar is a collection of rules comprised of words and phrases to be recognized from speech that enables applications to capture data efficiently and assert domain constraints to elevate accuracy. A lexicon is a collection of word pronunciations that a speech recognition engine (i.e., recognizer) uses to improve recognition accuracy and a speech synthesis engine (i.e., synthesizer) uses to enhance how it pronounces words. A profile is a collection of training and background noise information to use in recognizing speech that a speech recognition engine (i.e., recognizer) improves its recognition accuracy for a specific individual s voice and environment. TTS markup is text with imbedded indicators to tailor speaking qualities such as the speed, pitch, emphasis, and word pronunciation in reproducing speech from text. 3
When building applications, it s essential that you have the right tools to get the job done. Now you can develop applications that speak and listen faster and easier using Developer Workbench from Chant. Chant Developer Workbench is comprised of tools and components for integrating speech technology. As an interactive toolset, it provides a development and testing environment for working with the component libraries and the speech technology objects they manage. You can manage grammars, profiles, lexicons, recognizers, synthesizers, and text-to-speech markup interactively and directly within application software you develop and deploy. Chant Developer Workbench provides a comprehensive development and testing environment for working with speech technology that features: Multi-document, interactive, customizable environment; Powerful editor with color-coded formatting, IntelliPrompt, optional outlining, optional line numbers, undo-redo, word wrap, and find/replace; Command line testing; and Event tracing. The tabbed-document interface provides for fast switching among multiple speech objects. The editing environment is designed to accelerate speech technology grammar and markup development with built-in syntax checking and prompting. The multi-docked windows layout is configurable to yield productivity for various development and testing scenarios. 4
Toolbars can be easily customized to display the facilities most often used Window layout and toolbar settings are persisted across interactive environment executions. Within the Chant Developer Workbench interactive environment, you can: Manage grammars with GrammarKit: o Create and edit grammars in native grammar syntax o Generate word pronunciation phonemes o Compile and debug grammars o Test grammars with live and recorded audio, and text simulation (requires SpeechKit) Manage lexicons with LexiconKit: o Create and edit lexicon using XML o Generate word pronunciation phonemes o Edit word pronunciation phonemes o Import and export lexicon word pronunciations Manage speaker profiles with ProfileKit: o Create and delete speaker profiles o Enumerate speaker profiles for selection and command line testing o Invoke speaker training o Import and export speaker profiles Manage speech engines with SpeechKit: o Enumerate audio devices and speech engines for selection and command line testing of audio-, recognizer-, and synthesizer-specific features o Trace audio, recognition, and synthesis events o Support grammar activation and testing (requires GrammarKit) o Support TTS markup playback (requires VoiceMarkupKit) Manage TTS markup with VoiceMarkupKit: o Create and edit documents with TTS markup o Generate TTS markup o Generate word pronunciation phonemes o Edit word pronunciation phonemes (requires LexiconKit) o Playback text with TTS markup (requires SpeechKit) MANAGING GRAMMARS A grammar is a collection of rules comprised of words and phrases to be recognized from speech. A speech recognition engine (i.e., recognizer) uses a grammar to enhance its ability to recognize specific combinations of spoken words and phrases. Chant GrammarKit provides you an easy way to create, modify, and test context-free grammars before you integrate and deploy them with your application. 5
Grammar Editing: Edit speech SAPI 4, SAPI 5, IBM SRCL, and L&H BNF+ grammars faster with builtin intelliprompt that suggests valid grammar syntax. Grammar Compiling and Testing: Compile and test grammars with a click of a button. Review compiler messages in the output window. Speak into a microphone to test grammars or use the command line to test with recorded audio. Test SAPI 5 grammars with text strings using simulated recognition. 6
Recognition Results: View recognition results in the Output window. Recognition Events: Browse recognition events in the Events window. 7
Error Debugging: Browse compilation errors in the Error window. Click on the error to take you to the location of it in the document window. MANAGING LEXICONS A lexicon is a collection of words and information about these words used by a speech recognition engine to increase its recognition accuracy. A text-to-speech engine uses a lexicon to enhance the quality of its pronunciation. Lexicons play an important role in the accuracy of speech recognition. A speech recognition engine (i.e., recognizer) uses lexicons in the process of recognizing speech. Lexicons consist of the words that a recognizer understands and returns as recognized speech. Since it s impractical for a recognizer to maintain every possible word and context in its spoken language, you enhance the accuracy of speech recognition by extending its lexicon. Lexicons play an important role in the quality of text-to-speech playback. A text-to-speech engine (i.e., synthesizer) uses lexicons to obtain pronunciation information associated with words to generate the appropriate speech sounds for the word. For example, with a lexicon you may ensure record is pronounced correctly when used as a noun and when used as a verb. Chant LexiconKit provides you an easy way to create, delete, modify, and extend lexicons. It provides a simple way to backup and restore lexicons for distribution with your applications. 8
Lexicon Editing: Edit word pronunciations faster using XML with built-in intelliprompt that suggest valid syntax. Pronunciation Editing: Use the phoneme selection dialog to list the phonemes by speech engine and select them for editing pronunciations. 9
Pronunciation Generation: Use the Add Words dialog to generate word pronunciation entries with default pronunciations. MANAGING PROFILES Your speech recognition profile is a critical component for accurate speech recognition. It contains acoustic information that helps the speech recognition engine (i.e., recognizer) in converting your speech to text. You help the recognizer to perform its function by providing it samples of your speech through training. Training is a process of capturing and analyzing your speech in your environment. The recognizer uses the information saved from training to fine tune how it distinguishes speech from noise during speech recognition processing. Some recognizers automatically create a default speech recognition profile in case you have not created one. Some can adjust to your speech and environment during speech recognition and automatically update your profile. Chant ProfileKit provides you an easy way to create, delete, and train speaker profiles. It provides a simple way to backup and restore profiles for distribution with your applications or administering across your network of end users. 10
Profile Management: Enumerate, Add, Delete, Backup, Restore and Train speech recognition profiles with a click of a button. MANAGING SPEECH ENGINES Integrating speech technology involves managing the resources to process speech and audio. Audio devices handle the inbound recording and outbound playback of speech. Speech recognizers handle detecting speech from an audio source and converting to text. Speech synthesizers handle converting text to speech and generating audio. Chant SpeechKit provides you an easy way to manage audio devices, speech recognizers, and speech synthesizers. It provides a simple way to construct software that speaks and listens. MANAGING AUDIO DEVICES Managing audio devices is an application runtime function. Chant SpeechKit is comprised of application ready software components that handle the complexities of working with audio devices. The components minimize the programming efforts necessary to record and playback audio. 11
Chant Developer Workbench provides you powerful prototyping and testing environment to model and validate your application code for managing audio devices. Within Chant Developer Workbench, you can open an audio device enumerator to perform command line testing and trace callback events. This enables you to model and test your audio device use before, during, and after integrating code in your applications. Audio Device Management: Enumerate and test audio devices with recording and playback requests. Trace audio events in the Events window. MANAGING RECOGNIZERS Managing recognizers for speech recognition is an application runtime function. Chant SpeechKit is comprised of application ready software components that handle the complexities of speech recognition. The components minimize the programming efforts necessary to construct software that listens. Chant Developer Workbench provides you powerful prototyping and testing environment to model and validate your application code for managing recognizers. Within Chant Developer Workbench, you can open a recognizer enumerator to perform command line testing and trace callback events. This enables you to model and test your speech recognizer use before, during, and after integrating code in your applications. 12
Recognizer Management: Enumerate and test recognizers. Use the command line to invoke methods such as recognizing from prerecorded audio. Trace recognition events in the Events window. MANAGING SYNTHESIZERS Managing synthesizers for speech synthesis is an application runtime function. Chant SpeechKit is comprised of application ready software components that handle the complexities of speech synthesis. The components minimize the programming efforts necessary to construct software that speaks. Chant Developer Workbench provides you powerful prototyping and testing environment to model and validate your application code for managing synthesizers. Within Chant Developer Workbench, you can open a synthesizer enumerator to perform command line testing and trace callback events. This enables you to model and test your synthesizer use before, during, and after integrating code in your applications. 13
Synthesizer Management: Enumerate and test synthesizers. Use the command line to invoke methods such as synthesizing text from a file or audio file playback. Trace synthesis events in the Events window. Command Line Intelliprompt: Test component methods with built-in prompts for method signatures. Simply begin typing and pop-ups guide you through parameter specification. 14
Browse Events: Browse synthesis events in the Events window. Analyze event data to determine applicability of callbacks before integrating in applications. MANAGING TEXT-TO-SPEECH MARKUP A text-to-speech engine (i.e., synthesizer) uses TTS markup to enhance its ability to synthesize speech from text and generate the audio for playback. Chant VoiceMarkupKit provides you an easy way to create, modify, and test TTS markup before you integrate it with your application. 15
Marking Up Text: Highlight and click. It s that simple to markup text for enhanced speech synthesis. SSML Editing: Edit L&H Native Control Sequence, SAPI 4, SAPI 5, and W3C Speech Synthesis Markup Language (SSML) faster with built-in intelliprompt that suggest valid markup syntax. 16
TTS Playback: Playback text-to-speech markup with a click of the button. Highlight specific text or playback the entire document. MORE INFORMATION To learn more about developing software that speaks and listens, explore how easily you can manage grammars, profiles, lexicons, recognizers, synthesizers, and text-to-speech markup directly within application software you develop in the following documents: Integrate Speech Technology for Hands-free Operation, Design Grammars for High-performance Speech Recognition, Tailor Pronunciations for Maximum Clarity, Administer Speaker Profiles for Accurate Speech Recognition, and Fine-tune Speech Synthesis Using Text-to-Speech Markup. 17