Design Grammars for High-performance Speech Recognition
Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered trademarks of Chant Inc. Other marks are trademarks or registered trademarks of their respective holders.
Design Grammars for High-performance Speech Recognition A speech recognition grammar is a collection of rules comprised of words and phrases to be recognized from speech. A speech recognition engine (i.e., recognizer) uses a grammar to enhance its ability to recognize specific combinations of spoken words and phrases. With dictation recognition, a recognizer matches from all the word possibilities in a large dictionary and asserts contextual analysis to ensure it returns the correct word (i.e., spelling) for homonyms (e.g., right or write). Unlike dictation recognition, grammar recognition is context-free. A recognizer only matches against the rule definitions in the grammar. Context-free grammar recognition enables your applications to capture data very efficiently. Grammars also enable your applications to assert domain constraints to elevate data capture accuracy automatically. WHAT IS GRAMMAR MANAGEMENT? Grammar management enables you to: customize and tailor grammars in your development environment, compile grammars before application deployment, and integrate grammar generation and compilation as part of your deployed application. This provides your application added flexibility to run with information unknown until configuration time or runtime and to work with available technology on the deployed system. 3
WHAT IS GRAMMARKIT? Chant GrammarKit is comprised of application ready software components that handle the complexities of generating, compiling, and persisting the compiled grammar binary. It simplifies the process of managing grammars declared with IBM SRCL (IBM ViaVoice), Microsoft SAPI 4 Grammar Text File, Microsoft SAPI 5 XML Grammar, or Nuance BNF+ (VoCon 3200), Java Speech Grammar Format (JSGF), W3C ABNF, and W3C XML grammar syntax to use with your favorite speech recognizer. GrammarKit includes ActiveX, C++, C++Builder, Delphi, Java,.NET Framework, Silverlight, and Web component library formats to support all your programming languages and provides sample projects for popular IDEs such as the latest Visual Studio 2010 from Microsoft. The component libraries can be integrated with 32-bit, 64-bit, and mobile applications. GRAMMARKIT FEATURES The goal of good grammar design is to maximize application performance. With GrammarKit you can: Generate syntax-independent and -specific grammars. Compile grammar source from buffer, file, resource, stream, and string formats. Persist compiled grammar binary to buffer, file, and stream formats. Generate pronunciation phonemes from SAPI 4, SAPI 5, and VoCon 3200 recognizers. Dynamically switch among grammar compilers and syntax formats. Chant GrammarKit is comprised of software components that handle the complexities of constructing, compiling, and persisting grammars. This allows you to distribute compiled grammar binary files with your application, generate and compile grammars as part of your deployed application, and optimize grammar enablement at runtime by using compiled binary files. Recognizers have their own syntax for expressing grammars. GrammarKit supports the following recognizers and their grammar syntax: 4
Recognizer Speech API Grammar Syntax Nuance Dragon NaturallySpeaking V6 - V9 (all languages) SAPI 4 IBM ViaVoice (all languages) SMAPI IBM SRCL SAPI 4 Grammar Text File Microsoft SAPI 4 (all languages) SAPI 4 SAPI 4 Grammar Text File Microsoft SAPI 5 (all languages) SAPI 5 SAPI 5 XML Grammar, W3C SRGS XML Nuance VoCon 3200 (all languages) VoCon 3200 Nuance BNF+ V1.0, V1.1, V2.0, W3C SRGS, ABNF, Java Speech Grammar Format (JSGF) GRAMMAR MANAGEMENT COMPONENT ARCHITECTURE The GrammarKit component library includes a grammar management class that provides you a simple way to generate and compile speech recognition grammars. Your application can build and compile grammars as part of its runtime operation to enable real-time customization and tailoring of your speech recognition environment. The grammar management class, ChantGM, enables you to build a grammar independent of grammar syntax. Your application uses the ChantGrammar and adjunct classes to construct and modify grammar objects as needed and generate compiler-specific syntax on demand. With the ChantGM class, you can select a grammar compiler, compile the grammar, and optionally persist the compiled grammar binary. Your application uses the ChantGM class to manage the activities for compiling the grammar on behalf of your application. The ChantGM class manages the resources and interacts directly with the applicable grammar compiler. It supports the following grammar syntax: IBM Speech Recognition Control Language (SRCL), Microsoft SAPI 4 context-free grammar, Microsoft SAPI 5 XML grammar, Nuance VoCon 3200 BNF+ V1.0, V1.1, V2.0, Java Speech Grammar Format (JSGF), W3C SRGS ABNF, and W3C SRGS XML. Your application receives compiled grammar binary, warnings, and error messages through event callbacks. The ChantGM class encapsulates all of the technologies necessary to make the process of building and compiling grammars simple and efficient for your application. Optionally, it can persist the grammar binary across application invocations. 5
Your Application ChantGM Dragon SAPI 4 SAPI 5 SMAPI VoCon SAPI 4 CFG SAPI 4 CFG SAPI 5 XML W3C XML IBM SRCL Nuance BNF+ JSGF W3C ABNF The ChantGM class simplifies the process of building and compiling grammars by handling the lowlevel activities directly with the grammar compiler. You instantiate a ChantGM class object before you want to build or compile a grammar within your application. You destroy the ChantGM class object and release its resources when you no longer want to compile grammars within your application. The GrammarKit management component is designed to provide you a lot of flexiblity and minimize the programming necessary to manage the construction and compilation of your grammars. Your grammar source can be in a variety of formats (e.g., buffer, stream, and file) and your compiled binary can be save to a variety of formats (e.g., buffer, stream, and file). To simply compile your grammar and determine if there are any errors, all you need to do is pass the name of your grammar file source. You may optionally provide compiler-specific options to use when compiling your grammar and indicate whether the compilation process is synchronous or asynchronous. You can instantiate syntax-independent grammar objects from which you can generate compilerspecific syntax. These objects support generic and syntax-specific definitions that enable you to tailor grammars to leverage features across recognizers. 6
MORE INFORMATION To learn more about developing software that speaks and listens, explore how easily you can manage grammars, profiles, lexicons, recognizers, synthesizers, and text-to-speech markup directly within application software you develop in the following documents: Develop Software That Speaks and Listens, Integrate Speech Technology for Hands-free Operation, Tailor Pronunciations for Maximum Clarity, Administer Speaker Profiles for Accurate Speech Recognition, and Fine-tune Speech Synthesis Using Text-to-Speech Markup. 7