A Speech Recognition and Synthesis Tool
|
|
|
- Cora Stokes
- 9 years ago
- Views:
Transcription
1 A Speech Recognition and Synthesis Tool Hala ElAarag Laura Schindler Department of Mathematics and Computer Science Stetson University DeLand, Florida, U.S.A. {helaarag, ABSTRACT Many of the new technologies designed to help worldwide communication e.g. telephones, fax machines, computers have created new problems especially among the hearing and visually impaired. A person, who has severe hearing impairments, particularly to the extent in which deafness occurs, may experience difficulties communicating over a telephone as he or she is unable to hear the recipient s responses. Conversely, someone with visual impairments would have little inconvenience using a telephone but may not be able to communicate through a computer because of the difficulties (or, in the case of blindness, impossibility) in reading the screen. The goal of this project is to develop a Speech Recognition and Synthesis Tool (SRST) that provides a solution to the communication between the hearing and visually impaired. SRST is free and does not require any additional equipment besides a computer. Additionally, SRST can be used in educational settings, regardless of students or teachers disabilities, as a teaching aid. Categories and Subject Descriptors H.5.1 [Information Systems]: Information Interfaces and Presentation Multimedia Information Systems. General Terms Performance, Human Factors Keywords Speech Synthesis, Speech recognition, Human computer Interaction, Web-based education, Tools for visually and hearing impaired. 1. INTRODUCTION For most people, communication with others is quite simple. There are many options available: telephones, mail, electronic mail, chat rooms, instant messaging, etc. However, this becomes Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SE 06, March, 10-12, 2006, Melbourne, Florida, USA Copyright /06/0004 $5.00. a more difficult task for those with disabilities. A deaf person does not have the luxury of being able to dial someone on the telephone and talk without additional equipment. Similarly, it is difficult for a blind person to communicate through mail or electronic means that require being able to see the screen. Direct communication between a deaf and blind person is almost impossible without a mediator. There currently does not appear to be much software that directly addresses communication between the blind and the deaf. Much of the technology found is quite expensive, involves additional hardware, or uses Braille rather than speech. For example, Freedom Scientific offers a package solution, which allows for communication face-to-face, over telephones, and over the Internet; however, this costs $6,705 [1]. The goal of this research is to incorporate current speech recognition (speech-to-text) and speech synthesis (text-to-speech) technology into a chat room, thus, providing a solution to communication between the deaf and blind that is both free and does not require any additional equipment besides a computer. There were several programming languages that could have been used in the creation of SRST. Ultimately, C++ was chosen because of its speed, cross-systems capabilities, and the fact that well-tested packages pertaining to speech recognition and synthesis were found to be written in C++ and C. Due to time constraints, it would have been unwise to create a speech synthesis or recognition program from scratch. A search of the Internet produced several packages, which had been written over the course of several years and involved groups of highly skilled individuals who specialized in speech recognition and synthesis. Two of the packages found, Festival [2], and Sphinx-3 [3] were incorporated into SRST. The Festival Speech Synthesis System, written in C++, offers a framework for building a speech synthesis system. Sphinx-3 provides the means for the speech recognition aspects. It is written in C so minor adjustments needed to be made for it to work well with the, otherwise, C++ code. There are several aspects of this application that must be addressed in addition to the speech synthesis and recognition. First, SRST involves a client and server with a graphical user interface (GUI) for the client. SDL_net was chosen to handle the networking while Gtkmm was used to create the GUI. Secondly, additional planned features included multiple voices so that individuals do not all sound the same, thus allowing easier recognition (at a minimum, one male voice and one female voice); easy customization to allow for phonetic pronunciations to be associated with new words or names; recognition of basic chat 45
2 lingo (i.e. emoticons, lol, rotfl, etc.); and the ability to load preexisting (or custom-made) voices. Lastly, the conversations are also savable for future review or use. In addition to allowing communication between the blind and deaf, SRST has many other applications. For example, a teacher may wear a headset with a microphone while lecturing. The lecture could then be saved as a text file that any student could access during the lecture (if they are linked through a computer) or afterwards. Additionally, the speech synthesis aspect of SRST could be used as a tool for learning phonics. In order to ensure that this software is developed in such a manner that it may be useable by those with disabilities, the researchers have opened communication with the director of Academic Resources at Stetson University who has offered her assistance as well as the assistance of current students with impaired vision or hearing to be used as a resource for testing and development of the software to ensure that the needs of the disabled are adequately met. The rest of the paper is organized as follows. Section 2 provides general knowledge of what speech synthesis and recognition entails. Some of the past works concerning applications of current speech synthesis and recognition and web-based education are described in section 3. Section 4 describes the various packages and techniques used in the implementation of SRST. Finally, Section 5 presents the conclusion of the paper. 2. BACKGROUND 2.1 Speech Synthesis As Text-to-Speech implies, speech synthesis involves two basic processes: the reading in of text and the production into sound. For simplification purposes, call these the front end and back end, respectively. First, the front end must read the text and transform any numbers or abbreviations into text. For example, lol would be changed to laugh out loud. A phonetic transcription is then ascribed to each word using text-to-phoneme (TTP) or graphemeto-phoneme (GTP) processes. How a text should be spoken, including the pitch, frequency, and length of the phonemes is determined in this stage. This makes up the symbolic linguistic representation. The back end then takes that representation and attempts to convert it into actual sound output according to the rules created in the front end.. Speech synthesis has little, if any, understanding of the actual text being read. Such software is, typically, not concerned with what a sentence or word actually means. Rather, it simply uses dictionaries or rules to make guesses as to how the text should be read. Text-to-phoneme conversion guesses the pronunciation by using either the dictionary-based approach or the rule-based approach. In the dictionary-based approach, a large dictionary of words and spellings is stored by the program and accessed at appropriate times. This method, however, is very space consuming. The other option, rule-based approach uses preset rules of pronunciation to sound out how a word should be pronounced. Most speech synthesizers use a combination of both approaches. While the previously mentioned methods certainly help the computer, it is difficult to determine pronunciation without grasping the meaning. For example, how should the front end translate 1983? If it is used in the sentence, There are 1983 students, then it would be pronounced one thousand and eightythree. However, if it was in the sentence, She was born in 1983, it is pronounced nineteen eighty-three. It is almost impossible for a computer to decipher pronunciations, especially when both pronunciations are used in the same sentence. (i.e. In 1983, 1983 ducks swam the English Channel.) Similar problems exist for words which have two pronunciations, such as read (pronounced rēd or rĕd). This has yet to be perfected, and errors are still common. 2.2 Speech Recognition Speech recognition allows a computer to interpret any sound input (through either a microphone or audio file) to be transcribed or used to interact with the computer. A speech recognition application may be used by a large amount of users without any training or may be specifically designed to be used by one user. In this speaker-dependent model, accuracy rates are typically at their highest with approximately a 98% rate (that is, getting two words in a hundred wrong) when operated under optimal conditions (i.e. a quiet room, high quality microphone, etc). Generally, modern speech recognition systems are based on the hidden Markov models (HMMS). HMMS is a statistical model which attempts to determine the hidden components by using the known parameters. For example, if a person stated that he wore a raincoat yesterday, then one would predict that it must have been raining. Using this technique, a speech recognition system may determine the probability of a sequence of acoustic data given one word (or word sequence). Then, the most likely word sequence may be determined using Baye s rule: According to this rule, for any given sequence of acoustic data (for example, an audio file or microphone input), Pr(acoustics) is a constant and, thus, ignorable. Pr(word) is the prior probability of the word according to a language modeling. [As an example, this should ensure that Pr(mushroom soup) > Pr(much rooms hope).] Pr(acoustics word) is obtained using the aformentioned HMMS [6]. While recognition of words has risen to 80-90% (depending on the location), grammar remains less focused upon and, thus, less accurate. In order to determine punctuation, it is necessary to differentiate between the stressed syllables or words in an utterance. For instance, when naturally speaking, it is easy to differentiate between Go. Go! and Go? However, most speech recognition systems solely provide what word was uttered and do not note stress or intonation. As such, this information cannot be used by the recognizer. The most frequent solution is to simply require a user to announce when and where punctuation should occur. 3. RELATED WORK It is necessary to differentiate between speech synthesis and speech recognition. While it is true that both involve a computer and its interpretation of human speech, the reversal of input and output between recognition and synthesis differs enough that 46
3 much work focuses on one, rather than simultaneously develop tools for both (i.e. Festival, Sphinx, ModelTalker[7], Ventrilo[8] ). This is exemplified by the fact that two different packages were used in the speech synthesis and recognition aspects of this project. However, a connection does exist between the two. Mari Ostendorf and Ivan Bulyko discuss how the two are intertwined and may influence the advancement of the other now and in the future. [10] The emphasis on speech recognition has shifted over the past decade from a tool to study linguistic development to emphasizing engineering techniques. Today, the concern focuses more on optimization: getting the most accurate results in the fastest amount of time. This shift has produced considerable advancement although error rates are still fairly high. For instance, the word error rates on broadcast news are 13% while conversational speech and meeting speech produces error rates of 24% and 36% respectively. These rates further increase under noisy conditions. [11] Various techniques are implemented to increase the accuracy of speech recognition including mel-cepstral speech analysis, hidden Markov modeling (HMMS), clustering techniques, n-gram models of word sequences, a beam search to choose between candidate hypotheses, acoustic models which adapt to better match a single target speaker, and multi-pass search techniques to incorporate adaptation as well as models of increased complexity.[9] Speech synthesis has adopted the search algorithms found in speech recognition. Rather than relying on a sole instance of each unit in the database, speech synthesis now often incorporates multiple instances to allow for more choices and increase the quality of the concatenate synthesizer. Selecting the unit is implemented using the Viterbi search, which is, somewhat, a reversal of the decoding process in speech recognition. Instead of finding the word or word sequence that most closely matches the audio input, unit selection search finds a sequence of acoustic input that optimally matches a given word or word sequence. Both techniques involve concatenating contextdependent phonetic subword unit into words although synthesis must also include proper lexical stress, pitch and duration. [10] There are limits to how much speech recognition can influence speech synthesis or vice-versa. Fundamentally, these are two separate and distinct problems. For instance, speech recognition must be established to work for various speakers who all have voices of varying pitches and uses different stresses. On the other hand, speech synthesis has only one steadfast source of input: text. It simply needs to produce one accurate acoustic output. Additionally, recognition and synthesis techniques depart on signal processing means. Speech recognition mostly ignores prosody and relies heavily on mel-cepstral processing. In speech synthesis, mel-cepstral processing has been proven to have a low efficacy rate while prosody the patterns of stress and intonation in a language is essential. [10] Despite the fact that there are a variety of groups working to improve speech synthesis and recognition, there does not appear to be many free applications designed to help and encourage communication. Several applications, such as Ventrilo [8], advertise themselves as a more convenient means of communication for gamers. Other applications are used for primarily commercial purposes, such as telephone calls from doctor s offices, prescription offices, and other automated services. SRST is devoted simply to communication whether it is between those who are visually and hearing impaired, used as a transcriber, or as a teaching aid in learning phonics. In the past decade there has been an increasing amount of work dedicated to web-based education. Benefits of such systems are clear: distance is no longer an issue, feedback is expedient, and assignments may be catered to specific classes or individuals. Applications such as Blackboard system have been implemented in many universities to work in conjunction with classes. However, the focus has shifted from such static environments to more adaptive ones, which would alter teachings or assign different homework according to the assessed level of the individual students. Additional web-based education systems include virtual environments, which provide a multimedia experience without the student leaving the computer, and whiteboards. Virtual environments can include field trips through museums or historical sites or scientific experiments and dissections. Interactivity allows the student to observe at his or her own speed and learn more about specific areas at a click of the mouse. Whiteboards provide a space for the user to type, draw, or present other data that can be viewed by anyone else connected to the board. In this manner, the student is not limited to simple text communication but can easily include his or her own drawings and images. Speech synthesis and recognition may also be used in web-based education to provide another means for communication and interactivity. 4. IMPLEMENTATION 4.1 GUI As a primary concern for this research is to increase the ease in which the visually and hearing impaired can communicate, it was essential for a simple and straightforward GUI to be created. Gtkmm [4] is a C++ wrapper for GTK+, a popular GUI library that is often packaged with UNIX system. It provides a basic GUI toolkit with widgets, such as windows, buttons, toolbar, etc. While the primary development for SRST is taking place in MSVC++, there has been an effort to remain open for the possibility of cross platform capabilities. Gtkmm meets this by ensuring cross-platform on such systems as Linux (gcc), Solaris (gcc, Forte), Win32 (gcc, MSVC++.Net 2003), MacOS X (gcc), and more. SRST has a simple GUI, which displays widgets for the basic text input and networking capabilities. There are two basic GUI views that exist according to whether or not the user is connected to a server. The first view serves primarily as a text editor. It consists of a window with one text area in which text can be inputted either through the keyboard or using speech recognition. This text can then be saved on to the computer. As long as the user is disconnected, text files may also be opened and displayed in the text area. Using the Options choice under the Connections menu, the user (client) is able to specify the server s IP address, port, and choose when to connect or disconnect. After the IP and port have been entered, the user then chooses connect. If the address is correct and the server is running, then the user will connect. The GUI reflects this occurrence. Rather than one window, three windows 47
4 will appear: one to type in, one showing the conversation, and another showing the names of the users who are currently online. Three buttons are also available, which allow the user to send the text, clear his or her text area, and close the application. The GUI contains a means to choose whether or not speech recognition, speech synthesis, or both are currently being used. An options window allows the user more control over what speech aspects are being used. Without such a GUI feature, a hearing disabled person may be unnecessarily using the speech synthesis part and wasting virtual memory or RAM. Additionally, the user is allowed to choose a screen name and create a profile so that chat members are differentiated. There are several other GUI features that would be useful but not necessary: choosing of font attributes stored with each user s profile, a datafile which stores all of the preferences, the ability to block users, private conversations, user-specific voices, and volume control, which are all planned to be incorporated in SRST 4.2 Networking In addition to incorporating speech recognition and synthesis, SRST acts as a chat room to allow for communication over long distances. Thus, a basic client/server network was necessary. SDL_net [5] fit the needs precisely. SDL_net is a small, sample cross-platform networking library that uses the much larger C- written SDL (Simple DirectMedia Layer). The aim of SDL_net is to allow for easy cross-platform programming and simplify the handling of network connections and data transfer. It accomplishes this through its simple and portable interface for TCP and UDP protocols. With UDP sockets, SDL_net allows half-way connections; binding a socket to an address and thus avoiding filling any outgoing packets with the destination address. A connectionless method is also provided. The current implementation of SRST uses SDL_net to open a socket which waits for a user to connect. Once the user connects, a test message is sent. If it is successfully received, then the client s GUI changes to reflect that it has successfully connected. 4.3 Speech Synthesis The Festival Speech Synthesis System was chosen to accomplish the task of speech reocgnition. Festival provides a general framework for mutilingual speech synthesis systems, although it has the most features with English. It is useable through a multitude of APIs, including C++, Java, Emacs, and through SCHEME command interpreter. [2] Three specific classes of users are targeted: speech synthesis researchers, speech application developers, and the end user. As such, Festival is open source although many of the alterations can occur through its functions rather than altering of its code. Festival was built on a Unix system and has been most thoroughly tested in such platforms. However, there is a Windows port that has had minor testing on MSCV6 and Cygwin[12] which provides a Unix-like environment for machines which run on Windows. As SRST was being built on MSVC7, there were several difficulties encountered. First, while it does have a port for MSVC6, one still requires a Linux environment to first compile and make the VCFMakefile. Cygwin provided the mediator from tar to MSVC projects. Secondly, Fesitval contains several deprecated code which had to be changed Festival has been successfully run using either Cygwin or the Visual Studio s Command Prompt. Overall, the basic commands are quite simple. For example, the command (SayText Hello world. ) successfully produces the spoken words Hello world. Festival also comes with the ability to directly read a text file with only a few commands. A variety of voices are available, which allows the user to find the most natural sounding voice to him or her. Furthemore, any produced speech may also be saved as a sound file for future use. 4.4 Speech Recognition Although SRST was coded and primarily tested on Windows XP, an effort has been made to maintain the possibility for easy cross platform capabilities. Sphinx-3 continues this trend as it is workable on GNU/Linux, UNIX variants, and Windows NT or later. Written in C++, Sphinx-3 is a product of Carnegie Mellon University and is well known in the speech recognition community for its large vocabulary and, relatively, timely results. It includes both an acoustic trainer and various decoders (e.g. text recognition, phoneme recognition, N-best list generation). [13] Sphinx-3 uses phonetic units to determine and build a word pronunciation. Two types of output may be produced: a recognition hypothesis and a word lattice. The recognition hypothesis consists of the best recognition result for each utterance processed. A word lattice provides a list of all possible word candidates that were recognized during the decoding of an utterance. [13]. It is very useful for determining whether an utterance was clearly spoken or not as, in theory, the utterance should appear somewhere in the word lattice although it may not appear in the recognition hypothesis. In order to test that it has successfully been installed and allow a sample program of how to incorporate it, Sphinx-3 comes with a sphinx3-simple.bat, which allows the user to practice its speech recognition abilities using a limited vocabulary. A user who has never used a speech recognition program or has not used the SphinxTrain yet can expect 30-40% successful speech results. One early test of the recognition process involved stating, Start recording. One two three four five six. This produced START THREE CODE R E Y TWO THREE FOUR I SIXTH as Sphinx- 3 s hypothesis as to what was spoken. For the most part, it is evident where these mistakes may have come from. SIXTH is very close to six. In fact, the added -th is, most likely, a misinterpretation of the ending pause and possible background noise that was picked up. Sphinx-3 attempts to eliminate silences, coughs, and white noise, but it is not always perfectly accurate. Additionally, THREE CODE vaguely resembles record. R E Y, however, does not resemble any portion of recording or one, thus exemplifying the imperfection of the untrained recognition system. Through the Sphinx trainer, the accuracy rate drastically increased to over double its initial rate. Besides for occasional misrecognized words, Sphinx-3 has several other limitations. First of all, without a segmenter, Sphinx-3 cannot be used in utterances longer than 300 seconds. Secondly, Sphinx-3 does not recognize capitalization. All spoken words are transformed into entirely capitalized hypotheses. Thus, a separate grammar correction portion of SRST is necessary to ensure correct capitalization. Punctuation is also omitted. However, it may be possible to tamper with the filler penalty file, which differentiates stutters or silences from words, such that prolonged pauses signify periods. 48
5 Additionally, the dictionaries of both Sphinx-3 and Festival can be easily edited to include common chat lingo, emoticons, and text that, outside of a chat room setting, would be considered gibberish. 5. CONCLUSION Vigorous searching of the internet produced a variety of applications or packages which incorporated speech synthesis and recognition. However, few seemed to incorporate both. Furthermore, some had not been updated for several years or had limited testing or documentation and thus were not particularly useful in studying the technologies and developing the application. The successfulness of such applications (as measured by the accuracy rating) is currently mediocre in normal conditions. Under optimal conditions (often requiring quiet isolation) significant increases of accuracy ratings were observed in speech recognition. Speech synthesis had high accuracy; however, many of the noncommercial voices still have a computerized, abnormal sound with slight imperfections in pitch, intensity, or duration. SRST uses two popular speech recognition and synthesis packages Sphinx-3 and Festival, respectively to ensure accurate and well-tested results. Basic networking allows for a simple text chatting. Conversion using recognition or synthesis occurs at the user end and is toggled by the user so that either feature may be turned off if they are not needed. While it has currently only been tested on Windows XP, all of the packages and implementations have been tested on Linux systems, and, thus, should be cross-platform with little, if any, changes. Through SRST, a simple solution is provided to communication between the hearing and visually impaired that is free and does not require any additional equipment besides a computer. Additionally, SRST may also be used in educational settings, regardless of students or teachers disabilities, as a teaching aid. Thus, it may be employed for web-based education purposes. 6. REFERENCES [1] Freedom Scientific, "FSTTY Deaf-Blind Solution Pricing," asp [2] Black, Adam, et al., Festival Speech Synthesis System, [3] The CMU Sphinx Group Open Source Speech Recognition Engines, [4] Cumming, Murray, et al. Gtkmm the C++ Interface to GTK+, [5] Lantinga, Sam, Masahiro, Minami, and Wood, Roy. SDL_net Documentation Homepage, [6] McTear M. Spoken Dialogue Technology: Enabling the Conversational User Interface, ACM computing survey, Vol 34, Issue 1, March [7] ModelTalker Synthesizer, Speech Research Lab, [Online], Available: [8] Ventrilo Surround Sound Voice Communication Software, [9] Brusilovsky, Peter. Adaptive and intelligent technologies for web-based education. Special Issue on Intelligent Systems and Teleteaching, (Kunstliche Intelligenz), 4: , [10] Bulyko, Ivan and Ostendorf, Mari. The Impact of Speech Recognition on Speech Synthesis, Proceedings of 2002 IEEE Workshop on Speech Synthesis, [11] A. Le et al., The 2002 NIST RT Evaluation Speech-to-Text Results, Proc. RT02 Workshop, Available: [12] Cygwin Information and Installation, [13] Ravishankar, Mosur K. Sphinx-3 s3.x Decoder (X=5), 49
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Develop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
Things to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
COMPUTER TECHNOLOGY IN TEACHING READING
Лю Пэн COMPUTER TECHNOLOGY IN TEACHING READING Effective Elementary Reading Program Effective approach must contain the following five components: 1. Phonemic awareness instruction to help children learn
Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
Text-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
Dragon Solutions Enterprise Profile Management
Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible
The Impact of Using Technology in Teaching English as a Second Language
English Language and Literature Studies; Vol. 3, No. 1; 2013 ISSN 1925-4768 E-ISSN 1925-4776 Published by Canadian Center of Science and Education The Impact of Using Technology in Teaching English as
Speech Analytics. Whitepaper
Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the
Longman English Interactive
Longman English Interactive Level 2 Orientation (English version) Quick Start 2 Microphone for Speaking Activities 2 Translation Setting 3 Goals and Course Organization 4 What is Longman English Interactive?
Website Accessibility Under Title II of the ADA
Chapter 5 Website Accessibility Under Title II of the ADA In this chapter, you will learn how the nondiscrimination requirements of Title II of 1 the ADA apply to state and local government websites. Chapter
Course Syllabus My TOEFL ibt Preparation Course Online sessions: M, W, F 15:00-16:30 PST
Course Syllabus My TOEFL ibt Preparation Course Online sessions: M, W, F Instructor Contact Information Office Location Virtual Office Hours Course Announcements Email Technical support Anastasiia V. Mixcoatl-Martinez
SPeach: Automatic Classroom Captioning System for Hearing Impaired
SPeach: Automatic Classroom Captioning System for Hearing Impaired Andres Cedeño, Riya Fukui, Zihe Huang, Aaron Roe, Chase Stewart, Peter Washington Problem Definition Over one in seven Americans have
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist ([email protected]) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
Quick Start Guide: Read & Write 11.0 Gold for PC
Quick Start Guide: Read & Write 11.0 Gold for PC Overview TextHelp's Read & Write Gold is a literacy support program designed to assist computer users with difficulty reading and/or writing. Read & Write
Dragon Medical Practice Edition v2 Best Practices
Page 1 of 7 Dragon Medical Practice Edition v2 Best Practices 1. Hardware 2. Installation 3. Microphones 4. Roaming User Profiles 5. When (and how) to Make Corrections 6. Accuracy Tuning Running the Acoustic
Philips 9600 DPM Setup Guide for Dragon
Dragon NaturallySpeaking Version 10 Philips 9600 DPM Setup Guide for Dragon Philips 9600 DPM Setup Guide (revision 1.1) for Dragon NaturallySpeaking Version 10 as released in North America The material
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
Thirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
UNIVERSAL DESIGN OF DISTANCE LEARNING
UNIVERSAL DESIGN OF DISTANCE LEARNING Sheryl Burgstahler, Ph.D. University of Washington Distance learning has been around for a long time. For hundreds of years instructors have taught students across
VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
Voice-Recognition Software An Introduction
Voice-Recognition Software An Introduction What is Voice Recognition? Voice recognition is an alternative to typing on a keyboard. Put simply, you talk to the computer and your words appear on the screen.
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Graphical Environment Tool for Development versus Non Graphical Development Tool
Section 4 Computing, Communications Engineering and Signal Processing & Interactive Intelligent Systems Graphical Environment Tool for Development versus Non Graphical Development Tool Abstract S.Daniel
VoIP Conferencing Best Practices. Ultimate Guide for Hosting VoIP Conferences. A detailed guide on best practices for VoIP conferences:
VoIP Conferencing Best Practices Ultimate Guide for Hosting VoIP Conferences A detailed guide on best practices for VoIP conferences: 1. Setting Up Your Hardware 2. VoIP Conference Software and Its Settings
Industry Guidelines on Captioning Television Programs 1 Introduction
Industry Guidelines on Captioning Television Programs 1 Introduction These guidelines address the quality of closed captions on television programs by setting a benchmark for best practice. The guideline
Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
IP Office Contact Center R9.0 Interactive Voice Response Voluntary Product Accessibility Template (VPAT)
IP Office Contact Center R9.0 Interactive Voice Response Voluntary Product Accessibility Template (VPAT) The IP Office Contact Center solution consists of a suite of software applications. The statements
Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.
matrix Recognition accuracy Recognition speed System configuration Turns your voice into text with up to 99% accuracy New - Up to a 15% improvement to out-of-the-box accuracy compared to Dragon version
Lecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
Servicom G.R.I.P. Enabling Global Push-to-Talk over BGAN and Fleet Broadband Version 01 30.09.11
Servicom G.R.I.P. Enabling Global Push-to-Talk over BGAN and Fleet Broadband Version 01 30.09.11 Contents 1 Overview... 1 2 Background... 1 3 Key Features... 2 4 Typical Users... 2 5 Benefits to BGAN and
Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web
Utilizing Automatic Speech Recognition to Improve Deaf Accessibility on the Web Brent Shiver DePaul University [email protected] Abstract Internet technologies have expanded rapidly over the past two
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
Introduction to Wireshark Network Analysis
Introduction to Wireshark Network Analysis Page 2 of 24 Table of Contents INTRODUCTION 4 Overview 4 CAPTURING LIVE DATA 5 Preface 6 Capture Interfaces 6 Capture Options 6 Performing the Capture 8 ANALYZING
Making Distance Learning Courses Accessible to Students with Disabilities
Making Distance Learning Courses Accessible to Students with Disabilities Adam Tanners University of Hawaii at Manoa Exceptionalities Doctoral Program Honolulu, HI, USA [email protected] Kavita Rao Pacific
An Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
Avaya DECT R4 Telephones Models 3720, 3725, 3740, 3745 and 3749
Avaya DECT R4 Telephones Models Voluntary Product Accessibility Template (VPAT) The DECT ( Digital Enhanced Cordless Technology ) standard originated in Europe as a replacement for earlier cordless telephone
Avaya Model 9608 H.323 Deskphone
Avaya Model 9608 H.323 Deskphone Voluntary Product Accessibility Template (VPAT) The statements in this document apply to Avaya Model 9608 Deskphones only when they are configured with Avaya one-x Deskphone
Voice Messaging. Reference Guide
Voice Messaging Reference Guide Table of Contents Voice Messaging 1 Getting Started 3 To Play a Message 4 To Answer a Message 5 To Make a Message 6 To Give a Message 7 Message Addressing Options 8 User
To ensure you successfully install Timico VoIP for Business you must follow the steps in sequence:
To ensure you successfully install Timico VoIP for Business you must follow the steps in sequence: Firewall Settings - you may need to check with your technical department Step 1 Install Hardware Step
Network operating systems typically are used to run computers that act as servers. They provide the capabilities required for network operation.
NETWORK OPERATING SYSTEM Introduction Network operating systems typically are used to run computers that act as servers. They provide the capabilities required for network operation. Network operating
Reading Assistant: Technology for Guided Oral Reading
A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director
Designing a Graphical User Interface
Designing a Graphical User Interface 1 Designing a Graphical User Interface James Hunter Michigan State University ECE 480 Design Team 6 5 April 2013 Summary The purpose of this application note is to
Pronunciation in English
The Electronic Journal for English as a Second Language Pronunciation in English March 2013 Volume 16, Number 4 Title Level Publisher Type of product Minimum Hardware Requirements Software Requirements
Welcome to The Grid 2
Welcome to 1 Thanks for choosing! These training cards will help you learn about, providing step-by-step instructions for the key skills you will need and introducing the included resources. What does
9RLFH$FWLYDWHG,QIRUPDWLRQ(QWU\7HFKQLFDO$VSHFWV
Université de Technologie de Compiègne UTC +(8',$6
EUROPEAN TECHNOLOGY. Optimize your computer classroom. Convert it into a real Language Laboratory
Optimize your computer classroom. Convert it into a real Language Laboratory WHAT IS OPTIMAS SCHOOL? INTERACTIVE LEARNING, COMMUNICATION AND CONTROL Everything combined in the same intuitive, easy to use
Intelligent Human Machine Interface Design for Advanced Product Life Cycle Management Systems
Intelligent Human Machine Interface Design for Advanced Product Life Cycle Management Systems Zeeshan Ahmed Vienna University of Technology Getreidemarkt 9/307, 1060 Vienna Austria Email: [email protected]
Speech Recognition Software Review
Contents 1 Abstract... 2 2 About Recognition Software... 3 3 How to Choose Recognition Software... 4 3.1 Standard Features of Recognition Software... 4 3.2 Definitions... 4 3.3 Models... 5 3.3.1 VoxForge...
How to Develop Accessible Linux Applications
Sharon Snider Copyright 2002 by IBM Corporation v1.1, 2002 05 03 Revision History Revision v1.1 2002 05 03 Revised by: sds Converted to DocBook XML and updated broken links. Revision v1.0 2002 01 28 Revised
Chapter 3. Basic Application Software. McGraw-Hill/Irwin. Copyright 2008 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 3 Basic Application Software McGraw-Hill/Irwin Copyright 2008 by The McGraw-Hill Companies, Inc. All rights reserved. Competencies (Page 1 of 2) Discuss common features of most software applications
To help manage calls:
Mobile Phone Feature Definitions To help manage calls: Call waiting and call hold Allows you to accept a second incoming call with out losing the original call, then switch back and forth between them.
Audio and Web Conferencing
DATA SHEET MITEL Audio and Web Conferencing Simple, Cost-effective Audio and Web Conferencing Mitel Audio and Web Conferencing (AWC) is a simple, cost-effective and scalable audio and web conferencing
Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA
Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows
MICROSOFT WORD (2003) FEATURES
MICROSOFT WORD (2003) FEATURES There are many features built into Word 2003 that support learning and instruction. Several of these features are also supported in earlier versions of Word. This hands-on
Functions of NOS Overview of NOS Characteristics Differences Between PC and a NOS Multiuser, Multitasking, and Multiprocessor Systems NOS Server
Functions of NOS Overview of NOS Characteristics Differences Between PC and a NOS Multiuser, Multitasking, and Multiprocessor Systems NOS Server Hardware Windows Windows NT 4.0 Linux Server Software and
Summary Table Voluntary Product Accessibility Template
PLANTRONICS VPAT 7 Product: Call Center Hearing Aid Compatible (HAC) Polaris Headsets Over the Head Noise Canceling: P161N, P91N, P51N, P251N Over the Head Voice Tube: P161, P91, P51, P251 Over the Ear
Nuance PDF Converter Enterprise 8
8 Date: June 1 st 2012 Name of Product: 8 Contact for more Information: http://nuance.com/company/company-overview/contactus/index.htm or http://nuance.com/company/company-overview/companypolicies/accessibility/index.htm
Lone Star College System Disability Services Accommodations Definitions
Lone Star College System Disability Services Accommodations Definitions This document is intended to provide both students and employees with a better understanding of the most common reasonable accommodations
Summary Table Voluntary Product Accessibility Template. Criteria Supporting Features Remarks and explanations
Plantronics VPAT 1 Product: Call Center Hearing Aid Compatible (HAC) Headsets Operated with Amplifier Models M12, MX10, P10, or SQD: Over the Head Noise Canceling: H161N, H91N, H51N, H251N Over the Head
An Avatar Based Translation System from Arabic Speech to Arabic Sign Language for Deaf People
International Journal of Information Science and Education. ISSN 2231-1262 Volume 2, Number 1 (2012) pp. 13-20 Research India Publications http://www. ripublication.com An Avatar Based Translation System
Avaya Model 1408 Digital Deskphone
Avaya Model 1408 Digital Deskphone Voluntary Product Accessibility Template (VPAT) The statements in this document apply to Avaya Model 1408 Digital Deskphones only when they are used in conjunction with
Desktop Reference Guide
Desktop Reference Guide 1 Copyright 2005 2009 IPitomy Communications, LLC www.ipitomy.com IP550 Telephone Using Your Telephone Your new telephone is a state of the art IP Telephone instrument. It is manufactured
Voice Driven Animation System
Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take
Mac Built-in Accessibility (10.7 - Lion) - Quick Start Guide
Mac Built-in Accessibility (10.7 - Lion) - Quick Start Guide Overview The Mac operating system has many helpful features to help users with a wide range of abilities access their computer. This Quickstart
Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage
Outline 1 Computer Architecture hardware components programming environments 2 Getting Started with Python installing Python executing Python code 3 Number Systems decimal and binary notations running
Use of ICT in National Literacy Units: Reading
Use of ICT in National Literacy Units: Reading This edition: November 2014, version 1.1 Published by the Scottish Qualifications Authority The Optima Building, 58 Robertson Street, Glasgow G2 8DQ Lowden,
INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS
INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS By Dr. Kay MacPhee President/Founder Ooka Island, Inc. 1 Integrating the Common Core Standards into Interactive, Online
An Overview of Cisco IP Communicator
CHAPTER 1 An Overview of Cisco IP Communicator Cisco IP Communicator is a software-based application that allows users to place and receive phone calls using their personal computers. Cisco IP Communicator
Avaya Speech Analytics Desktop Client 2.0
Avaya Speech Analytics Desktop Client 2.0 Voluntary Product Accessibility Template (VPAT) Avaya Speech Analytics Desktop Client is a thick client desktop application for the Microsoft Windows operating
Dragon Solutions. Using A Digital Voice Recorder
Dragon Solutions Using A Digital Voice Recorder COMPLETE REPORTS ON THE GO USING A DIGITAL VOICE RECORDER Professionals across a wide range of industries spend their days in the field traveling from location
Online Recruitment - An Intelligent Approach
Online Recruitment - An Intelligent Approach Samah Rifai and Ramzi A. Haraty Department of Computer Science and Mathematics Lebanese American University Beirut, Lebanon Email: {samah.rifai, [email protected]}
SUMMARY TABLE VOLUNTARY PRODUCT ACCESSIBILITY TEMPLATE
Date: 1 May 2009 Name of Product: Polycom VVX1500 Telephone Company contact for more Information: Ian Jennings, [email protected] Note: This document describes normal operational functionality.
ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
Blackboard Instant Messenger: Virtual Office Hours
Blackboard Instant Messenger: Virtual Office Hours Faculty can now conduct Virtual Office Hours through Blackboard Instant Messenger. To begin with, the Virtual Office Hours on BbIM are a convenient way
Adobe Connect Quick Guide
Leicester Learning Institute Adobe Connect Quick Guide Request an account If you want to publish materials to Adobe Connect or run online meetings or teaching sessions, contact the IT Service Desk on 0116
Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.
Objectives To describe the services an operating system provides to users, processes, and other systems To discuss the various ways of structuring an operating system Chapter 2: Operating-System Structures
Developing accessible portals and portlets with IBM WebSphere Portal
Developing accessible portals and portlets with IBM WebSphere Portal Level: Introductory IBM Human Ability and Accessibility Center Austin, Texas February, 2006 Copyright International Business Machines
COMPARATIVE ANALYSIS OF COMPUTER SOFTWARE AND BRAILLE LITERACY TO EDUCATE STUDENTS HAVING VISUAL IMPAIRMENT
COMPARATIVE ANALYSIS OF COMPUTER SOFTWARE AND BRAILLE LITERACY TO EDUCATE STUDENTS HAVING VISUAL IMPAIRMENT Ismat Bano PhD. Scholar, University of Education Lahore Syed Abir Hassan Naqvi Lecturer, IER,
Hosted Fax Mail. Hosted Fax Mail. User Guide
Hosted Fax Mail Hosted Fax Mail User Guide Contents 1 About this Guide... 2 2 Hosted Fax Mail... 3 3 Getting Started... 4 3.1 Logging On to the Web Portal... 4 4 Web Portal Mailbox... 6 4.1 Checking Messages
Well, now you have a Language Lab! www.easylab.es
Do you have a computer lab? Well, now you have a Language Lab! What exactly is easylab? easylab can transform any ordinary computer laboratory into a cuttingedge Language Laboratory. easylab is a system
VPAT 1 Product: Call Center Hearing Aid Compatible (HAC) Headsets Operated with Amplifier Models M12, MX10, P10, or SQD:
Plantronics VPAT 1 Product: Call Center Hearing Aid Compatible (HAC) Headsets Operated with Amplifier Models M12, MX10, P10, or SQD: Over the Head Noise Canceling: H161N, H91N, H51N Over the Head Voice
SPAMfighter Mail Gateway
SPAMfighter Mail Gateway User Manual Copyright (c) 2009 SPAMfighter ApS Revised 2009-05-19 1 Table of contents 1. Introduction...3 2. Basic idea...4 2.1 Detect-and-remove...4 2.2 Power-through-simplicity...4
FiliText: A Filipino Hands-Free Text Messaging Application
FiliText: A Filipino Hands-Free Text Messaging Application Jerrick Chua, Unisse Chua, Cesar de Padua, Janelle Isis Tan, Mr. Danny Cheng College of Computer Studies De La Salle University - Manila 1401
About. IP Centrex App for ios Tablet. User Guide
About IP Centrex App for ios Tablet User Guide December, 2015 1 2015 by Cox Communications. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic,
OPC UA vs OPC Classic
OPC UA vs OPC Classic By Paul Hunkar Security and Communication comparison In the world of automation security has become a major source of discussion and an important part of most systems. The OPC Foundation
Grand Valley State University Disability Support Services Guide to Assistive Technology
Grand Valley State University Disability Support Services Guide to Assistive Technology List of Assistive Technology Solutions FTP Access to Network Storage Prepared by Jeff Sykes, Assistive Technology
Adobe Connect Support Guidelines
THINK TANK Online Services Adobe Connect Support Guidelines Page 1 Contents Introduction... 4 What is Adobe Connect?... 4 Adobe Connect Usage Quick Guide... 4 Items Required for Accessing Think Tank Online
Enterprise Messaging, Basic Voice Mail, and Embedded Voice Mail Card
MITEL Enterprise Messaging, Basic Voice Mail, and Embedded Voice Mail Card User Guide Notice This guide is released by Mitel Networks Corporation and provides information necessary to use Mitel voice
Lync TM Phone User Guide Polycom CX600 IP Phone
The Polycom CX600 IP (Internet Protocol) phone is a full-featured unified communications desktop phone, optimized for use with Microsoft Lync environments. It features Polycom HD Voice technology for crystal-clear
DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION
1 Recognition Accuracy Turns your voice into text with up to 99% accuracy NEW - Up to a 20% improvement to out-of-the-box accuracy compared to Dragon version 11 Recognition Speed Words appear on the screen
VPAT for Apple MacBook Pro (Late 2013)
VPAT for Apple MacBook Pro (Late 2013) The following Voluntary Product Accessibility information refers to the Apple MacBook Pro (Late 2013). For more information on the accessibility features of this
