FiliText: A Filipino Hands-Free Text Messaging Application
|
|
|
- Ella Palmer
- 10 years ago
- Views:
Transcription
1 FiliText: A Filipino Hands-Free Text Messaging Application Jerrick Chua, Unisse Chua, Cesar de Padua, Janelle Isis Tan, Mr. Danny Cheng College of Computer Studies De La Salle University - Manila 1401 Taft Avenue, Manila (63) [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT This research aims to create a hands-free text messaging application capable of recognizing the Filipino language which will allow users to send text messages using speech. Using this research, other developers may be able to study further on Filipino speech recognition and its application to Filipino text messaging. Keywords Speech Recognition, Filipino Language, Text Messaging 1. INTRODUCTION Texting while driving has been a problem in most countries since the late 2000s. The Philippine National Police (PNP) reported about 15,000 traffic accidents in 2006, averaging on 41 accidents per day. It is concluded that most accidents are caused by error on the part of the driver. Additionally, traffic accidents caused by cellphone use while driving represented the highest increase among the causes of traffic accidents [2]. According to Bonabente (2008), `The Automobile Association Philippines (AAP) has called for an absolute ban on use of mobile phones while driving, saying it was the 12th most common cause of traffic accidents in the country in 2006.' AAP said that the using cell phones, even hands-free sets, while driving could impair the driver's attention and could lead to accidents. An existing software application that helps people use words to command their phones what to do is Vlingo. It is an intelligent voice application that is capable of doing a lot of things other than just allowing users to text while driving [3]. It is also a multiplatform application which is available for Apple, Android, Nokia, Blackberry, and Windows Mobile. There is another software developed by Clemson University called VoiceText. It allows the driver to send text messages while keeping their eyes on the road. Drivers using VoiceText put their mobile phones in Bluetooth mode and connect it to their car. It is through the car's speakers system or through a Bluetooth headset thar drivers are able to give a voice command and deliver a text message. StartTalking is another existing software from AdelaVoice that allows the user to initiate, compose, review, edit and send a text message entirely by voice command. However, this certain application is only available for Android 2.0 and above [4]. There are other applications that are similar to the ones mentioned above but they all have the same purpose - to help lessen the cases of car accidents caused by distracted driving. 2. SIGNIFICANCE OF RESEARCH Western countries have started to develop hands free texting applications that has helped in reducing the number of car accidents caused by texting while driving and some of these have capabilities to understand Chinese. However, the Philippines, which is considered the text capital of the world, still has no such applications that prevent drivers from texting while driving mainly because there are no Filipino language capabilities on the existing applications so far. There are party-lists and organizations that support this kind of law. Buhay party-list filed a bill seeking to penalize persons caught using their mobile phones on the road. The cities of Manila, Makati and Cebu has successfully banned this kind of act on the paper, however, it has not been properly enforced [1]. Even when there is a law that bans Filipinos from using their mobile phones while driving, it has not been strictly implemented and there are only a few ways of knowing whether a person is really following this law. The development of a local version of an existing hands-free text messaging application, Vlingo InCar, delivers an alternative for Filipinos. This service aims to keep driver's hands on the wheel and keep concentration solely to his environment. The Philippines, known as the Texting Capital in the World, may have lesser traffic accidents in the future when drivers have been granted to use hands-free mobile phones on the road rather than having to glimpse and read a text message from one of his contacts. Hands-free text messaging is not only helpful in restricting drivers from using their hands to text while driving, but it may also be used by physically disabled individual with normal speech or simply those who are used to multitasking or for some cases where a person may need to use both hands to perform an activity. An application of this would be for those busy businessmen and women who need to do a lot of things in a short amount of time due to the load of work they need to finish. Having such an application would be helpful in their daily routine because they no longer need to use their hands in sending an urgent message to their colleagues while attending to other urgent matters. Alongside with this useful application, this research will be able to shine a light on speech recognition of the Filipino language, a topic that is lacking research and in depth analysis. When deeper studies have been conducted, it could be used as a stepping stone for various studies in the future involving Filipino speech recognition. 58 Proceedings of the 8th National Natural Language Processing Research Symposium, pages De La Salle University, Manila, November 2011
2 3. RELATED LITERATURE 3.1 Filipino Text Messaging Language Manila's lingua franca was used as the base for the Philippine's national language, Filipino, and this is commonly used in the urban areas of the country and it is also spreading fast across the country [6]. Tagalog is the structural base of the Filipino language and it was commonly spoken in Manila and the provinces of Rizal, Cavite, Laguna, and many more. As conducted on a study by Shane Snow, the Philippine is still considered the Text Capital of the World. With the constraint of Short Messaging System (SMS) of 160 characters or less to send a message, people learned how to shorten what they wanted to say that is now referred to as text speak. One simple way of shortening a message is by taking out all the vowels; however, this does not work for some words because it gives out an ambiguous feel for words that have similar consonant sequences. Phonetics or how the word sounds like also plays a role in shortening messages for Filipino texters [7]. Examples for such are Dito na kami becomes D2 n kmi and Kumain na ako becomes Kumain n ko. 3.2 Speech to Text Libraries Speech-to-Text systems are already available as desktop applications, and some of these systems give out their APIs and/or libraries for those who want to use their system to create a new desktop application. Some of these mentions systems are CMUSphinx, Android Speech Input, Java Speech API and SpinVox Create. Among all the APIs and libraries available, CMU Sphinx is the most appropriate. CMUSphinx is giving out their toolkit CMUSphinx Toolkit' which comes with various tools used to build speech applications, these tools include a recognizer library, a support library, language model and acoustic model training tools, and a decoder for speech recognition research, are just some of the tools offered by CMUSphinx. It also has a library for mobile support called as PocketSphinx. CMUSphinx can also generate its own pronunciation dictionary with the help of an existing dictionary as a basis, but the pronunciation generation code only supports English and Mandarin. 3.3 CMU Sphinx-4 Sphinx-4 is a Java-based, open source and automatic speech recognition system [8]. Sphinx-4 is made up of 3 core modules, namely, the FrontEnd, the Linguist, and the Decoder. The Decoder module is the central module, which takes in the output of the FrontEnd module and the Linguist module. From their output, it generates its results, which it passes to the calling application. The Decoder module has a single module, the SearchManager, which it uses to recognize a set of frames of features. The SearchManager is not limited to any single search algorithm, and its capabilities are further extended due to the design of the FrontEnd module. The FrontEnd module is responsible for the digital signal processing. It takes in one or more input signals and parameterizes them into features which it then passes these features off to the Decoder module. Finally, there is the Linguist module, which is responsible for generating the SearchGraph. The decoder module compares the features from the FrontEnd module against this SearchGraph to generate its results. The Linguist model is made up of 3 submodules: the AcousticModel, the Dictionary, and the LanguageModel. The AcousticModel module is responsible for the mapping between units of speech and their respective hidden Markov models. The LanguageModel module provides implementations which represent the word-level language structure. The dictionary dictates how words in the LanguageModel are pronounced. CMU Sphinx can also use a language model that is made by a user. With this language model users can create a grammar that is suitable for their own language and with the help of an acoustic model, a user can fully utilize the language model patterned after their own native language. 4. SYSTEM DESIGN 4.1 Overview FiliText is an application for desktop computers designed especially for the Filipinos. It serves as a stepping stone of future developers to create a hands free texting application for mobile phones. FiliText is a system that accepts audio files, specifically in a Waveform Audio File Format or WAV, as an input and processes it through a speech recognition API to convert the message into a text. The conversion of the message will be generated after the user has acknowledged that the voice input is already done. This application will produce two outputs. The first output will be a converted message with the proper and complete spelling in Filipino. As an option, the user may choose to compress the text output into a SMS due to the fact that most cell phone carriers allow only up to 160 characters per message. 4.2 Architecture The system will begin by first gathering input, a spoken message, through the input module. The input module will then pass off the unprocessed spoken message to the Sphinx-4 module, configured for recognizing Filipino. Sphinx-4 will then pass off the now textbased message to the message shrinking module, which will apply common methods of reducing word length. Finally, the shrunken, text-based message will be passed off to the output module, which will display said output to its user. Figure 2. Architectural Design of FiliText The system will rely on Sphinx-4 in order to convert the spoken message into its respective text format. Because Sphinx-4 is highly configurable, a speech recognition module would not need 59
3 to be coded from scratch. Instead, the Sphinx-4 module will be trained and configured to recognize informal Filipino. The input will first pass through the FrontEnd module, which will handle the cleaning and normalizing of the input. Little effort will be placed into configuring and optimizing the FrontEnd as it deals with digital signal processing. The Linguist module will create the SearchGraph, which the Decoder module will use to compare the input against in order to generate its results. The Linguist module will be the most configured of the three as it contains the hidden Markov models, the list of possible words and their respective pronunciations, and the acoustic models of phonemes. Sphinx-4 does not have any of the necessary files to understand Filipino, so the dictionary, the acoustic models, and the language models will be created by the proponents using the tools provided by CMU Sphinx group. The Language model will be created using a compilation of Filipino text messages, newscast transcripts, Facebook posts, and Twitter feeds. These will be placed into a text file with the format, <S> text n </S>. A vocabulary file, a text file listing all Filipino words used, will also be created and used to generate the language model. The vocabulary file will not include names and numbers. These two files will be used by the CMU-Cambridge Language Modeling Toolkit (CMUCLMTK) to create an n-gram language model. Aside from the implementation, the created language model is necessary for the creation of the Acoustic Model. The Acoustic Model submodule of the Linguist module will be trained using SphinxTrain, a tool also created by the CMU Sphinx group for generating the acoustic models a system will use. To train the acoustic model, recording from different speakers will be compiled and each audio file would be transcribed. Each recording will be a wav file sampled at 16 khz, 16-bit mono, and segmented to be between 5 to 30 seconds long [5]. The set of speakers will include males and females of 16 years of age or older. The Decoder module compares the processed input against the SearchGraph produced by the Linguist to produce its results. The Decoder's SearchManager sub-module will be configured to use the implemented SimpleBreadthFirstSearchManager, an implementation of the frame-synchronous Viterbi algorithm. The message shrinker module will use word alignment, a statistical machine translation algorithm to shorten the output of the Sphinx-4 module that will still be understandable. The output of this module would be a text with at most 160 characters unless it is stated that there is no possible way to make the text shorter than 160 characters. This shortened message will then be sent to the output module, which will display the message to the user. 4.3 Customization of Sphinx-4 Since the documentation of the Sphinx-4 has specified the steps in creating the language model and acoustic model of a new language when needed, it was somehow easy to create a prototype. The challenge in customizing the Sphinx-4 for a totally different language is getting all the recordings and having them be trained by SphinxTrain. The initial task to perform was to gather audio recordings of different speakers that will use all the phonemes of Filipino in their recordings. The audio file format must be in 16-bit mono and 16 khz and it must not be shorter than 5 seconds to aid in the accuracy of the acoustic model training. After gathering all the speech recordings, they are all placed in a folder that will be run with CMUCLMTK so that it may create the dictionary of used words and also the different phonemes it was able to detect. After being able to run it through the language modeling toolkit, it will be ready for training under SphinxTrain to create the acoustic model that the application would use to understand the different words uttered by the end-user of the application. 4.4 Data Collection The corpus the proponents will be using is a Filipino Speech Corpus (FSC) created by Guevarra, Co, Espina, Garcia, Tan, Ensomo, and Sagum of the University of the Philippines - Diliman Digital Signal Processing Laboratory in The corpus contains audio files and matching transcription files which will be used to train the acoustic model and the language model for the Filipino language and use these trained models for recognition. For the language model, contemporary sentences were also gathered from social networking sites such as Facebook and Twitter. 4.5 Training the Acoustic and Language Models To create a new acoustic model for Sphinx 4 to use, the CMU Sphinx project has created tools to aid in creating these models for recognizing speech. The required files are placed in a single folder which is considered as the speech database. Sound recordings will be placed into the directory wav/speaker_n. The dictionary, the mapping of words to their respective pronunciations, that will be used for SphinxTrain will be the same as the one used in the implementation of Sphinx-4. A single text file will be created to house the transcriptions of each recording. Each entry in the text file must follow the following format: <S>transcription of file n</s> (file_n). A file with the filenames of all the sound recordings must also be present as this will be used to map the files to the transcription. The ARPA language model will also be used by SphinxTrain to generate the acoustic models. A phoneset file, a file listing each phoneme tag, will be provided needed as well. The filler dictionary will be created to only include silences. These will all be used by SphinxTrain to generate the acoustic model. All mentioned files other than audio recordings will be placed inside a folder labeled etc. Because the FSC had recorded speech with lengths of 25 to 35 minutes each, the proponents had to segment each file into the specified 5 to 30-second length. They were able to automate the process by using SoX, an open source audio manipulation tool. This tool was able to segment the sound files according to the existing transcriptions that came with the speech corpus. After segmenting the sound files, the filenames were transcribed to a fileids file and the transcriptions of each sound file was compiled into a single file, ready for training. 60
4 The language model needed for the etc folder was created using the transcription file and the CMUCLMTK. The phonetic dictionary was also created with the aid of the language modeling toolkit because in the process of creating the language model itself, the toolkit will first create a dictionary file with all the words in the transcription file. The phonetic dictionary that the proponents created used the letters of the Filipino words as the phone for each letter. According to the acoustic model training documentation of SphinxTrain, this approach is done when there is no phoneme book available and it gives very good results [5]. Figure 3. Phonetic Dictionary Sample Sphinx-4 also has a mobile counterpart called PocketSphinx and this is usually used for mobile based applications that require speech recognition. It has been used to develop applications for the Apple iphone before [5]. 5. TEST RESULTS The proponents have trained three sets of acoustic model and language model using 20 speakers, 40 speakers and 60 speakers from the FSC. The proponents split the trainings to see whether or not accuracy would improve when the trained data has been increased. The proponents conducted two types of testing: controlled and uncontrolled testing. Controlled testing made use of the existing recordings from the speech corpus while uncontrolled testing was done with random people that were not from the corpus. In determining the accuracy of the system, the result text generated is compared to the correct transcription of the recording. The following formula is used to attain the accuracy rate of the system: matching_words Accuracy Rate = Total_words_in_transcription 100 The folder structure must be followed because the training process is controlled by calling Perl scripts to setup the rest of the training binaries and configuration files. Before starting the training, the train configuration file (sphinx_train.cfg) must be edited according to the size of the speech database to be trained. The variables that must be taken into consideration before training are the model parameters: the tied-states (senones) and the density. Controlled testing was done for all three trained sets and the accuracy for each speaker per set is shown in Figure 5. Figure 5. Accuracy Comparison for 3 Sets Figure 4. Approximation of Senones and Number of Densities Training internals include but is not limited to computing the features of the audio files, training the context-independent models and the context-dependent untied models, building decision trees, pruning the decision trees and finally train the context-dependent tied models. 4.6 Mobile Application Having a desktop application is very different from a mobile application because of the limitations of the mobile devices when it comes to size capacity, processing speed and many more. When moving the application to a mobile device, it would be better if the size of the whole application is smaller and would still be able to perform similarly to the desktop application. This comes as a challenge since the application would need the acoustic model, the language model and other linguistics related models that would be needed to recognize the spoken text. It can be seen that the accuracy for the 40-speaker set dropped but this was because it lacked training. For the 60-speaker set, the variables were adjusted to fit the size of the training data which in turn gave better results when compared to the 20-speaker set. Figure 6 shows the average accuracy rate for each set for the conducted controlled testing and it is evident that the accuracy of the 60-speaker set increased as compared to the 20-speaker set. The mean accuracy rate of controlled testing of each test was at 45% for 20-speaker set, 43.25% for 40-speaker set and 58.32% for 60-speaker set. Figure 6. Average Accuracy Rate for Controlled Test 61
5 Table 1. Sample Output Expected Output Actual Output Nandito ako Nandito ako Maganda ba yun palabas Maganda bayong palabas Tumatawag na naman siya Tumatawag nanaman siya Pauwi ka na ba Pauwi kalaban For the uncontrolled testing, the proponents designed the language model to compose of the transcriptions from UP Diliman and contemporary sentences gathered from different social networking sites. The proponents gathered 20 speakers, 10 male and 10 female, to test the system with sentences that are usually used in daily conversations. Using the trained data with 120-speaker set and the new language model, the system attained an average of 69.67% in accuracy and an error rate of 30.33%. Figure 7. Accuracy and Error Rate for Uncontrolled Test Male 5.1 Creating the Mobile Application Attempting to port the existing desktop speech recognition application to an Android device proved to be a challenge. Since the mobile version of Sphinx-4, PocketSphinx, is not well documented yet for Android, the proponents had a hard time installing the required software and actually creating the application for the mobile device. There were some sample applications available online that were on a demo level however it was tricky to install on the mobile device. Another challenge was the limitations on the existing phones that the proponents had. The demo application that was downloaded and modified was too heavy for the HTC Hero such that when the application was opened, it would close itself without any warning. Another mobile phone available for testing was a Samsung Galaxy Ace. However, the proponents have yet to test it on the said device. 5.2 Improving Performance As mentioned above, the accuracy for sentences not found in the language model was very low. The proponents are currently continuing research on how to improve the performance of the system. There are two approaches in which the proponents will tackle: a new language model would be built with sentences that consists of everyday conversational Filipino sentences and train the acoustic model to be phoneme dependent. Figure 8. Accuracy and Error Rate for Uncontrolled Test - Female For better results, speakers are recommended to speak in a clear loud voice and to avoid mispronunciations of the words. The speaker should also speak in a slower pace to make each word more distinct with one another to avoid conjunction of two different words. The accuracy of the system will also drop when there is too much background noise present. In Table 1, actual results produced by the system are seen: The new language model would be built with the help of the Department of Filipino of De La Salle University. The department would advise the researchers on what sentences are considered to be conversational Filipino and include these sentences in the language model. An additional resource for sentences that could be added to the language model would be a collection of existing text messages sent in Filipino. After this language model is completed, the system would be retrained to follow the new language model and be tested whether improvement occurred. Training the acoustic model to be phoneme dependent would aid in the system to use letter-to-sound rules to guess the pronunciation of unknown words. These unknown words are not found in the dictionary and the transcription files which mean that they were not trained to understand these words. The letter-tosound rule will attempt to guess the pronunciation of unknown words based on the existing phonemes and words in the dictionary. Again, testing would be conducted to the different sets of models trained to see whether improvement occurred with increments to the training data. The test results would also be compared to existing test results to see if unknown words would really be recognized. 62
6 6. CHALLENGES ENCOUNTERED Sphinx-4 is a very flexible system capable of performing many types of recognition tasks and there is a lot of documentation provided for the public. However, since this tool has not been made specifically for the Filipino language, there are a lot of modifications to be made. In the demonstration programs provided by the Sphinx-4 has low accuracy. This was due to the noise and echo included during the testing. This certain challenge was remedied by switching off noise reduction and acoustic echo cancellation in the microphone s setting. Sphinx documentation also specified that the recorded wave file should be set to 16-bit 16 khz mono to be used for the training. However, with the first set of recordings, it did not follow the specifications. By changing the sample rate of the given sound file only resulted in it being slowed. This issue was resolved after changing the sample rate of the project itself instead of the files. Although Sphinx-4 has been built and tested on the Solaris Operating Environment, Mac OS X, Linux and Win32 operating systems, CMUCLMTK requires the use of UNIX. This was remedied by using Cygwin as recommended by the Sphinx team. Line breaks in the tool also requires being in UNIX format. This issue was resolved by switching to UNIX line breaks using Notepad++. The recorded messages that would be used in the training of the system had less background noise. When the application is used in normal environment which has more noise and echo, the system s accuracy could drop. The issues of creating hands-free texting application also involve the users style of texting and speaking. The type of keypad their phone has, whether a QWERTY or T9, has a factor as to how they type their text messages. There is also a difference on how a person composes a text to the way he or she speaks in a conversation. An issue from the results is the lack of relevance of the trained language model to the messages sent for text messaging. Because of the low accuracy rate for the uncontrolled tests, the proponents believe that the language model is the main contributor to the drop of accuracy. The language model was patterned according to the speech uttered by the speakers in the FSC and these speeches include stories and words. These are not sentences that are often used in everyday texting which is why the sentences uttered by the speakers for the uncontrolled test were barely recognized by the system. 7. CONCLUSION CMU Sphinx-4 is an effective tool to develop a desktop application that is able to recognize speech in Filipino language and produce its text equivalent. The system is also able to use a simple rule-based text shortening using regular expressions to provide the users the text speak equivalent of the output produced. There are also several recommendations that future developers may do to improve the system: Firstly, to increase the data in the language model. These data may include English words since most Filipinos does not use plain Tagalog in texting but mixes English and Tagalog. Developers may also allow the user to place punctuation marks in sentences for better understanding of the result. Other commands such as starting and ending the recording for speech recognition may also be added as a feature enhancement for the application. Lastly, it is recommended that the application be ported into mobile with different operating systems such as Android and ios. 8. ACKNOWLEDGMENTS The researchers would like to thank the following: (1) Mr. Danny Cheng for being an adviser and guiding the group throughout the research, (2) for the entire panelist, namely: Mr. Allan Borra and Mr. Clement Ong for the remarks and suggestions that they have given to further improve the research, and last but not the least to (3) EEE department of the University of the Philippines Diliman for allowing us to use their Filipino Speech Corpus (FSC) for our research. 9. REFERENCES [1] Bonabente, C. L. (2008). Total ban sought on cell phone use while driving. Retrieved from [2] CarAccidents. (2010). Philippines Crash Accidents, Driving, Car, Manila Auto Crashes Pictures, Statistics, Info. Retrieved from [3] Vlingo Incorporated. (2010). Voice to text applications powered by intelligent voice recognition. Retrieved from [4] Adela Voice Corporation. (2010). Starttalking. Retrieved from [5] CMU Sphinx. (2010). CMU Sphinx Speech Recognition ToolKit. Retrieved from [6] Gonzalez, A. (1998). The Language Planning Situation in the Philppines. Journal of Multilingual and Multicultural Development, 55, 5-6. [7] BBC h2g2. (2002). Writing Text Messages. Retrieved from [8] CMU Sphinx Group. (2011). CMU Sphinx. Retrieved from [9] Cu, J., Ilao, J., Ong, E. (2010). O-COCOSDA 2010 Philippine Country Report. Retrieved from COCOSDA2010/o-cocosda2010-abstract.pdf 63
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
Developing speech recognition software for command-and-control applications
Developing speech recognition software for command-and-control applications Author: Contact: Ivan A. Uemlianin [email protected] Contents Introduction Workflow Set up the project infrastructure
Reading Assistant: Technology for Guided Oral Reading
A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION
1 Recognition Accuracy Turns your voice into text with up to 99% accuracy NEW - Up to a 20% improvement to out-of-the-box accuracy compared to Dragon version 11 Recognition Speed Words appear on the screen
Speech Analytics. Whitepaper
Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the
Speech Recognition Software Review
Contents 1 Abstract... 2 2 About Recognition Software... 3 3 How to Choose Recognition Software... 4 3.1 Standard Features of Recognition Software... 4 3.2 Definitions... 4 3.3 Models... 5 3.3.1 VoxForge...
Philips 9600 DPM Setup Guide for Dragon
Dragon NaturallySpeaking Version 10 Philips 9600 DPM Setup Guide for Dragon Philips 9600 DPM Setup Guide (revision 1.1) for Dragon NaturallySpeaking Version 10 as released in North America The material
Dictamus Manual. Dictamus is a professional dictation app for iphone, ipod touch and ipad. This manual describes setup and use of Dictamus version 10.
Dictamus Manual Dictamus is a professional dictation app for iphone, ipod touch and ipad. This manual describes setup and use of Dictamus version 10. Table of Contents Settings! 3 General! 3 Dictation!
SIPAC. Signals and Data Identification, Processing, Analysis, and Classification
SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is
Develop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
Dragon Solutions Enterprise Profile Management
Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible
Example of Standard API
16 Example of Standard API System Call Implementation Typically, a number associated with each system call System call interface maintains a table indexed according to these numbers The system call interface
Voice Driven Animation System
Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take
Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.
matrix Recognition accuracy Recognition speed System configuration Turns your voice into text with up to 99% accuracy New - Up to a 15% improvement to out-of-the-box accuracy compared to Dragon version
Gladinet Cloud Backup V3.0 User Guide
Gladinet Cloud Backup V3.0 User Guide Foreword The Gladinet User Guide gives step-by-step instructions for end users. Revision History Gladinet User Guide Date Description Version 8/20/2010 Draft Gladinet
Text-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
An Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal
INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT An e-college Time table Retrieval
Estonian Large Vocabulary Speech Recognition System for Radiology
Estonian Large Vocabulary Speech Recognition System for Radiology Tanel Alumäe, Einar Meister Institute of Cybernetics Tallinn University of Technology, Estonia October 8, 2010 Alumäe, Meister (TUT, Estonia)
FAQ OLYMPUS DICTATION SMARTPHONE APPLICATION
SMOLTZ Distributing, Inc. North American Distributor of Choice 877.476.6589 www.smoltz.com FAQ OLYMPUS DICTATION SMARTPHONE APPLICATION I have a client that would like to purchase the Olympus Smartphone
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist ([email protected]) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
C E D A T 8 5. Innovating services and technologies for speech content management
C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle
Blue&Me. Live life while you drive. What you can do: Introduction. What it consists of:
Blue&Me Live life while you drive Introduction Blue&Me is an innovative in-car system that allows you to use your Bluetooth mobile phone and to listen to your music while you drive. Blue&Me can be controlled
Automated Speech to Text Transcription Evaluation
Automated Speech to Text Transcription Evaluation Ryan H Email: [email protected] Haikal Saliba Email: [email protected] Patrick C Email: [email protected] Bassem Tossoun Email: [email protected]
DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS TROUBLE TICKETING SYSTEM
DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS TROUBLE TICKETING SYSTEM Rocelyn Nicole Alicbusan, Emar Nathniel de Pano, Jed Kevin Fermo, Brian Joseph Tan, and Marivic Tangkeko 1 1 Center for ICT for Development
Mobile Accessibility. Jan Richards Project Manager Inclusive Design Research Centre OCAD University
Mobile Accessibility Jan Richards Project Manager Inclusive Design Research Centre OCAD University Overview I work at the Inclusive Design Research Centre (IDRC). Located at OCAD University in downtown
Dragon Solutions Transcription Workflow
Solutions Transcription Workflow summary Improving Transcription and Workflow Efficiency Law firms have traditionally relied on expensive paralegals, legal secretaries, or outside services to transcribe
SPeach: Automatic Classroom Captioning System for Hearing Impaired
SPeach: Automatic Classroom Captioning System for Hearing Impaired Andres Cedeño, Riya Fukui, Zihe Huang, Aaron Roe, Chase Stewart, Peter Washington Problem Definition Over one in seven Americans have
CatDV Pro Workgroup Serve r
Architectural Overview CatDV Pro Workgroup Server Square Box Systems Ltd May 2003 The CatDV Pro client application is a standalone desktop application, providing video logging and media cataloging capability
Controlling the computer with your voice
AbilityNet Factsheet August 2015 Controlling the computer with your voice This factsheet provides an overview of how you can control computers (and tablets and smartphones) with your voice. Communication
Automatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
AFTER EFFECTS FOR FLASH FLASH FOR AFTER EFFECTS
and Adobe Press. For ordering information, CHAPTER please EXCERPT visit www.peachpit.com/aeflashcs4 AFTER EFFECTS FOR FLASH FLASH FOR AFTER EFFECTS DYNAMIC ANIMATION AND VIDEO WITH ADOBE AFTER EFFECTS
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
Dragon Solutions. Using A Digital Voice Recorder
Dragon Solutions Using A Digital Voice Recorder COMPLETE REPORTS ON THE GO USING A DIGITAL VOICE RECORDER Professionals across a wide range of industries spend their days in the field traveling from location
Dragon Solutions Using A Digital Voice Recorder
Dragon Solutions Using A Digital Voice Recorder COMPLETE REPORTS ON THE GO USING A DIGITAL VOICE RECORDER Professionals across a wide range of industries spend their days in the field traveling from location
Video Transcription in MediaMosa
Video Transcription in MediaMosa Proof of Concept Version 1.1 December 28, 2011 SURFnet/Kennisnet Innovatieprogramma Het SURFnet/ Kennisnet Innovatieprogramma wordt financieel mogelijk gemaakt door het
Transcription FAQ. Can Dragon be used to transcribe meetings or interviews?
Transcription FAQ Can Dragon be used to transcribe meetings or interviews? No. Given its amazing recognition accuracy, many assume that Dragon speech recognition would be an ideal solution for meeting
Getting Started with Windows Speech Recognition (WSR)
Getting Started with Windows Speech Recognition (WSR) A. OVERVIEW After reading Part One, the first time user will dictate an E-mail or document quickly with high accuracy. The instructions allow you to
COPYRIGHT 2011 COPYRIGHT 2012 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED
Subtitle insertion GEP100 - HEP100 Inserting 3Gb/s, HD, subtitles SD embedded and Teletext domain with the Dolby HSI20 E to module PCM decoder with audio shuffler A A application product note COPYRIGHT
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
Technology Finds Its Voice. February 2010
Technology Finds Its Voice February 2010 Technology Finds Its Voice Overview Voice recognition technology has been around since the early 1970s, but until recently the promise of new advances has always
Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA
Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows
Witango Application Server 6. Installation Guide for OS X
Witango Application Server 6 Installation Guide for OS X January 2011 Tronics Software LLC 503 Mountain Ave. Gillette, NJ 07933 USA Telephone: (570) 647 4370 Email: [email protected] Web: www.witango.com
USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE
USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE DUET EXE OVERVIEW Control Button Panel Connector Panel Loudspeaker Microphone The Duet is a high performance speakerphone for desktop use that can cover
Design Grammars for High-performance Speech Recognition
Design Grammars for High-performance Speech Recognition Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks
German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings
German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings Haojin Yang, Christoph Oehlke, Christoph Meinel Hasso Plattner Institut (HPI), University of Potsdam P.O. Box
Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project
Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone [email protected] June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................
Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML
Speech Recognition of a Voice-Access Automotive Telematics System using VoiceXML Ing-Yi Chen Tsung-Chi Huang [email protected] [email protected] Department of Computer Science and Information
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
Transcription Module Easy Start Guide
Transcription Module Easy Start Guide 1. Open the Transcription Module a. Double-Click on the Transcription Module icon on your desktop. b. Start Menu\Programs\Olympus DSS Player Pro\Transcription Module.
Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.
Objectives To describe the services an operating system provides to users, processes, and other systems To discuss the various ways of structuring an operating system Chapter 2: Operating-System Structures
A Review of Different Comparative Studies on Mobile Operating System
Research Journal of Applied Sciences, Engineering and Technology 7(12): 2578-2582, 2014 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2014 Submitted: August 30, 2013 Accepted: September
Longman English Interactive
Longman English Interactive Level 2 Orientation (English version) Quick Start 2 Microphone for Speaking Activities 2 Translation Setting 3 Goals and Course Organization 4 What is Longman English Interactive?
A Speech Recognition and Synthesis Tool
A Speech Recognition and Synthesis Tool Hala ElAarag Laura Schindler Department of Mathematics and Computer Science Stetson University DeLand, Florida, U.S.A. Email: {helaarag, lschindl}@stetson.edu ABSTRACT
Voice-Recognition Software An Introduction
Voice-Recognition Software An Introduction What is Voice Recognition? Voice recognition is an alternative to typing on a keyboard. Put simply, you talk to the computer and your words appear on the screen.
DSD Native DAC Setup Guide
CHANNEL D Pure Music DSD Native DAC Setup Guide Release 1.0 Copyright 2012 Channel D www.channel-d.com CHANNEL D Pure Music DSD Native DAC Setup Guide These instructions outline the setup steps required
Using a Digital Recorder with Dragon NaturallySpeaking
Using a Digital Recorder with Dragon NaturallySpeaking For those desiring to record dictation on the go and later have it transcribed by Dragon, the use of a portable digital dictating device is a perfect
Voice Input Computer Systems Computer Access Series
AT Quick Reference Guide Voice Input Computer Systems Computer Access Series Voice input computer systems (or speech recognition systems) learn how a particular user pronounces words and uses information
Using the Amazon Mechanical Turk for Transcription of Spoken Language
Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.
Call Recorder Oygo Manual. Version 1.001.11
Call Recorder Oygo Manual Version 1.001.11 Contents 1 Introduction...4 2 Getting started...5 2.1 Hardware installation...5 2.2 Software installation...6 2.2.1 Software configuration... 7 3 Options menu...8
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Mobile Labs Plugin for IBM Urban Code Deploy
Mobile Labs Plugin for IBM Urban Code Deploy Thank you for deciding to use the Mobile Labs plugin to IBM Urban Code Deploy. With the plugin, you will be able to automate the processes of installing or
Dictation Software Feature Comparison
Dictation Software Feature Comparison Software Version Direct Recording Window Dictation operation ODMS Dictation Module DSS Player Pro R5 Dictation Module DSS Player Standard R2 DSS Player Plus for Mac
A CHINESE SPEECH DATA WAREHOUSE
A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk
Operating Systems Overview As we have learned in working model of a computer we require a software system to control all the equipment that are
Session 07 Operating Systems Overview As we have learned in working model of a computer we require a software system to control all the equipment that are connected to computer and provide good environment
interviewscribe User s Guide
interviewscribe User s Guide YANASE Inc 2012 Contents 1.Overview! 3 2.Prepare for transcribe! 4 2.1.Assign the audio file! 4 2.2.Playback Operation! 5 2.3.Adjust volume and sound quality! 6 2.4.Adjust
SecureVault Online Backup Service FAQ
SecureVault Online Backup Service FAQ C0110 SecureVault FAQ (EN) - 1 - Rev. 19-Nov-2007 Table of Contents 1. General 4 Q1. Can I exchange the client type between SecureVault PC Backup Manager and SecureVault
CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson
CS 3530 Operating Systems L02 OS Intro Part 1 Dr. Ken Hoganson Chapter 1 Basic Concepts of Operating Systems Computer Systems A computer system consists of two basic types of components: Hardware components,
Information Leakage in Encrypted Network Traffic
Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)
Mobile Operating Systems. Week I
Mobile Operating Systems Week I Overview Introduction Mobile Operating System Structure Mobile Operating System Platforms Java ME Platform Palm OS Symbian OS Linux OS Windows Mobile OS BlackBerry OS iphone
Automated Segmentation and Tagging of Lecture Videos
Automated Segmentation and Tagging of Lecture Videos Submitted in partial fulfillment of the requirements for the degree of Master of Technology by Ravi Raipuria Roll No: 113050077 Supervisor: Prof. D.
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards
Register your product and get support at www.philips.com/welcome LFH0645 LFH0648. EN User manual
Register your product and get support at www.philips.com/welcome LFH0645 LFH0648 EN User manual Table of contents 1 Welcome 3 Product highlights 3 2 Important 4 Safety 4 Hearing safety 4 Disposal of your
Chapter 2 System Structures
Chapter 2 System Structures Operating-System Structures Goals: Provide a way to understand an operating systems Services Interface System Components The type of system desired is the basis for choices
Year 1 reading expectations (New Curriculum) Year 1 writing expectations (New Curriculum)
Year 1 reading expectations Year 1 writing expectations Responds speedily with the correct sound to graphemes (letters or groups of letters) for all 40+ phonemes, including, where applicable, alternative
Voice and data recording Red Box makes it easier than you imagine
Voice and data recording Red Box makes it easier than you imagine SIMPLER SMARTER VOICE If you re reading this, there s a good chance your organization has to record phone calls, radio conversations or
Single or multi-channel recording from microphone channels and telecommunications lines simultaneously
Single or multi-channel recording from microphone channels and telecommunications lines simultaneously Options for remote control and remote management Special voice recorder functions Processing and management
Email to Voice. Tips for Email to Broadcast
Contents Email to Broadcast Services addresses... 2 Job Name... 2 Email to Text To Speech... 2 Voice Software... 3 Write out the words... 4 Names... 4 Punctuation... 4 Pauses... 4 Vowels... 4 Telephone
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Tablets in Data Acquisition
Tablets in Data Acquisition Introduction In the drive to smaller and smaller data acquisition systems, tablet computers bring a great appeal. Desktop personal computers gave engineers the power to create
Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty
1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment
VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language
AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID
Tracking Network Changes Using Change Audit
CHAPTER 14 Change Audit tracks and reports changes made in the network. Change Audit allows other RME applications to log change information to a central repository. Device Configuration, Inventory, and
Ansur Test Executive. Users Manual
Ansur Test Executive Users Manual April 2008 2008 Fluke Corporation, All rights reserved. All product names are trademarks of their respective companies Table of Contents 1 Introducing Ansur... 4 1.1 About
Digital Asset Management. Content Control for Valuable Media Assets
Digital Asset Management Content Control for Valuable Media Assets Overview Digital asset management is a core infrastructure requirement for media organizations and marketing departments that need to
How to use PDFlib products with PHP
How to use PDFlib products with PHP Last change: July 13, 2011 Latest PDFlib version covered in this document: 8.0.3 Latest version of this document available at: www.pdflib.com/developer/technical-documentation
CS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 2: Operating System Structures Prof. Alan Mislove ([email protected]) Operating System Services Operating systems provide an environment for
320 E. 46 th Street, 11G New York, NY 10017 Tel. 212-751-5150 Ext. 5
320 E. 46 th Street, 11G New York, NY 10017 Tel. 212-751-5150 Ext. 5 Ms. Lisa Fowlkes Deputy Chief, Public Safety and Homeland Security Bureau Federal Communications Commission 445 12 th Street, NW, Room
