TalkVox. Project Advisor. Javier Thiele-Ruiz Computer Engineering June John Oliver

Similar documents
Develop Software that Speaks and Listens

Read&Write 11 Home Version Download Instructions for Windows 7

Total Quality Management (TQM) Quality, Success and Failure. Total Quality Management (TQM) vs. Process Reengineering (BPR)

Registry Tuner. Software Manual

COMPUTER TECHNOLOGY IN TEACHING READING

Foundations for Systems Development

Manage Software Development in LabVIEW with Professional Tools

Dragon Medical Practice Edition v2 Best Practices

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

Source Code Translation

General Problem Solving Model. Software Development Methodology. Chapter 2A

The ROI. of Speech Tuning

USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE

textthing MFL KS2 Content Evaluation (by Suzanne Ford)

How to become a successful language learner

Java Programming (10155)

Ten steps to better requirements management.

have more skill and perform more complex

Screen Design : Navigation, Windows, Controls, Text,

Introduction to Systems Analysis and Design

THE SOFTWARE DEVELOPMENT LIFE CYCLE *The following was adapted from Glencoe s Introduction to Computer Science Using Java

Chapter 12 Programming Concepts and Languages

Lawyer Search Search for attorneys licensed by the Supreme Court of Illinois

CHRIS User Guide: Dated Information and Date Tracking

Assessing A Student s Need for Assistive Technology: Where to Start?

Upgrade from Sage Instant Accounts v15

Using an Edline Gradebook. EGP Teacher Guide

ntier Verde: Simply Affordable File Storage No previous storage experience required

ELIMINATING RISKS FOR MANUFACTURING COMPANIES IMPLEMENTING MICROSOFT DYNAMICS AX

Easy Bangla Typing for MS-Word!

Text-to-Speech and Read Aloud Decision Guidelines Page 1

Top 10 Skills and Knowledge Set Every User Experience (UX) Professional Needs

Read&Write 10 Home Version Download Instructions for Windows XP, Vista and 7

N Q.3 Choose a level of accuracy appropriate to limitations on measurement when reporting quantities.

Multi-user Collaboration with Autodesk Revit Worksharing

Design Analysis of Everyday Thing: Nintendo Wii Remote

one Managing your PBX Administrator ACCESSING YOUR PBX ACCOUNT CHECKING ACCOUNT ACTIVITY

Teaching Writing to Students with Learning Disabilities by Bruce Johnson

Quick Start. Guide. The. Guide

What's new in Word 2010

Video conferencing for anyone, anywhere, on any device-that s the Connected Experience.

Consolidate Distributed Documents by Simplifying Information Sharing

Content Author's Reference and Cookbook

The preliminary design of a wearable computer for supporting Construction Progress Monitoring

Special Reports. Finding Actionable Insights through AdWords Reporting

EPiServer and XForms - The Next Generation of Web Forms

Collaboration Solution Helps Law Firm Bolster Relationships, Streamline Processes

Spiel. Connect to people by sharing stories through your favorite discoveries

Jack s Dyslexia Index indicates he has dyslexic difficulties that are mild in extent.

The Human Side of Test Automation

BPM: Chess vs. Checkers

Intranet Buyers Workbook

Chapter 8 Speech Recognition Tools

Graphical Environment Tool for Development versus Non Graphical Development Tool

Table of Contents User Guide... 1 Welcome... 4 Front End Life Blue... 5 General Navigation... 5 Menu Bar... 5 Continuous Scroll... 5 Hyperlinks...

Interpretype Video Remote Interpreting (VRI) Subscription Service White Paper September 2010

Microsoft Office Project Standard 2007 Project Professional April February 2006

Online Test Monitor Certification Course Transcript

BCSD WebMail Documentation

ADVANTAGES OF IMPLEMENTING A DATA WAREHOUSE DURING AN ERP UPGRADE

User experience storyboards: Building better UIs with RUP, UML, and use cases

White Paper. Java versus Ruby Frameworks in Practice STATE OF THE ART SOFTWARE DEVELOPMENT 1

Test Specification. Introduction

Once you have obtained a username and password you must open one of the compatible web browsers and go to the following address to begin:

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

Study Tools Software Contents

Software localization testing at isp

The «include» and «extend» Relationships in Use Case Models

Development Methodologies Compared

The Secret to Playing Your Favourite Music By Ear

QUALITY TOOLBOX. Understanding Processes with Hierarchical Process Mapping. Robert B. Pojasek. Why Process Mapping?

OVERVIEW AND TERMINOLOGY

Quick Start Guide: Read & Write 11.0 Gold for PC

FPGA Prototyping Primer

Comparative Analysis on the Armenian and Korean Languages

The modern marketer s guide to global content creation

How to Use e-commerce on

Software Life Cycle Processes

Speech Recognition Software Review

PROJECT MANAGEMENT PLAN CHECKLIST

Speech Analytics. Whitepaper

SCREAM (SCRUM TEAM MANAGEMENT TOOL)

Continuous Integration

CAPTURING A VM700T SCREEN DISPLAY with VMTWIN

Universal Tools, Supports, and Accommodations For MSP Science Online Assessment,

Avaya one-x Mobile User Guide for iphone

Read&Write 9 Home Version Download Instructions for Windows XP, Vista and 7

RingCentral Office. Basic Start Guide FOR USERS

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

RFID Based 3D Indoor Navigation System Integrated with Smart Phones

stress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,

Smart Thermostat page 1

Name of Lesson: Properties of Equality A Review. Mathematical Topic: The Four Properties of Equality. Course: Algebra I

Programmer s Reference

Release Notes. Asset Control and Contract Management Solution 6.1. March 30, 2005

Dept. of Communication Studies Senior Portfolio Instructions

BIGPOND ONLINE STORAGE USER GUIDE Issue August 2005

top tips to help you save on travel and expenses

Getting Started with the DCHR Service Desk. District Service Management Program

Transcription:

TalkVox Javier Thiele-Ruiz Computer Engineering June 2014 Project Advisor John Oliver Winter 2014 Spring 2014 1

Table of Contents Introduction...3 Project Overview 3 Need Statement 3 Project Definition...4 Project Goals 4 Project Objectives 4 Project Deliverables 4 Marketing Requirements 5 Engineering Requirements 5 User Experience 6 Project Planning and Design...7 Overview 7 Project Schedule 7 Software Design 8 Closing Statements...9 Sustainability Impact 9 Health and Safety Impact 9 Conclusion 10 2

Introduction Project Overview The problem I am trying to solve is the difficulty associated with editing the pronunciations of words for Text-to-Speech (TTS) programs in a fast and efficient manner. When using certain text-to-speech software, the pronunciation must first be converted to a different format, and then a command line argument is used to create a.wav file with the pronunciation of that word. This is incredibly inefficient and time-consuming so I set out to develop a program which would be able to remove the majority of the tedium involved with testing how a given word is pronounced by TTS programs. With fewer steps required from the user, more progress is able to be made and more words are able to be edited. Need Statement TTS programs are built upon large dictionaries of words with the phonetic spelling of each word alongside them. When editing these pronunciations, many steps are required in order to change, test, and save the changes to the dictionary. This makes it extremely time-consuming to actually edit these dictionary files and results in either an incomplete TTS program or too many hours spent working on ensuring the pronunciations are correct. I took on this project as a way to be able to streamline the process of editing and saving these phonetic pronunciations. 3

Project Definition Project Goals Develop a program which would remove the intermediate steps required to listen to a given word s pronunciation when read from a file. Reduce time required to edit a dictionary file of words. Project Objectives Develop a User Interface using Java Swing. Use the Pico2Wav executable to play the phonetic pronunciations. Save and revert changes made to dictionary file. Project Deliverables Upon completion, the project will include a program which is capable of reading in a dictionary file with a list of words and their given pronunciations. The program will then allow the user to navigate the dictionary with ease while listening to and editing the words which are included. The source code, a sample dictionary, and all of the utilities required to run the program will also be included. 4

Marketing Requirements Load and save an existing dictionary file Display word and associated pronunciation Mark which phonemes are invalid in pronunciation field Display current word number as well as total word count Play button which plays current word s pronunciation Next and Previous buttons which update with new word s information and play its pronunciation Word/Phoneme search to navigate dictionary easier Change language TTS voice uses Engineering Requirements Category Engineering Requirement Justification Functionality The program must be able to support different phonetic mappings Functionality/Usability The program must be able to play the updated word pronunciations instantly Usability The program must display which phonemes are invalid in the current pronunciation Usability The program should have a word/phoneme search In order to accommodate different phonetic alphabets, the ability to change the mappings is necessary If this is not done instantly then Without this feature, incorrect pronunciations could be saved to the original file This allows for the quick navigation of large dictionary files 5

The User Experience The primary user to use this would most likely be a developer trying to build up a better TTS dictionary for modifying existing TTS engines or creating their own. They would begin by loading a dictionary file containing a list of words whose pronunciations would need to be added or have their accuracies verified. Once the list is loaded they would then proceed to go through every word as quickly as possible to ensure each word s accuracy, or add pronunciations where needed. Once they reached the end, they would save the changes to the file and open a new file to repeat the process. Another user could potentially be someone who is only looking to add certain words to TTS engines in order to fix how certain words sound, like acronyms or foreign names. They would directly contribute to helping a TTS engine improve its accuracy by crowdsourcing its resources. 6

Project Planning and Design Overview When I began this project I laid out a timeline with my advisor John Oliver which broke down every section of the project and when each one should be done. Included in the timeline were prototypes, basic features, extra features, debugging, and testing. Throughout the project there were also regularly scheduled meetings with Dr. Oliver to update him on my status along the program. Project Schedule The project will be broken down into several phases. The first phase is a prototype phase where the viability of the chosen TTS engine (Pico SVOX) is examined and a small, simple program is written with it to demonstrate it working. The next phase is one where the foundation of the program is built, with only extremely basic functionality present such as loading and parsing a dictionary file, translating the pronunciations so they can be read by Pico, and playing how they currently are. After that phase is complete, the next phase commences where the majority of the program is completed. This includes pronunciation editing and display, several visual indicators to mark where in the file the current word is located, and the ability to save the new pronunciations back to the file. The following phase includes the adding of several features such as a display of invalid phonemes for each word, a word and phoneme search, an option to choose which language to run the TTS engine in, and the ability to revert changes made to the file. The final phase is the testing phase where the viability of the program is tested among several different conditions and the functionality of each feature is tested. 7

Software Design I initially decided on using Java Swing and Pico2Wav for this project. I decided on Swing because it is what I have the most experience in when building UIs, and because running system commands from Java programs (which is necessary in this program) is very simple and straightforward. Pico2Wav was chosen as a result of its very clear mapping of IPA to X-Sampa, two different phonetic notations, and its ability to play words based on their phonetic spelling, rather than their actual spelling. While I had some difficulty getting the Pico2Wav command to run from within the program initially and debated using a different language to overcome that issue, I eventually realized my problem and was able to fix it without having to start over in a different language. The diagram (right) illustrates the user-software interaction by outlining the basic flowchart the user would take when initally starting the program. It follows a straightforward path designed for simplicity and immediate usabilty. 8

Closing Statements Sustainability Impact Because this project is 100% software, the sustainability impacts are very minimal as there is very little hardware to worry about other than the computer running the program. With that said, it should still be mentioned that as a result of the reduced time it now takes to alter the dictionary files, it is possible the computer will remain on for less time thereby reducing overall energy consumption for those using this program. Health and Safety Impact This project has a very minimal impact on health and safety because of the fact that it is purely software. It is possible however that due to the decreased time required to edit these files, overeager users could have so much fun editing pronunciations they neglect their physical health. Conclusion The final implementation at the end of the project time did meet all of the initial expectations which were set. The following requirements were met: o Load and save an existing dictionary file o Display word and associated pronunciation o Mark which phonemes are invalid in pronunciation field 9

o Display current word number as well as total word count o Play button which plays current word s pronunciation o Next and Previous buttons which update with new word s information and play its pronunciation o Word/Phoneme search to navigate dictionary easier o Change language TTS voice uses The progress made on this program was the result of a familiarity with the programming aspects involved with this project as well as a solid conceptual understanding of utilizing the Pico SVOX TTS engine. Overall this project was a success as all of the requirements which were initially declared were met, and the final product is a stable working program. 10