Mobile Speech Processing

Size: px
Start display at page:

Download "Mobile Speech Processing"

Transcription

1 Mobile Speech Processing David Huggins-Daines Language Technologies Institute Carnegie Mellon University September 19, 2008

2 Outline Mobile Devices What are they? What would we like to do with them? Mobile Speech Applications Mobile Speech Technologies Current Research

3 Mobile Devices What is a mobile device? A hammer is a device, and you can carry it around with you! But no, that s not what we mean here

4 Mobile Devices What is a mobile device? A device that goes everywhere with you... which provides some or all of the functions of computer... and some things it doesn t, such as a cell phone or GPS.

5 Speech on Mobile Devices Why do we care about speech processing on these devices? Because they are the future of computers Because speech is actually a useful way to interact with them, unlike full-sized computers What kind of speech processing do we care about? Speech coding to improve voice quality for cellular and VoIP Speech recognition for hands-free input to apps Speech synthesis for eyes-free output from apps In some cases, speech is a natural and convenient modality In other cases, it is a necessity (e.g. in-car navigation)

6 Speech on Mobiles vs. Mobile Speech None of this necessarily implies doing actual speech processing (aside from coding) on the device itself Telephone dialog systems are mobile by any definition Let s Go - bus scheduling information HealthLine - medical information for rural health workers But all synthesis and recognition is done on a server This can be a good thing especially in the latter case You can t run a speech recognizer on a Motofone or a Nokia 1010 Speech processing on the device is useful for: Multimodal applications Disconnected applications Access to local data

7 Some Mobile Speech Applications GPS navigation Older systems used a small number of recorded prompts ( turn left, 100 metres, etc) More recently, TTS has been used to speak street names Even more recently, ASR is used for input Voice dialing Old systems used DTW and required training Newer ones build models from your address book Cactus for iphone - uses CMU Flite and Sphinx Voice-driven search (local, web, etc) Nuance, Vlingo, TellMe, Microsoft are all doing this Voice-to-text Typically server-based, requires a data connection on-line, ASR-based: Vlingo, Nuance off-line, human-assisted: SpinVox, Jott, ReQall Speech to Speech Translation

8 Mobile Speech Technologies Speech Coding Efficient digital representation of speech signals Fundamental for 2G and 3G cell networks and VoIP Speech Synthesis Speech output for commands, directions Text-to-speech for messages, books, other content Speech Recognition Command and control ( voice control ) Dictation (Speech-to-text for , SMS) Search input (questions, keywords) Dialogue

9 Speech Coding A fairly mature technology (started in the 1960s) Early versions were mostly for military applications Digital cell phone networks changed this dramatically Almost universally based on linear prediction and the source-filter model. Each sample is a weighted sum of P previous samples. Weights are linear prediction coefficients (LPCs), and are calculated to minimize mean squared error. Conveniently enough, this is actually a good model of the frequency response of the vocal tract (given enough LPCs). An excitation function models the glottal source. Everything else is just tweaking Better excitation functions (CELP) Variable bit rates (AMR) Compression tricks (VAD + comfort noise)

10 Mobile Speech Synthesis Two traditional categories, one new one Synthesis by rule, e.g. formant synthesis Concatenative synthesis, e.g. diphone, unit selection Statistical-parametric synthesis ( HMM synthesis ) We have had very efficient (often hardware-based) implementations of TTS for decades They sound terrible (but are often quite intelligible) The challenges for mobile devices are: Achieving natural-sounding speech Dealing with very large, irregular vocabularies Dealing with raw and diverse input text

11 Mobile Speech Synthesis Unit selection currently gives the most natural output But it is very ill-suited to mobile implementations Best systems use gigabytes of speech data But, you say... I have an 8GB microsd card in my phone! Search time: finding the right units of speech Access time: loading them from the storage medium Signal generation can also be time-consuming if not efficiently implemented Some ways to improve efficiency: Compress the speech database Prune the speech database by discarding units that are infrequently or never used Approximate search algorithms (much like ASR)

12 Mobile Speech Synthesis Statistical-parametric synthesis is quite promising Models are quite small (1-2MB) The search problem is nonexistent Parameter and waveform generation are the most time consuming parts currently Requires higher dimensionality parameterizations than concatenative synthesis Output parameters are smoothed using an iterative algorithm (similar to EM) Waveform generation from mcep is much slower than LPC Dictionary compression and text normalization Dictionary can be compressed by building letter-to-sound models and listing only the exceptions Efficient finite-state transducer representations can be created for pronunciation and text processing rules

13 Mobile Speech Recognition Challenges for mobile devices are: Variable and noisy acoustic environments Large vocabularies Open domain dictation input As with speech synthesis, simple ASR is not very resource intensive, although it has not been as widely implemented Even with large vocabularies, ASR can be done efficiently The most important factor is the complexity of the grammar Commercial systems achieve impressive performance based on very constrained grammars Systems tend to be extensively tuned for a given application

14 Mobile Speech Recognition: Acoustic Issues How do you talk to a device? This depends on the application, user, and environment Acoustic feature vectors can look very different Microphones may not be optimized for all positions Noisy environments Mobile devices are more likely to be used in noisy environments Worse, they are more likely to be used in difficult ones Non-stationary noise, crosstalk, human babble Array processing is not well suited to handheld devices On the bright side: Usually a mobile device has only one user Speaker adaptation can improve acoustic modeling Speaker identification can be used to filter out babble and crosstalk

15 Mobile Speech Recognition: Computational Issues Acoustic feature extraction Efficient, as long as it is implemented properly Fixed-point arithmetic, data-parallel processing Most processing time is consumed by, in roughly equal amounts: Acoustic model evaluation Search (hypothesis generation and evaluation) These can be made computationally efficient but must also be made memory efficient, search in particular. This necessarily involves tuning heuristics because a complete solution is intractable.

16 Mobile Speech Recognition: Acoustic Modeling Exact acoustic model evaluation is intractable P(o s i, λ) = K k=1 w ik 1 (2π) D Σ ik exp D d=1 (o d µ ikd ) 2 2σ 2 ikd Typical continuous-density acoustic model: 5000 tied states, each with 32 Gaussian densities, of 39 dimensions Complete evaluation of all log-likelihoods for one 10ms frame: log-additions subtractions multiplications That s 2500 million operations per second! Your new MacBook Pro can do that, but just barely (yes, its video card can do it easily)

17 Mobile Speech Recognition: Acoustic Modeling How do we make this fast enough? Only evaluate densities for active phones in search Predict which densities will score highly using a smaller, approximate model set, and only evaluate these ones Use fewer densities and: Share them between all HMM states (semi-continuous HMM) or all the states for some phonetic class (phonetically-tied HMM) Make density computation faster by quantizing acoustic features and parameters Skip some frames in the input, either by Blindly computing only multiples of N (usually 2 or 3) Detecting interesting regions in the input and only computing densities there (landmark detection) Every ASR system in existence uses some combination of these However, too many approximations can make the system slower

18 Mobile Speech Recognition: Search Search is not arithmetically intensive It largely consists of adding up scores and comparing them to other scores However it is very memory intensive The search module in an ASR system touches: Acoustic scores Language model scores Dictionary entries Viterbi path scores and backpointers Backpointer table entries In other words, pretty much every piece of memory except the acoustic model parameters Worse yet, there are sequential dependencies between all these memory accesses

19 Mobile Speech Recognition: Search Fundamentally, the speed of the recognizer is proportional to the number of different hypotheses it considers at once Optimizing search is entirely devoted to reducing this number without significantly affecting accuracy This includes: Careful tuning of various thresholds (beams) for word transitions, phone transitions, etc. Absolute pruning - hard limits on words per frame Phonetic lookahead Language model lookahead (factorization / weight pushing) Finite-state transducer systems can be very fast Dictionary, grammar, and (part of) acoustic model are composed into a single decoding network Determinization - allows exact language model search Minimization - merges common subpaths Weight pushing - more general kind of LM lookahead

20 Common Problems for Mobile Speech Processing Moore s Law works differently for mobile devices Instead of getting faster, they get smaller and cheaper Storage gets bigger, RAM doesn t Memory doesn t get much faster Memory bandwidth is a major bottleneck Making things smaller almost always makes them faster Memory allocations can be very expensive (depending on the operating system) Audio input quality is often much lower Typically 8kHz or 11kHz maximum sampling rate Dubious microphones

21 Current and Future Research Incorporating user feedback in multimodal (speech + touch) applications Presenting information efficiently using speech synthesis Very low bitrate speech coding using ASR and TTS Distributed processing for mobile speech recognition Acoustic robustness for handheld mobile devices Voice and multimodal user interface design

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2720 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Speech recognition technology for mobile phones

Speech recognition technology for mobile phones Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such

More information

From Concept to Production in Secure Voice Communications

From Concept to Production in Secure Voice Communications From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure

More information

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification (Revision 1.0, May 2012) General VCP information Voice Communication

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

HIGH QUALITY AUDIO RECORDING IN NOKIA LUMIA SMARTPHONES. 1 Nokia 2013 High quality audio recording in Nokia Lumia smartphones

HIGH QUALITY AUDIO RECORDING IN NOKIA LUMIA SMARTPHONES. 1 Nokia 2013 High quality audio recording in Nokia Lumia smartphones HIGH QUALITY AUDIO RECORDING IN NOKIA LUMIA SMARTPHONES 1 Nokia 2013 High quality audio recording in Nokia Lumia smartphones HIGH QUALITY AUDIO RECORDING IN NOKIA LUMIA SMARTPHONES This white paper describes

More information

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

Reading Assistant: Technology for Guided Oral Reading

Reading Assistant: Technology for Guided Oral Reading A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director

More information

The ROI. of Speech Tuning

The ROI. of Speech Tuning The ROI of Speech Tuning Executive Summary: Speech tuning is a process of improving speech applications after they have been deployed by reviewing how users interact with the system and testing changes.

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain

More information

Dragon Medical Practice Edition v2 Best Practices

Dragon Medical Practice Edition v2 Best Practices Page 1 of 7 Dragon Medical Practice Edition v2 Best Practices 1. Hardware 2. Installation 3. Microphones 4. Roaming User Profiles 5. When (and how) to Make Corrections 6. Accuracy Tuning Running the Acoustic

More information

Analog-to-Digital Voice Encoding

Analog-to-Digital Voice Encoding Analog-to-Digital Voice Encoding Basic Voice Encoding: Converting Analog to Digital This topic describes the process of converting analog signals to digital signals. Digitizing Analog Signals 1. Sample

More information

Challenges and Solutions in VoIP

Challenges and Solutions in VoIP Challenges and Solutions in VoIP Challenges in VoIP The traditional telephony network strives to provide 99.99 percent uptime to the user. This corresponds to 5.25 minutes per year of down time. Many data

More information

Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.

Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home. matrix Recognition accuracy Recognition speed System configuration Turns your voice into text with up to 99% accuracy New - Up to a 15% improvement to out-of-the-box accuracy compared to Dragon version

More information

Speech Processing Applications in Quaero

Speech Processing Applications in Quaero Speech Processing Applications in Quaero Sebastian Stüker www.kit.edu 04.08 Introduction! Quaero is an innovative, French program addressing multimedia content! Speech technologies are part of the Quaero

More information

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION (Please read appropriate parts of Section 2.5.2 in book) 1. VOICE DIGITIZATION IN THE PSTN The frequencies contained in telephone-quality

More information

Simple Voice over IP (VoIP) Implementation

Simple Voice over IP (VoIP) Implementation Simple Voice over IP (VoIP) Implementation ECE Department, University of Florida Abstract Voice over IP (VoIP) technology has many advantages over the traditional Public Switched Telephone Networks. In

More information

White Paper April 2006

White Paper April 2006 White Paper April 2006 Table of Contents 1. Executive Summary...4 1.1 Scorecards...4 1.2 Alerts...4 1.3 Data Collection Agents...4 1.4 Self Tuning Caching System...4 2. Business Intelligence Model...5

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.) VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.) 1 Remember first the big picture VoIP network architecture and some terminologies Voice coders 2 Audio and voice quality measuring

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Enabling Speech Based Access to Information Management Systems over Wireless Network

Enabling Speech Based Access to Information Management Systems over Wireless Network Enabling Speech Based Access to Information Management Systems over Wireless Network M. Bagein, O. Pietquin, C. Ris and G. Wilfart 1 Faculté Polytechnique de Mons - TCTS Lab. Parc Initialis - Av. Copernic,

More information

Design of a Flexible User Interface Demonstrator, Control Software and Sensors

Design of a Flexible User Interface Demonstrator, Control Software and Sensors Design of a Flexible User Interface Demonstrator, Control Software and Sensors Editors: Paul Panek 1) & Håkan Neveryd 2) Contributors: Georg Edelmayer 1), Håkan Eftring 2), Gunilla Knall 2), Charlotte

More information

Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts

Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts Envox CDP 7.0 Performance Comparison of and Envox Scripts Goal and Conclusion The focus of the testing was to compare the performance of and ENS applications. It was found that and ENS applications have

More information

Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML

Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML Speech Recognition of a Voice-Access Automotive Telematics System using VoiceXML Ing-Yi Chen Tsung-Chi Huang ichen@csie.ntut.edu.tw rick@ilab.csie.ntut.edu.tw Department of Computer Science and Information

More information

Speech Recognition Software Review

Speech Recognition Software Review Contents 1 Abstract... 2 2 About Recognition Software... 3 3 How to Choose Recognition Software... 4 3.1 Standard Features of Recognition Software... 4 3.2 Definitions... 4 3.3 Models... 5 3.3.1 VoxForge...

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

customer care solutions

customer care solutions customer care solutions from Nuance white paper :: Understanding Natural Language Learning to speak customer-ese In recent years speech recognition systems have made impressive advances in their ability

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Video Conferencing. Femi Alabi UNC-CH - Comp 523 November 22, 2010

Video Conferencing. Femi Alabi UNC-CH - Comp 523 November 22, 2010 Video Conferencing Femi Alabi UNC-CH - Comp 523 November 22, 2010 Introduction Videoconferencing What Is It? Videoconferencing is a method of communicating between two or more locations where sound, vision

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION

DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION 1 Recognition Accuracy Turns your voice into text with up to 99% accuracy NEW - Up to a 20% improvement to out-of-the-box accuracy compared to Dragon version 11 Recognition Speed Words appear on the screen

More information

Conference Phone Buyer s Guide

Conference Phone Buyer s Guide Conference Phone Buyer s Guide Conference Phones are essential in most organizations. Almost every business, large or small, uses their conference phone regularly. Such regular use means choosing one is

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3 Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is

More information

S4 USER GUIDE. Read Me to Get the Most Out of Your Device...

S4 USER GUIDE. Read Me to Get the Most Out of Your Device... S4 USER GUIDE Read Me to Get the Most Out of Your Device... Contents Introduction 4 Remove the Protective Cover 5 Charge Your S4 5 Pair the S4 with your Phone 6 Install the S4 in your Car 8 Using the Handsfree

More information

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream RC23499 (W0501-090) January 19, 2005 Computer Science IBM Research Report CSR: Speaker Recognition from Compressed Packet Stream Charu Aggarwal, David Olshefski, Debanjan Saha, Zon-Yin Shae, Philip Yu

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

Microsoft Office Outlook 2013: Part 1

Microsoft Office Outlook 2013: Part 1 Microsoft Office Outlook 2013: Part 1 Course Specifications Course Length: 1 day Overview: Email has become one of the most widely used methods of communication, whether for personal or business communications.

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Title : Analog Circuit for Sound Localization Applications

Title : Analog Circuit for Sound Localization Applications Title : Analog Circuit for Sound Localization Applications Author s Name : Saurabh Kumar Tiwary Brett Diamond Andrea Okerholm Contact Author : Saurabh Kumar Tiwary A-51 Amberson Plaza 5030 Center Avenue

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Speech Processing 15-492/18-492. Speech Translation Case study: Transtac Details

Speech Processing 15-492/18-492. Speech Translation Case study: Transtac Details Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details Transtac: Two S2S System DARPA developed for Check points, medical and civil defense Requirements Two way Eyes-free (no screen)

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Efficient Data Structures for Decision Diagrams

Efficient Data Structures for Decision Diagrams Artificial Intelligence Laboratory Efficient Data Structures for Decision Diagrams Master Thesis Nacereddine Ouaret Professor: Supervisors: Boi Faltings Thomas Léauté Radoslaw Szymanek Contents Introduction...

More information

Linear Predictive Coding

Linear Predictive Coding Linear Predictive Coding Jeremy Bradbury December 5, 2000 0 Outline I. Proposal II. Introduction A. Speech Coding B. Voice Coders C. LPC Overview III. Historical Perspective of Linear Predictive Coding

More information

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,

More information

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information

More information

Calculating Bandwidth Requirements

Calculating Bandwidth Requirements Calculating Bandwidth Requirements Codec Bandwidths This topic describes the bandwidth that each codec uses and illustrates its impact on total bandwidth. Bandwidth Implications of Codec 22 One of the

More information

x64 Servers: Do you want 64 or 32 bit apps with that server?

x64 Servers: Do you want 64 or 32 bit apps with that server? TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called

More information

Setting up for Adobe Connect meetings

Setting up for Adobe Connect meetings Setting up for Adobe Connect meetings When preparing to lead a live lecture or meeting, you probably ensure that your meeting room and materials are ready before your participants arrive. You run through

More information

ASR Resource Websites

ASR Resource Websites ATIM Module ASR Page 1 of 5 ASR Resource Websites Adaptive Solutions, Inc. Dragon Products: voice recognition products and microphones. http://www.talksight.com/home.html Ars technical: Review of Speech

More information

Voice Driven Animation System

Voice Driven Animation System Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take

More information

TeamAgenda Synchronization under Windows. Microsoft Outlook as a Synchronization Gateway

TeamAgenda Synchronization under Windows. Microsoft Outlook as a Synchronization Gateway TeamAgenda Synchronization under Windows This document describes the process of synchronizing TeamAgenda s Address Book, Calendar and To Do List with handheld devices such as Personal Digital Assistants

More information

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,

More information

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles Sound is an energy wave with frequency and amplitude. Frequency maps the axis of time, and amplitude

More information

Develop Software that Speaks and Listens

Develop Software that Speaks and Listens Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Designing the NEWCARD Connector Interface to Extend PCI Express Serial Architecture to the PC Card Modular Form Factor

Designing the NEWCARD Connector Interface to Extend PCI Express Serial Architecture to the PC Card Modular Form Factor Designing the NEWCARD Connector Interface to Extend PCI Express Serial Architecture to the PC Card Modular Form Factor Abstract This paper provides information about the NEWCARD connector and board design

More information

Thin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals

Thin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals Thin Client Development and Wireless Markup Languages cont. David Tipper Associate Professor Department of Information Science and Telecommunications University of Pittsburgh tipper@tele.pitt.edu http://www.sis.pitt.edu/~dtipper/2727.html

More information

Dragon Solutions Enterprise Profile Management

Dragon Solutions Enterprise Profile Management Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible

More information

How To Synchronize With A Cwr Mobile Crm 2011 Data Management System

How To Synchronize With A Cwr Mobile Crm 2011 Data Management System CWR Mobility Customer Support Program Page 1 of 10 Version [Status] May 2012 Synchronization Best Practices Configuring CWR Mobile CRM for Success Whitepaper Copyright 2009-2011 CWR Mobility B.V. Synchronization

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Dovid Coplon, Product Management Director http://gipscorp.com

Dovid Coplon, Product Management Director http://gipscorp.com Dovid Coplon, Product Management Director http://gipscorp.com VoIP Quality Mobile VoIP & Technology Trends Greater accessibility and affordability Network operators are introducing new pricing models Handset

More information

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program. Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

More information

Exhibit n.2: The layers of a hierarchical network

Exhibit n.2: The layers of a hierarchical network 3. Advanced Secure Network Design 3.1 Introduction You already know that routers are probably the most critical equipment piece in today s networking. Without routers, internetwork communication would

More information

IVR Primer Introduction

IVR Primer Introduction IVR Primer Introduction Speech-enabled applications are quickly becoming very popular. Why? Because using voice to navigate is more natural for users than punching telephone keypads. Speech as a navigation

More information

Video-Conferencing System

Video-Conferencing System Video-Conferencing System Evan Broder and C. Christoher Post Introductory Digital Systems Laboratory November 2, 2007 Abstract The goal of this project is to create a video/audio conferencing system. Video

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

Abstract. Cycle Domain Simulator for Phase-Locked Loops

Abstract. Cycle Domain Simulator for Phase-Locked Loops Abstract Cycle Domain Simulator for Phase-Locked Loops Norman James December 1999 As computers become faster and more complex, clock synthesis becomes critical. Due to the relatively slower bus clocks

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Introduction to acoustic imaging

Introduction to acoustic imaging Introduction to acoustic imaging Contents 1 Propagation of acoustic waves 3 1.1 Wave types.......................................... 3 1.2 Mathematical formulation.................................. 4 1.3

More information

Tutorial about the VQR (Voice Quality Restoration) technology

Tutorial about the VQR (Voice Quality Restoration) technology Tutorial about the VQR (Voice Quality Restoration) technology Ing Oscar Bonello, Solidyne Fellow Audio Engineering Society, USA INTRODUCTION Telephone communications are the most widespread form of transport

More information

Solid State Drive Architecture

Solid State Drive Architecture Solid State Drive Architecture A comparison and evaluation of data storage mediums Tyler Thierolf Justin Uriarte Outline Introduction Storage Device as Limiting Factor Terminology Internals Interface Architecture

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

The Problem with Faxing over VoIP Channels

The Problem with Faxing over VoIP Channels The Problem with Faxing over VoIP Channels Lower your phone bill! is one of many slogans used today by popular Voice over IP (VoIP) providers. Indeed, you may certainly save money by leveraging an existing

More information

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA

More information

ORGANIZER QUICK REFERENCE GUIDE. Install GoToMeeting. Schedule a Meeting. Start a Scheduled Meeting. Start an Instant Meeting.

ORGANIZER QUICK REFERENCE GUIDE. Install GoToMeeting. Schedule a Meeting. Start a Scheduled Meeting. Start an Instant Meeting. GoToMeeting organizers on both personal and corporate plans can hold meetings with up to 25 attendees. Organizers must first create a GoToMeeting account and then download the GoToMeeting desktop application

More information

Technology Finds Its Voice. February 2010

Technology Finds Its Voice. February 2010 Technology Finds Its Voice February 2010 Technology Finds Its Voice Overview Voice recognition technology has been around since the early 1970s, but until recently the promise of new advances has always

More information