Artificial speech in communications research

Similar documents
Specialty Answering Service. All rights reserved.

Lecture 1-10: Spectrograms

Mathematical modeling of speech acoustics D. Sc. Daniel Aalto

Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Introduction. The Articulatory System

Linear Predictive Coding

Neurogenic Disorders of Speech in Children and Adults

Tutorial about the VQR (Voice Quality Restoration) technology

Acoustics for the Speech and Hearing Sciences SPH 267 Fall 2003

Lecture 1-6: Noise and Filters

Thirukkural - A Text-to-Speech Synthesis System

Lecture 4: Jan 12, 2005

Graham s Guide to Synthesizers (part 1) Analogue Synthesis

L3: Organization of speech sounds

Speech Therapy for Cleft Palate or Velopharyngeal Dysfunction (VPD) Indications for Speech Therapy

Speech Signal Processing: An Overview

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

The Physics of Music: Brass Instruments. James Bernhard

English Phonetics: Consonants (i)

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

AM Receiver. Prelab. baseband

Articulatory Synthesis of Speech and Singing: State of the Art and Suggestions for Future Research

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Ph.D in Speech-Language Pathology

Domain and goal Activities Dancing game Singing/Vocalizing game Date What did your child do?

Investigating tongue movement during speech with ultrasound

Evolution from Voiceband to Broadband Internet Access

Welcome to the United States Patent and TradeMark Office

THE SIMULATION OF MOVING SOUND SOURCES. John M. Chowning

4 Phonetics. Speech Organs

Speech Production 2. Paper 9: Foundations of Speech Communication Lent Term: Week 4. Katharine Barden

Trigonometric functions and sound

THESE ARE A FEW OF MY FAVORITE THINGS DIRECT INTERVENTION WITH PRESCHOOL CHILDREN: ALTERING THE CHILD S TALKING BEHAVIORS

The IRCAM Musical Workstation: A Prototyping and Production Tool for Real-Time Computer Music

The Tuning CD Using Drones to Improve Intonation By Tom Ball

Check Your Hearing -

Voltage. Oscillator. Voltage. Oscillator

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

Introduction PURPOSE OF THE GUIDE

COMPETENT COMMUNICATION MANUAL (NEW)

Voice Authentication for ATM Security

Aircraft cabin noise synthesis for noise subjective analysis

TOOLS for DEVELOPING Communication PLANS

The Sonometer The Resonant String and Timbre Change after plucking

Analysis and Synthesis of Hypo and Hyperarticulated Speech

The Design and Implementation of Multimedia Software

Teaching Fourier Analysis and Wave Physics with the Bass Guitar

Program curriculum for graduate studies in Speech and Music Communication

Introduction 1.1 CONCEPT

62 Hearing Impaired MI-SG-FLD062-02

SIGNAL PROCESSING & SIMULATION NEWSLETTER

Adding Sinusoids of the Same Frequency. Additive Synthesis. Spectrum. Music 270a: Modulation

Common Pronunciation Problems for Cantonese Speakers

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Analog-to-Digital Voice Encoding

Defining the Bases of Phonetic Theory

Monophonic Music Recognition

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

PUSD High Frequency Word List

Bachelors of Science Program in Communication Disorders and Sciences:

Predicting Speech Intelligibility With a Multiple Speech Subsystems Approach in Children With Cerebral Palsy

Technology in Music Therapy and Special Education. What is Special Education?

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

THE MEASUREMENT OF SPEECH INTELLIGIBILITY

5 Free Techniques for Better English Pronunciation

I. INTRODUCTION. J. Acoust. Soc. Am. 115 (1), January /2004/115(1)/337/15/$ Acoustical Society of America

Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Review. IPA: The Vowels. Practice

Developmental Verbal Dyspraxia Nuffield Approach

Understanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt

The sound patterns of language

Radio Interface Setup

KODÁLY METHOD AND ITS INFLUENCE ON TEACHING WIND INSTRUMENTS IN HUNGARY

SPEECH Biswajeet Sarangi, B.Sc.(Audiology & speech Language pathology)

Conference Phone Buyer s Guide

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

The Competent Communicator Manual

DTS Enhance : Smart EQ and Bandwidth Extension Brings Audio to Life

Speech-Language Pathology Curriculum Foundation Course Linkages

Culture and Language. What We Say Influences What We Think, What We Feel and What We Believe

Waveforms and the Speed of Sound

DeNoiser Plug-In. for USER S MANUAL

How Children Acquire Language: A New Answer by Dr. Laura Ann Petitto

AM TRANSMITTERS & RECEIVERS

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

The Pronunciation of the Aspirated Consonants P, T, and K in English by Native Speakers of Spanish and French

Formant Bandwidth and Resilience of Speech to Noise

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

Acoustic Terms, Definitions and General Information

L9: Cepstral analysis

Functional Communication for Soft or Inaudible Voices: A New Paradigm

Treatment for Acquired Apraxia of Speech. Kristine Stanton Grace Cotton

THE ASYMMETRY OF C/V COARTICULATION IN CV AND VC

MS Learn Online Feature Presentation Speech Disorders in MS Featuring Patricia Bednarik, CCC-SLP, MSCS

Effects of Pronunciation Practice System Based on Personalized CG Animations of Mouth Movement Model

Tonal Analysis of Different Materials for Trumpet Mouthpieces

THE VOICE OF LOVE. Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard

Creating voices for the Festival speech synthesis system.

Learning Styles and Aptitudes

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

a leap ahead in analog

Transcription:

Artificial speech in communications research ASHA 2011 Brad Story Speech, Language, and Hearing Sciences University of Arizona Research supported by NIH R01-04789

What is artificial speech? Speech produced by mechanical, electronic, or digital means that emulates the human speech production system and/or the acoustic characteristics of the speech signal. Production Perception Speech signal MITalk (1979)

The Historical Challenge of Artificial Speech Without doubt it would be one of the most important discoveries to construct a machine that could properly express all sounds and tones of our speech with all articulations. Leonard Euler (1761) Letters to a German Princess Our ability to produce convincing artificial speech is a measure of the degree to which we understand human speech production

What purpose does artificial speech serve? As an augmentative device to aid those whose speech production system is impaired L. Euler (1761)- The preachers and orators whose voices were not strong or attractive enough could then play their sermons and discourses on such a[n artificial speech] machine, in the way that the organ players perform their pieces of music.

What purpose does artificial speech serve? As an augmentative device to aid those whose speech production system is impaired As an educational tool As an entertainment device As a text reading system As a research tool that facilitates investigations of speech production and speech perception

What purpose does artificial speech serve? As an augmentative device to aid those whose speech production system is impaired As an educational tool As an entertainment device As a text reading system As a research tool that facilitates investigations of speech production and speech perception

Development of artificial speech pre - 1600 1600 1700 1800 1900 2000

Development of artificial speech pre - 1600 1600 1700 1800 1900 2000 Talking Heads (deceptions) Albertus Magnus (1198-1280) Roger Bacon (1214-1294)

Development of artificial speech pre - 1600 1600 1700 1800 1900 2000 C. G. Kratzenstein (1780)* German physiologist/physicist Used pressure driven reeds to the excite air cavities corresponding to five vowels. (Won a prize offered by the Imperial Academy of St. Petersburg under the guidance of Leonard Euler) This device didn t actually talk ; it only produced static sounds that replicated the acoustic characteristics of the five vowels. *May have possibly been an inspiration for Mary Shelley s Dr. Frankenstein

Development of artificial speech pre - 1600 1600 1700 1800 1900 2000 Wolfgang von Kempelen (1791)* Hungarian Engineer/Industrialist Spent 20 years developing a talking machine to a large degree it simulated the human speech production system Bellows was used to drive the vibration of a metal reed A pliable leather tube was used as the vocal tract The original purpose was for the machine to become an augmentative device. *his credibility was compromised by construction of a chess automaton which truly was a deception.

Development of artificial speech pre - 1600 1600 1700 1800 1900 2000 Joseph Faber (1844-) German anatomist/mechanic The Amazing Talking Machine Perhaps the most well-designed and functional mechanical talking machine. Represented the sound generating parts of the speech production system (simulation)

What was Faber s purpose in developing a talking machine? It seems to have simply been a desire to create a machine that speaks like a human (to simulate human speech production) Some observers noted that the machine had a strong German accent but in general spoke better English than Faber himself Faber s own speaking patterns were imposed on the hand and foot motions used to produce speech with the machine!

Development of artificial speech pre - 1600 1600 1700 1800 1900 1928-1940 2000 In October, 1928, Homer Dudley of Bell Telephone Laboratories sketched in his technical notebook a device which subsequently became known as vocoder a term derived from the words VOice and CODER. Schroeder, 1966, IEEE This was the beginning of electronic speech coding and electronic speech synthesis

Vocoder The Idea: Send speech signals over the trans-atlantic telegraph cable(s) The Problem: Cable bandwidth was 100 Hz telephone quality speech bandwidth is about 3000 Hz But Dudley knew that the speech articulators moved rather slowly, and as a result produced slowly varying spectral characteristics. i.e., The wide (3000 Hz) speech bandwidth resulted from the comparatively high frequency of the voice excitation not the movement of the articulators. Solution: Transmit only the slowly-varying spectral characteristics and supply the high frequency excitation locally.

Decompose, send, and remake (synthesize) the speech Filter speech into discrete frequency bands Extract the amplitude envelopes in each band Send only the envelopes Unpack the envelopes over here Use them to remake the speech signal High frequency excitation (carrier) signal is provided locally Excitation (buzzer) Filter n.... Env. Mod. n Filter 2 Env. Mod. 2 Filter 1 Env. Mod. 1

Dudley s next step: Artificial speech as a means of understanding human speech production/perception After one believes he has a good understanding of the physical nature of speech, there comes the acid test of whether he understands the construction of speech well enough to fashion it from suitably chosen elements. -Dudley, 1940 Relating the vocoder to human speech production

VODER = Voice Operation DEmonstratoR (1939) Ten finger keys control the amplitude modulation (envelope) of each frequency band Essentially the same as the vocoder except that a human operator generates the input with finger, wrist and foot controls. Operators required at least one year of training to become fluent, intelligible speakers

The VODER was designed to be an exhibition at the 1939 San Francisco and New York World s Fairs

Similarities of a human talker and the voder In both cases: 1. Message originates in the brain of the sender 2. Transmission of control signals by the talker s nervous system to the appropriate muscles human 3. Muscles produce displacements of body parts formulating speech information as mechanical (syllabic) message waves. Human vocal tract movements Voder fingers, wrist, and foot voder 4. Slow modulations (message waves) are superimposed on a high frequency carrier (voice or noise) to make the signal audible.

Common Thread The operators of these synthesizers developed and internalized a set of rules or principles by which they coaxed the device into talking; In other words, they learned to play the machine.

The Sound Spectrograph (1946) Now the interest for artificial speech shifted to what could be seen in a spectrogram (i.e. formants)

Playback Synthesis Pattern Playback F. Cooper, 1950; 1952 To synthesize speech the user would literally paint the formants on a transparent film Frequency (Hz) Time sample

Pattern Playback was used to generate stimuli for many types of speech perception experiments Q: What is significant in the spectrographic pattern and what is not?

Formant Synthesis electrical resonators tuned to the formant frequencies observed in a spectrogram F0 Input (source) transfer function (filter) R1 F1 output R2 R3 F2 F3 R4 R5 F4 F5

Development of artificial speech pre - 1600 1600 1700 1800 1900 1950-1970 2000 Formant synthesis examples Gunnar Fant, OVE, 1953: Welcome Walter Lawrence, PAT, 1953: F2 F1 Time Gunnar Fant, OVE II: 1962: Walter Lawrence, PAT, 1962: Klatt, Stevens, Holmes, Rosen,

Note: Formant synthesizers were a starting point for Text-to-Speech Systems Need a set of rules to transform orthographic representations to phonetic and finally to acoustic.

Along came something new (or old)... Articulatory Synthesis Mathematical replication of the physics and physiology of the speech production system. - A computational form of the speaking machines of previous centuries Allows control of the positions and physical characteristics of the tongue, velum, jaw, lips, larynx/vocal folds. Coker, 1968; 1976 Mermelstein, 1973; Rubin et al., 1981

Computational Models of Articulatory Structures Synthesis Simulation Tongue FEM Wilhelms-Tricarico (1995) Baker (2008) Velum FEM Dang and Honda (2004) Perrier et al. (2003) Berry et al. (1999) Vocal Folds, Alipour and Titze Thomson, Mongeau & Frankel (2005)

The traditional motivation for research in speech synthesis has been simply to explain how humans use their vocal tracts to produce connected speech. -I. Mattingly (1974). Speech synthesis for phonetic and phonological models Pharynx Vocal tract Oral cavity Lips Glottis Trachea complex simple

Complexity vs Simplicity indeed, the purpose of a model is to substitute simple structures for complex ones. -F. Cooper (1961). Speech synthesizers

TubeTalker : Airway Modulation Model* Modulation of vocal tract shape (gestures): 1-D wave prop w/losses Voice source: glottal flow based on nonlinear interaction with VT/Tracheal pressures Energy source *Story, (2005). JASA, 117, 3231-3254

Modulation of the glottis by vocal fold vibration Modulation of the vocal tract shape vocal folds Titze (2006)

Build a phrase: Happy Birthday

A. Starting point: neutral vocal tract shape, constant voice source Even though the neutral vowel is neutral it does carry information about the speaker s identity.

Vocal tract modulations: continuous flow of vowel transitions interrupted by consonant constrictions C V V = V C V Happy Birthday

B: Vowel transitions

B. Second step: modulate vocal tract shape for vowel transitions, constant voice source

C. Impose consonants (constrictions) on the flow of vowel transitions d p,b

C. Impose consonants (constrictions) on the flow of vowel transitions

D. Not there yet- need to modify the voice source Must change fundamental frequency F0 (vibrational frequency of vocal folds). Abduct vocal folds for voiceless consonants (and respiration), and adduct for voiced consonants and vowels

Example: Vocal fold vibration w/adduction & abduction

Finally all of it together

Finally all of it together

Modifications to the system for Scaling of the vocal tract and vocal folds Hypo-adduction & vocal fold asymmetry (Robin Samlan) research purposes Vocal tremor (Rosemary Lester) Insufficient closure of nasal port (Co-PI: Kate Bunton) Centralized vowels Alternate timing of vocal tract movement

TubeTalker model scaled for age Vocal tract Adult 6 yr 2 yr Trachea The objective is to develop a model of sound production in children for the purpose of understanding how a child uses this system to generate speech.

Synthesis of singing - soprano Nonlinear interaction of source and filter Q: What is significant about the movement patterns of the vocal tract and vocal folds and what is not?

Development of artificial speech pre - 1600 1600 1700 1800 1900 1930 1940 1970 2000 The essential point here, as in all science, is that we must simplify nature if we are to understand nature. The great virtue of speech synthesizers is that they help us make such simplifications. -F. Cooper (1961). Speech synthesizers

The End

VODER = Voice Operation DEmonstratoR Ten finger keys control the amplitude modulation (envelope) of each frequency band Operators required at least one year of training to become fluent, intelligible speakers