Overview. 9/8/11 PSY Speech 1

Similar documents
The sound patterns of language

Lecture 12: An Overview of Speech Recognition

Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Introduction. The Articulatory System

English Phonetics: Consonants (i)

Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Review. IPA: The Vowels. Practice

Thirukkural - A Text-to-Speech Synthesis System

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

L3: Organization of speech sounds

Some Basic Concepts Marla Yoshida

Speech Therapy for Cleft Palate or Velopharyngeal Dysfunction (VPD) Indications for Speech Therapy

Lecture 1-10: Spectrograms

4 Phonetics. Speech Organs

Mathematical modeling of speech acoustics D. Sc. Daniel Aalto

Bachelors of Science Program in Communication Disorders and Sciences:

SPEECH Biswajeet Sarangi, B.Sc.(Audiology & speech Language pathology)

Phonetics and Phonology

Prelinguistic vocal behaviors. Stage 1 (birth-1 month) Stage 2 (2-3 months) Stage 4 (7-9 months) Stage 3 (4-6 months)

Phonemic Awareness. Section III

Spanish-influenced English: Typical phonological patterns in the English language learner

Introduction PURPOSE OF THE GUIDE

Speech Production 2. Paper 9: Foundations of Speech Communication Lent Term: Week 4. Katharine Barden

Trigonometric functions and sound

How can a speech-language pathologist assess velopharyngeal function without instrumentation?

MATTHEW K. GORDON (University of California, Los Angeles) THE NEUTRAL VOWELS OF FINNISH: HOW NEUTRAL ARE THEY? *

Guidelines for Transcription of English Consonants and Vowels

The Vowels & Consonants of English

Ph.D in Speech-Language Pathology

Things to remember when transcribing speech

5 Free Techniques for Better English Pronunciation

Basic Acoustics and Acoustic Filters

Common Pronunciation Problems for Cantonese Speakers

Understanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt

SEDAT ERDOĞAN. Ses, Dil, Edebiyat, Öğrenim... TEMEL İNGİLİZCE. Ses dilin temelidir, özüdür... Türkiye de ses öğrenimi

EARLY INTERVENTION: COMMUNICATION AND LANGUAGE SERVICES FOR FAMILIES OF DEAF AND HARD-OF-HEARING CHILDREN

Functional Auditory Performance Indicators (FAPI)

Common Phonological processes - There are several kinds of familiar processes that are found in many many languages.

The Pronunciation of the Aspirated Consonants P, T, and K in English by Native Speakers of Spanish and French

Author's Name: Stuart Davis Article Contract Number: 17106A/0180 Article Serial Number: Article Title: Loanwords, Phonological Treatment of

Speech Signal Processing: An Overview

Stricture and Nasal Place Assimilation. Jaye Padgett

TOOLS for DEVELOPING Communication PLANS

Portions have been extracted from this report to protect the identity of the student. RIT/NTID AURAL REHABILITATION REPORT Academic Year

Effects of Pronunciation Practice System Based on Personalized CG Animations of Mouth Movement Model

Domain and goal Activities Dancing game Singing/Vocalizing game Date What did your child do?

4 Phonetics and Phonology

Your Hearing ILLUMINATED

CONSONANTS (ordered by manner of articulation) Chapters 4, 6, 7. The larynx s structure is made of cartilage.

BBC Learning English - Talk about English July 18, 2005

Pronunciation Difficulties of Japanese Speakers of English: Predictions Based on a Contrastive Analysis Steven W. Carruthers

THE VOICE OF LOVE. Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

Thai Pronunciation and Phonetic Symbols Prawet Jantharat Ed.D.

Lecture 1-6: Noise and Filters

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

THESE ARE A FEW OF MY FAVORITE THINGS DIRECT INTERVENTION WITH PRESCHOOL CHILDREN: ALTERING THE CHILD S TALKING BEHAVIORS

Indiana Department of Education

Nonverbal Factors in Influence

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Office Phone/ / lix@cwu.edu Office Hours: MW 3:50-4:50, TR 12:00-12:30

The Phonological Role in English Pronunciation Instruction

Glossary of commonly used Speech Therapy/Language terms

Tutorial about the VQR (Voice Quality Restoration) technology

Lecture 4: Jan 12, 2005

Neurogenic Disorders of Speech in Children and Adults

Monophonic Music Recognition

Department of English and American Studies. English Language and Literature

From Concept to Production in Secure Voice Communications

DRA2 Word Analysis. correlated to. Virginia Learning Standards Grade 1

62 Hearing Impaired MI-SG-FLD062-02

Quarterly Progress and Status Report. An ontogenetic study of infant speech perception

Create stories, songs, plays, and rhymes in play activities. Act out familiar stories, songs, rhymes, plays in play activities

NLPA-Phon1 (4/10/07) P. Coxhead, 2006 Page 1. Natural Language Processing & Applications. Phones and Phonemes

Chapter 4 COMMUNICATION SKILLS. The difference between verbal and nonverbal communication. The difference between hearing and listening

Part 1: Physiology. Below is a cut-through view of a human head:

Waveforms and the Speed of Sound

Quarterly Progress and Status Report. Preaspiration in Southern Swedish dialects

Fast Track to Reading Arabic Notes

ACOUSTIC CHARACTERISTICS OF CLEARLY SPOKEN ENGLISH TENSE AND LAX VOWELS

Phonics. Phonics is recommended as the first strategy that children should be taught in helping them to read.

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

BBC Learning English - Talk about English July 11, 2005

have more skill and perform more complex

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

Resonance in a Closed End Pipe

Acoustics and perception of emphasis in Urban Jordanian Arabic

Unit 1 Title: Word Work Grade Level: 1 st Grade Timeframe: 6 Weeks

Copyright 2008 Pearson Education, Inc., publishing as Pearson Addison-Wesley.

Alphabetic Knowledge / Exploring with Letters

Annotation Pro Software Speech signal visualisation, part 1

Program curriculum for graduate studies in Speech and Music Communication

Aircraft cabin noise synthesis for noise subjective analysis

THE ASYMMETRY OF C/V COARTICULATION IN CV AND VC

Formant Bandwidth and Resilience of Speech to Noise

Short vowel a The Apple huge Short vowel o Cute Baby [Boston accent] standard dialect

AP1 Waves. (A) frequency (B) wavelength (C) speed (D) intensity. Answer: (A) and (D) frequency and intensity.

The use of Praat in corpus research

Intonation difficulties in non-native languages.

Linear Predictive Coding

INCREASE YOUR PRODUCTIVITY WITH CELF 4 SOFTWARE! SAMPLE REPORTS. To order, call , or visit our Web site at

Transcription:

Overview 1) Speech articulation and the sounds of speech. 2) The acoustic structure of speech. 3) The classic problems in understanding speech perception: segmentation, units, and variability. 4) Basic perceptual data and the mapping of sound to phoneme. 5) Higher level influences on perception. 6) Physiology of speech perception and language. 9/8/11 PSY 719 - Speech 1

Vocal Tract 9/8/11 PSY 719 - Speech 2

Articulation - 1 The speech signal is the result of the movement of the tongue, lips, jaw, and vocal cords in modifying the air stream from the lungs. The movement of the articulators takes time (they have inertia). The movements are rapid (for communication). This leads to the phenomena of coarticulation. The preceding segment alters the precise realization of the current segment (inertia or perseveration). The next segment also alters how the current segment is realized (planning or anticipation). 9/8/11 PSY 719 - Speech 3

Articulation - 2 For example, when a vowel sound is produced, the vocal cords vibrate and the tongue is in a particular position within the oral cavity. The lips are open (either spread or rounded). For a nasal consonant, such as /m/, the uvula is pulled down and sound is allowed to resonate (flow) through the nasal cavity. During /m/ production, the lips are closed for a brief interval. For a fricative consonant, such as /s/, the vocal folds are held open (they do not vibrate) and air is forced through a narrow opening between the tongue and the alveolar ridge. This produces the noise quality of /s/. 9/8/11 PSY 719 - Speech 4

Articulation - 3 Basic dimensions of articulation: 1) Voicing - Vocal folds vibrate or are held open (voiced or voiceless) 2) Nasalization - Nasalized or not (uvula closed) 3) Place - Location in vocal tract of constriction bilabial, labiodental, inter-dental, alveolar, palatal, velar, glottal 4) Manner - Degree of constriction in vocal tract (open, moderate, constricted and closed) 9/8/11 PSY 719 - Speech 5

Phonemes Phonemes are the smallest segment of the signal that, if changed, would produce a different word with a different meaning. Thus, while words carry meaning, phonemes are the units from which words are built. /m/ and /b/ are different phonemes in English because /mqd/ (mad) and /bqd/ (bad) are different words. Different languages have different numbers of phonemes (Hawaiian has 11, Midwestern American English has 39), but all come from a universal set. All languages divide their inventory of phonemes into vowels and consonants. All languages group phonemes into sequences to form syllables. 9/8/11 PSY 719 - Speech 6

Phonemes - 2 Every phoneme represents the coordinated movement of the articulators that results in a different sound. Because the movement from one phoneme to the next is continuous, the precise sound that represents a phoneme varies with the nature of what precedes and follows it. This phenomenon is called coarticulation. The position of the articulators is similar to the different pipes in an organ. The position of the tongue, lips and jaw produces a set of resonators (tubes with a particular length and area). These amplify some frequencies and attenuate others. The pattern of this frequency information, over time, is speech. 9/8/11 PSY 719 - Speech 7

Some Terms The terms to describe the segments include allophonic, phonetic, phonemic and phonological: Phonetic a description of the sound segment that includes details of production. [p h ] vs [p] Phonemic a description of the sound segment where anything predictable is omitted. /p/ Allophonic variations within a segment category (phonetic) that do no change the identity of the segment (phonemic). Phonological The sound segment inventory and constraints on sequences. 9/8/11 PSY 719 - Speech 8

Acoustics of Speech The speech signal is broken down by the ear into a representation of the intensity at each frequency over time. The sound spectrogram is a similar representation of the acoustic information in speech. Dark areas are concentrations of energy at a particular frequency. When such a concentration occurs over time, it is called a formant. In the next graph, the energy in four syllables that start with the consonant /b/ and end in the consonant /g/ are shown. These syllables differ in the vowel and illustrate the different sound for these four vowels. 9/8/11 PSY 719 - Speech 9

bvg Examples time (100 msec) 5 formant frequency (khz) 3 1 /big/ /beg/ /bçg/ /bug/ 9/8/11 PSY 719 - Speech 10

Speech Acoustics - 2 The formants in the speech signal vary with the position of the tongue, lips and jaw. They are a cue for listeners to recognize the sounds of speech. For the vowel in /beg/ (beg), spoken by this male talker, the center frequencies of the first three formants at the middle of the syllable are approximately 560 Hz, 1750 Hz and 2400 Hz. For the consonants /b/ and /g/ at the beginning and end of each syllable, the formants change rapidly over time. These changes are called formant transitions and are critical to our ability to recognize consonants and vowels. 9/8/11 PSY 719 - Speech 11

Consonants and Vowels The sounds of speech vary in: 1) The frequencies and intensities of the formants 2) The pattern of change in the formants, over time 3) Voicing (whether the vocal folds vibrate or not) 4) The presence of nasal formants 5) Presence of noise in the spectrum and the frequency/ intensity distribution of the noise 6) The duration of 1 through 5 above 9/8/11 PSY 719 - Speech 12

The Challenge of Speech Perception The question of how humans perceive speech is complex because of two classical problems: 1) How is a continuous signal divided up into phonemes, syllables and words? 2) How does the listener recognize the sequence of phonemes, syllables and words when the speech signal changes because of differences in speaker, speaking rate, dialect, and coarticulation? 9/8/11 PSY 719 - Speech 13

Segmentation Speech is continuous. There are no breaks between words in fluent speech. One effect of coarticulation is to smear the boundaries between adjacent phonemes, syllables and words. As an illustration, consider the following sentence. Where does one word end and the next begin? The sentence is shown first as a waveform then as a spectrogram. 9/8/11 PSY 719 - Speech 14

Segmentation Illustration - 1 amplitude time(msec) In this sentence, where are the boundaries between words? 9/8/11 PSY 719 - Speech 15

Segmentation Illustration - 2 5 frequency (khz) 3 1 time 9/8/11 PSY 719 - Speech 16

Segmentation - 2 In fluent speech, silence occurs when we take a breath, when we deliberately pause for emphasis (or to think), or when a particular type of phoneme occurs (stops) in which the vocal tract is briefly closed. These silence intervals that occur with stops can be within words or between words. 9/8/11 PSY 719 - Speech 17

Segmentation Illustration - 3 Our waiter was rudely interrupted while working. silent gaps waiter interrupted working 9/8/11 PSY 719 - Speech 18

Variability Variability refers to the changes that occur in phonemes, syllables, and words because: 1) They occur with different other sounds before and after them. This influence of coarticulation alters the sound for a phoneme based on what came before and after. 2) Different talkers have different length vocal tracts and speak with different dialects and idiolects. 3) Talkers vary their rate of speech and the accuracy (carefulness) of their articulation. 9/8/11 PSY 719 - Speech 19

Variability: Coarticulation The formant transitions that characterize a /b/ or a /g/ change with the vowel. That is, the acoustic details of /b/ in the words beat, bit, bet, bat, box, bought, boat, book, boot, but, bird, bite, bout, and boy are different. One of the primary goals of research in speech is to find a way to characterize a pattern of change in the sound which is the same for all examples of a particular phoneme (e. g. /b/s) and distinguishes it from other phonemes (an invariant). Finding such a pattern for each consonant and vowel has so far proved elusive. 9/8/11 PSY 719 - Speech 20

Variability: Talkers 1) Individuals differ in vocal tract length and size of their larynx. They speak different dialects with different accents. This leads to different physical sounds that correspond to the same phonemes and words. 2) Individuals vary in how careful or sloppy they are in their articulation. In saying Did you get to know him well, Did you is often said as dija, get to becomes geta, and the h in him is omitted. In spite of this, listeners have relatively little difficulty understanding speech across this range of variation. 9/8/11 PSY 719 - Speech 21

Variability: Speaking Rate A person may intrinsically speak rapidly or slowly. Each individual also varies their rate of speech. This causes segments (phonemes and syllables) to vary in duration. However, some phonemes such as /b/ and /w/ ( beat and wheat ) are differentiated (in part) by their duration. This implies that listeners adjust their perception for the rate at which the person is speaking. Like size-distance scaling in vision, this implies that certain properties of the sound must be extracted first to properly perceive the distal object. 9/8/11 PSY 719 - Speech 22

Variability: Environment A person listens to speech in many different environments. They may hear a person speaking against a quiet background, against a background of environmental noise (e.g., a busy street) or against a background of other conversations. How does a listeners recover the intended message? How do they separate the aspects of sound that belong to the speech signal that they are trying to recognize from other sound or other speech? 9/8/11 PSY 719 - Speech 23

Perception In speech perception, listeners show evidence of phonetic constancy. They hear the same speech sounds in spite of variation in who is talking, how fast they talk, or other variation in the sound. From an ecological perspective, this has led to a search for invariant properties or features in the sound. When the feature is present, it would signal the listener that a particular phoneme has been spoken. While some invariant properties may have been identified, there are also phonemes for which no invariant properties have been found yet. 9/8/11 PSY 719 - Speech 24

Speech by Ear and by Eye The perception of speech can take advantage of visual information (from looking at the face of the speaker) in addition to the sound. That is, a listener is more accurate in their perception when they have both auditory and visual information. This can also lead to illusions. If we edit a video to show a speaker saying /ga/ while the audio track plays /ba/, an observer will report that they hear da. If the observer closes her/his eyes, they hear ba. Known as the McGurk Effect, this illustrates how listeners integrate information from two sensory systems in speech perception. 9/8/11 PSY 719 - Speech 25

Articulation of Vowels In English, vowels are voiced, non-nasal and their manner is vocalic and continuant (very little constriction). They are classified according to the position of the tongue and the shape of the lips. The point of maximal constriction in the oral tract with the tongue can be front, mid or back. The height of the tongue in the vocal tract can be high, mid or low. The lips can be rounded (round opening) or spread (wide, shallow opening). The vowel can be tense (long duration) or lax (short duration). 9/8/11 PSY 719 - Speech 26

Articulation of Vowels - 2 For example, /i/ as in beat is a high (height), front (front to back), spread (lip rounding), tense (long duration), non-nasal vowel. / / is a low, mid, spread, lax vowel. The next diagram show the position of the maximal constriction of the tongue in vowel production for most of the vowels of American English. 9/8/11 PSY 719 - Speech 27

American English Vowel Space i u high I U Symbols in dotted boxes are allophones E Q a A o ç low front back 9/8/11 PSY 719 - Speech 28

Articulation of Vowels - 3 The vowel / / was not show in the diagram because it is distinguished by the curvature of the tongue and not the tongue position or height, where it is similar to /U/ or / /. The diphthongs were not shown since they are characterized by movement of the tongue over time. 9/8/11 PSY 719 - Speech 29

200 300 i X X u 400 e X X I X X U X o F1 (Hz) 500 600 Q X X E X oi X! X ç A Talker s Vowel Space 700 ai X au X X a 800 900 2400 2000 1600 1200 800 F2 (Hz) 9/8/11 PSY 719 - Speech 30

Acoustics of Vowels Plotted as a function of the first formant (F1) and the second formant (F2) frequencies The diphthongs are shown as an arrow representing their formant frequencies at the beginning, middle and end. 9/8/11 PSY 719 - Speech 31

Other Languages There are attributes of vowel articulation that English does not use. 1) Nasalization. A vowel can be nasal or non-nasal. French and Hindi use this distinction. 2) Long or short (duration). A vowel can be long in duration or short. Japanese uses this distinction. Note that English speakers make long and short, nasal and non nasal vowels, but these are allophonic variations. 9/8/11 PSY 719 - Speech 32