What Audio Engineers Should Know About Human Sound Perception Part 2. Binaural Effects and Spatial Hearing AES 112 th Convention, Munich AES 113 th Convention, Los Angeles Durand R. Begault Human Factors Research & Technology Division NASA Ames Research Center Moffett Field, California
Overview ILD, ITD differences and lateralization HRTF spectral changes for 3D imagery Binaural versus monaural influence of echoes Effects of reverberation on perception of the environmental context Cues to auditory distance Cognitive and multisensory cues
Communication chain for acoustic events Sound source(s), interaction with room acoustics SOURCE MEDIUM RECEIVER Frequency Amplitude Spectrum Location
Communication chain for acoustic events Sound source(s), interaction with room acoustics Recording & playback: acousticalelectrical- acoustical transformation SOURCE MEDIUM RECEIVER Frequency Amplitude Spectrum Location
Communication chain for acoustic events Sound source(s), interaction with room acoustics Recording & playback: acousticalelectrical- acoustical transformation Hearing: perception, cognition, multisensory interaction SOURCE MEDIUM RECEIVER Frequency Amplitude Spectrum Location Pitch Loudness Timbre Localization
Communication chain for acoustic events Sound source(s), interaction with room acoustics Recording & playback: acousticalelectrical- acoustical transformation Hearing: perception, cognition, multisensory interaction SOURCE MEDIUM RECEIVER Mismatch between prescribed & perceived spatial events
Psychologically-driven Acoustic signal-driven Model of the binaural hearing system Binaural hearing (localization; signal separation & detection): forming spatial auditory events from acoustical (bottom-up) and psychological (top-down) inputs Figure adapted from Jens Blauert, Spatial Hearing. The Pychophysics of Human Sound Localization. Revised Edition. 1983, MIT Press.
Model of the binaural hearing system Binaural hearing (localization; signal separation & detection) Filtering of acoustic signal by pinnae, ear canal
Model of the binaural hearing system Binaural hearing (localization; signal separation & detection) Filtering by inner ear; frequency-specific neuron firings Filtering of acoustic signal by pinnae, ear canal
Model of the binaural hearing system Binaural hearing (localization; signal separation & detection) Physiological evaluation of interaural timing and level differences Filtering by inner ear; frequency-specific neuron firings Filtering of acoustic signal by pinnae, ear canal
Psychologically-driven Acoustic signal-driven Model of the binaural hearing system Multi-sensory information; cognition Binaural hearing (localization; signal separation & detection) Physiological evaluation of interaural timing and level differences Filtering by inner ear; frequency-specific neuron firings Filtering of acoustic signal by pinnae, ear canal.
Two important functions of the binaural hearing system for recording engineers: Localization (lateral and 3-dimensional) Binaural masking: Echo supression, room perception
Lateral localization of auditory images Duplex theory of localization ILD (interaural level difference) ITD (interaural time difference)
Lateral spatial image shift ILD (interaural level difference) caused by head shadow of wavelengths > 1.5 khz Level difference (db) Level difference (db)
Perceptual decoding of spatial cues in a cross-coincident microphone recording is based on ILDs rotation
Lateral image shift ITD (interaural time difference)
.5 1 1.5 Lateralization demo. A simple time or level difference can make headphone images move from side to side inside the head. Lateral shift from center of the head 5 4 3 2 1 Interaural level difference (db) 0 4 8 12 (max) 0 (center) 0.5 1 1.5 Interaural time difference (msec) Adapted from Toole & Sayers, 1965 and Blauert, 1983: click stimuli Adapted from Blauert, 1983: broadband noise 1. ILD DEMO: 2 db 4 db 6 db 8 db 12 db 2. ITD DEMO: 0.00 ms 0.25 ms 0.50 ms 0.75 ms 1.00 ms 1.50 ms
Elevation and front-back discrimination: HRTF, pinnae cues
The cone of confusion causes reversals for virtual sources with identical or near-identical ITD or ILD Source Left 1 5 0 Lis t ener Exte rnaliz e d pe rc e pt ion Source Left 3 0
Head-related transfer function cues (HRTFs) provide cues for front-back discrimination and elevation 10 Right 30, elevated 0 Log Magnitude (db) -10 Right 90, ear level -20 Right 120, below -30-40 -50 100 2000 4000 6000 8000 10000 12000 14000 16000 Frequency 3. audio example: HRTF clock positions 45, 0 135, 0
Variation in HRTF magnitude with elevation at one azimuth 4. Audio example: 120 degree azimuth: at +36, 0, -36 degrees elevation Graphic by William L. Martens, University of Aizu
Perceptual errors with headphone 3-D sound include inside-the-head localization (solution: reverberation cues) and reversals (solution: head tracking) TARGET POSITION REVERSAL ERROR WITH LOCALIZATION ERROR (ELEVATION) TARGET POSITION REGION OF LOCALIZATION ERROR (AZIMUTH) INTRACRANIAL LOCATION (DISTANCE ERROR) REVERSAL ERROR
Localization error for headphone stimuli (azimuth) Anechoic Speech: Individual differences 30 Mean values for different reverberation conditions Unsigned azimuth error (degrees) 25 20 15 10 5 0 anechoic early reflections full auralization reverberation treatment
Echoes, reverberation and background sound: perception of the environmental context
Spatial hearing fundamentally involves perception of the location of a sound source at a point in space (azimuth, elevation, distance). Listener Distance Image Size Elevation But a sound source simultaneously reveals information about its environmental context. Azimuth -reverberation -image size & extent Environmental Context
Effect of delay time for a single echo image shift image broadening echo 0 0.6 1.5 10 40 Approximate delay time to left channel (msec) Sound examples: 5. stereo echo- 6. monaural echo Relative to the reference condition, spatially separated echoes create spatial percepts; non-spatially separated echoes create timbral effects
Early and late reverberant sound fields D R1 7. Audio examples: -direct sound -direct w/ 1st, 2nd order ERs -direct with full auralization R2 Relative amplitude Direct sound Early reflections Late reflections (dense reverberation) Time
Early and late reverberant sound fields 8. audio examples: normal and 0.25 speed impulse response -5 Relative amplitude Relative amplitude - 1 0-1 5-2 0-2 5-3 0-3 5-4 0 0. 5 1 1. 5 2 2. 5 3 3. 5 4 4. 5 5 5. 5 Direct sound Early reflections Late reflections (dense reverberation) Time x 1 0 4
Echo thresholds Sensitivity can increase as much as 10 db if echoes occur at different locations Late reverberation can decrease sensitivity Sensitivity increases with increasing time delay
Although thresholds for reverberation are relatively low, background noise (e.g., NC 35) can mask the reverberant decay. 80 70 Noise Criteria (NC) curves 60 speech NC 65 NC 60 Reverberation threshold (speech) re 60 db SPL -10-15 -20-25 -30-35 -40 Small Medium Large 250 500 1000 2000 fbw 250 500 1000 2000 fbw 250 500 1000 2000 fbw 50 40 30 20 10 0 reverberation threshold Approximate Threshold of Hearing for Continuous Noise NC 5 One-Third Octave Band Center Frequency, Hz NC 55 NC 50 NC 45 NC 40 NC35 NC 30 NC 25 NC 20 NC 15 NC 10 31.5 63 125 250 500 1k 2k 4k 8k Octave-Band Center Frequency (fbw=full bandwidth)
Distance perception: amplitude cues The inverse square law states that sound decays 6 decibels per doubling of distance in a reflection-free environment. db SPL 85 79 73 67 1 ' 2 ' 4 ' 8 ' 9. sound example
Distance perception: amplitude cues However, half-as-loud corresponds to a 10 db reduction in level with distance db SPL 85 75 65 55 1 ' 2 ' 4 ' 8 ' 10. sound example
Distance perception: reverberant ratio cues An increase in reverberant level indicates movement into the diffuse sound field 94 91 88 Anechoic w/ ER w/ ER + LR 85 82 79 76 73 70 0 2 4 6 distance (feet) 8 10
Concert Hall reverberation physicalperceptual parameters Reverberance (reverberation time, strength) Apparent source width (ASW) (interaural cross-correlation) Envelopment (spatial diffusion of reflections from all around) Clarity (ratio of first 50-80 ms of early sound to late sound) Warmth (ratio of bass frequency RT to mid-band RT)
Cognitive cues; multisensory cues
Cognitive cues to distance perception Shouting Whispering
Auditory localization can be influenced or biased by cognitive mapping
Influence of visual, vibratory cues Helicopter fly-overs Explosions & crashes
Summary ILD, ITD differences and lateralization HRTF spectral changes for 3D imagery Binaural versus monaural influence of echoes Effects of reverberation on perception of the environmental context Cues to auditory distance Cognitive and multisensory cues