Room Acoustic Reproduction by Spatial Room Response Rendering Hoda Nasereddin 1, Mohammad Asgari 2 and Ayoub Banoushi 3 Audio Engineer, Broadcast engineering department, IRIB university, Tehran, Iran, 2Assistant professor, Broadcast engineering department, IRIB University, Tehran, Iran. 3 Assistant professor, NRP department, INRA Organization,Tehran,Iran. *hoda.nassereddin@gmail.com Abstract Spatial room response rendering is a recent technique to reproduce the acoustics of an enclosed space in a spatial form. In this technique the original sound field is not precisely reproduced but based on psychoacoustics concepts for human localization and physics of sound, the original sound field is analyzed, and a kind of sound field is synthesized that is perceptually corresponded to the original one. To reach this goal, the room response is first measured and then desired response is synthesized. In this method, the loudspeaker system is not limited to a specific structure and any loudspeaker setup can be used for reproduction. In this paper, we report the results of applying spatial room response rendering technique to measured data of a room. In this work, an Exponential Sine Sweep (ESS) was used as excitation signal that ranges all audio frequencies, B-format microphone technique was used for the first time for room response measurement, and reproduction setup is based on three full band loudspeakers assembly. Time dependency of arrival direction and diffuseness of the measured room responses are analyzed and based on the results of this analysis, a multichannel response is synthesized for the loudspeaker setup. Keywords: Spatial rendering; acoustic reproduction; B-format microphone technique. 1. Introduction Multichannel loudspeaker setups are currently used for surround sound reproduction. The multidimensional sound that has been produced in this way has a good directional accuracy. By adding more channels, the precision can be further enhanced [1]. Multichannel microphone techniques have to be used in order to get use of multichannel loudspeaker setups. Conventional microphone techniques have problems coping with the wide variety of existing loudspeaker setups. So instead of multichannel microphone techniques, close microphone techniques are used [1]. In close microphone techniques a microphone is placed near an audio source and a dry sound is captured by the microphone. After recording all the audio sources by this way, virtual placing of the sources takes place by amplitude panning. Spatial impression is created with the help of reverberators or by 1
adding the signals of additional microphones placed further away from the source(s) in the recording room. So with convolving reverberators it has become possible to simulate recording in any performance venue using close microphone recording and measured room responses. The problems of conventional microphone techniques also apply to capturing the impulse response. Spatial response rendering(srr) is a new processing method to overcome this problems by using a perceptually motivated analysis- synthesis approach. In other processing methods like wave field synthesis(wfs) many loudspeaker has to be use in order to create accurate spatial sound field but spatial response rendering can be used for any loudspeaker setup. The SRR processing consists of analysis of direction and diffuseness of sound within frequency bands, followed by synthesis yielding multichannel impulse responses that can be tailored for an arbitrary loudspeaker system[1]. SRR is suitable for processing room responses for convolving reverberates. 2. Psychoacoustical background Due to practical, no established recording and reproduction technique can recreate the sound field of a recording space perfectly in a listening room[1]. However, typically this is not even the goal of recording. Instead of physical accuracy, it is more important to relay a perception. In order to be able to recreate the perception of an existing room or a hall, it is thus important to know what kind of information human listeners utilize in spatial sound perception [1]. Human sound localization is based on five frequencydependent cues which thus need to be reproduced. 1) Interaural time difference (ITD) and 2) interaural level difference (ILD) are the dominant cues for determining in which cone of confusion a sound source is located, that is, in which cone forming a constant angle with the line connecting both ear canal entrances of a listener. Although also depending on the cone of confusion, the most important role of 3) monaural spectral cues is resolving the direction of the source within the cone of confusion. Furthermore, 4) the effect of head rotation on the previous cues helps in determining the correct direction[1]. In addition human listeners are sensitive to 5) interaural coherence (IC) which recently has been proposed to be an important cue for localization in reverberant environments and multisource scenarios [1]. The first assumption for SRR processing is that, it is not necessary to reconstruct the original sound field perfectly in order to be able to reproduce spatial impression of an existing performance venue. Instead of sound field reconstruction. SRR aims at a time and frequency dependent recreation of features that are relevant for human perception as mentioned[1]. 3. Room response measurement A room or a hall has a considerable effect on the localization cues described in the previous section. The transmission of sound from a source to a receiver position can be described using room responses[1]. Room response measurement in this paper is done by B-format microphone technique as multichannel receiver and exponential sine sweep as impulse signal instead of convention impulse signals at IRIB University hall. B-format microphone consists of an ominidirectional microphone and three directional microphones placed near to each other and in corresponding Cartesian coordinates as showed in figure 1. A schematic of room response measurement is illustrated in figure 2. 2
2nd International Conference on Acoustics & Vibration (ISAV2012), Tehran, Iran, 26-27 Dec. 2012 Directional microphone in?x, y & z Cartesian directions. AKGC414 omniderctional mirophone Figure 1. B-format microphone setup. omniderctional speaker B-format mirophone Figure 2. schematic of room response measurement with B-format microphone technique. 4. Experimental method and results Based on psychoacoustical considerations, spatial room response rendering can now be formulated. The formulation consists of analysis and synthesis parts. In analysis part, the measured room responses at one point are transformed to useful data based on the previous considerations and in the synthesis part a spatial response is synthesised. 4.1 Analysis part The analysis has applied on the measured room responses by B-format microphone technique at point 3. In figure 3, a block diagram of analysis part has been illustrated. The analysis part is based on sound field energy analysis and specially sound intensity. Intensity vectors are extracted from B-format microphone output signals, w(t), x(t), y(t) and z(t). The omindirectional output signal is proportional to sound pressure, p(t), at the measuring point[1]. So 3
p(t)=w(t). (1) and directional output signals, x(t), y(t) and z(t) are proportional to particle velocity in Cartesian coordinates. In a plane wave, sound pressure p(t) and particle velocity u(t) have the following relationship[1] : Particle velocity calculated from B-format microphone signal is[1] : (2) where and,, are unit vectors in the directions of Cartesian coordinate axes. By substituting the Fourier transforms of Equ. (1) and (3) into equations of single-sided frequency distribution of active intensity in an analyzed window, equation (5),, (5) and single sided frequency distribution of distribution estimate, equation (6), (3) (4), (6) we now have the frequency distribution of the active intensity: and the diffuseness estimate:, (7), (8) using the intensity vector, the azimuth, and the elevation of the direction of arrival can be written in the form: [(-I y (E))/(-I x (E) )], (9), (10) where I z,i y,i x are the components of active intensity vectors in the directions of correspondening Cartesian coordinates axes[1]. 4
Omni directional output signal w(t) Directional output signal,x(t) Directional output signal,y(t) 3D intensity vrctors Energy density azimuth, elevation, diffuseness. Directional output signal,z(t) Figure 3. Block diagram of SRR analysis part. 4.2 Synthesis part In this part, based on the number of loudspeakers, a multichannel impulse response is creat ed. The impulse response can be loaded to convolving reverberators and by convolving an audio source with this multichannel impulse response, spatial sound is created. The auditory impact of the spatial sound is the same as the original hall. Processing procedure on the recorded signals is illustrated in figure 4. Extracted azimuth, elevation and diffuseness is used to synthesis a multichannel spatial response. According to the diffuseness,, the omnidirectional output signal is divided to the diffuse an undiffuse parts. Actually the undiffused part is directional section of the signal. And the signal has to be produced from the correct direction. Vector Base Amplitude Panning is used to put the signals in the correct direction[2]. In this paper vector based amplitude panning is implemented, based on 3 fullband loudspeakers and three directional signals according to corresponding Cartesian coordinates are produced. The final impulse responses are created after I transforms of these signals. In order to increase the auditory reverberation impact, reverberation impact is added to the final spatial impulse responses by convolving the diffuse part of the ominidirectional signal with short exponentially decaying noise burst[3]. This signal is send to all the three loudspeakers. Produced directional signals in corresponding Cartesian coordinates are illustrated in figures (5), (6) and (7). The results show that the response in x direction has lower peaks. Omni directional output signal w(t) Split energy by diffuseness estimation Non diffuse part diffuse part Non diffuse part Vector Base Amplitude Panning 3 directional signals diffuse part convolving with short exponentially decaying noise burst Signal to all speakers Figure 4. Block diagram of SRR synthesis part. 5
Figure 5. Synthesized directional(x(t)) signal. Figure 5. Synthesized directional(z(t)) signal. Figure 6. Synthesized directional(y(t)) signal. 6
Figure 7. Synthesized directional(z(t)) signal. 5. Conclusion In this paper, spatial room response rendering for spatial acoustic reproduction of an enclosed space has been implemented on multichannel measured room responses. Spatial room rendering is a novel multichannel method for spatial acoustic reproduction of enclosed spaces. In this method there are no limitations in number of loudspeakers and channels at destination and it can implemented with any loudspeaker setup. First of all, multichannel room response of IRIB University hall has been measured. Required data has been extracted from these responses in analysis part of spatial room rendering algorithm and then in synthesis part, based on these data, multichannel impulse response of the hall was produced. The multichannel spatial impulse response can be loaded to convolving reverberators to simulate the recording in any desired hall. REFERENCES 1. 2. 3. J. Merima, V. Pulkki, "Spatial Impulse Response Rendering, Analysis and Synthesis", Journal of Audio Engineering Society Vol. 53, No. 12, (2005). V. Pulkki, Virtual Sound Source Positioning Using Vector Base Amplitude Panning, Journal of Audio Engineering Society, Vol. 45, pp. 456-466(1997). M. O. J. Hawksford and N. Harris, " Diffuse Signal Processing and Acoustic Source Characterization for Application in Synthetic Loudspeaker Arrays", Proceeding of the 112 th convention of Audio Engineering Society, Vol. 50, pp. 511, 512 (2002). 7