Hareo Hamada Department of Information and Communication Engineering, Tokyo Denki University, Tokyo 101, Japan

Local sound field reproduction using two closely spaced loudspeakers Ole Kirkeby a) and Philip A. Nelson Institute of Sound and Vibration Research, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom Hareo Hamada Department of Information and Communication Engineering, Tokyo Denki University, Tokyo 101, Japan Received 2 December 1997; accepted for publication 5 June 1998 When only two loudspeakers are used for the reproduction of sound for a single listener, time domain simulations show that it is advantageous that the two loudspeakers are very close together. The sound field reproduced by two loudspeakers that span 10 degrees as seen by the listener is simpler, and locally more similar to the sound field generated by a real sound source, than that reproduced by two loudspeakers that span 60 degrees. The basic physics of the problem is first explained by assuming that the sound propagates under free-field conditions. It is then demonstrated that when the influence of the listener on the incident sound waves is taken into account by modeling the listener s head as a rigid sphere, the results are qualitatively the same as in the free-field case. Consequently, two closely spaced loudspeakers are capable of accurately reproducing a desired sound field, not only at the ears of the listener but also in the vicinity of the listener s head. This result, although counter-intuitive, is very encouraging. In particular, it suggests that many low-fidelity audio systems, such as those currently supplied with most multi-media computers, can be greatly improved. 1998 Acoustical Society of America. S0001-4966 98 04309-4 PACS numbers: 43.20.Fn, 43.20.Px, 43.60.Pt, 43.66.Pn DEC INTRODUCTION A virtual source imaging system attempts to give a listener the impression that there is a sound source at a position in space where no real sound source exists. The overwhelming part of current research into virtual source imaging relies heavily on binaural technology. 1 4 A notable exception is when large arrays of loudspeakers are used for the reproduction. In that case it is possible to synthesize the entire sound field under certain conditions. 5 Binaural technology is based on the sensible engineering principle that if a sound reproduction system can generate the same sound pressures at the listener s eardrums as would have been produced there by a real sound source, then the listener should not be able to tell the difference between the virtual image and the real sound source. In order to determine these binaural signals, or target signals, it is necessary to know how the listener s torso, head, and pinnae outer ears modify incoming sound waves as a function of the position of the sound source. This information can be obtained by making measurements on dummy-heads or human subjects. 6,7 The results of such measurements are usually called head-related transfer functions, or HRTFs. When only two loudspeakers are used for the reproduction, it is necessary to consider how to deal with cross-talk. 8 12 Cross-talk in the context of sound reproduction is usually interpreted as the sound reproduced at a location where it was not intended to be heard. 8 For example, when a dummy-head recording 6 is played back over two loudspeakers, the sound emitted from the right loudspeakers, and heard at the left ear, is cross-talk. Similarly, the sound emitted from the left loudspeaker, and heard at the right ear, is also cross-talk. In 1966, the first method for cross-talk cancellation was developed by Atal et al. 13 Their method was based on a free-field model that did not account for the presence of a listener in the sound field. Since then, more sophisticated methods, some based on digital signal processing techniques, have been developed. 10 12,14 17 With a few notable exceptions, 11,18 most researchers have concentrated on systems using loudspeaker arrangements spanning an angle of typically 60 degrees as seen by the listener. A fundamental problem that one faces when using relatively widely spaced loudspeakers is that convincing virtual images are experienced only within a very tight bubble surrounding the listener s head. In contrast, a system using two closely spaced loudspeakers is surprisingly robust with respect to head movement. 19 The size of the bubble around the listener s head is increased significantly without any noticeable reduction in performance. We use the term stereo dipole 20,21 to describe such a virtual source imaging system since the inputs to the two closely spaced loudspeakers are close to being exactly out of phase over a wide frequency range. 22 Consequently, they reproduce a sound field very similar to that generated by a point dipole source. Strictly speaking, the field that they reproduce approximates that generated by a combination of a point dipole and a point monopole source. 23 In this paper, time domain simulations are used to show the form of the sound fields that are reproduced by two loudspeakers placed symmetrically about the median plane in front of the listener. We concentrate on two loudspeaker ara Now at the Department of Communication Technology, Institute for Electronic Systems, Aalborg University, Denmark. 1973 J. Acoust. Soc. Am. 104 (4), October 1998 0001-4966/98/104(4)/1973/9/$15.00 1998 Acoustical Society of America 1973

FIG. 2. Definitions of the electro-acoustic transfer functions C 1, C 2, A 1, and A 2. FIG. 1. The arrangement of the two loudspeakers and the two microphones. Note that left and right are defined to be relative to the listener. rangements: a wide spacing where the loudspeakers span 60 degrees as seen by the listener, and a narrow spacing where the loudspeakers span 10 degrees as seen by the listener. The results calculated by using a free-field model are compared to the results calculated by using a rigid sphere model. The emphasis is on the basic physical principles, and so we do not attempt to include the detailed high-frequency information contained in HRTFs measured on human subjects, or dummy-heads, in this analysis. I. GEOMETRY The geometry of the problem is shown in Fig. 1. Two loudspeakers sources, separated by the distance S, are positioned on the x 1 axis symmetrically about the x 2 axis. We imagine that a listener is positioned r 0 meters away from the loudspeakers directly in front them. The ears of the listener are represented by two microphones, separated by the distance 2a, that are also positioned symmetrically about the x 2 axis. Note that when we later refer to left and right, we consider this to be relative to the listener s point of view, as indicated in Fig. 1. The loudspeakers span an angle of as seen from the position of the listener. The shortest distance, in a straight line, from the loudspeakers to the microphones is the direct path r 1, and the furthest distance is the cross-talk path r 2. When the system is operating at a single frequency, we can use complex notation. Thus, the inputs to the right and left loudspeaker are denoted by V 1 and V 2, respectively, and the outputs from the right and left microphone are denoted by W 1 and W 2, respectively. The variables V 1, V 2, W 1, and W 2 are all complex scalars. II. ELECTROACOUSTIC TRANSFER FUNCTIONS A. General notation Since the loudspeaker microphone arrangement is symmetric about the x 2 axis, only two of the four transfer functions relating the loudspeaker inputs to the microphone outputs are different. In complex notation, these are denoted C 1 and C 2. Similarly, the transfer functions relating the input D to a virtual source to the microphone outputs are denoted by A 1 and A 2. This is shown in Fig. 2. Thus, for the two loudspeakers we have and C 1 W 1 W 2 V 1 V V 2 0 2 V 1 0 1a C 2 W 2 W 1. 1b V 1 V V 2 0 2 V 1 0 For the virtual source A 1 W 1 D and 2a A 2 W 2 D. 2b Using these two transfer functions, the output from the microphones as a function of the inputs to the loudspeakers is conveniently expressed as the matrix-vector multiplication w Cv, 3 where w W 1 W, 4 2 C C 1 C 2 5 C 2 C 1, and v V 1 V. 6 2 The aim of the system shown in Fig. 1 is to reproduce a pair of desired signals D 1 and D 2 at the microphones. Consequently, we will generally want W 1 to be equal to D 1, and W 2 to be equal to D 2. The desired signals will sometimes be 1974 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1974

FIG. 3. The variables used when calculating the sound field scattered from a rigid sphere. referred to individually as D 1 and D 2, and sometimes referred to as a vector d of the same type as w, d D 1 D. 7 2 B. Free-field conditions The sound field p ff radiated from a monopole in a freefield is given by exp jkr p ff j 0 q, 8 4 r where j is the square root of 1, is the angular frequency, 0 is the density of the medium, q is the source strength, k is the wave number /c 0 where c 0 is the speed of sound, and r is the distance from the source to the field point. 24 If V is defined as V j 0q 4, 9 then the transfer function C ff is given by C ff p ff V exp jkr. 10 r This expression is easy to implement and quick to evaluate numerically, and it is therefore very suitable for computer simulations. C. Rigid sphere model An analytical expression for the total sound field produced by a plane wave impinging on a rigid sphere was first published by Rayleigh before the turn of the century. 25 Since then, Rayleigh s result has been used by many other authors see, for example, Refs. 24 and 26. In the following, we will derive the expression for the total sound field produced by a spherical wave that impinges on a rigid sphere. This is straightforward to do once the spherical wave has been expanded into an appropriate infinite series. The results we need to do this are listed in the comprehensive work by Abramowitz and Stegun Ref. 27. The variables that are used in the derivation are shown in Fig. 3. A rigid sphere, whose radius is a, is positioned at the origin of the coordinate system. A monopole source is positioned on the positive half of the x 1 axis at a distance from the origin. A straight line connecting the field point with the origin has the length r, and it forms the angle with the x 1 axis. The sound pressure p t subscript t is for total, and p, rather than P, is used in order to avoid confusion with the Legendre polynomial at the field point can be written as the sum of two components: p ff, the incident field, and p s, the scattered field. Thus, p t p ff p s. 11 The incident field p ff is already known; it is given by Eq. 8. The scattered field p s must be determined by imposing the relevant boundary conditions on the total field p t. Since the sphere is assumed to be perfectly rigid, the particle velocity in the radial direction must be zero on the surface of the sphere. This is equivalent to requiring that the gradient of p t in the radial direction must be zero on the surface of the sphere, p t r r a 0. 12 In order to be able to impose this boundary condition on p t, p ff is expanded into an infinite series by using the results 10.1.45 and 10.1.46 listed on p. 440 in Ref. 27. These two results give series expansions for cos (kr)/kr and sin (kr)/kr, and these series can be combined to give a series expansion for exp ( jkr)/kr. By using the expression for V as defined in Eq. 9, wefind p ff jkv 2m 1 j m kr m 0 j m k jn m k P m cos. 13 In this expression, j m and n m are mth-order spherical Bessel functions of the first and second kind, respectively, and P m is the mth-order Legendre polynomial. It can be shown that the scattered field p s can be expanded into an infinite series of waves propagating outwards, away from the origin Ref. 28, pp. 533 535, p s kv b m j m kr jn m kr P m cos. 14 m 0 If the sign of jn m (kr), the second term in the square brackets, is reversed, then the series represents an infinite sum of waves converging on the origin rather than diverging from the origin. Although p s could, in principle, contain such converging waves, we do not have to consider these components since they violate the so-called radiation condition at infinity Ref. 28, p. 534. Consequently, the problem is now to choose the constants b m such that the boundary condition given by Eq. 12 is satisfied. After some algebra we find j m k jn m k b m j 2m 1 1 j n m ka /j m ka, 15 where the prime denotes differentation with respect to the function s single argument. The scattered field is now readily calculated by substituting b m back into Eq. 14. Note that the value of b m for a given wave number k and order m does not depend on the position of the field point, and so it is possible to speed up the numerical evaluation of p s dramatically by storing the constants b m in a two-dimensional 1975 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1975

is calculated at 65 frequencies between DC 0 Hz and the Nyquist frequency f Nyq half the sampling frequency, both frequencies included. The sampling frequency f s is 12.8 khz, and the time increment between each frame is three sampling intervals, which is equivalent to the time it takes the sound to travel approximately 8 cm. In each frame, the value of p t is calculated at 61 61 points over an area of 0.6 0.6 m 2. Values greater than 1 are plotted as white, values smaller than 1 are plotted as black, and values between 1 and 1 are shaded appropriately. The sphere, whose radius a is 9 cm, is positioned at the center of each frame. The source is positioned 0.5 m away from the sphere directly below it. Thus, the source is 20 cm below the bottom edge of each frame. The pulse emitted by the source is a Hanning pulse one period of a raised cosine specified by FIG. 4. The sound field produced by a sperical wave impinging on a rigid sphere. Note the secondary wave that starts radiating away form the bright spot at the back of the sphere. d pulse t 1 cos 0t /2, 0 t 2 / 0, 0, all other t, 16 lookup table rather than calculating them repeatedly for each field point. In addition, it is important to know that the series expansion of p ff converges very slowly for large kr whereas the series expansion of p s converges quite quickly for large kr. In fact, p s converges most slowly when the field point is on the surface of the sphere (r a). For values of ka smaller than about 10, it is more than enough to include only the first 20 terms of the series. Consequently, p t should be calculated as the sum of the simple expression for p ff given by Eq. 8, and the series expansion of p s given by Eq. 14. Itisnota good idea to calculate p t from the single infinite series that can be obtained by adding together the series expansions of p ff and p s Eqs. 13 and 14. Although this series looks quite compact when written out in full, it is not very efficient for calculating p t numerically. III. TIME DOMAIN SIMULATION OF SCATTERING FROM A RIGID SPHERE In practice, we can calculate the field scattered by the sphere only at a finite number of discrete points and frequencies. Consequently, the time response at a given field point must be calculated by an inverse fast Fourier transform of the sampled frequency response at that point. In order to avoid the undesirable wrap-around effect, 29 p t must be calculated at a relatively large number of frequencies even if we are interested in only a single snapshot of the total sound field. Thus, a time domain simulation typically requires several orders of magnitude more computational time than a frequency domain simulation. Nevertheless, it is still feasible to calculate the full time history of the scattered field on a fast PC. Figure 4 illustrates what happens when a short pulse emitted from a monopole source impinges on a rigid sphere. Figure 4 shows nine snapshots, or frames, of the total sound field p t. The frames are listed sequentially in a reading sequence from top left to bottom right; top left is the earliest time, bottom right is the latest time. The sound field where 0 is chosen to be 2 times 3.2 khz. The spectrum of this pulse has its first zero at 6.4 khz, and the main part of its energy is concentrated below 3 khz. This pulse, which is defined as a continuous function of time, is used for all freefield simulations. However, it is necessary to use a digital version of the Hanning pulse for the simulations based on the rigid sphere since in this case we do not have direct access to a time domain expression for the scattered field. Instead, the simulated time responses are calculated by an inverse Fourier transform of the sampled frequency response inverse FFT. The digital Hanning pulse is given by the time sequence 0, 0.5, 1, 0.5, 0. It is seen that when the incident wave hits the front of the sphere, it causes a reflected wave, which appears to be almost perfectly spherical, to radiate away from this point. This is not surprising since any hard obstacle reflects highfrequency components according to ray theory. However, it is interesting to observe that a similar wave starts to radiate away from the point directly at the back of the sphere a short while later. This phenomenon can be understood by realizing that the amplitude response at the point directly at the back of the sphere decays away quite slowly with frequency even though it is right in the middle of the shadow zone. All the paths the incident wave can follow around the sphere to this bright spot 30 have the same length, and so all components of the incident field arriving at this point are bound to be in phase and interfere constructively. In the time domain, this is associated with a secondary wave that propagates away from the bright spot. Note that the density plots in Fig. 4 are effectively clipped in order to make it easy to see the low-amplitude secondary wave. Had the sphere not been present, the amplitude of the incident field would have been two at the center of each frame, but since any value greater than one is plotted as white, the duration of the incident pulse looks greater than it really is. 1976 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1976

IV. OPTIMAL SOURCE INPUTS The pair of desired signals can be specified with two fundamentally different objectives in mind: cross-talk cancellation or virtual source imaging. Perfect cross-talk cancellation requires that a signal is reproduced perfectly at one ear of the listener while nothing is heard at the other ear, so if we want to produce a desired signal D 2 at the listener s left ear, then D 1 must be zero. Virtual source imaging, on the other hand, requires that the signals reproduced at the ears of the listener are identical up to a common delay and a common scaling factor to the signals that would have been produced there by a real source. For given values of D 1 and D 2, v must be calculated by solving Cv d for v. In order to be able to determine the impulse responses corresponding to V 1 and V 2 for general choices of C 1, C 2, D 1, and D 2,itis necessary to use a modeling delay, 31 and it is usually also necessary to use regularization in order to avoid very large values of V 1 and V 2 caused by ill conditioning of C. 12,17,32 V. CROSS-TALK CANCELLATION SYSTEMS A. Free-field conditions It is advantageous to define D 2 to be the product of D pulse and the phase factor exp ( jkr 1 ) since this ensures that the source inputs are both causal. The source inputs can be determined by solving the linear equation system Cv 0,D 2 T for v. After an inverse Fourier transform of v, we have the source inputs as a function of time. The solution for each source input 20,23 consists of an exponentially decaying train of delta functions convolved with d pulse which is a Hanning pulse defined as a continuous function of time, see Eq. 16. Figure 5 shows the two source inputs v 1 (t) and v 2 (t) for the two loudspeaker spans a 60 degrees and b 10 degrees. The x axis is normalized so that instead of showing time in seconds it indicates the distance in meters travelled by the sound as a function of time assuming that c 0, the speed of sound, is 340 m/s. The distance to the listener is 0.5 m, and the microphone separation, which is an approximation to the head diameter, is 18 cm. In Fig. 5 a, it is seen that when the loudspeakers span 60 degrees, each of the pulses emitted by the sources are clearly separated. This is because the duration of the Hanning pulse is short compared to the interval between adjacent pulses. When the loudspeaker span is reduced to 10 degrees this is no longer the case. Figure 5 b shows that the individual pulses then start to overlap because the interval between adjacent pulses is much shorter. This, in turn, increases the low-frequency energy content of the source inputs. As a rule of thumb, the interval between adjacent pulses is equivalent to a frequency f 0, which is reasonably well approximated by 100 khz divided by the loudspeaker span in degrees. 20 We call f 0 the ringing frequency. For a loudspeaker span of 60 degrees f 0 is 1.9 khz, for a loudspeaker span of 10 degrees f 0 is 11 khz. Figure 6 shows the sound field reproduced by the two loudspeaker spans a 60 degrees and b 10 degrees. Each of the two plots contains nine snapshots, or frames, of the sound field as in Fig. 4. The frames are listed sequentially in FIG. 5. The time response of the two source input signals v 1 (t) thick line and v 2 (t) thin line required to achieve perfect cross-talk cancellation at the listener s left ear under free-field conditions. The two loudspeaker spans are a 60 degrees and b 10 degrees. a reading sequence from top left to bottom right; top left is the earliest time (t 0.2/c 0 ), bottom right is the latest time (t 1.0/c 0 ). The time increment between each frame is 0.1/c 0 which is equivalent to the time it takes the sound to travel 10 cm. The normalization of D 2 ensures that the left loudspeaker starts emitting sound at exactly t 0; the right loudspeaker starts emitting sound a short while later. Each frame is calculated at 101 101 points over an area of 1 1 m 2. The positions of the loudspeakers and the microphones are indicated by circles. As in Fig. 4, values greater than 1 are plotted as white, values smaller than 1 are plotted as black, and values between 1 and 1 are shaded appropriately. Figure 6 a illustrates the cross-talk cancellation principle when the loudspeakers span 60 degrees. It is easy to identify a sequence of positive pulses from the left loudspeaker, and a sequence of negative pulses from the right loudspeaker. Both pulse trains are emitted with a ringing frequency of 1.9 khz. Only the first pulse emitted from the left loudspeaker is actually seen at the left microphone; consecutive pulses are cancelled out at both the left and right microphone. However, many copies of the original Hanning pulse are seen at other locations in the sound field, even very close to the two microphones. When the loudspeaker span is reduced to 10 degrees Fig. 6 b the reproduced sound field becomes much simpler. The desired Hanning pulse is now beamed towards the left microphone, and a similar line of cross-talk cancellation extends through the position of the right microphone. The only disturbance seen at most locations in the sound field is a single attenuated and delayed copy of the original Hanning pulse. This suggests 1977 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1977

FIG. 7. As in Fig. 5 cross-talk cancellation, but the listener s head is now modeled as a rigid sphere. FIG. 6. The sound field reproduced by two monopole sources whose inputs are adjusted to achieve perfect cross-talk cancellation at the listener s left ear under free-field conditions. The two loudspeaker spans are a 60 degrees and b 10 degrees. that reducing the loudspeaker span improves the system s robustness with respect to head movement. B. Rigid sphere model Figure 7 is equivalent to Fig. 5 but the source inputs are now calculated by using a rigid sphere model rather than a free-field model. For the purpose of visual presentation, the source inputs presented in Fig. 7, and also later in Fig. 11, have been calculated using a digital Hanning pulse given by the time sequence 0, 0.15, 0.5, 0.85, 1, 0.85, 0.5, 0.15, 0. Since the duration of this sequence is twice as long as the sequence 0, 0.5, 1, 0.5, 0, the sampling frequency has also been increased by a factor of 2, from 12.8 to 25.6 khz. Note that since these results are obtained by an inverse fast Fourier transform of sampled frequency data, there is no obvious way to calibrate the time axis, and so the time response has been manually shifted an integer number of sampling intervals in order to make Fig. 7 compatible with Fig. 5. It is seen that when the loudspeakers span 60 degrees Fig. 7 a, the presence of the sphere causes the recursive source inputs to decay away much quicker with time than in the free-field case Fig. 5 a. This is because the shadowing of the sphere reduces the amount of cross-talk that needs to be cancelled out. When the loudspeakers span only 10 degrees Fig. 7 b, the source inputs are very similar to the free-field case Fig. 5 b. This is not surprising since in this case the two points on each side of the sphere are very close to being on the edge of the shadow zone, and so within the frequency range that we are considering the sphere does not modify the free-field transfer functions much. Figure 8 is equivalent to Fig. 6 but the sound field is now calculated by using a rigid sphere model rather than a free-field model. As in Fig. 4, the sound field is plotted over an area of 0.6 0.6 m note that this area is smaller than the 1 1 m 2 square used in the free-field case. The time is calibrated such that in frame four first frame in second row the center of the Hanning pulse is at the left ear. It is seen that the total sound field is qualitatively similar to the sound field generated by two monopoles in a free field. When the loudspeakers span 60 degrees Fig. 8 a, the zone of efficient cross-talk cancellation is still quite small even though the recursive nature of the source outputs is less pronounced than in the free-field case Fig. 6 a whereas the 10 degrees loudspeaker span still produces a very clean sound field Fig. 8 b. VI. VIRTUAL IMAGING SYSTEMS It is, in principle, a trivial task to create a virtual source once it is known how to implement a cross-talk cancellation system. If the cross-talk cancellation problem can be solved for each ear separately, then any pair of desired signals can be reproduced by adding the two solutions together. In prac- 1978 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1978

FIG. 9. The time response of the two source input signals v 1 (t) thick line and v 2 (t) thin line required to create a virtual source at the position 0.5 m,0m under free-field conditions. The two loudspeaker spans are a 60 degrees and b 10 degrees. Note that the effective duration of both v 1 (t) and v 2 (t) decreases as the loudspeaker span is decreased. FIG. 8. As in Fig. 6 cross-talk cancellation, but the listener s head is now modeled as a rigid sphere. Note that each frame represents a smaller area than each of the frames shown in Fig. 6. tice it is far easier for the loudspeakers to create the signals due to a virtual source than to achieve perfect cross-talk cancellation at one point. A. Free-field conditions As in the cross-talk cancellation case, it is convenient to normalize the desired signals D 1 and D 2 in order to ensure causality of the source inputs. In addition, we want the loudest of the two desired signals to be equal to D pulse, it must not be scaled no matter what the distance is from the virtual source to the center of the listener s head. The desired signals are therefore defined as d e 1 jkr A 1 /A 2 1. 17 This definition assumes that the virtual source is to the left of the listener at a coordinate for which x 1 0). The source inputs can be calculated by solving Cv d for v, and the time domain responses can then be determined by taking the inverse Fourier transform. The result is that each source input is now the convolution of d pulse with the sum of two decaying trains of delta functions, one positive and one negative. 20 The virtual source is positioned at 0.5 m, 0 m, which means that it is at an angle of 45 degrees to the left relative to straight front as seen by the listener. Figure 9 shows the two source inputs v 1 (t) and v 2 (t) for the two loudspeaker spans a 60 degrees and b 10 degrees. When the loudspeakers span 60 degrees Fig. 9 a both the positive and the negative pulse trains can be seen clearly in v 1 (t) and v 2 (t), just as in the cross-talk cancellation case Fig. 5 a. However, as the loudspeaker span is reduced to 10 degrees Fig. 9 b, the positive and negative pulse trains start to cancel out. The two source inputs look roughly like square pulses of relatively short duration this duration is given by the difference in arrival time at the microphones of a pulse emitted from the virtual source. The advantage of the cancelling of the positive and negative parts of the pulse trains is that it greatly reduces the low-frequency content of the source inputs, and this is why virtual source imaging systems in practice are much easier to implement than cross-talk cancellation systems. Figure 10 shows another nine snapshots of the reproduced sound field. It is equivalent to Fig. 6, but for a virtual source at 0.5 m, 0 m indicated in the bottom right-hand corner of each frame rather than for a cross-talk cancellation system. The plots show how the arrival times of the desired pulses are matched at the two ears, and they suggest, once again, that the reproduced sound field becomes simpler as the loudspeaker span is reduced from 60 degrees Fig. 10 a to 10 degrees Fig. 10 b. Note that the inputs to the two 1979 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1979

FIG. 11. As in Fig. 9 virtual source imaging, but the listener s head is now modeled as a rigid sphere. FIG. 10. The sound field reproduced by two sources whose outputs are adjusted to create a virtual source at 0.5 m, 0 m under free-field conditions. The two loudspeaker spans are a 60 degrees and b 10 degrees. loudspeakers contain roughly the same amount of energy. This means that the loudspeakers have to work almost equally hard even when the virtual source is quite close to the left loudspeaker, as is the case when the loudspeaker span is 60 degrees. B. Rigid sphere model Figure 11 is equivalent to Fig. 9, but the source inputs are now calculated by using a rigid sphere model rather than a free-field model. The source inputs will produce the same sound pressures at the listener s ears as would have been produced there by the virtual source this means that the output from the virtual source is a delayed Hanning pulse in the free-field case, but d 1 (t) and d 2 (t) are no longer simple Hanning pulses. It is seen, as in the cross-talk cancellation case, that the presence of the sphere makes the recursive nature of the source inputs less pronounced when the loudspeakers span 60 degrees Fig. 11 a, whereas it has little effect on the source inputs when the loudspeakers span only 10 degrees Fig. 11 b. Figure 12 is equivalent to Fig. 8, but for a rigid sphere model rather than a free-field model. The first pulse from the virtual source is reproduced at the left ear at frame four, and a second pulse, which is significantly weaker than the first because of the shadowing of the sphere, is reproduced at the right ear at frame six. When the loudspeakers span 60 degrees Fig. 12 a, the reproduced sound field looks quite complicated. Since the amplitude of the desired pulse at the right ear is quite low, the total sound field is not too different from the sound field reproduced by the corresponding crosstalk cancellation system Fig. 8 a. When the loudspeakers span 10 degrees Fig. 12 b, the properties of the reproduced sound field are qualitatively the same as in the free-field case Fig. 10 b, as expected. VII. SUMMARY When only two loudspeakers placed symmetrically in front of a single listener are used to control the sound field around the head of the listener, the area over which the sound field can be controlled is larger when the two loudspeakers are close together than when they are far apart. However, the smaller the loudspeaker span is as seen by the listener, the harder it is to achieve efficient cross-talk cancellation at low frequencies, and, in addition, the more low-frequency energy is required to create a virtual image at a position well outside the two loudspeakers. In practice, a loudspeaker span of 10 degrees is a good compromise. The total sound field produced by a spherical wave impinging on a rigid sphere can be expressed as an infinite series. The scattered field, as opposed to the incident field, can be evaluated quite efficiently numerically from its series expansion. A time domain simulation shows that when the incident wave hits the front of the sphere, it causes a reflected wave to radiate away from this point, and that a 1980 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1980

FIG. 12. As in Fig. 10 virtual source imaging, but the listener s head is now modeled as a rigid sphere. Note that each frame represents a smaller area than each of the frames shown in Fig. 10. short while later a similar, but weaker, wave starts to radiate away from the bright spot directly at the back of the sphere. The properties of the sound field reproduced by a loudspeaker arrangement spanning 10 degrees as seen by the listener do not change markedly when a rigid sphere model is used instead of the simple free-field model. When the loudspeakers span 60 degrees, however, the shadowing of the sphere causes the series of impulses emitted from each of the two sources to decay away more rapidly than in the free-field case. Nevertheless, the zone of cross-talk cancellation still remains much smaller than when the loudspeakers span only 10 degrees. 1 F. L. Wightman and D. J. Kistler, Headphone simulation of free-field listening Parts I and II, J. Acoust. Soc. Am. 85, 858 867, 868 878 1989. 2 H. Møller, Fundamentals of binaural technology, Appl. Acoust. 36, 171 218 1992. 3 D. R. Begault, 3-D Sound for Virtual Reality and Multimedia AP Professional, Cambridge, MA, 1994. 4 J. Blauert, Spatial Hearing, The Physchophysics of Human Sound Localization MIT, Cambridge, MA, 1997. 5 A. J. Berkhout, D. de Vries, and P. Vogel, Acoustic control by wave field synthesis, J. Acoust. Soc. Am. 93, 2764 2778 1993. 6 M. Kleiner, Problems in the design and use of dummy-heads, Acustica 41, 183 193 1978. 7 H. Møller, C. B. Jensen, D. Hammershøi, and M. F. Sørensen, Evaluation of artificial heads in listening tests, presented at the 102nd Audio Engineering Society Convention, 22 25 March 1997, Munich, Germany. AES preprint 4404 A1. 8 P. Damaske, Head-related two-channel stereophony with loudspeaker reproduction, J. Acoust. Soc. Am. 50, 1109 1115 1971. 9 D. Griesinger, Equalization and spatial equalization of dummy-head recordings for loudspeaker reproduction, J. Audio Eng. Soc. 37 1/2, 20 29 1989. 10 D. H. Cooper and J. L. Bauck, Prospects for transaural recording, J. Audio Eng. Soc. 37, 3 19 1989. 11 J. L. Bauck and D. H. Cooper, Generalized transaural stereo and applications, J. Audio Eng. Soc. 44 9, 683 705 1996. 12 O. Kirkeby, P. A. Nelson, H. Hamada, and F. Orduna-Bustamante, Fast deconvolution of multi-channel systems using regularisation, IEEE Trans. Speech Audio Process. 6 2, 189 194 1998. 13 B. S. Atal, M. Hill, and M. R. Schroeder, Apparent Sound Source Translator, United States Patent Office, No. 3,236,949 22 February 1966. 14 P. A. Nelson, H. Hamada, and S. J. Elliott, Adaptive inverse filters for stereophonic sound reproduction, IEEE Trans. Signal Process. 40 7, 1621 1632 1992. 15 P. A. Nelson, F. Orduna-Bustamante, and H. Hamada, Inverse filter design and equalisation zones in multi-channel sound reproduction, IEEE Trans. Speech Audio Process. 3 3, 185 192 1995. 16 P. A. Nelson and F. Orduna-Bustamante, Multi-channel signal processing techniques in the reproduction of sound, J. Audio Eng. Soc. 44, 973 989 1996. 17 O. Kirkeby, P. A. Nelson, F. Orduna-Bustamante, and H. Hamada, Local sound field reproduction using digital signal processing, J. Acoust. Soc. Am. 100, 1584 1593 1996. 18 F. Heegaard, The reproduction of sound in auditory perspective and a compatible system of stereophony, EBU Rev., Pt. A-Technical, No. 50, pp. 2 6 1958 Dec. ; reprinted in J. Audio Eng. Soc. 40, 802 808 1992. 19 T. Takeuchi, P. A. Nelson, O. Kirkeby, and H. Hamada, Robustness of the performance of the Stereo Dipole to misalignment of head position, presented at the 102nd Audio Engineering Society Convention, 22 25 March 1997, Munich, Germany, AES preprint 4464 17. 20 O. Kirkeby, P. A. Nelson, and H. Hamada, The stereo dipole binaural sound reproduction using two closely spaced loudspeakers, presented at the 102nd Audio Engineering Society Convention, 22 25 March 1997, Munich, Germany. AES preprint 4463 I6. Also published in J. Audio Eng. Soc. 46 5, 387 395 1998. 21 O. Kirkeby, P. A. Nelson, and H. Hamada, Stereo dipole, Patent Application, PCT/GB97/00415, 1997. 22 O. Kirkeby and P. A. Nelson, Virtual source imaging using the stereo dipole presented at the 103rd Audio Engineering Society Convention, 26 29 September 1997, New York. AES preprint 4574 J10. 23 P. A. Nelson, O. Kirkeby, T. Takeuchi, and H. Hamada, Sound fields for the production of virtual acoustic images, J. Sound. Vib. 204 2, 386 396 1997. 24 P. M. Morse and K. Uno Ingard, Theoretical Acoustics McGraw Hill, New York, 1968. 25 J. W. S. Rayleigh, The Theory of Sound Dover, New York, 1945, 2nd rev. ed., Vol. 2. 26 I. Malechi, Physical Foundations of Technical Acoustics Pergammon, New York, 1969. 27 M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions Dover, New York, 1965. 28 D. G. Crighton, A. P. Dowling, J. E. Ffowcs Williams, M. Heckl, and F. G. Leppington, Modern Methods in Analytical Acoustics Springer- Verlag, New York, 1992. 29 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing Prentice Hall, Englewood Cliffs, NJ, 1975. 30 G. F. Kuhn, Physical acoustics and measurements pertaining to directional hearing, in Directional Hearing, edited by W. A. Yost and G. Gourevitch Springer-Verlag, New York, 1987. 31 B. Widrow and S. D. Stearns, Adaptive Signal Processing Prentice Hall, Englewood Cliffs, NJ, 1985. 32 W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C, 2nd ed. Cambridge U.P., Cambridge, 1992. 1981 J. Acoust. Soc. Am., Vol. 104, No. 4, October 1998 Kirkeby et al.: Local sound field reproduction 1981