Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX (36) 593-2, e-mail:park@tamuk.edu Abstract In this paper a simple algorithm to detect an emergency vehicle s siren using the linear prediction is presented. By measuring the means and variances of the reflection coefficients in a preselected number of successive frames, automatic detection of emergency vehicle s siren is possible. It has been shown that only two coefficients are enough for successful detection. Due to the simplicity of the algorithm, it can be implemented easily on any Texas Instrument s TMS DSs. Keywords linear prediction, reflection coefficients, pattern recognition. * This research was supported by NASA Grant Number NAG 9-93. I. Introduction Hearing impaired drivers or drivers who set the volume of their car s audio system very high cannot hear when an emergency vehicle, such as police car, fire engine, or ambulance, approaches. This may result in a collision or unnecessary delay. We propose an algorithm that can automatically detect an emergency vehicle s siren. It can be implemented easily on any TMS DSs. Once detection is made, it may be displayed visually on the dashboard, in practice. The siren usually has several dominant frequencies that typically last for several seconds. However, because of the Doppler effect, the frequencies are maintained within a tolerance for tens or hundreds of milliseconds. To find dominant frequencies, the linear prediction model is used. Linear predictive coding [] has been very successful for speech coding and efficient Durbin s recursive algorithm is used to find prediction coefficients. If the coefficients are maintained within a pre-selected tolerance for a pre-selected time, detection is made. In section II, the linear prediction is introduced. Efficient Durbin s method to compute linear prediction coefficients is explained in section III. The algorithm to detect a siren and some simulation results are explained in section IV. Finally, the conclusions are made in section V. II. Linear rediction In a variety of applications, it is desirable to compress a speech signal for efficient transmission or storage. For medium or low bit-rate speech coders, linear predictive coding (LC) [] is most widely used. Redundancy in a speech signal is removed by passing the signal through a speech analysis filter. The output of the filter, termed the residual error signal, has less redundancy than original speech signal and can be quantized by smaller number of bits than the original speech. The residual error signal along with the filter coefficients are transmitted to the receiver. At the receiver, the speech is reconstructed by passing the residual error signal through the synthesis filter. To model a human speech production system, the linear prediction model is used. This model not only works well for the speech but also for any kind of signal. Assume that the present sample of the signal is predicted by the past samples of the signal such that xn %( ) bxn ( ) + bxn ( 2) + L + bxn ( ) bxn ( m) () 2 m

where xn %( ) is the prediction of x(n), x(n k) is the k-th step previous sample, and {b m } are called the linear prediction coefficients. The error between the actual sample and the predicted one can be expressed as ε ( n) x( n) x% ( n) x( n) b x( n m). (2) m The sum of the squared error to be minimized is expressed as ( ) ( ) ( ) 2 E ε n x n bm x n m n n 2. (3) We would like to minimize the sum of the squared error. By setting to zero the derivative of E with respect to b m, one obtains n xn ( k) xn ( ) bxn m ( m) for k, 2, 3, Λ,. (4) Equation (4) results in unknowns in equations such that b xn ( kxn ) ( ) + b xn ( kxn ) ( 2) + L + b xn ( kxn ) ( ) 2 n n n xn ( kxn ) ( ) for k, 2, 3, Λ,. (5) n Let us assume that the signal is divided into frames each with N samples. If the length of each frame is short enough, the signal in the frame may be stationary. If there are N samples in the sequence indexed from to N- such that {x(n)} {x(), x(), x(2), Λ, x(n-2), x(n-)}, Equation (5) can be expressed in terms of matrix equation. r() r() L r ( 2) r ( ) b r() r() r() r( 3) r( 2) b 2 r(2) L M M O M M M M r ( 2) r ( 3) L r() r() b r ( ) r ( ) rp ( 2) L r() r() b r ( ) (6) where N k rk ( ) xnxn ( ) ( + k). (7) n To solve the matrix equation (6), the Durbin s method described in the next section is used. III. Durbin s Recursive Method The sum of squared errors of the -th order prediction (or simply the -th order prediction error) in Equation (3) can be rewritten as

E x( n) ε( n) bmx( n m) ε( n) n n (8) where the subscript of E denotes the order of prediction. Equation (4) can be rewritten as xn ( m) ε ( n) for m, 2, 3, Λ,. (9) n Because of Equation (9), the second summation of Equation (8) is zero. Thus, the final expression of the prediction error becomes E x( n) x( n) bmx( n m) n () m. r() b r() b 2 r(2) Λ b - r( ) b r() r() b r( m) We now want to develop a recursive method to solve Equation (6). Let us start from the order and increase it until the desired order reaches. When (i.e., when no prediction is made), the error is expressed from Equation (). E r(). () When, the error is expressed as E r() b r() (2) where the second subscript of b indicates that the prediction order in this case is. The solution of Equation (6) is b r()/r() κ (3) where κ is termed the reflection coefficient. Note that magnitude of κ is less than ( κ <) as r() is less than r(). Now the prediction error for becomes E r() κ r() r()[ κ 2 ] E [ κ 2 ]. (4) One can see that the prediction error E is smaller than E. When 2, Equations () and (6) can be combined in a single matrix equation r() r() r(2) E2 r() r() r() b 2 r(2) r() r() b 22 (5) Assume that the solution can be found recursively as shown below. b b κ b 2 2 b 22 (6) where κ 2 is the second reflection coefficient. The subscript 2 of b 2 and b 22 indicates that these are the second order linear prediction coefficients. Using Equation (6), Equation (5) becomes

r() r() r(2) E q2 E2 r() r() r() b κ b κ 2 2 r(2) r() r() q 2 E (7) where q 2 r(2) b r(). (8) Because q 2 κ 2 E from Equation (7), the second reflection coefficient becomes κ 2 q 2 /E. (9) The new prediction error for 2 becomes E 2 E κ 2 q 2 E [ κ 2 2 ]. (2) The linear prediction coefficients can be obtained using Equation (6) such that b 2 b κ 2 b and b 22 κ 2. (2) Now the recursive solution method for any prediction order is described below. Initial values: E r() b κ r()/e E E ( κ 2 ). With p 2, the following recursion is performed (i) q p r(p) (ii) κ p q p E( p ) p b r( p m) m( p ) (iii) b pp κ p (iv) b mp b m(p-) κ p b (p-m)(p-) for m, Λ, p (v) E p E (p-) [ κ p 2 ]. (vi) If p <, then increase p to p+ and go to (i). If p, then stop. IV. Results A signal is sampled at,25 Hz and divided into frames of 2 samples each. In each frame the first samples are used to compute reflection coefficients. In addition to reflection coefficients, we extracted linear prediction coefficients, roots of the prediction polynomial and the LS (line spectrum pair) frequencies [2]. All of them showed similar patterns. We chose reflection coefficients because of the simplicity. Fig. shows the first four reflection coefficients for 5 successive frames of an ambulance siren. The blue line (bottom) is for the first reflection coefficient, κ, and the green line (top) is for the second reflection coefficient, κ 2. As seen in Fig. (a), the first two reflection coefficients can be easily identified. The last two reflection coefficients are not distinguishable. In the case of wind noise, all four coefficients are mixed up. As the first two reflection coefficients have a distinct feature, the prediction order is chosen to be two from now on.

.4.3.5.2. -. -.5 -.2 -.3-5 5 (a) ambulance siren -.4 5 5 (b) typical wind noise Fig. lots of the first four reflection coefficients of 5 successive frames of two signals: (a) ambulance siren (b) typical wind noise. blue (bottom) first reflection coefficient; green (top) second, red third, cyan fourth. The first two reflection coefficients for four different signals are computed and displayed in Fig. 2. The police siren shows that two coefficients are widely separated and each coefficient has relatively small standard deviation. The fire engine siren has about the same property as the police siren except for the second part. The second part, in fact, was the sound of a horn rather than a siren. In the case of speech signal, vowel sounds show the same property as sirens. Consonants have a property like noise. In both cases, the second reflection coefficient shows relatively large standard deviation. Finally, in the case of noise, two coefficients are mixed up and cannot be distinguishable. Based on the observations we made so far, the algorithm to detect emergency vehicle s siren is proposed and summarized below.. A signal is sampled at,25 samples/sec and divided into frames of 2 samples. 2. First two reflection coefficients are calculated from the first samples of each frame. 3. For the next successive frames, means and standard deviations of two reflection coefficients are computed. 4. Detection is made if the difference between the mean of the first reflection coefficient and the mean of the second coefficient is greater than the pre-selected number (.2 in this case) and the sum of standard deviations of two reflection coefficients is smaller than the pre-selected threshold (.3). Even though each frame is 2 samples long, only the first samples are collected and used for computation of reflection coefficients. This frees up a microprocessor to compute two reflection coefficients in each frame. Little more than 3 multiplies and 3 adds are required to compute two reflection coefficients. In every tenth frame, the difference between means and the sum of standard deviation of two reflection coefficients obtained during the past nine frames and the current frame are computed. If these parameters satisfy the condition described above (item 4), a detection is made and the result may be displayed on a dashboard. Occasionally, some other signals may cause a false alarm.

However, this will be displayed only for about.2 seconds. A user may not pay attention if detection is displayed for a short period of time..5.5 -.5 -.5-5 5 (a) police car - 5 5 (b) fire engine.5.5 -.5 -.5-5 5 (c) human speech - 5 5 (d) noise Fig. 2 The first two reflection coefficients for 5 frames of four signals are plotted. (a) police car siren (b) fire engine (c) human speech (d) noise blue (bottom) first reflection coefficient; green (top) second reflection coefficient By applying the algorithm to aforementioned four signals, detections were made and the results are plotted in Fig. 3. olice siren was detected successfully except for the duration of frames. First part of the fire engine siren was detected successfully. It should be noted that the second part of the signal was from a horn rather than a siren. Human speech results in false alarm occasionally. However, it is so unlikely that

human speech will be loud enough to be picked up by a microphone that would be supposed to pick up loud sirens and mounted somewhere outside a car..5.5 5 5 (a) police car 5 5 (b) fire engine.5.5 5 5 (c) human speech 5 5 (d) noise Fig. 3 Detection ( ) and no detection ( ) of siren was made over the duration of frames (about.2 seconds long) and plotted for 5 frames. V. Conclusions In this paper a simple algorithm to detect an emergency vehicle s siren using the linear prediction model for hearing impaired drivers is presented. By measuring the means and variances of reflection coefficients in a pre-selected number of successive frames, automatic detection of emergency vehicle s siren is possible. It has been shown that only two prediction coefficients are enough for successful detection. Due to the simplicity of the algorithm, it can be implemented easily on any TMS DSs. The algorithm is not foolproof and may result in false alarms. However, they may happen for just.2 or.4 seconds occasionally and can be ignored. A display of detection for several seconds will definitely get an attention of a driver. References [] Atal, B.S. and S.L. Hanauer, Speech analysis and synthesis by linear prediction of speech wave, J. Acoust. Soc. Am., vol. 5, pp637-665, 97. [2] ark, S. and A. Ratanavarinchai, Simple Quantization of LS frequencies, roceedings of the IASTED/ISMM International conference on Modeling and Simulation, ittsburgh, A, April, 996 pp. 39-4.