67 Detection: Determining the Number of Sources

Transcription

1 Williams, D.B. Detection: Determining the Number of Sources Digital Signal Processing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 999 c 999byCRCPressLLC

2 67 Detection: Determining the Number of Sources Douglas B. Williams Georgia Institute of Technology 67. Formulation of the Problem 67.2 Information Theoretic Approaches AIC and MDL EDC 67.3 Decision Theoretic Approaches The Sphericity Test Multiple Hypothesis Testing 67.4 For More Information References The processing of signals received by sensor arrays generally can be separated into two problems: ) detecting the number of sources and 2) isolating and analyzing the signal produced by each source. We make this distinction because many of the algorithms for separating and processing array signals make the assumption that the number of sources is known a priori and may give misleading results if the wrong number of sources is used [3]. A good example are the errors produced by many high resolution bearing estimation algorithms e.g., MUSIC) when the wrong number of sources is assumed. Because, in general, it is easier to determine how many signals are present than to estimate the bearings of those signals, signal detection algorithms typically can correctly determine the number of signals present even when bearing estimation algorithms cannot resolve them. In fact, the capability of an array to resolve two closely spaced sources could be said to be limited by its ability to detect that there are actually two sources present. If we have a reliable method of determining the number of sources, not only can we correctly use high resolution bearing estimation algorithms, but we can also use this knowledge to utilize more effectively the information obtained from the bearing estimation algorithms. If the bearing estimation algorithm gives fewer source directions than we know there are sources, then we know that there is more than one source in at least one of those directions and have thus essentially increased the resolution of the algorithm. If analysis of the information provided by the bearing estimation algorithm indicates more source directions than we know there are sources, then we can safely assume that some of the directions are the results of false alarms and may be ignored, thus decreasing the probability of false alarm for the bearing estimation algorithms. In this section we will present and discuss the more common approaches to determining the number of sources. 67. Formulation of the Problem The basic problem is that of determining how many signal producing sources are being observed by an array of sensors. Although this problem addresses issues in several areas including sonar, radar,

3 communications, and geophysics, one basic formulation can be applied to all these applications. We will give only a basic, brief description of the assumed signal structure, but more detail can be found in references such as the book by Johnson and Dudgeon [3]. We will assume that an array of M sensors observes signals produced by N s sources. The array is allowed to have an arbitrary geometry. For our discussion here, we will assume that the sensors are omnidirectional. However, this assumption is only for notational convenience as the algorithms to be discussed will work for more general sensor responses. Theoutputofthemth sensor can be expressed as a linear combination of signals and noise N s y m t) = s i t i m)) + n m t). i= The noise observed at the mth sensor is denoted by n m t). The propagation delays, i m), are measured with respect to an origin chosen to be at the geometric center of the array. Thus, s i t) indicates the ith propagating signal observed at the origin, and s i t i m)) is the same signal measured by the mth sensor. For a plane wave in a homogeneous medium, these delays can be found from the dot product between a unit vector in the signal s direction of propagation, ζ i o, and the sensor s location, x m, ζ i m) = i o x m, c wherec is the plane wave s speed of propagation. Most algorithms used to detect the number of sources incident on the array are frequency domain techniques that assume the propagating signals are narrowband about a common center frequency, ω o. Consequently, after Fourier transforming the measured signals, only one frequency is of interest and the propagation delays become phase shifts N s Y m ω o ) = S i ω o ) e jωo i m) + N m ω o ). i= The detection algorithms then exploit the form of the spatial correlation matrix, R, for the array. The spatial correlation matrix is the M M matrix formed by correlating the vector of the Fourier transforms of the sensor outputs at the particular frequency of interest Y = [ Y 0 ω o ) Y ω o ) Y M ω o )] T. If the sources are assumed to be uncorrelated with the noise, then the form of R is R = E { YY } = K n + SCS, where K n is the correlation matrix of the noise, S is the matrix whose columns correspond to the vector representations of the signals, S is the conjugate transpose of S, and C is the matrix of the correlations between the signals. Thus, the matrix S has the form S = e jωo 0) e jωo Ns 0).. e jωo M ) e jωo Ns M ) If we assume that the noise is additive, white Gaussian noise with power σn 2 and that none of the signals are perfectly coherent with any of the other signals, then K n = σn 2I m, C has full rank, and the form of R is R = σn 2 I M + SCS. 67.).

4 We will assume that the columns of S are linearly independent when there are fewer sources than sensors, which is the case for most common array geometries and expected source locations. As C is of full rank, if there are fewer sources than sensors, then the rank of SCS is equal to the number of signals incident on the array or, equivalently, the number of sources. If there are N s sources, then SCS is of rank N s and its N s eigenvalues in descending order are δ, δ 2,, δ Ns.TheM eigenvalues of σn 2I M are all equal to σn 2, and the eigenvectors are any orthonormal set of length M vectors. So the eigenvectors of R are the N s eigenvectors of SCS plus any M N s eigenvectors which complete the orthonormal set, and the eigenvalues in descending order are σn 2 + δ,, σn 2 + δ N s, σn 2,, σ n 2.The correlation matrix is generally divided into two parts: the signal-plus-noise subspace formed by the largest eigenvalues σn 2 + δ,,σn 2 + δ N s ) and their eigenvectors, and the noise subspace formed by the smallest, equal eigenvalues and their eigenvectors. The reason for these labels is obvious as the space spanned by the signal-plus-noise subspace eigenvectors contains the signals and a portion of the noise while the noise subspace contains only that part of the noise that is orthogonal to the signals [3]. If there are fewer sources than sensors, the smallest M N s eigenvalues of R are all equal and to determine exactly how many sources there are, we must simply determine how many of the smallest eigenvalues are equal. If there are not fewer sources than sensors N s M), then none of the smallest eigenvalues are equal. The detection algorithms then assume that only the smallest eigenvalue is in the noise subspace as it is not equal to any of the other eigenvalues. Thus, these algorithms can detect up to M sources and for N s M will say that there are M sources as this is the greatest detectable number. Unfortunately, all that is usually known is R, the sample correlation matrix, which is formed by averaging N samples of the correlation matrix taken from the outputs of the array sensors. As R is formed from only a finite number of samples of R, the smallest M N s eigenvalues of R are subject to statistical variations and are unequal with probability one [4]. Thus, solutions to the detection problem have concentrated on statistical tests to determine how many of the eigenvalues of R are equal when only the sample eigenvalues of R are available. When performing statistical tests on the eigenvalues of the sample correlation matrix to determine the number of sources, certain assumptions must be made about the nature of the signals. In array processing, both deterministic and stochastic signal models are used depending on the application. However, for the purpose of testing the sample eigenvalues, the Fourier transforms of the signals at frequency ω o ; S i ω o ), i =,..., N s ; are assumed to be zero mean Gaussian random processes that are statistically independent of the noise and have a positive definite correlation matrix C. We also assume that the N samples taken when forming R are statistically independent of each other. With these assumptions, the spatial correlation matrix is still of the same form as in 67.), except that now we can more easily derive statistical tests on the eigenvalues of R Information Theoretic Approaches We will see that the source detection methods to be described all share common characteristics. However, we will classify them into two groups information theoretic and decision theoretic approaches determined by the statistical theories used to derive them. Although the decision theoretic techniques are quite a bit older, we will first present the information theoretic algorithms as they are currently much more commonly used AIC and MDL AIC and MDL are both information theoretic model order determination techniques that can be used to test the eigenvalues of a sample correlation matrix to determine how many of the smallest eigenvalues of the correlation matrix are equal. The AIC and MDL algorithms both consist of minimizing a criterion over the number of signals that are detectable, i.e., N s = 0,..., M.

5 To construct these criteria, a family of probability densities, fy θn s )), N s = 0,..., M, is needed, where θ, which is a function of the number of sources, N s, is the vector of parameters needed for the model that generated the data Y. The criteria are composed of the negative of the log-likelihood function of the density fy ˆθN s )),where ˆθN s ) is the maximum likelihood estimate of θ for N s signals, plus an adjusting term for the model dimension. The adjusting term is needed because the negative log-likelihood function always achieves a minimum for the highest dimension model possible, which in this case is the largest possible number of sources. Therefore, the adjusting term will be a monotonically increasing function of N s and should be chosen so that the algorithm is able to determine the correct model order. AIC was introduced by Akaike []. Originally, the IC stood for information criterion and the A designated it as the first such test, but it is now more commonly considered an acronym for the Akaike Information Criterion. If we have N independent observations of a random variable with probability density gy) and a family of models in the form of probability densities fy θ) where θ is the vector of parameters for the models, then Akaike chose his criterion to minimize Ig; f θ)) = gy) ln gy)dy gy) ln fy θ)dy 67.2) which is known as the Kullback-Leibler mean information distance. N AICθ) is an estimate of E{ gy) ln fy θ)dy} and minimizing AICθ) over the allowable values of θ should minimize 67.2). The expression for AI Cθ) is [ )] AICθ) = 2ln f Y ˆθ N s ) + 2η, whereη is the number of independent parameters in θ. Following AIC, MDL was developed by Schwarz [6] using Bayesian techniques. He assumed that the a priori density of the observations comes from a suitable family of densities that possess efficient estimates [7]; they are of the form fy θ) = expθ py) bθ)). The MDL criterion was then found by choosing the model that is most probable a posteriori. This choice is equivalent to selecting the model for which [ )] MDLθ) = ln f Y ˆθ N s ) + 2 η ln N is minimized. This criterion was independently derived by Rissanen [5] using information theoretic techniques. Rissanen noted that each model can be perceived as encoding the observed data and that the optimum modes the one that yields the minimum code length. Hence, the name MDL comes from Minimum Description Length. For the purpose of using AIC and MDL to determine the number of sources, the forms of the loglikelihood function and the adjusting terms have been given by Wax [8]. For N s signals the parameters that completely parameterize the correlation matrix R are {σn 2,λ,,λ Ns, v,, v Ns } where λ i and v i, i =,..., N s, are the eigenvalues and their respective eigenvectors of the signal-plus-noise subspace of the correlation matrix. As the vector of sensor outputs is a Gaussian random vector with correlation matrix R and all the samples of the sensor outputs are independent, the log-likelihood function of fy θ) is ) ln f Y σ n 2,λ,,λ Ns, v,, v Ns = π pn det R) N exp Ntr R R ))

6 where tr ) denotes the trace of the matrix, R is the sample correlation matrix, and R is the unique correlation matrix formed from the given parameters. The maximum likelihood estimate of the parameters are [2, 4] ˆv i = u i ; i =,,N s ˆλ i = ; i =,,N s 67.3) ˆσ n 2 = M N s = l, i=n s + where l,,l M are the eigenvalues in descending order of R and u i are the corresponding eigenvectors. Therefore, the log-likelihood function of fy ˆθN s )) is i=n s + ln fy l, l,,l Ns, u,, u Ns ) = ln M N s l /M N s) i i=n s + M N s )N Remembering that the eigenvalues of a complex correlation matrix are real and that the eigenvectors are complex and orthonormal, the number of degrees of freedom in the parameters of the modes classically chosen to be η = N s 2M N s ) +. Noting that any constant term in the criteria which is common to the entire family of models for either AIC or MDL may be ignored, we have the criterion for AIC as i= N AIC N s ) = 2N ln s N s 2M N s ); N s = 0,..., M M N s and the criterion for MDL as i= N MDL N s ) = N ln s + M N s + 2 N s 2M N s ) ln N; N s = 0,..., M. For both of these methods, the estimate of the number of sources is that value of N s which minimizes the criterion. In [9] there is a more thorough discussion concerning determining the number of degrees of freedom and the advantages of choosing instead η = N s 2M N s ). In general, MDL is considered to perform better than AIC. Schwarz [6], through his derivation of the MDL criterion, showed that if his assumptions are accepted, then AIC cannot be asymptotically optimal. He also mentioned that MDL tends toward lower-dimensional models than AIC as the model dimension term is multiplied by 2 ln N in the MDL criterion. Zhao et al. [4] showed that.

7 MDL is consistent the probability of detecting the correct number of sources, i.e., Pr N s = N s ), goestoasn goes to infinity), but AIC is not consistent and will tend to overestimate the number of sources as N goes to infinity. Thus, most people in array processing prefer to use MDL over AIC. Interestingly, many statisticians prefer AIC because many of their modeling problems have a very large penalty for underestimating the model order but a relatively mild penalty for overestimating it. Xu and Kaveh [2] have provided a thorough discussion of the asymptotic properties of AIC and MDL, including an examination of their sensitivities to modelling errors and bounds on the probability that AIC will overestimate the number of sources EDC Clearly, the only difference between the implementations of AIC and MDL is the choice of the adjusting term that penalizes for choosing larger model orders. Several people have examined using other adjusting terms to arrive at other criteria. In particular, statisticians at the University of Pittsburgh [3, 4] have developed the Efficient Detection Criterion EDC) procedure which is actually a family of criteria chosen such that they are all consistent. The general form of these criteria is [ )] EDCθ) = ln f Y ˆθ N s ) + ηc N, where C N can be any function of N such that ) lim N /N = 0 N 2) lim N / lnlnn)) =. N Thus, for the array processing source detection problem the EDC procedure chooses the value of N s that minimizes i= N EDC N s ) = N ln s + + N s 2M N s )C N ; N s = 0,..., M. M N s In their analysis of the EDC procedure, Zhao et al. [4] showed that not only are all the EDC criteria consistent for the data assumptions we have made, but under certain conditions they remain consistent even when the data sample vectors used to form the estimate R are not independent or Gaussian. The choice of C N = 2 lnn) satisfies the restrictions on C N and, thus, produces one of the EDC procedures. This particular criterion is identical to MDL and shows that the MDL criterion is included as one of the EDC procedures. Another relatively common choice for C N is C N = N lnn) Decision Theoretic Approaches The methods that we term decision theoretic approaches all rely on the statistical theory of hypothesis testing to determine the number of sources. The first of these that we will discuss, the sphericity test, is by far the oldest algorithm for source detection.

8 67.3. The Sphericity Test Originally, the sphericity test was a hypothesis testing method designed to determine if the correlation or covariance) matrix, R, of a length M Gaussian random vector is proportional to the identity matrix, I M, when only R, the sample correlation matrix, is known. If R I M, then the contours of equal density for the Gaussian distribution form concentric spheres in M-dimensional space. The sphericity test derives its name from being a test of the sphericity of these contours. The original sphericity test had two possible hypotheses H 0 : H : R = σ 2 n I M R = σ 2 n I M for some unknown σ 2 n. If we denote the eigenvalues of R in descending order by λ, λ 2,, λ M, then equivalent hypotheses are H 0 : λ = λ 2 = =λ M H : λ >λ M. For the appropriate statistic, T R), the test is of the form T R) H > < H 0 γ where the threshold, γ, can be set according to the Neyman-Pearson criterion [7]. That is, if the distribution of T R) is known under the null hypothesis, H 0, then for a given probability of false alarm, P F, we can choose γ such that PrT R) >γ H 0 ) = P F. Using the alternate form of the hypotheses, T R) is actually Tl,l 2,,l M ), and the eigenvalues of the sample correlation matrix are a sufficient statistic for the hypothesis test. The correct form of the sphericity test statistic is the generalized likelihood ratio [4] ) M M Tl,l 2,,l M ) = ln i= which was also a major component of the information theoretic tests. For the source detection problem we are interested in testing a subset of the smaller eigenvalues for equality. In order to use the sphericity test, the hypotheses are generally broken down into pairs of hypotheses that can be tested in a series of hypothesis tests. For testing M N s eigenvalues for equality, the hypotheses are i= H 0 : λ λ N s λ N s + = =λ M H : λ λ N s λ N s + >λ M. We are interested in finding the smallest value of N s for which H 0 is true, which is done by testing N s = 0, N s =, until N s = M 2 or the test does not fail. If the test fails for N s = M 2,

9 then we consider none of the smallest eigenvalues to be equal and say that there are M sources. If N s is the smallest value for which H 0 is true, then we say that there are N s sources. There is also a problem involved in setting the desired P F. The Neyman-Pearson criterion is not able to determine a threshold for given P F for the overall detection problem. The best that can be done is to set a P F for each individual test in the nested series of hypothesis tests using Neyman-Pearson methods. Unfortunately, as the hypothesis tests are obviously not statistically independent and their statistical relationship is not very clear, how this P F for each test relates to the P F for the entire series of tests is not known. To use the sphericity test to detect sources, we need to be able to set accurately the threshold γ according to the desired P F, which requires knowledge of the distribution of the sphericity test statistic Tl N s +,,l M) under the null hypothesis. The exact form of this distribution is not available in a form that is very useful as it is generally written as an infinite series of Gaussian, chisquared, or beta distributions [2, 4]. However, if the test statistic is multiplied by a suitable function of the eigenvalues of R, then its distribution can be accurately approximated as being chi-squared [0]. Thus, the statistic 2 N ) N s 2 ) 2 M N s + 6 N s ) 2 lī ) + M N s l ln i= M N s is approximately chi-squared distributed with degrees of freedom given by d = M N s ) 2, where l = M. Although the performance of the sphericity test is comparable to that of the information theoretic tests, it is not as popular because it requires selection of the P F and calculation of the test thresholds for each value of N s. However, if the received data does not match the assumed model, the ability to change the test thresholds gives the sphericity test a robustness lacking in the information theoretic methods Multiple Hypothesis Testing The sphericity test relies on a sequence of binary hypothesis tests to determine the number of sources. However, the optimum test for this situation would be to test all hypotheses simultaneously: H 0 : H : H 2 : H M : λ = λ 2 = =λ M λ >λ 2 = =λ M λ λ 2 >λ 3 = =λ M. λ λ 2 λ M >λ M to determine how many of the smaller eigenvalues are equal. While it is not possible to generalize the sphericity test directly, it is possible to use an approximation to the probability density function pdf ) of the eigenvalues to arrive at a suitable test. Using the theory of multiple hypothesis tests, we

10 can derive a test that is similar to AIC and MDL and is implemented in exactly the same manner, but is designed to minimize the probability of choosing the wrong number of sources. To arrive at our statistic, we start with the joint probability density function pdf ) of the eigenvalues of the M M sample covariance when the M N s smallest eigenvalues are known to be equal. We will denote this pdf by f N s l,...,l M λ λ N s + = = λ M) where the denote the eigenvalues of the sample matrix and the λ i are the eigenvalues of the true covariance matrix. The asymptotic expression for f N s ) isgivenbywongetal.[] for the complex-valued data case as Ns f N s l,...,l M λ λ N s + = =λ M) nmn 2 2 ) MM ) Ns π 2 2 ) Ɣ M n) Ɣ M N s ) Mi= λ n Mi= i li n M N s N s i= i<j li l j)λ i λ j λ i λ j { exp n M i= ) N s M i= j= N s + λ i } M ) li l j)λ i λ j λ i λ j Mi<j li l j ) 2 where n = N is one less than the number of samples and Ɣ N ) is the multivariate gamma function for complex-valued data []. We then form M likelihood ratios by dividing each joint pdf by f M ) to form N s ) = f N l s,...,l M λ λ N s + = =λ ) M, N s = 0,..., M. f M l,...,l M λ λ M ) Assuming that each value of N s is equally likely, then multiple hypothesis testing theory tells us that the value of N s that maximizes N s ) is the optimum choice in that it minimizes the probability of choosing the incorrect N s [7]. Because N s ) in this form requires knowledge of the unknown parameters λ i, we must use a generalized likelihood ratio test and independently substitute the maximum likelihood estimates of the λ i [see Eq.67.3) for these expressions] into both f N s ), for which we assume M N s equal λ i s, and f M ), for which we assume no equal λ i s, to get our new statistics N s ). After much simplification including dropping terms that are common to N s ) for every allowable value of N s and then taking the natural logarithm of each N s ), we get the statistic N s ) = ) i= N n N s ln s + M N s N s [ ] π N s2 )/2 ln ) + Ɣ M N s i= j= N s + 2 N s 2M N s ) ln[n]+ [ ] li l j ln l j=i+ 2ln [ ] li l j ) /2 l j where l = M. The terms in the first line of this equation are almost identical to the negative of the MDL criterion, especially when the degrees of freedom recommended in [9] are used. Note that the change in sign is necessary because we are finding the maximum of this criterion, not the minimum. The extra terms on the following line include both the eigenvalues being tested for equality and those not being tested. These extra terms allow this test to outperform the information theoretic techniques, since the use of all the eigenvalues for each value of N s being tested allows this criterion to be more adaptive.

11 67.4 For More Information Most of the original papers on model order determination appeared in the statistical literature in journals such as The Annals of Statistics and the Journal of Multivariate Analysis. However, almost all of the more recent developments that apply these techniques to the source detection problem have appeared in signal processing journals such as the IEEE Transactions on Signal Processing. More advanced topics that have been addressed in the signal processing literature but not discussed here include: detecting coherent i.e., completely correlated) signals, detecting sources in unknown colored noise, and developing more robust source detection methods. References [] Akaike, H., A new look at the statistical modedentification, IEEE Trans. on Automatic Control, AC-9, , Dec [2] Anderson, T.W., An Introduction to Multivariate Statistical Analysis, 2nd ed., John Wiley & Sons, New York 984. [3] Johnson, D.H. and Dudgeon, D.E., Array Signal Processing: Concepts and Techniques, Prentice-Hall, Englewood Cliffs, NJ, 993. [4] Muirhead, R.J., Aspects of Multivariate Statistical Theory, John Wiley & Sons, New York, 982. [5] Rissanen, J., Modeling by shortest data description, Automatica, 4, , Sept [6] Schwarz, G., Estimating the dimension of a model, Annal. Stat., 6, , Mar [7] Van Trees, H.L., Detection, Estimation, and Modulation Theory, Part I, John Wiley & Sons, New York, 968. [8] Wax, M. and Kailath, T., Detection of signals by information theoretic criteria, IEEE Trans. Acoustics, Speech, and Signal Processing, ASSP-33, , Apr [9] Williams, D.B., Counting the degrees of freedom when using AIC and MDL to detect signals, IEEE Trans. Signal Processing, 42, , Nov [0] Williams, D.B. and Johnson, D.H., Using the sphericity test for source detection with narrowband passive arrays, IEEE Trans. Acoustics, Speech, and Signal Processing, 38, , Nov [] Wong, K.M., Zhang, Q.-T., Reilly, J.P. and Yip, P.C., On information theoretic criteria for determining the number of signals in high resolution array processing, IEEE Trans. Acoustics, Speech, and Signal Processing, 38, , Nov [2] Xu, W. and Kaveh, M., Analysis of the performance and sensitivity of eigendecomposition-based detectors, IEEE Trans. Signal Processing, 43, , June 995. [3] Yin, Y.Q. and Krishnaiah, P.R., On some nonparametric methods for detection of the number of signals, IEEE Trans. Acoustics, Speech, and Signal Processing, ASSP 35, , Nov [4] Zhao, L.C., Krishnaiah, P.R. and Bai, Z.D., On detection of the number of signals in presence of white noise, J. Multivariate Analysis, 20, 25, Oct. 986.