# 67 Detection: Determining the Number of Sources

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Williams, D.B. Detection: Determining the Number of Sources Digital Signal Processing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 999 c 999byCRCPressLLC

2 67 Detection: Determining the Number of Sources Douglas B. Williams Georgia Institute of Technology 67. Formulation of the Problem 67.2 Information Theoretic Approaches AIC and MDL EDC 67.3 Decision Theoretic Approaches The Sphericity Test Multiple Hypothesis Testing 67.4 For More Information References The processing of signals received by sensor arrays generally can be separated into two problems: ) detecting the number of sources and 2) isolating and analyzing the signal produced by each source. We make this distinction because many of the algorithms for separating and processing array signals make the assumption that the number of sources is known a priori and may give misleading results if the wrong number of sources is used [3]. A good example are the errors produced by many high resolution bearing estimation algorithms e.g., MUSIC) when the wrong number of sources is assumed. Because, in general, it is easier to determine how many signals are present than to estimate the bearings of those signals, signal detection algorithms typically can correctly determine the number of signals present even when bearing estimation algorithms cannot resolve them. In fact, the capability of an array to resolve two closely spaced sources could be said to be limited by its ability to detect that there are actually two sources present. If we have a reliable method of determining the number of sources, not only can we correctly use high resolution bearing estimation algorithms, but we can also use this knowledge to utilize more effectively the information obtained from the bearing estimation algorithms. If the bearing estimation algorithm gives fewer source directions than we know there are sources, then we know that there is more than one source in at least one of those directions and have thus essentially increased the resolution of the algorithm. If analysis of the information provided by the bearing estimation algorithm indicates more source directions than we know there are sources, then we can safely assume that some of the directions are the results of false alarms and may be ignored, thus decreasing the probability of false alarm for the bearing estimation algorithms. In this section we will present and discuss the more common approaches to determining the number of sources. 67. Formulation of the Problem The basic problem is that of determining how many signal producing sources are being observed by an array of sensors. Although this problem addresses issues in several areas including sonar, radar,

3 communications, and geophysics, one basic formulation can be applied to all these applications. We will give only a basic, brief description of the assumed signal structure, but more detail can be found in references such as the book by Johnson and Dudgeon [3]. We will assume that an array of M sensors observes signals produced by N s sources. The array is allowed to have an arbitrary geometry. For our discussion here, we will assume that the sensors are omnidirectional. However, this assumption is only for notational convenience as the algorithms to be discussed will work for more general sensor responses. Theoutputofthemth sensor can be expressed as a linear combination of signals and noise N s y m t) = s i t i m)) + n m t). i= The noise observed at the mth sensor is denoted by n m t). The propagation delays, i m), are measured with respect to an origin chosen to be at the geometric center of the array. Thus, s i t) indicates the ith propagating signal observed at the origin, and s i t i m)) is the same signal measured by the mth sensor. For a plane wave in a homogeneous medium, these delays can be found from the dot product between a unit vector in the signal s direction of propagation, ζ i o, and the sensor s location, x m, ζ i m) = i o x m, c wherec is the plane wave s speed of propagation. Most algorithms used to detect the number of sources incident on the array are frequency domain techniques that assume the propagating signals are narrowband about a common center frequency, ω o. Consequently, after Fourier transforming the measured signals, only one frequency is of interest and the propagation delays become phase shifts N s Y m ω o ) = S i ω o ) e jωo i m) + N m ω o ). i= The detection algorithms then exploit the form of the spatial correlation matrix, R, for the array. The spatial correlation matrix is the M M matrix formed by correlating the vector of the Fourier transforms of the sensor outputs at the particular frequency of interest Y = [ Y 0 ω o ) Y ω o ) Y M ω o )] T. If the sources are assumed to be uncorrelated with the noise, then the form of R is R = E { YY } = K n + SCS, where K n is the correlation matrix of the noise, S is the matrix whose columns correspond to the vector representations of the signals, S is the conjugate transpose of S, and C is the matrix of the correlations between the signals. Thus, the matrix S has the form S = e jωo 0) e jωo Ns 0).. e jωo M ) e jωo Ns M ) If we assume that the noise is additive, white Gaussian noise with power σn 2 and that none of the signals are perfectly coherent with any of the other signals, then K n = σn 2I m, C has full rank, and the form of R is R = σn 2 I M + SCS. 67.).

4 We will assume that the columns of S are linearly independent when there are fewer sources than sensors, which is the case for most common array geometries and expected source locations. As C is of full rank, if there are fewer sources than sensors, then the rank of SCS is equal to the number of signals incident on the array or, equivalently, the number of sources. If there are N s sources, then SCS is of rank N s and its N s eigenvalues in descending order are δ, δ 2,, δ Ns.TheM eigenvalues of σn 2I M are all equal to σn 2, and the eigenvectors are any orthonormal set of length M vectors. So the eigenvectors of R are the N s eigenvectors of SCS plus any M N s eigenvectors which complete the orthonormal set, and the eigenvalues in descending order are σn 2 + δ,, σn 2 + δ N s, σn 2,, σ n 2.The correlation matrix is generally divided into two parts: the signal-plus-noise subspace formed by the largest eigenvalues σn 2 + δ,,σn 2 + δ N s ) and their eigenvectors, and the noise subspace formed by the smallest, equal eigenvalues and their eigenvectors. The reason for these labels is obvious as the space spanned by the signal-plus-noise subspace eigenvectors contains the signals and a portion of the noise while the noise subspace contains only that part of the noise that is orthogonal to the signals [3]. If there are fewer sources than sensors, the smallest M N s eigenvalues of R are all equal and to determine exactly how many sources there are, we must simply determine how many of the smallest eigenvalues are equal. If there are not fewer sources than sensors N s M), then none of the smallest eigenvalues are equal. The detection algorithms then assume that only the smallest eigenvalue is in the noise subspace as it is not equal to any of the other eigenvalues. Thus, these algorithms can detect up to M sources and for N s M will say that there are M sources as this is the greatest detectable number. Unfortunately, all that is usually known is R, the sample correlation matrix, which is formed by averaging N samples of the correlation matrix taken from the outputs of the array sensors. As R is formed from only a finite number of samples of R, the smallest M N s eigenvalues of R are subject to statistical variations and are unequal with probability one [4]. Thus, solutions to the detection problem have concentrated on statistical tests to determine how many of the eigenvalues of R are equal when only the sample eigenvalues of R are available. When performing statistical tests on the eigenvalues of the sample correlation matrix to determine the number of sources, certain assumptions must be made about the nature of the signals. In array processing, both deterministic and stochastic signal models are used depending on the application. However, for the purpose of testing the sample eigenvalues, the Fourier transforms of the signals at frequency ω o ; S i ω o ), i =,..., N s ; are assumed to be zero mean Gaussian random processes that are statistically independent of the noise and have a positive definite correlation matrix C. We also assume that the N samples taken when forming R are statistically independent of each other. With these assumptions, the spatial correlation matrix is still of the same form as in 67.), except that now we can more easily derive statistical tests on the eigenvalues of R Information Theoretic Approaches We will see that the source detection methods to be described all share common characteristics. However, we will classify them into two groups information theoretic and decision theoretic approaches determined by the statistical theories used to derive them. Although the decision theoretic techniques are quite a bit older, we will first present the information theoretic algorithms as they are currently much more commonly used AIC and MDL AIC and MDL are both information theoretic model order determination techniques that can be used to test the eigenvalues of a sample correlation matrix to determine how many of the smallest eigenvalues of the correlation matrix are equal. The AIC and MDL algorithms both consist of minimizing a criterion over the number of signals that are detectable, i.e., N s = 0,..., M.

5 To construct these criteria, a family of probability densities, fy θn s )), N s = 0,..., M, is needed, where θ, which is a function of the number of sources, N s, is the vector of parameters needed for the model that generated the data Y. The criteria are composed of the negative of the log-likelihood function of the density fy ˆθN s )),where ˆθN s ) is the maximum likelihood estimate of θ for N s signals, plus an adjusting term for the model dimension. The adjusting term is needed because the negative log-likelihood function always achieves a minimum for the highest dimension model possible, which in this case is the largest possible number of sources. Therefore, the adjusting term will be a monotonically increasing function of N s and should be chosen so that the algorithm is able to determine the correct model order. AIC was introduced by Akaike []. Originally, the IC stood for information criterion and the A designated it as the first such test, but it is now more commonly considered an acronym for the Akaike Information Criterion. If we have N independent observations of a random variable with probability density gy) and a family of models in the form of probability densities fy θ) where θ is the vector of parameters for the models, then Akaike chose his criterion to minimize Ig; f θ)) = gy) ln gy)dy gy) ln fy θ)dy 67.2) which is known as the Kullback-Leibler mean information distance. N AICθ) is an estimate of E{ gy) ln fy θ)dy} and minimizing AICθ) over the allowable values of θ should minimize 67.2). The expression for AI Cθ) is [ )] AICθ) = 2ln f Y ˆθ N s ) + 2η, whereη is the number of independent parameters in θ. Following AIC, MDL was developed by Schwarz [6] using Bayesian techniques. He assumed that the a priori density of the observations comes from a suitable family of densities that possess efficient estimates [7]; they are of the form fy θ) = expθ py) bθ)). The MDL criterion was then found by choosing the model that is most probable a posteriori. This choice is equivalent to selecting the model for which [ )] MDLθ) = ln f Y ˆθ N s ) + 2 η ln N is minimized. This criterion was independently derived by Rissanen [5] using information theoretic techniques. Rissanen noted that each model can be perceived as encoding the observed data and that the optimum modes the one that yields the minimum code length. Hence, the name MDL comes from Minimum Description Length. For the purpose of using AIC and MDL to determine the number of sources, the forms of the loglikelihood function and the adjusting terms have been given by Wax [8]. For N s signals the parameters that completely parameterize the correlation matrix R are {σn 2,λ,,λ Ns, v,, v Ns } where λ i and v i, i =,..., N s, are the eigenvalues and their respective eigenvectors of the signal-plus-noise subspace of the correlation matrix. As the vector of sensor outputs is a Gaussian random vector with correlation matrix R and all the samples of the sensor outputs are independent, the log-likelihood function of fy θ) is ) ln f Y σ n 2,λ,,λ Ns, v,, v Ns = π pn det R) N exp Ntr R R ))

6 where tr ) denotes the trace of the matrix, R is the sample correlation matrix, and R is the unique correlation matrix formed from the given parameters. The maximum likelihood estimate of the parameters are [2, 4] ˆv i = u i ; i =,,N s ˆλ i = ; i =,,N s 67.3) ˆσ n 2 = M N s = l, i=n s + where l,,l M are the eigenvalues in descending order of R and u i are the corresponding eigenvectors. Therefore, the log-likelihood function of fy ˆθN s )) is i=n s + ln fy l, l,,l Ns, u,, u Ns ) = ln M N s l /M N s) i i=n s + M N s )N Remembering that the eigenvalues of a complex correlation matrix are real and that the eigenvectors are complex and orthonormal, the number of degrees of freedom in the parameters of the modes classically chosen to be η = N s 2M N s ) +. Noting that any constant term in the criteria which is common to the entire family of models for either AIC or MDL may be ignored, we have the criterion for AIC as i= N AIC N s ) = 2N ln s N s 2M N s ); N s = 0,..., M M N s and the criterion for MDL as i= N MDL N s ) = N ln s + M N s + 2 N s 2M N s ) ln N; N s = 0,..., M. For both of these methods, the estimate of the number of sources is that value of N s which minimizes the criterion. In [9] there is a more thorough discussion concerning determining the number of degrees of freedom and the advantages of choosing instead η = N s 2M N s ). In general, MDL is considered to perform better than AIC. Schwarz [6], through his derivation of the MDL criterion, showed that if his assumptions are accepted, then AIC cannot be asymptotically optimal. He also mentioned that MDL tends toward lower-dimensional models than AIC as the model dimension term is multiplied by 2 ln N in the MDL criterion. Zhao et al. [4] showed that.

7 MDL is consistent the probability of detecting the correct number of sources, i.e., Pr N s = N s ), goestoasn goes to infinity), but AIC is not consistent and will tend to overestimate the number of sources as N goes to infinity. Thus, most people in array processing prefer to use MDL over AIC. Interestingly, many statisticians prefer AIC because many of their modeling problems have a very large penalty for underestimating the model order but a relatively mild penalty for overestimating it. Xu and Kaveh [2] have provided a thorough discussion of the asymptotic properties of AIC and MDL, including an examination of their sensitivities to modelling errors and bounds on the probability that AIC will overestimate the number of sources EDC Clearly, the only difference between the implementations of AIC and MDL is the choice of the adjusting term that penalizes for choosing larger model orders. Several people have examined using other adjusting terms to arrive at other criteria. In particular, statisticians at the University of Pittsburgh [3, 4] have developed the Efficient Detection Criterion EDC) procedure which is actually a family of criteria chosen such that they are all consistent. The general form of these criteria is [ )] EDCθ) = ln f Y ˆθ N s ) + ηc N, where C N can be any function of N such that ) lim N /N = 0 N 2) lim N / lnlnn)) =. N Thus, for the array processing source detection problem the EDC procedure chooses the value of N s that minimizes i= N EDC N s ) = N ln s + + N s 2M N s )C N ; N s = 0,..., M. M N s In their analysis of the EDC procedure, Zhao et al. [4] showed that not only are all the EDC criteria consistent for the data assumptions we have made, but under certain conditions they remain consistent even when the data sample vectors used to form the estimate R are not independent or Gaussian. The choice of C N = 2 lnn) satisfies the restrictions on C N and, thus, produces one of the EDC procedures. This particular criterion is identical to MDL and shows that the MDL criterion is included as one of the EDC procedures. Another relatively common choice for C N is C N = N lnn) Decision Theoretic Approaches The methods that we term decision theoretic approaches all rely on the statistical theory of hypothesis testing to determine the number of sources. The first of these that we will discuss, the sphericity test, is by far the oldest algorithm for source detection.

8 67.3. The Sphericity Test Originally, the sphericity test was a hypothesis testing method designed to determine if the correlation or covariance) matrix, R, of a length M Gaussian random vector is proportional to the identity matrix, I M, when only R, the sample correlation matrix, is known. If R I M, then the contours of equal density for the Gaussian distribution form concentric spheres in M-dimensional space. The sphericity test derives its name from being a test of the sphericity of these contours. The original sphericity test had two possible hypotheses H 0 : H : R = σ 2 n I M R = σ 2 n I M for some unknown σ 2 n. If we denote the eigenvalues of R in descending order by λ, λ 2,, λ M, then equivalent hypotheses are H 0 : λ = λ 2 = =λ M H : λ >λ M. For the appropriate statistic, T R), the test is of the form T R) H > < H 0 γ where the threshold, γ, can be set according to the Neyman-Pearson criterion [7]. That is, if the distribution of T R) is known under the null hypothesis, H 0, then for a given probability of false alarm, P F, we can choose γ such that PrT R) >γ H 0 ) = P F. Using the alternate form of the hypotheses, T R) is actually Tl,l 2,,l M ), and the eigenvalues of the sample correlation matrix are a sufficient statistic for the hypothesis test. The correct form of the sphericity test statistic is the generalized likelihood ratio [4] ) M M Tl,l 2,,l M ) = ln i= which was also a major component of the information theoretic tests. For the source detection problem we are interested in testing a subset of the smaller eigenvalues for equality. In order to use the sphericity test, the hypotheses are generally broken down into pairs of hypotheses that can be tested in a series of hypothesis tests. For testing M N s eigenvalues for equality, the hypotheses are i= H 0 : λ λ N s λ N s + = =λ M H : λ λ N s λ N s + >λ M. We are interested in finding the smallest value of N s for which H 0 is true, which is done by testing N s = 0, N s =, until N s = M 2 or the test does not fail. If the test fails for N s = M 2,

9 then we consider none of the smallest eigenvalues to be equal and say that there are M sources. If N s is the smallest value for which H 0 is true, then we say that there are N s sources. There is also a problem involved in setting the desired P F. The Neyman-Pearson criterion is not able to determine a threshold for given P F for the overall detection problem. The best that can be done is to set a P F for each individual test in the nested series of hypothesis tests using Neyman-Pearson methods. Unfortunately, as the hypothesis tests are obviously not statistically independent and their statistical relationship is not very clear, how this P F for each test relates to the P F for the entire series of tests is not known. To use the sphericity test to detect sources, we need to be able to set accurately the threshold γ according to the desired P F, which requires knowledge of the distribution of the sphericity test statistic Tl N s +,,l M) under the null hypothesis. The exact form of this distribution is not available in a form that is very useful as it is generally written as an infinite series of Gaussian, chisquared, or beta distributions [2, 4]. However, if the test statistic is multiplied by a suitable function of the eigenvalues of R, then its distribution can be accurately approximated as being chi-squared [0]. Thus, the statistic 2 N ) N s 2 ) 2 M N s + 6 N s ) 2 lī ) + M N s l ln i= M N s is approximately chi-squared distributed with degrees of freedom given by d = M N s ) 2, where l = M. Although the performance of the sphericity test is comparable to that of the information theoretic tests, it is not as popular because it requires selection of the P F and calculation of the test thresholds for each value of N s. However, if the received data does not match the assumed model, the ability to change the test thresholds gives the sphericity test a robustness lacking in the information theoretic methods Multiple Hypothesis Testing The sphericity test relies on a sequence of binary hypothesis tests to determine the number of sources. However, the optimum test for this situation would be to test all hypotheses simultaneously: H 0 : H : H 2 : H M : λ = λ 2 = =λ M λ >λ 2 = =λ M λ λ 2 >λ 3 = =λ M. λ λ 2 λ M >λ M to determine how many of the smaller eigenvalues are equal. While it is not possible to generalize the sphericity test directly, it is possible to use an approximation to the probability density function pdf ) of the eigenvalues to arrive at a suitable test. Using the theory of multiple hypothesis tests, we

10 can derive a test that is similar to AIC and MDL and is implemented in exactly the same manner, but is designed to minimize the probability of choosing the wrong number of sources. To arrive at our statistic, we start with the joint probability density function pdf ) of the eigenvalues of the M M sample covariance when the M N s smallest eigenvalues are known to be equal. We will denote this pdf by f N s l,...,l M λ λ N s + = = λ M) where the denote the eigenvalues of the sample matrix and the λ i are the eigenvalues of the true covariance matrix. The asymptotic expression for f N s ) isgivenbywongetal.[] for the complex-valued data case as Ns f N s l,...,l M λ λ N s + = =λ M) nmn 2 2 ) MM ) Ns π 2 2 ) Ɣ M n) Ɣ M N s ) Mi= λ n Mi= i li n M N s N s i= i<j li l j)λ i λ j λ i λ j { exp n M i= ) N s M i= j= N s + λ i } M ) li l j)λ i λ j λ i λ j Mi<j li l j ) 2 where n = N is one less than the number of samples and Ɣ N ) is the multivariate gamma function for complex-valued data []. We then form M likelihood ratios by dividing each joint pdf by f M ) to form N s ) = f N l s,...,l M λ λ N s + = =λ ) M, N s = 0,..., M. f M l,...,l M λ λ M ) Assuming that each value of N s is equally likely, then multiple hypothesis testing theory tells us that the value of N s that maximizes N s ) is the optimum choice in that it minimizes the probability of choosing the incorrect N s [7]. Because N s ) in this form requires knowledge of the unknown parameters λ i, we must use a generalized likelihood ratio test and independently substitute the maximum likelihood estimates of the λ i [see Eq.67.3) for these expressions] into both f N s ), for which we assume M N s equal λ i s, and f M ), for which we assume no equal λ i s, to get our new statistics N s ). After much simplification including dropping terms that are common to N s ) for every allowable value of N s and then taking the natural logarithm of each N s ), we get the statistic N s ) = ) i= N n N s ln s + M N s N s [ ] π N s2 )/2 ln ) + Ɣ M N s i= j= N s + 2 N s 2M N s ) ln[n]+ [ ] li l j ln l j=i+ 2ln [ ] li l j ) /2 l j where l = M. The terms in the first line of this equation are almost identical to the negative of the MDL criterion, especially when the degrees of freedom recommended in [9] are used. Note that the change in sign is necessary because we are finding the maximum of this criterion, not the minimum. The extra terms on the following line include both the eigenvalues being tested for equality and those not being tested. These extra terms allow this test to outperform the information theoretic techniques, since the use of all the eigenvalues for each value of N s being tested allows this criterion to be more adaptive.

11 67.4 For More Information Most of the original papers on model order determination appeared in the statistical literature in journals such as The Annals of Statistics and the Journal of Multivariate Analysis. However, almost all of the more recent developments that apply these techniques to the source detection problem have appeared in signal processing journals such as the IEEE Transactions on Signal Processing. More advanced topics that have been addressed in the signal processing literature but not discussed here include: detecting coherent i.e., completely correlated) signals, detecting sources in unknown colored noise, and developing more robust source detection methods. References [] Akaike, H., A new look at the statistical modedentification, IEEE Trans. on Automatic Control, AC-9, , Dec [2] Anderson, T.W., An Introduction to Multivariate Statistical Analysis, 2nd ed., John Wiley & Sons, New York 984. [3] Johnson, D.H. and Dudgeon, D.E., Array Signal Processing: Concepts and Techniques, Prentice-Hall, Englewood Cliffs, NJ, 993. [4] Muirhead, R.J., Aspects of Multivariate Statistical Theory, John Wiley & Sons, New York, 982. [5] Rissanen, J., Modeling by shortest data description, Automatica, 4, , Sept [6] Schwarz, G., Estimating the dimension of a model, Annal. Stat., 6, , Mar [7] Van Trees, H.L., Detection, Estimation, and Modulation Theory, Part I, John Wiley & Sons, New York, 968. [8] Wax, M. and Kailath, T., Detection of signals by information theoretic criteria, IEEE Trans. Acoustics, Speech, and Signal Processing, ASSP-33, , Apr [9] Williams, D.B., Counting the degrees of freedom when using AIC and MDL to detect signals, IEEE Trans. Signal Processing, 42, , Nov [0] Williams, D.B. and Johnson, D.H., Using the sphericity test for source detection with narrowband passive arrays, IEEE Trans. Acoustics, Speech, and Signal Processing, 38, , Nov [] Wong, K.M., Zhang, Q.-T., Reilly, J.P. and Yip, P.C., On information theoretic criteria for determining the number of signals in high resolution array processing, IEEE Trans. Acoustics, Speech, and Signal Processing, 38, , Nov [2] Xu, W. and Kaveh, M., Analysis of the performance and sensitivity of eigendecomposition-based detectors, IEEE Trans. Signal Processing, 43, , June 995. [3] Yin, Y.Q. and Krishnaiah, P.R., On some nonparametric methods for detection of the number of signals, IEEE Trans. Acoustics, Speech, and Signal Processing, ASSP 35, , Nov [4] Zhao, L.C., Krishnaiah, P.R. and Bai, Z.D., On detection of the number of signals in presence of white noise, J. Multivariate Analysis, 20, 25, Oct. 986.

### Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. Neyman-Pearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments

23rd European Signal Processing Conference EUSIPCO) MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments Hock Siong LIM hales Research and echnology, Singapore hales Solutions

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### 171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

### Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

### Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

### Department of Economics

Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

### The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

### Lecture 8: Signal Detection and Noise Assumption

ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,

### December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA kbell@gmu.edu, hlv@gmu.

POSERIOR CRAMÉR-RAO BOUND FOR RACKING ARGE BEARING Kristine L. Bell and Harry L. Van rees Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA bell@gmu.edu, hlv@gmu.edu ABSRAC

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Nonlinear Iterative Partial Least Squares Method

Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Multiple Testing. Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf. Abstract

Multiple Testing Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf Abstract Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu

### Linear Codes. Chapter 3. 3.1 Basics

Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

### A Statistical Framework for Operational Infrasound Monitoring

A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

### By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

### Introduction to acoustic imaging

Introduction to acoustic imaging Contents 1 Propagation of acoustic waves 3 1.1 Wave types.......................................... 3 1.2 Mathematical formulation.................................. 4 1.3

### Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### A Introduction to Matrix Algebra and Principal Components Analysis

A Introduction to Matrix Algebra and Principal Components Analysis Multivariate Methods in Education ERSH 8350 Lecture #2 August 24, 2011 ERSH 8350: Lecture 2 Today s Class An introduction to matrix algebra

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### 1. LINEAR EQUATIONS. A linear equation in n unknowns x 1, x 2,, x n is an equation of the form

1. LINEAR EQUATIONS A linear equation in n unknowns x 1, x 2,, x n is an equation of the form a 1 x 1 + a 2 x 2 + + a n x n = b, where a 1, a 2,..., a n, b are given real numbers. For example, with x and

### Coding and decoding with convolutional codes. The Viterbi Algor

Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### Signal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING

C H A P T E R 4 Signal Detection 4. SIGNAL DETECTION AS HYPOTHESIS TESTING In Chapter 3 we considered hypothesis testing in the context of random variables. The detector resulting in the minimum probability

### SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

### Radar Systems Engineering Lecture 6 Detection of Signals in Noise

Radar Systems Engineering Lecture 6 Detection of Signals in Noise Dr. Robert M. O Donnell Guest Lecturer Radar Systems Course 1 Detection 1/1/010 Block Diagram of Radar System Target Radar Cross Section

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

### The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

### Face Recognition using Principle Component Analysis

Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA

### Nonlinear Blind Source Separation and Independent Component Analysis

Nonlinear Blind Source Separation and Independent Component Analysis Prof. Juha Karhunen Helsinki University of Technology Neural Networks Research Centre Espoo, Finland Helsinki University of Technology,

### CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Relations Between Time Domain and Frequency Domain Prediction Error Methods - Tomas McKelvey

COTROL SYSTEMS, ROBOTICS, AD AUTOMATIO - Vol. V - Relations Between Time Domain and Frequency Domain RELATIOS BETWEE TIME DOMAI AD FREQUECY DOMAI PREDICTIO ERROR METHODS Tomas McKelvey Signal Processing,

### Hypothesis Testing in the Classical Regression Model

LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within

### 2D Geometric Transformations. COMP 770 Fall 2011

2D Geometric Transformations COMP 770 Fall 2011 1 A little quick math background Notation for sets, functions, mappings Linear transformations Matrices Matrix-vector multiplication Matrix-matrix multiplication

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

### VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James

### 7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

### Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

### Probabilistic Methods for Time-Series Analysis

Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:

### NOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

Mathematical Sciences Paper II Time Allowed : 75 Minutes] [Maximum Marks : 100 Note : This Paper contains Fifty (50) multiple choice questions. Each question carries Two () marks. Attempt All questions.

### MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the

### Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Generating Gaussian Mixture Models by Model Selection For Speech Recognition

Generating Gaussian Mixture Models by Model Selection For Speech Recognition Kai Yu F06 10-701 Final Project Report kaiy@andrew.cmu.edu Abstract While all modern speech recognition systems use Gaussian

### QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING

QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING I. Nikiforov Université de Technologie de Troyes, UTT/ICD/LM2S, UMR 6281, CNRS 12, rue Marie Curie, CS 42060

### ADVANCED LINEAR ALGEBRA FOR ENGINEERS WITH MATLAB. Sohail A. Dianat. Rochester Institute of Technology, New York, U.S.A. Eli S.

ADVANCED LINEAR ALGEBRA FOR ENGINEERS WITH MATLAB Sohail A. Dianat Rochester Institute of Technology, New York, U.S.A. Eli S. Saber Rochester Institute of Technology, New York, U.S.A. (g) CRC Press Taylor

### Common factor analysis

Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### AN INTRODUCTION TO ERROR CORRECTING CODES Part 1

AN INTRODUCTION TO ERROR CORRECTING CODES Part 1 Jack Keil Wolf ECE 154C Spring 2008 Noisy Communications Noise in a communications channel can cause errors in the transmission of binary digits. Transmit:

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Time Series Analysis

Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

### MIMO: What shall we do with all these degrees of freedom?

MIMO: What shall we do with all these degrees of freedom? Helmut Bölcskei Communication Technology Laboratory, ETH Zurich June 4, 2003 c H. Bölcskei, Communication Theory Group 1 Attributes of Future Broadband

Facts About Eigenvalues By Dr David Butler Definitions Suppose A is an n n matrix An eigenvalue of A is a number λ such that Av = λv for some nonzero vector v An eigenvector of A is a nonzero vector v

### Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

### Introduction to Detection Theory

Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection

### Lecture 3 : Hypothesis testing and model-fitting

Lecture 3 : Hypothesis testing and model-fitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and model-fitting

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

### Exploiting A Constellation of Narrowband RF Sensors to Detect and Track Moving Targets

Exploiting A Constellation of Narrowband RF Sensors to Detect and Track Moving Targets Chris Kreucher a, J. Webster Stayman b, Ben Shapo a, and Mark Stuff c a Integrity Applications Incorporated 900 Victors

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational