The method of least squares 1
Contents 1. Introduction. Statistical interpretation of the curve fitting problem 3. The parametric model 4. The stochastic model 5. The least squares estimators of parameters 6. Covariance of the estimators
Introduction A common problem in many applications is the determination of a function fitting in some sense a set of n points (t i,y io ). The function ϕ(t i ) depends on m<n parameters and the problems is to find the m parameters. We give to this problem a statistical interpretation: The abscissa t i is a known number (controlled variable), but the ordinate y io is the value of an RV Y i with mean ϕ(t i ). 3
Statistical interpretation Imagine that t i and y i are the values of two variables related by a physical law y=ϕ(t). It is often the case that the values of t i are known exactly while the values of y io involve random error. For example t i could be the precisely measured water temperature, while y io is the imperfect measurement of the solubility y i of a chemical substance. We use the random character of the errors to find an estimate of the fitting curve. 4
The probabilistic model We are given n random variables Y i whose mean is the vector y1 E Y = y =... = Ax + a y n where A is an n-by-m matrix of constant numbers x is a vector of m unknown parameters a is a vector of n constant numbers. 5
The probabilistic model In the case where the searched fitting curve is a line, the model becomes Where y b + b t 1 t 1 0 1 1 1 b0 b1 y n b0 b1t n 1 t n E Y = y =... =... =...... + 1 t1 A =...... 1 tn n x b 0 = b 1 1 0 a =... 0 n1 6
The probabilistic model y i o y i =b 0 +b 1 t i t i 7
The probabilistic model We can write the RV Y i as the sum: Y = E Y + V = y + V where V is a sequence of n RVs such that: E{ V} = 0 C = σ VV 0 Q The covariance matrix of V is known but for a constant term σ 0 It results that C = C =σq YY VV 0 Where the n-by-n matrix Q is called cofactors matrix. 8
The probabilistic model V is interpreted in many cases as the observation error. In most cases we assume that the errors in the different observations are described with RVs with the same mean, equal to zero, and the same variance. In addition it is assumed that the errors are independent from one another. In this cases we have CVV =σ0i n n 9
The least squares method Let s consider our example. We want to estimate the best fitting line, that is the two parameters that define the line. In more general terms, the problem is to estimate the m parameters, on which the mean of our observations depends. The sought parameters are obtained by minimizing the following target function n n o o ( ) ( ) o ( ) 0 1 i 0 1 i i b 0,b1 Φ b,b y = y b bt = v = min i= 1 i= 1 10
The least squares method In general the target function will be ( ˆ o o ) ( ˆ + ) 1 o ( ˆ 1 ) ( ˆ + ) ( ˆ ) Φ y y,q = y y Q y y = v Q v = min ˆy = Ax ˆ + a 11
The least squares estimates By applying the minimum condition, which takes into account the cofactors matrix when different from the identity matrix, one gets the following estimates: ( 1 1 + + ) 1 ( o ) 1 + 1 ( o ) ˆx = A Q A A Q y a = N A Q y a ˆy = Ax ˆ+ a ˆ o v y y σ ˆ = 0 = ˆ ˆ ˆ + 1 vq v n m 1
The least squares estimates And in the example of the best line fitting the observations ˆb 0 1 + + o ˆx = = ( A A) A y = ˆb 1 1 n n o n ti yi i1 = i1 = n n n o ti ti yi ti i1 = i1 = i1 = 13
The least squares estimators The relations that one finds by applying the least squares methods can be seen as linear transformation of the sample RV Y. By them we obtain other RVs of which we can compute the mean and the covariance matrix. As usual we refer to RVs by using capital letters ( ) 1 + 1 ˆX = N A Q Y a ˆ ( ) 1 + 1 Y AX a AN A Q Y a a 0 = ˆ + = + Vˆ = Y Yˆ Σ ˆ = ˆ ˆ + 1 VQ V n m 14
The least squares parameters estimator Note that the estimator of the parameters is a sequences of m RVs. By applying the propagation laws one finds { ˆ} 1 + 1 1 + 1 ( ) ( ) E X = N A Q E Y a = N A Q Ax + a a = 1 + 1 N A Q Ax = = XX ˆˆ x ( ) + 1 + 1 1 + 1 1 + 1 1 1 1 YY o o C = N AQ C N AQ = σ N AQ QQ AN = σ N As the mean of the parameters estimator equals the vector of the unknown parameters, the estimators are said to be UNBIASED. This happens to all the least squares estimators. 15
The least squares estimators The estimator of the mean of the sample RVs is a sequences of n RVs. By applying the propagation laws one finds { ˆ} { ˆ } E Y = E AX + a = Ax + a = y C = AC A = σ AN A YY ˆˆ XX ˆˆ + 1 + o 16
The least squares estimators The estimator of the deviations is a sequences of n RVs. By applying the propagation laws one finds { ˆ} { ˆ} E V = E Y Y = y y = 0 VV ˆˆ 1 + 1 1 + 1 o ( ) ( ) + ( Q AN A ) C = σ I AN A Q Q I AN A Q = =σ 1 o + 17
The hypothesis of normality If we suppose that the errors have a normal distribution with the given mean and covariance matrix as before, we will be able to determine the distribution of the least squares estimators. As we will see, the knowledge of these distributions will allow us to draw some conclusions about the quality of the solutions we will have sought. 18
The distribution of the parameters estimator It can be shown that when the observations are normally distributed then 1 ˆX N x, σon Furthermore, the following quadratic form ( ) + Xˆ x N( Xˆ x) σ o χ m And ( ) + Xˆ x N( Xˆ x) mˆ Σ o χm m χn m n m = F m,n m 19
The chi-square density The Chi-square is a density function depending on a parameter, called degree of freedom. It is the distribution of the sum of a sequence of independent squared standard normal RVs. The degree of freedom is equal to the number of squared standard normal RVs in the sum. n p Zi i1 = χ = Zi N 0,1 0
The chi-square density n p Zi i1 = χ = Zi N 0,1 1
The Fisher density The Fisher density function depends on two parameters. It is the distribution of the ratio between two independent chisquare RVs, each divided by its own degree of freedom: χl l χm m = F l,m
The Fisher density 3
The distribution of the estimators It can be shown that when the observations are normally distributed then Σˆ n m χ σ ( ) 0 o n m 4