The method of least squares

Similar documents
Lecture 8: Signal Detection and Noise Assumption

Multivariate Normal Distribution

Quadratic forms Cochran s theorem, degrees of freedom, and all that

CURVE FITTING LEAST SQUARES APPROXIMATION

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Introduction to General and Generalized Linear Models

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Part 2: Analysis of Relationship Between Two Variables

Definitions 1. A factor of integer is an integer that will divide the given integer evenly (with no remainder).

Simple Linear Regression Inference

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Multivariate normal distribution and testing for means (see MKB Ch 3)

Master s Theory Exam Spring 2006

8. Linear least-squares

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

1 Another method of estimation: least squares

171:290 Model Selection Lecture II: The Akaike Information Criterion

Statistical Machine Learning

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Statistical Models in R

Permutation Tests for Comparing Two Populations

FACTORING ax 2 bx c. Factoring Trinomials with Leading Coefficient 1

Probability Calculator

Solutions to Exam in Speech Signal Processing EN2300

Network quality control

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Some probability and statistics

1.5 Oneway Analysis of Variance

SAS Software to Fit the Generalized Linear Model

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

Linear Models for Classification

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Statistics for Analysis of Experimental Data

Regression III: Advanced Methods

Regression Analysis: A Complete Example

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Linearly Independent Sets and Linearly Dependent Sets

Least Squares Estimation

Partial Fractions. (x 1)(x 2 + 1)

Summary: Transformations. Lecture 14 Parameter Estimation Readings T&V Sec Parameter Estimation: Fitting Geometric Models

x = + x 2 + x

17. SIMPLE LINEAR REGRESSION II

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve

Partial Fractions Examples

Chapter 6: Multivariate Cointegration Analysis

How to do hydrological data validation using regression

Simple Second Order Chi-Square Correction

AP Physics 1 and 2 Lab Investigations

4.3 Least Squares Approximations

Dynamic data processing

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

1 Determinants and the Solvability of Linear Systems

Recall this chart that showed how most of our course would be organized:

Probability and Random Variables. Generation of random variables (r.v.)

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Chapter 5 Analysis of variance SPSS Analysis of variance

Multiple regression - Matrices

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

expression is written horizontally. The Last terms ((2)( 4)) because they are the last terms of the two polynomials. This is called the FOIL method.

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

Trend and Seasonal Components

A Model of Optimum Tariff in Vehicle Fleet Insurance

Chapter 4: Vector Autoregressive Models

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Econometrics Simple Linear Regression

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Transformations and Expectations of random variables

Systems of Linear Equations

Introduction to Kalman Filtering

Linear Models for Continuous Data

Week 4: Standard Error and Confidence Intervals

Solving Quadratic Equations by Factoring

Analysis of Variance. MINITAB User s Guide 2 3-1

Sections 2.11 and 5.8

Section 2.4: Equations of Lines and Planes

Statistics Graduate Courses

MATHEMATICAL METHODS OF STATISTICS

Lecture 5 Least-squares

Using R for Linear Regression

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Regression and Correlation

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

(Quasi-)Newton methods

Chapter 3: The Multiple Linear Regression Model

5.3 The Cross Product in R 3

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Problem sets for BUEC 333 Part 1: Probability and Statistics

TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS

Using Excel for inferential statistics

Data Mining: Algorithms and Applications Matrix Math Review

Descriptive Statistics

2. Simple Linear Regression

Factorization Theorems

Transcription:

The method of least squares 1

Contents 1. Introduction. Statistical interpretation of the curve fitting problem 3. The parametric model 4. The stochastic model 5. The least squares estimators of parameters 6. Covariance of the estimators

Introduction A common problem in many applications is the determination of a function fitting in some sense a set of n points (t i,y io ). The function ϕ(t i ) depends on m<n parameters and the problems is to find the m parameters. We give to this problem a statistical interpretation: The abscissa t i is a known number (controlled variable), but the ordinate y io is the value of an RV Y i with mean ϕ(t i ). 3

Statistical interpretation Imagine that t i and y i are the values of two variables related by a physical law y=ϕ(t). It is often the case that the values of t i are known exactly while the values of y io involve random error. For example t i could be the precisely measured water temperature, while y io is the imperfect measurement of the solubility y i of a chemical substance. We use the random character of the errors to find an estimate of the fitting curve. 4

The probabilistic model We are given n random variables Y i whose mean is the vector y1 E Y = y =... = Ax + a y n where A is an n-by-m matrix of constant numbers x is a vector of m unknown parameters a is a vector of n constant numbers. 5

The probabilistic model In the case where the searched fitting curve is a line, the model becomes Where y b + b t 1 t 1 0 1 1 1 b0 b1 y n b0 b1t n 1 t n E Y = y =... =... =...... + 1 t1 A =...... 1 tn n x b 0 = b 1 1 0 a =... 0 n1 6

The probabilistic model y i o y i =b 0 +b 1 t i t i 7

The probabilistic model We can write the RV Y i as the sum: Y = E Y + V = y + V where V is a sequence of n RVs such that: E{ V} = 0 C = σ VV 0 Q The covariance matrix of V is known but for a constant term σ 0 It results that C = C =σq YY VV 0 Where the n-by-n matrix Q is called cofactors matrix. 8

The probabilistic model V is interpreted in many cases as the observation error. In most cases we assume that the errors in the different observations are described with RVs with the same mean, equal to zero, and the same variance. In addition it is assumed that the errors are independent from one another. In this cases we have CVV =σ0i n n 9

The least squares method Let s consider our example. We want to estimate the best fitting line, that is the two parameters that define the line. In more general terms, the problem is to estimate the m parameters, on which the mean of our observations depends. The sought parameters are obtained by minimizing the following target function n n o o ( ) ( ) o ( ) 0 1 i 0 1 i i b 0,b1 Φ b,b y = y b bt = v = min i= 1 i= 1 10

The least squares method In general the target function will be ( ˆ o o ) ( ˆ + ) 1 o ( ˆ 1 ) ( ˆ + ) ( ˆ ) Φ y y,q = y y Q y y = v Q v = min ˆy = Ax ˆ + a 11

The least squares estimates By applying the minimum condition, which takes into account the cofactors matrix when different from the identity matrix, one gets the following estimates: ( 1 1 + + ) 1 ( o ) 1 + 1 ( o ) ˆx = A Q A A Q y a = N A Q y a ˆy = Ax ˆ+ a ˆ o v y y σ ˆ = 0 = ˆ ˆ ˆ + 1 vq v n m 1

The least squares estimates And in the example of the best line fitting the observations ˆb 0 1 + + o ˆx = = ( A A) A y = ˆb 1 1 n n o n ti yi i1 = i1 = n n n o ti ti yi ti i1 = i1 = i1 = 13

The least squares estimators The relations that one finds by applying the least squares methods can be seen as linear transformation of the sample RV Y. By them we obtain other RVs of which we can compute the mean and the covariance matrix. As usual we refer to RVs by using capital letters ( ) 1 + 1 ˆX = N A Q Y a ˆ ( ) 1 + 1 Y AX a AN A Q Y a a 0 = ˆ + = + Vˆ = Y Yˆ Σ ˆ = ˆ ˆ + 1 VQ V n m 14

The least squares parameters estimator Note that the estimator of the parameters is a sequences of m RVs. By applying the propagation laws one finds { ˆ} 1 + 1 1 + 1 ( ) ( ) E X = N A Q E Y a = N A Q Ax + a a = 1 + 1 N A Q Ax = = XX ˆˆ x ( ) + 1 + 1 1 + 1 1 + 1 1 1 1 YY o o C = N AQ C N AQ = σ N AQ QQ AN = σ N As the mean of the parameters estimator equals the vector of the unknown parameters, the estimators are said to be UNBIASED. This happens to all the least squares estimators. 15

The least squares estimators The estimator of the mean of the sample RVs is a sequences of n RVs. By applying the propagation laws one finds { ˆ} { ˆ } E Y = E AX + a = Ax + a = y C = AC A = σ AN A YY ˆˆ XX ˆˆ + 1 + o 16

The least squares estimators The estimator of the deviations is a sequences of n RVs. By applying the propagation laws one finds { ˆ} { ˆ} E V = E Y Y = y y = 0 VV ˆˆ 1 + 1 1 + 1 o ( ) ( ) + ( Q AN A ) C = σ I AN A Q Q I AN A Q = =σ 1 o + 17

The hypothesis of normality If we suppose that the errors have a normal distribution with the given mean and covariance matrix as before, we will be able to determine the distribution of the least squares estimators. As we will see, the knowledge of these distributions will allow us to draw some conclusions about the quality of the solutions we will have sought. 18

The distribution of the parameters estimator It can be shown that when the observations are normally distributed then 1 ˆX N x, σon Furthermore, the following quadratic form ( ) + Xˆ x N( Xˆ x) σ o χ m And ( ) + Xˆ x N( Xˆ x) mˆ Σ o χm m χn m n m = F m,n m 19

The chi-square density The Chi-square is a density function depending on a parameter, called degree of freedom. It is the distribution of the sum of a sequence of independent squared standard normal RVs. The degree of freedom is equal to the number of squared standard normal RVs in the sum. n p Zi i1 = χ = Zi N 0,1 0

The chi-square density n p Zi i1 = χ = Zi N 0,1 1

The Fisher density The Fisher density function depends on two parameters. It is the distribution of the ratio between two independent chisquare RVs, each divided by its own degree of freedom: χl l χm m = F l,m

The Fisher density 3

The distribution of the estimators It can be shown that when the observations are normally distributed then Σˆ n m χ σ ( ) 0 o n m 4