Efficiency and the Cramér-Rao Inequality
|
|
|
- Rudolf Harvey
- 9 years ago
- Views:
Transcription
1 Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision. An estimator ˆφ is unlikely to impress anybody if it produces widely varying estimates on different occasions. This suggests that V ar(ˆφ should be as small as possible. But how small can V ar(ˆφ be? The answer is given by the following result. The Cramér-Rao Inequality Let X = (X,X,...,X n be a random sample from a distribution with p.m/d.f. f(x θ, where θ is a scalar parameter. Under certain regularity conditions on f(x θ, for any unbiased estimator ˆφ (X of φ (θ [ V ar (ˆφ d φ dθ (θ] (. E(S (X where S(X = θ log f X(X θ = ( n θ log f(x i θ (. Remark ( The function S(X is called the Score statistic and will be seen to play an important role in statistical inference. 7
2 8CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY [ d dθ φ (θ] ( The bound on the r.h.s. of the Cramér-Rao inequality is E(S (X known as the Cramér-Rao Lower Bound (CRLB for the variance of an unbiased estimator of φ(θ. (3 An estimator whose variance is equal to the CRLB is called a most efficient estimator The efficiency eff(ˆφ of an unbiased estimator ˆφ of φ(θ is defined to be the ratio eff(ˆφ = CRLB V ar(ˆφ Note that 0 eff(ˆφ for all θ Θ. (4 The quantity I(θ = E(S (X (.3 is called the Fisher information on the parameter θ contained in the sample X. Note that (i the Fisher information is, in general, a function of the value of θ, (ii the higher the Fisher information i.e. the more information there is in the sample, the smaller the CRLB and consequently the smaller the variance of the most efficient estimator (when it exists. Proof of the Cramér-Rao inequality (for case when observations are continuous. For discrete observations replace integrals by sums. Since ˆφ is an unbiased estimator E (ˆφ = φ (θ for all θ i.e.... ˆφ(x,x,...,x n f X (x,x,...,x n θdx dx...dx n = φ(θ and for convenience we write ˆφ (xfx (x θdx = φ(θ (.4
3 Differentiating both sides with respect to θ and interchanging the order of differentiation and integration (assuming this is possible i.e. regularity conditions hold we get 9 ˆφ (x θ f X(x θdx = d φ (θ (.5 dθ We now note that from the definition of the score statistic we have S(x = θ log f X(x θ = f X (x θ θ f X(x θ and hence θ f X(x θ =S(xf X (x θ (.6 Substituting in.4 we have ˆφ (xs(xfx (x θdx = d dθ φ (θ i.e. E (ˆφ d (XS(X = φ (θ (.7 dθ Next we note that the score statistic has zero expectation since E (S(X = S(xf X (x θdx = θ f X(x θdx because of.6. Again interchanging differentiation with integration (i.e. assuming necessary regularity conditions hold we get Hence E (S(X = θ f X (x θdx = θ = 0 (.8.. Cov(ˆφ,S = E(ˆφ,S E (ˆφ d E (S = E(ˆφ,S = φ (θ (.9 dθ the last equality following from.7, and V ar(s = E ( S E (S = E(S =I(θ (.0
4 0CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Finally, since for any two random variables the correlation coefficient between them is always less than in absolute value, we have ρ (ˆφ,S i.e. [ Cov(ˆφ,S ] or V ar(ˆφv ar(s [ Cov(ˆφ,S ] [ d dθ φ (θ] V ar(ˆφ V ar(s = I(θ which proves the Cramér-Rao inequality. This proof has also provided us with the following result. Corollary Under certain regularity conditions the score statistic S(X has mean zero and variance equal to the Fisher information i.e. E (S(X = 0 (. V ar(s(x = I(θ (. Efficiency: Since we now know that the CRLB is the smallest value that the variance of any estimator of φ(θ can be, the efficiency of an estimator of θ can be assessed by comparing its variance against the CRLB. In fact, as we have seen, we have the following definition for the efficiency eff(ˆφ of an estimator ˆφ of φ(θ. Definition The efficiency of an estimator ˆφ of φ(θ is defined to be eff(ˆφ = = CRLB V ar(ˆφ [ d dθ φ(θ] I(θ V ar(ˆφ (.3
5 Remark Note that 0 eff(ˆφ and that the closer to the value of eff(ˆφ is, the more efficient the estimator ˆφ is; indeed when ˆφis most efficient then eff(ˆφ =. Note that a most efficient estimator of φ(θ need not always exist. When dealing with unbiased estimators, if a most efficient unbiased estimator does not exist (i.e. if there is no unbiased estimator whose variance is as small as the CRLB, we seek out the estimator whose variance is the closest to the CRLB i.e. the unbiased estimator with the highest efficiency. When does a most efficient estimator exist? The following result not only gives the necessary and sufficient conditions for a most efficient estimator to exist, it also provides this estimator. Theorem Under the same regularity conditions as in the Cramèr-Rao Inequality, there exists an unbiased estimator ˆφ of φ(θ whose variance attains the CRLB (i.e. there exists a most efficient unbiased estimator ˆφ of φ(θ if and only if the score statistic S(x can be expressed in the form S(X =α (θ [ˆφ (X φ(θ ] (.4 in which case α (θ = I(θ d φ (θ dθ or equivalently if and only if the function d dθ S(X. φ(θ + φ(θ I(θ is independent of θ and is only dependent on X, in which this statistic is the unbiased most efficient estimator of φ(θ. Proof. Recall that the Cramér-Rao inequality was a result of the fact that for any unbiased estimator ˆφ of φ(θ, ρ (ˆφ,S. Hence the CRLB is attained if and only if ρ (ˆφ,S =. But this can only happen if and only if ˆφ and S are linearly related random variables i.e. if and only if there exist functions α (θ and β (θ which are independent of X such that S(X =α (θ ˆφ (X + β (θ (.5
6 CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Recall, however, (from corollary 4 that S(X has zero expectation and that ˆφ is an unbiased estimator of φ(θ. Hence taking expectations of both sides of.5 we get 0 = α (θ φ(θ + β (θ or β (θ = α (θφ(θ which, on substituting in.5 we get S(X =α (θ ˆφ (X α (θφ(θ = α (θ [ˆφ (X φ(θ ] (.6 confirming (.4. Multiplying both sides of (.6 by S(X and taking expectations throughout we get E ( S (X = α (θ [ E (ˆφ (XS(X φ (θe(s(x ] and in view of corollary 4 we have which by (.9 and(. reduces to E ( S (X = α (θcov (ˆθ (X,S(X completing the proof. I(θ = α (θ d dθ φ (θ.0.3 A computationally useful result The expression I(θ = E ( S (X ( = E θ log f X (X θ for the Fisher information does not lend itself to easy computation. The alternative form given below is computationally more attractive and you are strongly recommended to use it when finding the Fisher information in examples, exercises or exam questions. Proposition Under the same regularity conditions as in the Cramér-Rao Inequality
7 3 I(θ = E ( θ log f X (X θ (.7 Proof θ log f X (X θ = { } θ f X (X θ θ f X (X θ [ ] = [f X (X θ] θ f X (X θ + f X (X θ θ f X (X θ [ ] = θ log f X (X θ + f X (X θ θ f X (X θ = S (X + f X (X θ θ f X (X θ Taking expectations throughout we get ( E θ log f X (X θ = E ( S (X ( + E f X (X θ θ f X (X θ ( = I(θ + E f X (X θ θ f X (X θ (.8 But (assuming continuous observations replace integrals by summations if observations are discrete. ( ( E f X (X θ θ f X (X θ = f X (x θ θ f X (x θ f X (x θdx = θ f X (x θdx = f θ X (x θ dx = θ = 0 i.e. the second term in.8 is zero and the result follows. Example Let X = (X,X,...,X n be a random sample from the exponential distribution with parameter θ.find the Cramér-Rao lower bound for
8 4CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY unbiased estimators of θ. Does there exist an unbiased estimator of θ whose variance is equal to the CRLB? Solution: The Cramér-Rao inequality is with V ar(ˆθ I(θ I(θ = E ( S (X = E But f X (x θ = n θe θx i = θ n e θ n x i log f X (x θ = n log θ θ n x i x i S(x = θ log f X (x θ = n θ n ( θ log f X (x θ = n θ Hence ( I(θ = E θ log f X (X θ and Now V ar(ˆθ I(θ = θ n. S(X = n θ n θ log f X (X θ = E ( n = n θ θ X i and this cannot be expressed in the form α (θ [ˆθ (X θ ] i.e. there does not exist an unbiased estimator of θ whose variance is equal to the CRLB. This can be confirmed by the fact that with φ(θ = θ and φ (θ = dφ(θ dθ, which is not independent of θ. S(X. φ (θ I(θ + φ(θ = (n θ X i θ n + θ
9 Example In the last example, what is the CRLB for the variances of unbiased estimators of /θ? Does there exist an unbiased estimator of /θ whose variance is equal to the CRLB? 5 Solution: For any unbiased estimator ˆψ of /θ Further from the last example [ ( ] V ar( ˆψ θ θ [ ] = θ I(θ n = θ nθ S(X = n n [ θ n X i X i = n ] [ ] = n X n θ θ which implies that X is an unbiased estimator of θ to the CRLB =. Further, by (?? nθ n = [ θ I(θ ] = ( I(θ θ ( θ whose variance is equal I(θ = n ( = n θ θ which agrees with the direct computation of I(θ. A Remark on the regularity conditions: We have seen that the regularity conditions we require are those which allows us to interchange differentiation with integration on two occasions. These required regularity conditions will normally fail to hold if. the density/mass function f(x θ does not tail off rapidly enough (i.e. high values of x can be produced with relatively high probability so that the integrals converge slowly. In practice most of the distributions we deal with satisfy this regularity condition.
10 6CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY. the effective range of integration depends on θ. This will be the case if the range of possible values X ofx depends on θ. One such case is the case when X has the Uniform U(0,θ distribution over the interval (0,θ so that { /θ for 0 < x < θ f(x θ = 0 otherwise.0.4 Multivariate version of the Cramér-Rao Inequality. Theorem: Let X be a sample of observations with joint p.d/m.f. f X (x θ dependent on a vector parameter θ = (θ,θ,...,θ k T. Let φ (θ be a real valued function of θ. Then under some regularity conditions, for any unbiased estimator ˆφ (X of φ (θ V ar (ˆφ δ T [I(θ] δ where δ is the vector of derivatives of φ (θ i.e. δ = ( θ φ (θ, φ (θ,..., φ (θ (.9 θ θ k and the matrix I(θ is the Fisher Information matrix with (i,jth element I ij (θ = E (S i (XS j (X ( = E log f X (X θ θ i θ j (.0 i,j =,,...,k,. Here S i (X = θ i log f X (X θ (. is the ith score statistic i =,,...,k. In particular, if ˆθ r (X is an unbiased estimator of the rth parameter θ r then under the same regularity conditions V ar (ˆθr J rr (θ
11 where J rr (θ is the rth diagonal element of the inverse matrix [I(θ], r =,,...,k. 7 Proof: The proof is very similar to that of the scalar case and its details are not provided. Below, however, is a sketch of the proof. The details can be filled in by you.. Following exactly the same approach as in the scalar case show that θ j φ(θ = Cov(ˆφ(X,S j (X. Make use of the fact that for any coefficients c,c,...,c k, k V ar(ˆφ(x c i S i (X 0 (. In particular, this is true for the values c = c,c = c,...,c k = c k which minimize V ar(ˆφ(x k c i S i (X. 3. Using. show that k V ar(ˆφ(x c i S i (X = V ar(ˆφ(x + c T I(θc c T δ with c = (c,c,...,c k T and then using calculus show that 4. Thus show that c = (c,c,...,c k T = [I(θ] δ k min(v ar(ˆφ(x c i S i (X = V ar(ˆφ(x δ T [I(θ] δ and because of. that the result follows. Remark: As in the univariate case the proof also shows that. E (S i (X = 0 i =,,...,k. (.3
12 8CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY. I ij (θ = Cov (S i (X,S j (X i,j =,,...,k (.4 Example Let X,X,...,X n be a random sample from the N(µ,σ distribution. Find the CRLB and, in cases. and. check whether it is equalled, for the variance of an unbiased estimator of. µ when σ is known,. σ when µ is known 3. µ when σ is unknown 4. σ when µ is unknown 5. the coefficient of variation σ µ when both µ and σ are unknown. Solution:The sample joint p.d.f. is f X (x θ = n σ π exp( (x i µ /σ and log f X (x θ = n log(π n log ( σ (x i µ /σ. When σ is known θ = µ and log f X (x θ = n log(π n log ( σ (x i θ /σ S(x = θ log f X(x θ = (x i θ /σ = n [ x θ] σ where x = n n x i. By the theorem on page this implies that X = n n X i is an unbiased estimator of θ = µ whose variance equals the CRLB and that n σ = I(θ i.e. CRLB = σ n. Thus X is a most efficient estimator.
13 9. When µ is known but σ is unknown, θ = σ and log f X (x θ = n log(π n log (θ θ Hence (x i µ S(x = θ log f X(x θ = n θ + θ = n [ ] (x θ i µ θ n (x i µ By the theorem on page, n (X n i µ is an unbiased estimator of θ = σ and n θ = I(θ i.e. the CRLB = θ n = σ4 n ( ( µ 3. and 4. Case both µ and σ θ unknown Here θ = σ = i.e. θ θ = µ and θ = σ f X (x θ = n σ π exp( (x i µ /σ ( θ n/ exp (x i θ θ and Thus log f X (x θ = n log θ θ θ log f X (x θ = θ (x i θ (x i θ log f θ X (x θ = n θ log f X (x θ = (x θ θ θ i θ log f X (x θ = n + (x θ θ θ i θ
14 30CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY log f θ X (x θ = n (x θ θ 3 i θ Consequently I (θ = E ( n = n θ θ ( I (θ = E (X θ i θ = 0 ( n I (θ = E (X θ θ 3 i θ = n θ i.e. n 0 I(θ = θ n 0 θ and θ σ [I(θ] 0 = J(θ = n θ = 0 n σ n n Consequently, for unbiased estimators ˆµ, ˆσ of µ and σ respectively and V ar (ˆµ σ n V ar (ˆσ σ4 n 5. We were now interested in estimating the ratio σ µ = θ µ = θ and σ = θ are unknown. Since ( ( ( ( θ θ, = θ θ θ θ θ it follows that if ˆφ is an unbiased estimator of σ µ then θ θ when both (, = σ θ θ µ, µσ V ar (ˆφ ( σ µ, [I(θ] µσ σ µ µσ
15 3 = ( σ µ, µσ [ ] = σ4 + µ nµ 4 σ σ n 0 0 σ 4 n σ µ µσ Example Let X,X,...,X n be a random sample from the trinomial distribution with p.m.f. f(x θ = m! x!x! (m x x! θx θ x ( θ θ m x x where θ = (θ,θ are unknown values. Find lower bounds for the variance of unbiased estimators of θ and θ. Solution: The sample joint p.m.f. is and Hence f X (x θ = n log f X (x θ m! r i!r i!r i3! θr i θ r i ( θ θ m r i r i n x i log f X (x θ = θ θ log f X (x θ = x i log θ + x i log θ + + (m x i x i log ( θ θ θ n x i θ n (m x i x i ( θ θ n (m x i x i ( θ θ and since E (x i = mθ, E (x i = mθ ( I (θ = E log f X (X θ By symmetry I (θ = E θ ( θ log f X (X θ = nm θ + nm ( θ θ = nm nm + θ ( θ θ (.5 (.6
16 3CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Finally and hence n (m x i x i log f X (x θ = θ θ ( θ θ I (θ = E ( θ θ log f X (X θ = nm ( θ θ (.7 Hence the Fisher Information matrix is + I(θ = nm θ ( θ θ ( θ θ ( θ θ + θ ( θ θ and J (θ = [I (θ] = θ θ ( θ θ nm θ + ( θ θ ( θ θ ( θ θ + θ ( θ θ Thus if ˆθ is an unbiased estimator of θ V ar (ˆθ J (θ = θ θ ( θ θ nm = θ ( θ nm Similarly if ˆθ is an unbiased estimator of θ ( θ + V ar (ˆθ J (θ = θ ( θ nm ( θ θ Notice that if θ was assumed to be known then the CRLB for the variance of ˆθ would have been V ar (ˆθ I (θ i.e. V ar (ˆθ nm ( = θ ( θ θ θ + nm ( θ ( θ θ
17 33 Since ( θ ( θ > ( θ θ for 0 < θ,θ < it follows that J (θ > I (θ which indicates that if θ is unknown it would be wrong to use the inequality V ar (ˆθ I (θ since this inequality is not as tight as the correct inequality In general it can be shown that V ar (ˆθ J (θ J ii (θ I ii (θ for all i, with strict inequality (and hence improved lower bound for the variance of an unbiased estimator of θ i unless the Fisher information matrix I(θ is diagonal (see C.R.Rao (973 Linear Statistical Inference and its applicationse
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Principle of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
Practice problems for Homework 11 - Point Estimation
Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
General Sampling Methods
General Sampling Methods Reference: Glasserman, 2.2 and 2.3 Claudio Pacati academic year 2016 17 1 Inverse Transform Method Assume U U(0, 1) and let F be the cumulative distribution function of a distribution
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page
Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8
1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
Multivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
5. Continuous Random Variables
5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be
Hypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
Methods for Finding Bases
Methods for Finding Bases Bases for the subspaces of a matrix Row-reduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,
CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.
Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,
Linear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
Systems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
Normal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
4.5 Linear Dependence and Linear Independence
4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then
1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v,
1.3. DOT PRODUCT 19 1.3 Dot Product 1.3.1 Definitions and Properties The dot product is the first way to multiply two vectors. The definition we will give below may appear arbitrary. But it is not. It
by the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.
LECTURE 7: BLACK SCHOLES THEORY 1. Introduction: The Black Scholes Model In 1973 Fisher Black and Myron Scholes ushered in the modern era of derivative securities with a seminal paper 1 on the pricing
Estimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
Lecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
HYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
Probability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS Contents 1. Moment generating functions 2. Sum of a ranom number of ranom variables 3. Transforms
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability
CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces
Chapter 17. Orthogonal Matrices and Symmetries of Space
Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Hypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
E3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
T ( a i x i ) = a i T (x i ).
Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
minimal polyonomial Example
Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We
1 The Brownian bridge construction
The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof
Math 461 Fall 2006 Test 2 Solutions
Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two
Exact Confidence Intervals
Math 541: Statistical Theory II Instructor: Songfeng Zheng Exact Confidence Intervals Confidence intervals provide an alternative to using an estimator ˆθ when we wish to estimate an unknown parameter
Chapter 2. Parameterized Curves in R 3
Chapter 2. Parameterized Curves in R 3 Def. A smooth curve in R 3 is a smooth map σ : (a, b) R 3. For each t (a, b), σ(t) R 3. As t increases from a to b, σ(t) traces out a curve in R 3. In terms of components,
Chapter 8. The exponential family: Basics. 8.1 The exponential family
Chapter 8 The exponential family: Basics In this chapter we extend the scope of our modeling toolbox to accommodate a variety of additional data types, including counts, time intervals and rates. We introduce
Lecture 21. The Multivariate Normal Distribution
Lecture. The Multivariate Normal Distribution. Definitions and Comments The joint moment-generating function of X,...,X n [also called the moment-generating function of the random vector (X,...,X n )]
LS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
M2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung
M2S1 Lecture Notes G. A. Young http://www2.imperial.ac.uk/ ayoung September 2011 ii Contents 1 DEFINITIONS, TERMINOLOGY, NOTATION 1 1.1 EVENTS AND THE SAMPLE SPACE......................... 1 1.1.1 OPERATIONS
160 CHAPTER 4. VECTOR SPACES
160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results
For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )
Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll
Probability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
MULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
Gaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Factor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
Lecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
Unified Lecture # 4 Vectors
Fall 2005 Unified Lecture # 4 Vectors These notes were written by J. Peraire as a review of vectors for Dynamics 16.07. They have been adapted for Unified Engineering by R. Radovitzky. References [1] Feynmann,
The Bivariate Normal Distribution
The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
Solving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
The Mean Value Theorem
The Mean Value Theorem THEOREM (The Extreme Value Theorem): If f is continuous on a closed interval [a, b], then f attains an absolute maximum value f(c) and an absolute minimum value f(d) at some numbers
6. Define log(z) so that π < I log(z) π. Discuss the identities e log(z) = z and log(e w ) = w.
hapter omplex integration. omplex number quiz. Simplify 3+4i. 2. Simplify 3+4i. 3. Find the cube roots of. 4. Here are some identities for complex conjugate. Which ones need correction? z + w = z + w,
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
Time Series Analysis
Time Series Analysis Autoregressive, MA and ARMA processes Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 212 Alonso and García-Martos
Credit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
2.6. Probability. In general the probability density of a random variable satisfies two conditions:
2.6. PROBABILITY 66 2.6. Probability 2.6.. Continuous Random Variables. A random variable a real-valued function defined on some set of possible outcomes of a random experiment; e.g. the number of points
STAT 830 Convergence in Distribution
STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31
Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)
MAT067 University of California, Davis Winter 2007 Linear Maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) As we have discussed in the lecture on What is Linear Algebra? one of
CHAPTER IV - BROWNIAN MOTION
CHAPTER IV - BROWNIAN MOTION JOSEPH G. CONLON 1. Construction of Brownian Motion There are two ways in which the idea of a Markov chain on a discrete state space can be generalized: (1) The discrete time
5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
Statistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
An Introduction to Basic Statistics and Probability
An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random
An Introduction to Partial Differential Equations in the Undergraduate Curriculum
An Introduction to Partial Differential Equations in the Undergraduate Curriculum J. Tolosa & M. Vajiac LECTURE 11 Laplace s Equation in a Disk 11.1. Outline of Lecture The Laplacian in Polar Coordinates
Notes on Continuous Random Variables
Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes
Section 5.1 Continuous Random Variables: Introduction
Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,
Lecture 2: Universality
CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.
3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with
9 Multiplication of Vectors: The Scalar or Dot Product
Arkansas Tech University MATH 934: Calculus III Dr. Marcel B Finan 9 Multiplication of Vectors: The Scalar or Dot Product Up to this point we have defined what vectors are and discussed basic notation
Let H and J be as in the above lemma. The result of the lemma shows that the integral
Let and be as in the above lemma. The result of the lemma shows that the integral ( f(x, y)dy) dx is well defined; we denote it by f(x, y)dydx. By symmetry, also the integral ( f(x, y)dx) dy is well defined;
Quadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
Some probability and statistics
Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,
DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents
DIFFERENTIABILITY OF COMPLEX FUNCTIONS Contents 1. Limit definition of a derivative 1 2. Holomorphic functions, the Cauchy-Riemann equations 3 3. Differentiability of real functions 5 4. A sufficient condition
TOPIC 4: DERIVATIVES
TOPIC 4: DERIVATIVES 1. The derivative of a function. Differentiation rules 1.1. The slope of a curve. The slope of a curve at a point P is a measure of the steepness of the curve. If Q is a point on the
Solutions to Homework 10
Solutions to Homework 1 Section 7., exercise # 1 (b,d): (b) Compute the value of R f dv, where f(x, y) = y/x and R = [1, 3] [, 4]. Solution: Since f is continuous over R, f is integrable over R. Let x
Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS
Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and
