Lecture 21. The Multivariate Normal Distribution

Size: px
Start display at page:

Download "Lecture 21. The Multivariate Normal Distribution"

Transcription

1 Lecture. The Multivariate Normal Distribution. Definitions and Comments The joint moment-generating function of X,...,X n [also called the moment-generating function of the random vector (X,...,X n )] is defined by M(t,...,t n )=E[exp(t X + + t n X n )]. Just as in the one-dimensional case, the moment-generating function determines the density uniquely. The random variables X,...,X n are said to have the multivariate normal distribution or to be jointly Gaussian (we also say that the random vector (X,...,X n ) is Gaussian) if M(t,...,t n ) = exp(t µ + + t n µ n ) exp t i a ij t j where the t i and µ j are arbitrary real numbers, and the matrix A is symmetric and positive definite. Before we do anything else, let us indicate the notational scheme we will be using. Vectors will be written with an underbar, and are assumed to be column vectors unless otherwise specified. If t is a column vector with components t,...,t n, then to save space we write t =(t,...,t n ). The row vector with these components is the transpose of t, written t. The moment-generating function of jointly Gaussian random variables has the form ( ) M(t,...,t n ) = exp(t µ) exp t At. i,j= We can describe Gaussian random vectors much more concretely.. Theorem Joint Gaussian random variables arise from nonsingular linear transformations on independent normal random variables. Proof. Let X,...,X n be independent, with X i normal (0,λ i ), and let X =(X,...,X n ). Let Y = BX+µ where B is nonsingular. Then Y is Gaussian, as can be seen by computing the moment-generating function of Y : But E[exp(u X)] = M Y (t) =E[exp(t Y )] = E[exp(t BX)] exp(t µ). n E[exp(u i X i )] = exp ( n λ i u i / ) = exp ( u Du ) i= i=

2 where D is a diagonal matrix with λ i s down the main diagonal. Set u = B t,u = t B; then M Y (t) = exp(t µ) exp( t BDB t) and BDB is symmetric since D is symmetric. Since t BDB t = u Du, which is greater than 0 except when u =0(equivalently when t =0because B is nonsingular), BDB is positive definite, and consequently Y is Gaussian. Conversely, suppose that the moment-generating function of Y is exp(t µ) exp[(/)t At)] where A is symmetric and positive definite. Let L be an orthogonal matrix such that L AL = D, where D is the diagonal matrix of eigenvalues of A. Set X = L (Y µ), so that Y = µ + LX. The moment-generating function of X is E[exp(t X)] = exp( t L µ)e[exp(t L Y )]. The last term is the moment-generating function of Y with t replaced by t L, or equivalently, t replaced by Lt. Thus the moment-generating function of X becomes exp( t L µ) exp(t L µ) exp ( t L ALt ) This reduces to exp ( t Dt ) = exp ( λ i t ) i. Therefore the X i are independent, with X i normal (0,λ i ). i=.3 A Geometric Interpretation Assume for simplicity that the random variables X i have zero mean. If E(U) =E(V )=0 then the covariance of U and V is E(UV), which can be regarded as an inner product. Then Y µ,...,y n µ n span an n-dimensional space, and X,...,X n is an orthogonal basis for that space. We will see later in the lecture that orthogonality is equivalent to independence. (Orthogonality means that the X i are uncorrelated, i.e., E(X i X j ) = 0 for i j.).4 Theorem Let Y = µ + LX as in the proof of (.), and let A be the symmetric, positive definite matrix appearing in the moment-generating function of the Gaussian random vector Y. Then E(Y i )=µ i for all i, and furthermore, A is the covariance matrix of the Y i, in other words, a ij =Cov(Y i,y j ) (and a ii =Cov(Y i,y i )=VarY i ). It follows that the means of the Y i and their covariance matrix determine the momentgenerating function, and therefore the density.

3 3 Proof. Since the X i have zero mean, we have E(Y i )=µ i. Let K be the covariance matrix of the Y i. Then K can be written in the following peculiar way: Y µ K = E. (Y µ,...,y n µ n ). Y n µ n Note that if a matrix M is n by and a matrix N isbyn, then MN is n by n. In this case, the ij entry is E[(Y i µ i )(Y j µ j )] = Cov(Y i,y j ). Thus K = E[(Y µ)(y µ) ]=E(LXX L )=LE(XX )L since expectation is linear. [For example, E(MX)=ME(X) because E( j m ijx j )= j m ije(x j ).] But E(XX ) is the covariance matrix of the X i, which is D. Therefore K = LDL = A (because L AL = D)..5 Finding the Density From Y = µ + LX we can calculate the density of Y. The Jacobian of the transformation from X to Y is det L = ±, and f X (x,...,x n )= ( exp ( π) n λ λ n x ) i /λ i. We have λ λ n = det D = det K because det L = det L = ±. Thus f X (x,...,x n )= i= ( π) n det K exp ( x D x ). But y = µ + Lx, x = L (y µ), x D x =(y µ) LD L (y µ), and [see the end of (.4)] K = LDL, K = LD L. The density of Y is f Y (y,...,y n )= ( π) n det K exp [ (y µ) K (y µ ) ]..6 Individually Gaussian Versus Jointly Gaussian If X,...,X n are jointly Gaussian, then each X i is normally distributed (see Problem 4), but not conversely. For example, let X be normal (0,) and flip an unbiased coin. If the coin shows heads, set Y = X, and if tails, set Y = X. Then Y is also normal (0,) since P {Y y} = P {X y} + P { X y} = P {X y} because X is also normal (0,). Thus F X = F Y. But with probability /, X +Y =X, and with probability /, X + Y = 0. Therefore P {X + Y =0} =/. If X and Y were jointly Gaussian, then X + Y would be normal (Problem 4). We conclude that X and Y are individually Gaussian but not jointly Gaussian.

4 4.7 Theorem If X,...,X n are jointly Gaussian and uncorrelated (Cov(X i,x j ) = 0 for all i j), then the X i are independent. Proof. The moment-generating function of X =(X,...,X n )is M X (t) = exp(t µ) exp ( t Kt ) where K is a diagonal matrix with entries σ,σ,...,σn down the main diagonal, and 0 s elsewhere. Thus n M X (t) = exp(t i µ i ) exp ( σ i t ) i i= which is the joint moment-generating function of independent random variables X,...,X n, whee X i is normal (µ i,σ i )..8 A Conditional Density Let X,...,X n be jointly Gaussian. We find the conditional density of X n given X,...,X n : f(x n x,...,x n )= f(x,...,x n ) f(x,...,x n ) with f(x,...,x n )=(π) n/ (det K) / exp [ ] y i q ij y j i,j= where Q = K =[q ij ],y i = x i µ i. Also, Now f(x,...,x n )= y i q ij y j = i,j= n i,j= f(x,...,x n,x n ) dx n = B(y,...,y n ). n n y i q ij y j + y n q nj y j + y n q in y i + q nn yn. j= i= Thus the conditional density has the form A(y,...,y n ) B(y,...,y n ) exp[ (Cy n + D(y,...,y n )y n ] with C =(/)q nn, D = n j= q njy j = n i= q iny i since Q = K is symmetric. The conditional density may now be expressed as A B exp ( D [ ) exp C(y n + D ]. 4C C )

5 5 We conclude that given X,...,X n, X n is normal. The conditional variance of X n (the same as the conditional variance of Y n = X n µ n )is C = q nn because σ = C, σ = C. Thus and the conditional mean of Y n is so the conditional mean of X n is Var(X n X,...,X n )= q nn D C = n q nj Y j q nn j= E(X n X,...,X n )=µ n n q nj (X j µ j ). q nn Recall from Lecture 8 that E(Y X) is the best estimate of Y based on X, in the sense that the mean square error is minimized. In the joint Gaussian case, the best estimate of X n based on X,...,X n is linear, and it follows that the best linear estimate is in fact the best overall estimate. This has important practical applications, since linear systems are usually much easier than nonlinear systems to implement and analyze. Problems. Let K be the covariance matrix of arbitrary random variables X,...,X n. Assume that K is nonsingular to avoid degenerate cases. Show that K is symmetric and positive definite. What can you conclude if K is singular?. If X is a Gaussian n-vector and Y = AX with A nonsingular, show that Y is Gaussian. 3. If X,...,X n are jointly Gaussian, show that X,...,X m are jointly Gaussian for m n. 4. If X,...,X n are jointly Gaussian, show that c X + + c n X n is a normal random variable (assuming it is nondegenerate, i.e., not identically constant). j=

6 6 Lecture. The Bivariate Normal Distribution. Formulas The general formula for the n-dimensional normal density is f X (x,...,x n )= ( exp [ π) n det K (x µ) K (x µ) ] where E(X) =µ and K is the covariance matrix of X. We specialize to the case n =: [ ] [ ] σ K = σ σ σ σ = ρσ σ ρσ σ σ, σ =Cov(X,X ); [ ] K σ = ρσ σ σ σ ( ρ ) ρσ σ σ Thus the joint density of X and X is { πσ σ ρ exp ( ρ ) = ρ [ /σ ρ/σ σ ρ/σ σ /σ ]. [ ( x µ σ ) ρ (x µ σ )(x µ σ ) + (x µ σ ) The moment-generating function of X is M X (t,t ) = exp(t µ) exp ( t Kt ) [ = exp t µ + t µ + σ ( t +ρσ σ t t + σt ) ]. If X and X are jointly Gaussian and uncorrelated, then ρ = 0, so that f(x,x ) is the product of a function g(x )ofx alone and a function h(x )ofx alone. It follows that X and X are independent. (We proved independence in the general n-dimensional case in Lecture.) From the results at the end of Lecture, the conditional distribution of X given X is normal, with E(X X = x )=µ q (x µ ) q where q = ρ/σ σ q /σ = ρσ. σ Thus E(X X = x )=µ + ρσ (x µ ) σ and Var(X X = x )= = σ q ( ρ ). For E(X X = x ) and Var(X X = x ), interchange µ and µ, and interchange σ and σ. ]}.

7 . Example Let X be the height of the father, Y the height of the son, in a sample of father-son pairs. Assume X and Y bivariate normal, as found by Karl Pearson around 900. Assume E(X) = 68 (inches), E(Y ) = 69, σ X = σ Y =,ρ =.5. (We expect ρ to be positive because on the average, the taller the father, the taller the son.) Given X = 80 (6 feet 8 inches), Y is normal with mean µ Y + ρσ Y (x µ X )=69+.5(80 68) = 75 σ X which is 6 feet 3 inches. The variance of Y given X =80is σ Y ( ρ ) = 4(3/4) = 3. Thus the son will tend to be of above average height, but not as tall as the father. This phenomenon is often called regression, and the line y = µ Y +(ρσ Y /σ X )(x µ X ) is called the line of regression or the regression line. Problems. Let X and Y have the bivariate normal distribution. The following facts are known: µ X =,σ X =, and the best estimate of Y based on X, i.e., the estimate that minimizes the mean square error, is given by 3X +7. The minimum mean square error is 8. Find µ X,σ Y and the correlation coefficient ρ between X and Y.. Show that the bivariate normal density belongs to the exponential class, and find the corresponding complete sufficient statistic. 7

8 8 Lecture 3. Cramér-Rao Inequality 3. A Strange Random Variable Given a density f θ (x), <x<, a<θ<b. We have found maximum likelihood estimates by computing θ ln f θ(x). If we replace x by X, we have a random variable. To see what is going on, let s look at a discrete example. If X takes on values x,x,x 3,x 4 with p(x )=.5,p(x )=p(x 3 )=.,p(x 4 )=., then p(x) is a random variable with the following distribution: P {p(x) =.5} =.5, P{p(X) =.} =.4, P{p(X) =.} =. For example, if X = x then p(x) =p(x )=., and if X = x 3 then p(x) =p(x 3 )=.. The total probability that p(x) =. is.4. The continuous case is, at first sight, easier to handle. If X has density f and X = x, then f(x) =f(x). But what is the density of f(x)? We will not need the result, but the question is interesting and is considered in Problem. The following two lemmas will be needed to prove the Cramér-Rao inequality, which can be used to compute uniformly minimum variance unbiased estimates. In the calculations to follow, we are going to assume that all differentiations under the integral sign are legal. 3. Lemma Proof. The expectation is E θ [ θ ln f θ(x) ] =0. [ θ ln f θ(x) ] f θ (x) dx = f θ (x) f θ (x) f θ (x) dx θ which reduces to f θ (x) dx = θ θ ()= Lemma Let Y = g(x) and assume E θ (Y )=k(θ). If k (θ) =dk(θ)/dθ, then Proof. We have k (θ) =E θ [ Y θ ln f θ(x) ]. k (θ) = θ E θ[g(x)] = θ g(x)f θ (x) dx = g(x) f θ(x) θ dx

9 9 = g(x) f θ(x) θ f θ (x) f θ(x) dx = g(x) [ θ ln f θ(x) ] f θ (x) dx = E θ [g(x) θ ln f θ(x)] = E θ [Y θ ln f θ(x)]. 3.4 Cramér-Rao Inequality Under the assumptions of (3.3), we have Var θ Y Proof. By the Cauchy-Schwarz inequality, hence [k (θ)] E θ [( θ ln f θ(x) ) ]. [Cov(V,W)] =(E[(V µ V )(W µ W )]) Var V Var W [Cov θ (Y, θ ln f θ(x))] Var θ Y Var θ Since E θ [( / θ)lnf θ (X)] = 0 by (3.), this becomes θ ln f θ(x). (E θ [Y θ ln f θ(x)]) Var θ YE θ [( θ ln f θ(x)) ]. By (3.3), the left side is [k (θ)], and the result follows. 3.5 A Special Case Let X,...,X n be iid, each with density f θ (x), and take X = (X,...,X n ). Then f θ (x,...,x n )= n i= f θ(x i ) and by (3.), [( E θ θ ln f θ(x) ) ] =Varθ θ ln f θ(x) =Var θ i= θ ln f θ(x i ) 3.6 Theorem = n Var θ θ ln f θ(x i )=ne θ [( θ ln f θ(x i ) ) ]. Let X,...,X n be iid, each with density f θ (x). If Y = g(x,...,x n ) is an unbiased estimate of θ, then Var θ Y ne θ [( θ ln f θ(x i ) ) ].

10 0 Proof. Applying (3.5), we have a special case of the Cramér-Rao inequality (3.4) with k(θ) =θ, k (θ) =. The lower bound in (3.6) is /ni(θ), where [( I(θ) =E θ θ ln f θ(x i ) ) ] is called the Fisher information. It follows from (3.6) that if Y is an unbiased estimate that meets the Cramér-Rao inequality for all θ (an efficient estimate), then Y must be a UMVUE of θ. 3.7 A Computational Simplification From (3.) we have Differentiate again to obtain Thus ( θ ln f θ(x) ) f θ (x) dx =0. ln f θ (x) θ f θ (x) dx + ln f θ (x) θ f θ (x) dx + ln f θ (x) f θ (x) dx =0. θ θ ln f θ (x)[ f θ (x) θ θ f θ (x) ] fθ (x) dx =0. But the term in brackets on the right is ln f θ (x)/ θ, sowehave ln f θ (x) ( θ f θ (x) dx + θ ln f θ(x) ) fθ (x) dx =0. Therefore [( I(θ) =E θ θ ln f θ(x i ) ) ] [ ln f θ (X i )] = Eθ. θ Problems. If X is a random variable with density f(x), explain how to find the distribution of the random variable f(x).. Use the Cramér-Rao inequality to show that the sample mean is a UMVUE of the true mean in the Bernoulli, normal (with σ known) and Poisson cases.

11 Lecture 4. Nonparametric Statistics We wish to make a statistical inference about a random variable X even though we know nothing at all about its underlying distribution. 4. Percentiles Assume F continuous and strictly increasing. If 0 < p <, then the equation F (x) = p has a unique solution ξ p, so that P {X ξ p } = p. When p =/, ξ p is the median; when p =.3, ξ p is the 30-th percentile, and so on. Let X,...,X n be iid, each with distribution function F, and let Y,...,Y n be the order statistics. We will consider the problem of estimating ξ p. 4. Point Estimates On the average, np of the observations will be less than ξ p. (We have n Bernoulli trials, with probability of success P {X i <ξ p } = F (ξ p )=p.) It seems reasonable to use Y k as an estimate of ξ p, where k is approximately np. We can be a bit more precise. The random variables F (X ),...,F(X n ) are iid, uniform on (0,) [see (8.5)]. Thus F (Y ),...,F(Y n ) are the order statistics from a uniform (0,) sample. We know from Lecture 6 that the density of F (Y k )is n! (k )!(n k)! xk ( x) n k, 0 <x<. Therefore n! E[F (Y k )] = 0 (k )!(n k)! xk ( x) n k n! dx = β(k +,n k +). (k )!(n k)! Now β(k +,n k +) = Γ(k +)Γ(n k +)/Γ(n +) = k!(n k)!/(n +)!, and consequently E[F (Y k )] = k n +, k n. Define Y 0 = and Y n+ =, so that E[F (Y k+ ) F (Y k )] = n +, 0 k n. (Note that when k = n, the expectation is [n/(n +)] = /(n +), as asserted.) The key point is that on the average, each [Y k,y k+ ] produces area /(n +) under the density f of the X i. This is true because Yk+ f(x) dx = F (Y k+ ) F (Y k ) Y k and we have just seen that the expectation of this quantity is /(n +), k =0,,...,n. If we want to accumulate area p, set k/(n +) = p, that is, k =(n +)p.

12 Conclusion: If (n +)p is an integer, estimate ξ p by Y (n+)p. If (n +)p is not an integer, we can use a weighted average. For example, if p =.6 and n = 3 then (n +)p =4.6 =8.4. Now if (n +)p were 8, we would use Y 8, and if (n +)p were 9 we would use Y 9. If (n +)p =8+λ, we use ( λ)y 8 + λy 9. In the present case, λ =.4, so we use.6y 8 +.4Y 9 = Y 8 +.4(Y 9 Y 8 ). 4.3 Confidence Intervals Select order statistics Y i and Y j, where i and j are (approximately) symmetrical about (n +)p. Then P {Y i <ξ p <Y j } is the probability that the number of observations less than ξ p is at least i but less than j, i.e., between i and j, inclusive. The probability that exactly k observations will be less than ξ p is ( n k) p k ( p) n k, hence j ( ) n P {Y i <ξ p <Y j } = p k ( p) n k. k k=i Thus (Y i,y j ) is a confidence interval for ξ p, and we can find the confidence level by evaluating the above sum, possibly with the aid of the normal approximation to the binomial. 4.4 Hypothesis Testing First let s look at a numerical example. The 30-th percentile ξ.3 will be less than 68 precisely when F (ξ.3 ) <F(68), because F is continuous and strictly increasing. Therefore ξ.3 < 68 iff F (68) >.3. Similarly, ξ.3 > 68 iff F (68) <.3, and ξ.3 =68iffF (68) =.3. In general, and ξ p0 <ξ F (ξ) >p 0, ξ p0 >ξ F (ξ) <p 0 ξ p0 = ξ F (ξ) =p 0. In our numerical example, if F (68) were actually.4, then on the average, 40 percent of the observations will be 68 or less, as opposed to 30 percent if F (68) =.3. Thus a larger than expected number of observations less than or equal to 68 will tend to make us reject the hypothesis that the 30-th percentile is exactly 68. In general, our problem will be H 0 : ξ p0 = ξ ( F (ξ) =p 0 ) H : ξ p0 <ξ ( F (ξ) >p 0 ) where p 0 and ξ are specified. If Y is the number of observations less than or equal to ξ, we propose to reject H 0 if Y c. (If H is ξ p0 >ξ, i.e., F (ξ) <p 0, we reject if Y c.) Note that Y is the number of nonpositive signs in the sequence X ξ,...,x n ξ, and for this reason, the terminology sign test is used.

13 3 Since we are trying to determine whether F (ξ) is equal to p 0 or greater than p 0,we may regard θ = F (ξ) as the unknown state of nature. The power function of the test is K(θ) =P θ {Y c} = k=c ( ) n θ k ( θ) n k k and in particular, the significance level (probability of a type error) is α = K(p 0 ). The above confidence interval estimates and the sign test are distribution free, that is, independent of the underlying distribution function F. Problems are deferred to Lecture 5.

14 4 Lecture 5. The Wilcoxon Test We will need two formulas: k = k= n(n +)(n +), 6 [ ] n(n +) k 3 =. k= For a derivation via the calculus of finite differences, see my on-line text A Course in Commutative Algebra, Section 5.. The hypothesis testing problem addressed by the Wilcoxon test is the same as that considered by the sign test, except that: () We are restricted to testing the median ξ.5. () We assume that X,...,X n are iid and the underlying density is symmetric about the median (so we are not quite nonparametric). There are many situations where we suspect an underlying normal distribution but are not sure. In such cases, the symmetry assumption may be reasonable. (3) We use the magnitudes as well as the signs of the deviations X i ξ.5, so the Wilcoxon test should be more accurate than the sign test. 5. How The Test Works Suppose we are testing H 0 : ξ.5 = m vs. H : ξ.5 >mbased on observations X,...,X n. We rank the absolute values X i m from smallest to largest. For example, let n =5 and X m =.7,X m =.3,X 3 m = 0.3,X 4 m = 3.,X 5 m =.4. Then X 3 m < X m < X 5 m < X m < X 4 m. Let R i be the rank of X i m, so that R 3 =,R =,R 5 =3,R =4,R 4 = 5. Let Z i be the sign of X i m, so that Z i = ±. Then Z 3 =,Z =,Z 5 =,Z =,Z 4 =. The Wilcoxon statistic is W = Z i R i. i= In this case, W = =. Because the density is symmetric about the median, if R i is given then Z i is still equally likely to be ±, so (R,...,R n ) and (Z,...,Z n ) are independent. (Note that if R j is given, the odds about Z i (i j) are unaffected since the observations X,...,X n are independent.) Now the R i are simply a permutation of (,,..., n), so W is a sum of independent random variables V i where V i = ±i with equal probability.

15 5 5. Properties Of The Wilcoxon Statistic Under H 0,E(V i ) = 0 and Var V i = E(V i )=i,so E(W )= E(V i )=0, Var W = i= i = i= n(n +)(n +). 6 The V i do not have the same distribution, but the central limit theorem still applies because Liapounov s condition is satisfied: n i= E[ V i µ i 3 ] ( 0 as n. n i= σ i )3/ Now the V i have mean µ i =0,so V i µ i 3 = V i 3 = i 3 and σ i =VarV i = i. Thus the Liapounov fraction is the sum of the first n cubes divided by the 3/ power of the sum of the first n squares, which is n (n +) /4 [n(n +)(n +)/6] 3/. For large n, the numerator is of the order of n 4 and the denominator is of the order of (n 3 ) 3/ = n 9/. Therefore the fraction is of the order of / n 0asn. By the central limit theorem, [W E(W )]/σ(w ) is approximately normal (0,) for large n, with E(W ) = 0 and σ (W )=n(n +)(n +)/6. If the median is larger than its value m under H 0, we expect W to have a positive bias. Thus we reject H 0 if W c. (If H were ξ.5 <m), we would reject if W c.) The value of c is determined by our choice of the significance level α. Problems. Suppose we are using a sign test with n = observations to decide between the null hypothesis H 0 : m = 40 and the alternative H : m>40, whee m is the median. We use the statistic Y = the number of observations that are less than or equal to 40. We reject H 0 if and only if Y c. Find the power function K(p) in terms of c and p = F (40), and the probability α of a type error for c =.. Let m be the median of a random variable with density symmetric about m. Using the Wilcoxon test, we are testing H 0 : m = 60 vs. H : m>60 based on n =6 observations, which are as follows: 76.9, 58.3, 5., 58.8, 7.4, 69.8, 59.7, 6.7, 56.6, 74.5, 84.4, 65., 47.8, 77.8, 60., Compute the Wilcoxon statistic and determine whether H 0 is rejected at the.05 significance level, i.e., the probability of a type error is When n is small, the distribution of W can be found explicitly. Do it for n =,, 3.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Efficiency and the Cramér-Rao Inequality

Efficiency and the Cramér-Rao Inequality Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Some probability and statistics

Some probability and statistics Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Section 5.1 Continuous Random Variables: Introduction

Section 5.1 Continuous Random Variables: Introduction Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

5. Continuous Random Variables

5. Continuous Random Variables 5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

Math 461 Fall 2006 Test 2 Solutions

Math 461 Fall 2006 Test 2 Solutions Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two

More information

2WB05 Simulation Lecture 8: Generating random variables

2WB05 Simulation Lecture 8: Generating random variables 2WB05 Simulation Lecture 8: Generating random variables Marko Boon http://www.win.tue.nl/courses/2wb05 January 7, 2013 Outline 2/36 1. How do we generate random variables? 2. Fitting distributions Generating

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i ) Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X Week 6 notes : Continuous random variables and their probability densities WEEK 6 page 1 uniform, normal, gamma, exponential,chi-squared distributions, normal approx'n to the binomial Uniform [,1] random

More information

Statistics 100A Homework 8 Solutions

Statistics 100A Homework 8 Solutions Part : Chapter 7 Statistics A Homework 8 Solutions Ryan Rosario. A player throws a fair die and simultaneously flips a fair coin. If the coin lands heads, then she wins twice, and if tails, the one-half

More information

Probability Generating Functions

Probability Generating Functions page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

More information

Lecture 13: Martingales

Lecture 13: Martingales Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

More information

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Stat 5102 Notes: Nonparametric Tests and. confidence interval Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

More information

The Bivariate Normal Distribution

The Bivariate Normal Distribution The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

More information

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is. Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

Probability Calculator

Probability Calculator Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

NOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

NOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane Mathematical Sciences Paper II Time Allowed : 75 Minutes] [Maximum Marks : 100 Note : This Paper contains Fifty (50) multiple choice questions. Each question carries Two () marks. Attempt All questions.

More information

1 Lecture: Integration of rational functions by decomposition

1 Lecture: Integration of rational functions by decomposition Lecture: Integration of rational functions by decomposition into partial fractions Recognize and integrate basic rational functions, except when the denominator is a power of an irreducible quadratic.

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

The Determinant: a Means to Calculate Volume

The Determinant: a Means to Calculate Volume The Determinant: a Means to Calculate Volume Bo Peng August 20, 2007 Abstract This paper gives a definition of the determinant and lists many of its well-known properties Volumes of parallelepipeds are

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Lecture Notes 1. Brief Review of Basic Probability

Lecture Notes 1. Brief Review of Basic Probability Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

More information

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

More information

Solving Linear Systems, Continued and The Inverse of a Matrix

Solving Linear Systems, Continued and The Inverse of a Matrix , Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

More information

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1]. Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

More information

4 Sums of Random Variables

4 Sums of Random Variables Sums of a Random Variables 47 4 Sums of Random Variables Many of the variables dealt with in physics can be expressed as a sum of other variables; often the components of the sum are statistically independent.

More information

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors. 3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

What you CANNOT ignore about Probs and Stats

What you CANNOT ignore about Probs and Stats What you CANNOT ignore about Probs and Stats by Your Teacher Version 1.0.3 November 5, 2009 Introduction The Finance master is conceived as a postgraduate course and contains a sizable quantitative section.

More information

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010 Multivariate Normal Distribution Rebecca Jeings, Mary Wakeman-Li, Xin Zhao November, 00. Basics. Parameters We say X ~ N n (µ, ) with parameters µ = [E[X ],.E[X n ]] and = Cov[X i X j ] i=..n, j= n. The

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Correlation in Random Variables

Correlation in Random Variables Correlation in Random Variables Lecture 11 Spring 2002 Correlation in Random Variables Suppose that an experiment produces two random variables, X and Y. What can we say about the relationship between

More information

F Matrix Calculus F 1

F Matrix Calculus F 1 F Matrix Calculus F 1 Appendix F: MATRIX CALCULUS TABLE OF CONTENTS Page F1 Introduction F 3 F2 The Derivatives of Vector Functions F 3 F21 Derivative of Vector with Respect to Vector F 3 F22 Derivative

More information

General Sampling Methods

General Sampling Methods General Sampling Methods Reference: Glasserman, 2.2 and 2.3 Claudio Pacati academic year 2016 17 1 Inverse Transform Method Assume U U(0, 1) and let F be the cumulative distribution function of a distribution

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Notes on Continuous Random Variables

Notes on Continuous Random Variables Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

To give it a definition, an implicit function of x and y is simply any relationship that takes the form: 2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

( ) is proportional to ( 10 + x)!2. Calculate the

( ) is proportional to ( 10 + x)!2. Calculate the PRACTICE EXAMINATION NUMBER 6. An insurance company eamines its pool of auto insurance customers and gathers the following information: i) All customers insure at least one car. ii) 64 of the customers

More information

Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

More information