What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference



Similar documents
Lecture Notes 1. Brief Review of Basic Probability

e.g. arrival of a customer to a service station or breakdown of a component in some system.

Introduction to Probability

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Sums of Independent Random Variables

E3: PROBABILITY AND STATISTICS lecture notes

Chapter 4 Lecture Notes

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Section 5.1 Continuous Random Variables: Introduction

Probability Generating Functions

Joint Exam 1/P Sample Exam 1

Stat 704 Data Analysis I Probability Review

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Important Probability Distributions OPRE 6301

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Exponential Distribution

Basics of Statistical Machine Learning

ST 371 (IV): Discrete Random Variables

The Exponential Distribution

Lecture 13: Martingales

5. Continuous Random Variables

Master s Theory Exam Spring 2006

Lecture 6: Discrete & Continuous Probability and Random Variables

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

A Tutorial on Probability Theory

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Random variables, probability distributions, binomial random variable

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Lecture 7: Continuous Random Variables

Chapter 6: Point Estimation. Fall Probability & Statistics

M2S1 Lecture Notes. G. A. Young ayoung

Math 431 An Introduction to Probability. Final Exam Solutions

Statistics 100A Homework 8 Solutions

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 2 Solutions

Aggregate Loss Models

Math 370, Spring 2008 Prof. A.J. Hildebrand. Practice Test 2 Solutions

THE CENTRAL LIMIT THEOREM TORONTO

STA 256: Statistics and Probability I

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

MAS108 Probability I

Statistics 100A Homework 7 Solutions

An Introduction to Basic Statistics and Probability

Practice problems for Homework 11 - Point Estimation

6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS

Lecture 8. Confidence intervals and the central limit theorem

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

6 PROBABILITY GENERATING FUNCTIONS

4. Continuous Random Variables, the Pareto and Normal Distributions

STAT 830 Convergence in Distribution

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

3. The Economics of Insurance

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

TABLE OF CONTENTS. 4. Daniel Markov 1 173

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Statistics 100A Homework 4 Solutions

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process

( ) is proportional to ( 10 + x)!2. Calculate the

Section 6.1 Joint Distribution Functions

Random Variables. Chapter 2. Random Variables 1

Math 461 Fall 2006 Test 2 Solutions

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability Distribution for Discrete Random Variables

2WB05 Simulation Lecture 8: Generating random variables

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

ECE302 Spring 2006 HW3 Solutions February 2,

Sections 2.11 and 5.8

Notes on Continuous Random Variables

Poisson Processes. Chapter Exponential Distribution. The gamma function is defined by. Γ(α) = t α 1 e t dt, α > 0.

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 5 Solutions

Maximum Likelihood Estimation

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

Math 370, Spring 2008 Prof. A.J. Hildebrand. Practice Test 1 Solutions

Principle of Data Reduction

Manual for SOA Exam MLC.

Chapter 3 RANDOM VARIATE GENERATION

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

ISyE 6761 Fall 2012 Homework #2 Solutions

Optimization of Business Processes: An Introduction to Applied Stochastic Modeling. Ger Koole Department of Mathematics, VU University Amsterdam

Chapter 5. Random variables

1 Prior Probability and Posterior Probability

The Kelly Betting System for Favorable Games.

Recursive Estimation

Introduction to Queueing Theory and Stochastic Teletraffic Models

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

0 x = 0.30 x = 1.10 x = 3.05 x = 4.15 x = x = 12. f(x) =

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

Transcription:

0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures for gaining and processing quantitative evidence in order to help us make judgements and decisions It can include Design of experiments and studies Exploring data using graphics Informal interpretation of data Formal statistical analysis Clear communication of conclusions and uncertainty It is NOT just data analysis! In this course we shall focus on formal statistical inference: we assume we have data generated from some unknown probability model we aim to use the data to learn about certain properties of the underlying probability model Lecture 1. Introduction and probability review 1 (1 37) Lecture 1. Introduction and probability review 2 (1 37) Idea of parametric inference 1. Introduction and probability review 1.2. Idea of parametric inference Let X be a random variable (r.v.) taking values in X Assume distribution of X belongs to a family of distributions indexed by a scalar or vector parameter θ, taking values in some parameter space Θ Call this a parametric family: For example, we could have X Poisson(µ), θ = µ Θ = (0, ) X N(µ, σ 2 ), θ = (µ, σ 2 ) Θ = R (0, ). BIG ASSUMPTION For some results (bias, mean squared error, linear model) we do not need to specify the precise parametric family. But generally we assume that we know which family of distributions is involved, but that the value of θ is unknown. 1. Introduction and probability review 1.2. Idea of parametric inference Let X 1, X 2,..., X n be independent and identically distributed (iid) with the same distribution as X, so that X = (X 1, X 2,..., X n ) is a simple random sample (our data). We use the observed X = x to make inferences about θ, such as, (a) giving an estimate ˆθ(x) of the true value of θ (point estimation); (b) giving an interval estimate (ˆθ 1 (x), (ˆθ 2 (x)) for θ; (c) testing a hypothesis about θ, eg testing the hypothesis H : θ = 0 means determining whether or not the data provide evidence against H. We shall be dealing with these aspects of statistical inference. Other tasks (not covered in this course) include Checking and selecting probability models Producing predictive distributions for future random variables Classifying units into pre-determined groups ( supervised learning ) Finding clusters ( unsupervised learning ) Lecture 1. Introduction and probability review 3 (1 37) Lecture 1. Introduction and probability review 4 (1 37)

1. Introduction and probability review 1.2. Idea of parametric inference Statistical inference is needed to answer questions such as: What are the voting intentions before an election? [Market research, opinion polls, surveys] What is the effect of obesity on life expectancy? [Epidemiology] What is the average benefit of a new cancer therapy? Clinical trials What proportion of temperature change is due to man? Environmental statistics What is the benefit of speed cameras? Traffic studies What portfolio maximises expected return? Financial and actuarial applications How confident are we the Higgs Boson exists? Science What are possible benefits and harms of genetically-modified plants? Agricultural experiments What proportion of the UK economy involves prostitution and illegal drugs? Official statistics What is the chance Liverpool will best Arsenal next week? Sport Probability review 1. Introduction and probability review 1.3. Probability review Let Ω be the sample space of all possible outcomes of an experiment or some other data-gathering process. E.g when flipping two coins, Ω = {HH, HT, TH, TT }. Nice (measurable) subsets of Ω are called events, and F is the set of all events - when Ω is countable, F is just the power set (set of all subsets) of Ω. A function P : F [0,1] called a probability measure satisfies P(φ) = 0 P(Ω) = 1 P( n=1 A n) = n=1 P(A n), whenever {A n } is a disjoint sequence of events. A random variable is a (measurable) function X : Ω R. Thus for the two coins, we might set X (HH) = 2, XX (HT ) = 1, X (TH) = 1, X (TT ) = 0, so X is simply the number of heads. Lecture 1. Introduction and probability review 5 (1 37) Lecture 1. Introduction and probability review 6 (1 37) 1. Introduction and probability review 1.3. Probability review 1. Introduction and probability review 1.3. Probability review Our data are modelled by a vector X = (X 1,..., X n ) of random variables each observation is a random variable. The distribution function of a r.v. X is F X (x) = P(X x), for all x R. So F X is non-decreasing, 0 F X (x) 1 for all x, F X (x) 1 as x, F X (x) 0 as x. A discrete random variable takes values only in some countable (or finite) set X, and has a probability mass function (pmf) f X (x) = P(X = x). f X (x) is zero unless x is in X. f X (x) 0 for all x, x X f X (x) = 1 P(X A) = x A f X (x) for a set A. We say X has a continuous (or, more precisely, absolutely continuous) distribution if it has a probability density function (pdf) f X such that P(X A) = A f X (t)dt for nice sets A. Thus f X (t)dt = 1 F X (x) = x f X (t)dt [Notation note: There will be inconsistent use of a subscript in mass, density and distributions functions to denote the r.v. Also f will sometimes be p.] Lecture 1. Introduction and probability review 7 (1 37) Lecture 1. Introduction and probability review 8 (1 37)

Expectation and variance 1. Introduction and probability review 1.4. Expectation and variance If X is discrete, the expectation of X is Independence 1. Introduction and probability review 1.5. Independence E(X ) = x X (exists when x P(X = x) < ). If X is continuous, then E(X ) = xp(x = x) xf X (x)dx (exists when x f X (x)dx < ). E(X ) is also called the expected value or the mean of X. If g : R R then x X g(x)p(x = x) E(g(X )) = g(x)fx (x)dx if X is discrete if X is continuous. ( (X ) ) 2 The variance of X is var(x ) = E E(X ) = E ( X 2) ( E(X ) ) 2. Lecture 1. Introduction and probability review 9 (1 37) The random variables X 1,..., X n are independent if for all x 1,..., x n, P(X 1 x 1,..., X n x n ) = P(X 1 x 1 )... P(X n x n ). If the independent random variables X 1,..., X n have pdf s or pmf s f X1,..., f Xn, then the random vector X = (X 1,..., X n ) has pdf or pmf f X (x) = i f Xi (x i ). Random variables that are independent and that all have the same distribution (and hence the same mean and variance) are called independent and identically distributed (iid) random variables. Lecture 1. Introduction and probability review 10 (1 37) 1. Introduction and probability review 1.6. Maxima of iid random variables Maxima of iid random variables 1. Introduction and probability review 1.7. Sums and linear transformations of random variables Sums and linear transformations of random variables For any random variables, Let X 1,..., X n be iid r.v. s, and Y = max(x 1,..., X n ). Then F Y (y) = P(Y y) = P(max(X 1,..., X n ) y) = P(X 1 y,..., X n y) = P(X i y) n = [F X (y)] n The density for Y can then be obtained by differentiation (if continuous), or differencing (if discrete). Can do similar analysis for minima of iid r.v. s. E(X 1 + + X n ) = E(X 1 ) + + E(X n ) E(a 1 X 1 + b 1 ) = a 1 E(X 1 ) + b 1 E(a 1 X 1 + + a n X n ) = a 1 E(X 1 ) + + a n E(X n ) For independent random variables, var(a 1 X 1 + b 1 ) = a 2 1var(X 1 ) E(X 1... X n ) = E(X 1 )... E(X n ), and var(x 1 + + X n ) = var(x 1 ) + + var(x n ), var(a 1 X 1 + + a n X n ) = a 2 1var(X 1 ) + + a 2 nvar(x n ). Lecture 1. Introduction and probability review 11 (1 37) Lecture 1. Introduction and probability review 12 (1 37)

Standardised statistics 1. Introduction and probability review 1.8. Standardised statistics 1. Introduction and probability review 1.9. Moment generating functions Moment generating functions Suppose X 1,..., X n are iid with E(X 1 ) = µ and var(x 1 ) = σ 2. Write their sum as S n = n i=1 From preceding slide, E(S n ) = nµ and var(s n ) = nσ 2. Let X n = S n /n be the sample mean. Then E( X n ) = µ and var( X n ) = σ 2 /n. Let Z n = S n nµ σ n Then E(Z n ) = 0 and var(z n ) = 1. Z n is known as a standardised statistic. X i n( X n µ) =. σ The moment generating function for a r.v. X is M X (t) = E(e tx x X etx P(X = x) ) = e tx f X (x)dx provided M exists for t in a neighbourhood of 0. Can use this to obtain moments of X, since E(X n ) = M (n) X (0), i.e. nth derivative of M evaluated at t = 0. Under broad conditions, M X (t) = M Y (t) implies F X = F Y. if X is discrete if X is continuous. Lecture 1. Introduction and probability review 13 (1 37) Lecture 1. Introduction and probability review 14 (1 37) 1. Introduction and probability review 1.9. Moment generating functions Convergence 1. Introduction and probability review 1.10. Convergence Mgf s are useful for proving distributions of sums of r.v. s since, if X 1,..., X n are iid, M Sn (t) = M n X (t). Example: sum of Poissons If X i Poisson(µ), then M Xi (t) = E(e tx ) = e tx e µ µ x /x! = e µ(1 et ) e µet (µe t ) x /x! = e µ(1 et). x=0 And so M Sn (t) = e nµ(1 et), which we immediately recognise as the mgf of a Poisson(nµ) distribution. So sum of iid Poissons is Poisson. x=0 The Weak Law of Large Numbers (WLLN) states that for all ɛ > 0, P ( Xn µ > ɛ ) 0 as n. The Strong Law of Large Numbers (SLLN) says that P ( X n µ ) = 1. The Central Limit Theorem tells us that Z n = S n nµ n( σ Xn µ) = is approximately N(0, 1) for large n. n σ Lecture 1. Introduction and probability review 15 (1 37) Lecture 1. Introduction and probability review 16 (1 37)

Conditioning 1. Introduction and probability review 1.11. Conditioning Conditioning 1. Introduction and probability review 1.12. Conditioning Let X and Y be discrete random variables with joint pmf Then the marginal pmf of Y is p X,Y (x, y)=p(x =x, Y =y). p Y (y) = P(Y =y) = x The conditional pmf of X given Y =y is p X Y (x y) = P(X = x Y = y) = p X,Y (x, y). P(X = x, Y = y) P(Y = y) if p Y (y) 0 (and is defined to be zero if p Y (y)=0)). = p X,Y (x, y), p Y (y) In the continuous case, suppose that X and Y have joint pdf f X,Y (x, y), so that for example P(X x 1, Y y 1 ) = Then the marginal pdf of Y is f Y (y) = y1 The conditional pdf of X given Y = y is x1 f X,Y (x, y)dx. f X Y (x y) = f X,Y (x, y), f Y (y) if f Y (y) 0 (and is defined to be zero if f Y (y)=0). f X,Y (x, y)dxdy. Lecture 1. Introduction and probability review 17 (1 37) Lecture 1. Introduction and probability review 18 (1 37) 1. Introduction and probability review 1.12. Conditioning The conditional expectation of X given Y = y is xfx Y (x y) E(X Y =y) = xfx Y (x y)dx Thus E(X Y =y) is a function of y, and E(X Y ) is a function of Y and hence a r.v.. The conditional expectation formula says Proof [discrete case]: E[X ] = E [E(X Y )]. pmf pdf. E [E(X Y )] = [ ] y x f X Y (x y) f Y (y) = x y f X,Y (x, y) Y X X Y = [ ] x y f Y X (y x) f X (x) = x f X (x). X Y X 1. Introduction and probability review 1.12. Conditioning The conditional variance of X given Y = y is defined by [ (X ) ] 2 var(x Y =y) = E E(X Y =y) Y = y, and this is equal to E(X 2 Y =y) ( E(X Y =y) ) 2. We also have the conditional variance formula: Proof: var(x ) = E[var(X Y )] + var[e(x Y )]. var(x ) = E(X 2 ) [E(X )] 2 = E [ E(X 2 Y ) ] [E [ E(X Y ) ]] 2 [ = E E(X 2 Y ) [ E(X Y ) ] ] [ 2 [E(X ] ] 2 + E Y ) = E [ var(x Y ) ] var [ E(X Y ) ]. [E [ E(X Y ) ]] 2 Lecture 1. Introduction and probability review 19 (1 37) Lecture 1. Introduction and probability review 20 (1 37)

1. Introduction and probability review 1.13. Some important discrete distributions: Binomial Some important discrete distributions: Binomial X has a binomial distribution with parameters n and p (n N, 0 p 1), X Bin(n, p), if ( ) n P(X = x) = p x (1 p) n x, for x {0, 1,..., n} x Example: throwing dice 1. Introduction and probability review 1.13. Some important discrete distributions: Binomial let X = number of sixes when throw 10 fair dice, so X Bin(10, 1 6 ) R code: barplot( dbinom(0:10, 10, 1/6), names.arg=0:10, xlab="number of sixes in 10 throws" ) (zero otherwise). We have E(X ) = np, var(x ) = np(1 p). This is the distribution of the number of successes out of n independent Bernoulli trials, each of which has success probability p. Lecture 1. Introduction and probability review 21 (1 37) Lecture 1. Introduction and probability review 22 (1 37) 1. Introduction and probability review 1.14. Some important discrete distributions: Poisson Some important discrete distributions: Poisson X has a Poisson distribution with parameter µ (µ > 0), X Poisson(µ), if 1. Introduction and probability review 1.14. Some important discrete distributions: Poisson Example: plane crashes. Assume scheduled plane crashes occur as a Poisson process with a rate of 1 every 2 months. How many (X ) will occur in a year (12 months)? Number in two months is Poisson(1), and so X Poisson(6). barplot( dpois(0:15, 6), names.arg=0:15, xlab="number of scheduled plane crashes in a year" ) (zero otherwise). Then E(X ) = µ and var(x ) = µ. P(X = x) = e µ µ x /x!, for x {0, 1, 2,...}, In a Poisson process the number of events X (t) in an interval of length t is Poisson(µt), where µ is the rate per unit time. The Poisson(µ) is the limit of the Bin(n,p) distribution as n, p 0, µ = np. Lecture 1. Introduction and probability review 23 (1 37) Lecture 1. Introduction and probability review 24 (1 37)

1. Introduction and probability review 1.15. Some important discrete distributions: Negative Binomial Some important discrete distributions: Negative Binomial X has a negative binomial distribution with parameters k and p (k N, 0 p 1), if ( ) x 1 P(X = x) = (1 p) x k p k, for x = k, k + 1,..., k 1 1. Introduction and probability review 1.15. Some important discrete distributions: Negative Binomial Example: How many times do I have to flip a coin before I get 10 heads? This is first (X ) definition of the Negative Binomial since it includes all the flips. R uses second definition (Y ) of Negative Binomial, so need to add in the 10 heads: barplot( dnbinom(0:30, 10, 1/2), names.arg=0:30 + 10, xlab="number of flips before 10 heads" ) (zero otherwise). Then E(X ) = k/p, var(x ) = k(1 p)/p 2. This is the distribution of the number of trials up to and including the kth success, in a sequence of independent Bernoulli trials each with success probability p. The negative binomial distribution with k = 1 is called a geometric distribution with parameter p. The r.v Y = X k has ( ) y + k 1 P(Y = y) = (1 p) y p k, for y = 0, 1,.... k 1 This is the distribution of the number of failures before the kth success in a sequence of independent Bernoulli trials each with success probability p. It is also sometimes called the negative binomial distribution: be careful! Lecture 1. Introduction and probability review 25 (1 37) Lecture 1. Introduction and probability review 26 (1 37) 1. Introduction and probability review 1.16. Some important discrete distributions: Multinomial Some important discrete distributions: Multinomial Suppose we have a sequence of n independent trials where at each trial there are k possible outcomes, and that at each trial the probability of outcome i is p i. Let N i be the number of times outcome i occurs in the n trials and consider N 1,..., N k. They are discrete random variables, taking values in {0, 1,..., n}. This multinomial distribution with parameters n and p 1,..., p k, n N, p i 0 for all i and i p i = 1 has joint pmf P(N 1 = n 1,..., N k = n k ) = and is zero otherwise. n! n 1!... n k! pn1 1... pn k k, if i n i = n, The rv s N 1,..., N k are not independent, since i N i = n. The marginal distribution of N i is Binomial(n,p i ). Example: I throw 6 dice: what is the probability that I get one of each face ( 6! 1,2,3,4,5,6? Can calculate to be 1 ) 6 1!...1! 6 = 0.015 dmultinom( x=c(1,1,1,1,1,1), size=6, prob=rep(1/6,6)) Lecture 1. Introduction and probability review 27 (1 37) 1. Introduction and probability review 1.17. Some important continuous distributions: Normal Some important continuous distributions: Normal X has a normal (Gaussian) distribution with mean µ and variance σ 2 (µ R, σ 2 > 0), X N(µ, σ 2 ), if it has pdf ( ) 1 f X (x) = exp (x µ)2 2πσ 2 2σ 2, x R. We have E(X ) = µ, var(x ) = σ 2. If µ = 0 and σ 2 = 1, then X has a standard normal distribution, X N(0, 1). We write φ for the standard normal pdf, and Φ for the standard normal distribution function, so that φ(x) = 1 2π exp ( x 2 /2 ), Φ(x) = x φ(t)dt. The upper 100α% point of the standard normal distribution is z α where P(Z > z α ) = α, where Z N(0, 1). Values of Φ are tabulated in normal tables, as are percentage points z α. Lecture 1. Introduction and probability review 28 (1 37)

1. Introduction and probability review 1.18. Some important continuous distributions: Uniform Some important continuous distributions: Uniform 1. Introduction and probability review 1.19. Some important continuous distributions: Gamma Some important continuous distributions: Gamma X has a Gamma (α, λ) distribution (α > 0, λ > 0) if it has pdf X has a uniform distribution on [a, b], X U[a, b] ( < a < b < ), if it has pdf f X (x) = 1 b a, Then E(X ) = a+b 2 and var(x ) = (b a)2 12. x [a, b]. f X (x) = λα x α 1 e λx, x > 0, Γ(α) where Γ(α) is the gamma function defined by Γ(α) = 0 x α 1 e x dx for α > 0. We have E(X ) = α λ and var(x ) = α λ 2. The moment generating function M X (t) is M X (t) = E ( e Xt) ( ) α λ =, for t < λ. λ t Note the following two results for the gamma function: (i) Γ(α) = (α 1)Γ(α 1), (ii) if n N then Γ(n) = (n 1)!. Lecture 1. Introduction and probability review 29 (1 37) Lecture 1. Introduction and probability review 30 (1 37) 1. Introduction and probability review 1.20. Some important continuous distributions: Exponential Some important continuous distributions: Exponential Some Gamma distributions: a<-c(1, 3, 10); b<-c(1, 3, 0.5) for(i in 1:3){ y= dgamma(x, a[i],b[i]) plot(x,y,...) } 1. Introduction and probability review 1.20. Some important continuous distributions: Exponential X has an exponential distribution with parameter λ (λ > 0) if X Gamma(1, λ), so that X has pdf f X (x) = λe λx, x > 0. Then E(X ) = 1 λ and var(x ) = 1 λ. 2 Note that if X 1,..., X n are iid Exponential(λ) r.v s then n i=1 X i Gamma(n, λ). ( ) λ Proof: mgf of X i is λ t, and so mgf of ( ) n n, i=1 X λ i is λ t which we recognise as the mgf of a Gamma(n, λ). Lecture 1. Introduction and probability review 31 (1 37) Lecture 1. Introduction and probability review 32 (1 37)

1. Introduction and probability review 1.21. Some important continuous distributions: Chi-squared Some important continuous distributions: Chi-squared If Z 1,..., Z k are iid N(0, 1) r.v. s, then X = k distribution on k degrees of freedom, X χ 2 k. i=1 Z i 2 has a chi-squared Since E(Zi 2 ) = 1 and E(Zi 4 ) = 3, we find that E(X ) = k and var(x ) = 2k. Further, the moment generating function of Zi 2 is M Z 2 i (t) = E (e ) Z 2 i t = e z2 t 1 e z2 /2 dz = (1 2t) 1/2 for t < 1/2 2π 1. Introduction and probability review 1.21. Some important continuous distributions: Chi-squared Some chi-squared distributions: k= 1,2,10 : k<-c(1,2,10) for(i in 1:3){ y=dchisq(x, k[i]) plot(x,y,...) } (check), so that the mgf of X = k for t < 1/2. i=1 Z i 2 is M X (t) = (M Z 2(t)) k = (1 2t) k/2 We recognise this as the mgf of a Gamma(k/2, 1/2), so that X has pdf f X (x) = 1 Γ(k/2) ( ) k/2 1 x k/2 1 e x/2, x > 0. 2 Lecture 1. Introduction and probability review 33 (1 37) Lecture 1. Introduction and probability review 34 (1 37) 1. Introduction and probability review 1.21. Some important continuous distributions: Chi-squared 1. Introduction and probability review 1.22. Some important continuous distributions: Beta Some important continuous distributions: Beta Note: 1 We have seen that if X χ 2 k then X Gamma(k/2, 1/2). 2 If Y Gamma(n, λ) then 2λY χ 2 2n (prove via mgf s or density transformation formula). 3 If X χ 2 m, Y χ 2 n and X and Y are independent, then X + Y χ 2 m+n (prove via mgf s). This is called the additive property of χ 2. 4 We denote the upper 100α% point of χ 2 k by χ2 k (α), so that, if X χ2 k then P(X > χ 2 k (α)) = α. These are tabulated. The above connections between gamma and χ 2 means that sometimes we can use χ 2 -tables to find percentage points for gamma distributions. X has a Beta(α, β) distribution (α > 0, β > 0) if it has pdf f X (x) = x α 1 (1 x) β 1, 0 < x < 1, B(α, β) where B(α, β) is the beta function defined by Then E(X ) = α α+β B(α, β) = Γ(α)Γ(β)/Γ(α + β). and var(x ) = αβ The mode is (α 1)/(α + β 2). Note that Beta(1,1) U[0, 1]. (α+β) 2 (α+β+1). Lecture 1. Introduction and probability review 35 (1 37) Lecture 1. Introduction and probability review 36 (1 37)

Some beta distributions : k<-c(1,2,10) for(i in 1:3){ y=dbeta(x, a[i],b[i]) plot(x,y,...) } 1. Introduction and probability review 1.22. Some important continuous distributions: Beta Lecture 1. Introduction and probability review 37 (1 37)