1 Sufficient statistics



Similar documents
1 Prior Probability and Posterior Probability

Maximum Likelihood Estimation

Practice problems for Homework 11 - Point Estimation

Notes on the Negative Binomial Distribution

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Principle of Data Reduction

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.


Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Normal distribution. ) 2 /2σ. 2π σ

Hypothesis Testing for Beginners

Lecture 8. Confidence intervals and the central limit theorem

Section 5.1 Continuous Random Variables: Introduction

Exact Confidence Intervals

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Notes on Continuous Random Variables

INSURANCE RISK THEORY (Problems)

5.1 Identifying the Target Parameter

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Sampling Distributions

STAT 830 Convergence in Distribution

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

5. Continuous Random Variables

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

I. Pointwise convergence

Chapter 4. Probability and Probability Distributions

Chapter 3 RANDOM VARIATE GENERATION

Basics of Statistical Machine Learning

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Stats on the TI 83 and TI 84 Calculator

Determining distribution parameters from quantiles

Unified Lecture # 4 Vectors

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

2.6. Probability. In general the probability density of a random variable satisfies two conditions:

HYPOTHESIS TESTING: POWER OF THE TEST

Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin

6.2 Permutations continued

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Binomial lattice model for stock prices

Beta Distribution. Paul Johnson and Matt Beverlin June 10, 2013

Lecture 6: Discrete & Continuous Probability and Random Variables

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Math 151. Rumbos Spring Solutions to Assignment #22

1 The Brownian bridge construction

2WB05 Simulation Lecture 8: Generating random variables

Standard Deviation Estimator

Chapter 2. Hypothesis testing in one population

An Introduction to Basic Statistics and Probability

Lecture 8: More Continuous Random Variables

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Lecture 13 - Basic Number Theory.

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Efficiency and the Cramér-Rao Inequality

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

General Sampling Methods

Methods for Finding Bases

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

4. How many integers between 2004 and 4002 are perfect squares?

Statistical Machine Learning from Data

Econometrics Simple Linear Regression

Statistics 100A Homework 7 Solutions

WHERE DOES THE 10% CONDITION COME FROM?

15. Symmetric polynomials

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

CHAPTER 2 Estimating Probabilities

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Multivariate normal distribution and testing for means (see MKB Ch 3)

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

4 Sums of Random Variables

Gaussian Conjugate Prior Cheat Sheet

Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients. y + p(t) y + q(t) y = g(t), g(t) 0.

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Monte Carlo-based statistical methods (MASM11/FMS091)

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Permutation Tests for Comparing Two Populations

4. Continuous Random Variables, the Pareto and Normal Distributions

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

Section 6.1 Joint Distribution Functions

Class Meeting # 1: Introduction to PDEs

Lecture 5 : The Poisson Distribution

3.1. Solving linear equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Linear Algebra Notes

Probability Calculator

Linear Programming Notes V Problem Transformations

DERIVATIVES AS MATRICES; CHAIN RULE

x a x 2 (1 + x 2 ) n.

MTAT Cryptology II. Sigma Protocols. Sven Laur University of Tartu

Dongfeng Li. Autumn 2010

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Chapter 5: Normal Probability Distributions - Solutions

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Week 3&4: Z tables and the Sampling Distribution of X

Transcription:

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 = max{x 1, X 2,, X n } T 2 = 5 1 The last statistic is a bit strange it completely igonores the random sample, but it is still a statistic. We say a statistic T is an estimator of a population parameter if T is usually close to θ. The sample mean is an estimator for the population mean; the sample variance is an estimator for the population variation. Obviously, there are lots of functions of X 1, X 2,, X n and so lots of statistics. When we look for a good estimator, do we really need to consider all of them, or is there a much smaller set of statistics we could consider? Another way to ask the question is if there are a few key functions of the random sample which will by themselves contain all the information the sample does. For example, suppose we know the sample mean and the sample variance. Does the random sample contain any more information about the population than this? We should emphasize that we are always assuming our population is described by a given family of distributions normal, binomial, gamma or... with one or several unknown parameters. The answer to the above question will depend on what family of distributions we assume describes the population. We start with a heuristic definition of a sufficient statistic. We say T is a sufficient statistic if the statistician who knows the value of T can do just as good a job of estimating the unknown parameter θ as the statistician who knows the entire random sample. The mathematical definition is as follows. A statistic T = rx 1, X 2,, X n is a sufficient statistic if for each t, the conditional distribution of X 1, X 2,, X n given T = t and θ does not depend on θ. 1

To motivate the mathematical definition, we consider the following experiment. Let T = rx 1,, X n be a sufficient statistic. There are two statisticians; we will call them A and B. Statistician A knows the entire random sample X 1,, X n, but statistician B only knows the value of T, call it t. Since the conditional distribution of X 1,, X n given θ and T does not depend on θ, statistician B knows this conditional distribution. So he can use his computer to generate a random sample X 1,, X n which has this conditional distribution. But then his random sample has the same distribution as a random sample drawn from the population with its unknown value of θ. So statistician B can use his random sample X 1,, X n to compute whatever statistician A computes using his random sample X 1,, X n, and he will on average do as well as statistician A. Thus the mathematical definition of sufficient statistic implies the heuristic definition. It is difficult to use the definition to check if a statistic is sufficient or to find a sufficient statistic. Luckily, there is a theorem that makes it easy to find sufficient statistics. Theorem 1. Factorization theorem Let X 1, X 2,, X n be a random sample with joint density fx 1, x 2,, x n θ. A statistic T = rx 1, X 2,, X n is sufficient if and only if the joint density can be factored as follows: fx 1, x 2,, x n θ = ux 1, x 2,, x n vrx 1, x 2,, x n, θ 2 where u and v are non-negative functions. The function u can depend on the full random sample x 1,, x n, but not on the unknown parameter θ. The function v can depend on θ, but can depend on the random sample only through the value of rx 1,, x n. It is easy to see that if ft is a one to one function and T is a sufficient statistic, then ft is a sufficient statistic. In particular we can multiply a sufficient statistic by a nonzero constant and get another sufficient statistic. We now apply the theorem to some examples. Example normal population, unknown mean, known variance We consider a normal population for which the mean µ is unknown, but the the variance σ 2 is known. The joint density is fx 1,, x n µ = 2π n/2 σ n 1 exp x 2σ 2 i µ 2 2

= 2π n/2 σ n exp 1 2σ 2 x 2 i + µ σ 2 x i nµ2 2σ 2 Since σ 2 is known, we can let and where ux 1,, x n = 2π n/2 σ n exp 1 2σ 2 vrx 1, x 2,, x n, µ = exp nµ2 2σ + µ 2 σ rx 1, x 2 2,, x n rx 1, x 2,, x n = By the factorization theorem this shows that n X i is a sufficient statistic. It follows that the sample mean X n is also a sufficient statistic. Example Uniform population Now suppose the X i are uniformly distributed on [0, θ] where θ is unknown. Then the joint density is x i fx 1,, x n θ = θ n 1x i θ, i = 1, 2,, n Here 1E is an indicator function. It is 1 if the event E holds, 0 if it does not. Now x i θ for i = 1, 2,, n if and only if max{x 1, x 2,, x n } θ. So we have fx 1,, x n θ = θ n 1max{x 1, x 2,, x n } θ By the factorization theorem this shows that T = max{x 1, X 2,, X n } is a sufficient statistic. What about the sample mean? Is it a sufficient statistic in this example? By the factorization theorem it is a sufficient statistic only if we can write 1max{x 1, x 2,, x n } θ as a function of just the sample mean and θ. This is impossible, so the sample mean is not a sufficient statistic in this setting. 3 x 2 i

Example Gamma population, α unknown, β known Now suppose the population has a gamma distribution and we know β but α is unknown. Then the joint density is n fx 1,, x n α = βnα x α 1 Γα n i exp β x i We can write n x α 1 i = exp α 1 lnx i By the factorization theorem this shows that T = lnx i is a sufficient statistic. Note that expt = n X i is also a sufficient statistic. But the sample mean is not a sufficient statistic. Now we reconsider the example of a normal population, but suppose that both µ and σ 2 are both unknown. Then the sample mean is not a sufficient statistic. In this case we need to use more than one statistic to get sufficiency. The definition both heuristic and mathematical of sufficiency extends to several statistics in a natural way. We consider k statistics T i = r i X 1, X 2,, X n, i = 1, 2,, k 3 We say T 1, T 2,, T k are jointly sufficient statistics if the statistician who knows the values of T 1, T 2,, T k can do just as good a job of estimating the unknown parameter θ as the statistician who knows the entire random sample. In this setting θ typically represents several parameters and the number of statistics, k, is equal to the number of unknown parameters. The mathematical definition is as follows. The statistics T 1, T 2,, T k are jointly sufficient if for each t 1, t 2,, t k, the conditional distribution of X 1, X 2,, X n given T i = t i for i = 1, 2,, n and θ does not depend on θ. Again, we don t have to work with this definition because we have the following theorem: 4

Theorem 2. Factorization theorem Let X 1, X 2,, X n be a random sample with joint density fx 1, x 2,, x n θ. The statistics T i = r i X 1, X 2,, X n, i = 1, 2,, k 4 are jointly sufficient if and only if the joint density can be factored as follows: fx 1, x 2,, x n θ = ux 1, x 2,, x n vr 1 x 1,, x n, r 2 x 1,, x n,, r k x 1,, x n, θ where u and v are non-negative functions. The function u can depend on the full random sample x 1,, x n, but not on the unknown parameters θ. The function v can depend on θ, but can depend on the random sample only through the values of r i x 1,, x n, i = 1, 2,, k. Recall that for a single statistic we can apply a one to one function to the statistic and get another sufficient statistic. This generalizes as follows. Let gt 1, t 2,, t k be a function whose values are in R k and which is one to one. Let g i t 1,, t k, i = 1, 2,, k be the component functions of g. Then if T 1, T 2,, T k are jointly sufficient, then g i T 1, T 2,, T k are jointly sufficient. We now revisit two examples. First consider a normal population with unknown mean and variance. Our previous equations show that T 1 = X i, T 2 = X 2 i are jointly sufficient statistics. Another set of jointly sufficent statistics is the sample mean and sample variance. What is gt 1, t 2? Now consider a population with the gamma distribution with both α and β unknown. Then T 1 = X i, T 2 = lnx i are jointly sufficient statistics. 5

2 Exercises 1. Gamma, both parameters unknown, show sum and product form a sufficient 2. Uniform on [θ 1, θ 2 ]. Find two statistics that are sufficient. 3. Poisson or geometric - find sufficient statistic. 4. Beta? 6