The Variance of Sample Variance for a Finite Population



Similar documents
15. Symmetric polynomials

Lecture 8. Confidence intervals and the central limit theorem

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Sample Induction Proofs

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

1 Sufficient statistics

Integer roots of quadratic and cubic polynomials with integer coefficients

9.2 Summation Notation

BINOMIAL OPTIONS PRICING MODEL. Mark Ioffe. Abstract

Indiana State Core Curriculum Standards updated 2009 Algebra I

Equations, Inequalities & Partial Fractions

Continued Fractions and the Euclidean Algorithm

The Determinant: a Means to Calculate Volume

The four [10,5,4] binary codes

Ira Fine and Thomas J. Osler Department of Mathematics Rowan University Glassboro, NJ Introduction

Solving simultaneous equations using the inverse matrix

Row Echelon Form and Reduced Row Echelon Form

is identically equal to x 2 +3x +2

G(s) = Y (s)/u(s) In this representation, the output is always the Transfer function times the input. Y (s) = G(s)U(s).

WHERE DOES THE 10% CONDITION COME FROM?

3. Mathematical Induction

SECTION 10-2 Mathematical Induction

Math 115 Spring 2011 Written Homework 5 Solutions

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

5.5. Solving linear systems by the elimination method

arxiv:math/ v1 [math.st] 28 Dec 2001

Elements of probability theory

5.1 Identifying the Target Parameter

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

Introduction to Engineering Analysis - ENGR1100 Course Description and Syllabus Monday / Thursday Sections. Fall '15.

VERTICES OF GIVEN DEGREE IN SERIES-PARALLEL GRAPHS

Notes on Determinant

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Fairfield Public Schools

Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.

8.2. Solution by Inverse Matrix Method. Introduction. Prerequisites. Learning Outcomes

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

5 Numerical Differentiation

171:290 Model Selection Lecture II: The Akaike Information Criterion

Summation Algebra. x i

arxiv: v2 [math.ho] 4 Nov 2009

3.6. Partial Fractions. Introduction. Prerequisites. Learning Outcomes

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

PYTHAGOREAN TRIPLES KEITH CONRAD

MATHEMATICAL METHODS OF STATISTICS

k, then n = p2α 1 1 pα k

Hook Lengths and Contents. Richard P. Stanley M.I.T.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Lecture Notes on Polynomials

1 Lecture: Integration of rational functions by decomposition

Some probability and statistics

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Homework 11. Part 1. Name: Score: / null

Unified Lecture # 4 Vectors

Module 3: Correlation and Covariance

Statistics Graduate Courses

Sequences. A sequence is a list of numbers, or a pattern, which obeys a rule.

3.1. Solving linear equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Basic Probability Concepts

SOLVING QUADRATIC EQUATIONS OVER POLYNOMIAL RINGS OF CHARACTERISTIC TWO

Mathematical Induction

arxiv: v1 [math.co] 7 Mar 2012

Generalized modified linear systematic sampling scheme for finite populations

MATHEMATICS BONUS FILES for faculty and students

LOGNORMAL MODEL FOR STOCK PRICES

Random variables, probability distributions, binomial random variable

The Prime Numbers. Definition. A prime number is a positive integer with exactly two positive divisors.

Section 1.1 Linear Equations: Slope and Equations of Lines

Introduction. Appendix D Mathematical Induction D1

Core Maths C1. Revision Notes

Number Patterns, Cautionary Tales and Finite Differences

with functions, expressions and equations which follow in units 3 and 4.

THREE DIMENSIONAL GEOMETRY

Chapter 20. Vector Spaces and Bases

SUM OF TWO SQUARES JAHNAVI BHASKAR

Discrete Mathematics: Homework 7 solution. Due:

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Introduction to Regression and Data Analysis

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS

Quadratic forms Cochran s theorem, degrees of freedom, and all that

MULTIPLE REGRESSION WITH CATEGORICAL DATA

THE DYING FIBONACCI TREE. 1. Introduction. Consider a tree with two types of nodes, say A and B, and the following properties:

The Characteristic Polynomial

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

MATHEMATICAL INDUCTION. Mathematical Induction. This is a powerful method to prove properties of positive integers.

Standard Deviation Estimator

Regression III: Advanced Methods

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

Transcription:

The Variance of Sample Variance for a Finite Population Eungchun Cho November 11, 004 Key Words: variance of variance, variance estimator, sampling variance, randomization variance, moments Abstract The variance of variance of sample from a finite population is given in terms of the second and the fourth moments of the population. Examples of the special case when the population is of uniform distribution is given. 1 Introduction If the population variance is estimated by a sample variance, it is important to know a relationship between the variance of the population and the variance of the variance of all the samples of a certain fixed size. Though the sample variance is an unbiased estimator of the population variance, it is problematic if the variance of the variance estimator is too large. For an arbitrary finite population one establishes the unbiasedness of the sample mean x for the population mean; evaluates the randomization variance of the sample mean. One subsequently uses similar developments for related randomized designs, e.g., stratified random sampling and some forms of cluster sampling. In applications of this material to practical problems, it is often important to evaluate the variance of the variance estimator. For example, some cluster sample designs may be considered problematic if the resulting variance estimator is unstable, i.e., has an unreasonably large variance. Historically, introductory sampling textbooks have addressed this issue in a relatively limited form, either through a brief reference to an elaborate algebra of polykays developed by Tukey [4, pages 37-54] and Wishart [8, pages 1-13] or through a direct appeal to large-sample approximations based on the normal and chi-square distributions. See, e.g., Cochran [1, pages. 9, 96] for examples of these two approaches. In addition, the statistical literature has developed computational tools to implement the above mentioned polykay results. Kentucky State University, 400 E. Main Street, Frankfort, KY 40601, USA, eccho@gwmail.kysu.edu 3345

We present a relatively simple, direct derivation of the variance of variance, that is, the variance of the variance of samples from a finite population. We provide results for the important special case when the population is of uniformly distribution. The Variance of the Sample Variance Routine arguments (e.g., Cochran [1] [Theorems.1,. and.4]) show E{a(S)} = A (1) E{v(S)} = V (A) () V {a(s)} = 1 n N V (A) (3) n where a(s) is the mean of the sample S, v(s) the variance of the sample S, A the mean of A, andv (A) the full finite-population analogue of the sample variance v(s). That is, a(s) = 1 n v(s) = A = 1 N V (A) = a i (4) a i S 1 n 1 (a i a(s)) (5) a i S N a i (6) 1 N 1 N ( ai A ) The equation () shows that v(s) is an unbiased estimator of V (A). The principal task in this paper is to obtain a relatively simple expression for the variance of the variance estimator v(s), V {v(s)} = ( 1 N {v(s) V (A)} n) (8) S L n,a in terms of a i s in the underlying population. The formula will be useful for estimating the variance of the variance estimator v(s) when the straight forward computation by the definition is impractical due the combinatorial explosion when N is not very small. 3 The Main Thheorem and Formua A relationship between the variance of a given population A of size N and the variance of the variance of the samples (of size n )ofa is given. The formula (7) 3346

for the variance of the variance estimator is given in terms of the fourth and the second moments of the population, the size of the population, and the sample size. Recall µ 4 and µ, the fourth and the second moments of the population A. µ 4 = 1 N µ = 1 N N N x i 4 x i (9) (10) Since the variance is invariant under a shifting by a constant, we will assume the mean of the population is zeo, which simplifies our formula a lot. Theorem 1 Let A be a population of size N 4, which is represented as a list of N numbers, A =[a 1,a,...,a N ]. Assume the mean of A be 0. Consider the list of all possible samples of n ( n N 1 ) numbers selected without replacement from A, L n,a =[S 1,S,...,S α ] where S i A, S i = n for all i, and α = N!/n!(N n)! (the number of sublists of size n). Let V n =[v(s 1 ),v(s ),..., v(s α )] be ( the list of the sample variance v(s i ) for each S i L n,a. Then V (v(s)) = E [v(s) E{v(S)}] ), the variance of the variance of estimator v(s) taken over all S in V n, is a linear combination V (v(s)) = a 1 µ 4 + a µ (11) where N (N n)(nn N n 1) a 1 = (1) (N 3) (N ) (N 1) (n 1) n a = N (N n) ( N n 3n 3N +6N 3 ) (N 3) (N ) (N 1) (13) (n 1) n Remark 1 Following are alternate forms of the formual N (N n) ( V (v(s)) = (N 3) (N ) (N 1) b1 µ 4 + b µ ) (14) (n 1) n where Note b 1 and b canbewrittenas b 1 = (N 1) (Nn N n 1) (15) b = N ( N n 3n 3N +6N 3 ) (16) b 1 = (N 1) ((N 1) (n 1) ) (17) b = N (n ( N 3 ) 3(N 1) ) = N ( N (n 3) + 3 (N n)+3(n 1) ) (18) 3347

V (v(s)) = N (N n) (N 3) (N ) (N 1) (n 1) n ((N 1) (Nn N n 1) µ 4 ( N ( N n 3n 3N +6N 3 )) µ ) (19) A preliminary formula in terms of all the possible fourth order products of the elements in the population will be derived in the lemma, which will be simplified into a form in terms of the second and the fourth moments of the population. Lemma 1 The variance V (v(s)) of the variance of the samples can be represented as a linear combination of the sums of fourth degree terms; V (v(s)) = C 1 a 4 i + C a 3 i a j + C 3 i +C 4 i j,i k j<k where i, j, k, l runs from 1 to N and i j a i a j a k + C 5 i<j i<j<k<l a i a j a i a j a k a l (0) C 1 = N n N n C = 4(N n) N (N 1) n C 3 = (N n)(nn 3 N 3 n +3) N (N 1) n (n 1) C 4 = 8(N n)(nn 3 N 3 n +3) N (N 1) (N ) n (n 1) C 5 = 48 (N n)(nn 3 N 3 n +3) N (N 1) (N ) (N 3) n (n 1) (1) () (3) (4) (5) Remark Note the coefficient c i s have the common factor c 1 =(N n) /N n. This makes it obvious, as expected, that V (v (S)) = 0 when N = n. Sketch of the Proof of the Theorem Let s d represent the sum N d i a i and s 31 = N i j a3 i a j, s = N i<j a i a j, s 11 = N i j,i k a i a ja k,ands 1111 = j<k N i<j<k<l a ia j a k a l.then s 31 = N s 3 s 1 Ns 4 (6) s = 1 N s 1 Ns 4 (7) s 11 = 1 N 3 s s 1 N s 3 s 1 + Ns 4 1 N s s 1111 = 1 4 N 4 s 1 4 1 4 Ns 4 + 1 3 N s 3 s 1 + 1 8 N s 1 4 N 3 s s 1 (8) (9) 3348

Now the double, triple and quadruple summations in the formula given by fthe lemma are replaced by simpler summations using these relations and our simplifying assumption, µ = 0, hence s 1 = 0. Collecting the similar terms after the cancellation of various intermediate terms, we get the formula in terms of µ 4 and µ only. The formula in the lemma follows from the definition of the variance of the sample variances. The proof of the formula involves determining the coefficients of all the fourth degree terms a 4 i, a i a j, a i a j a k, a 3 i a j,anda i a j a k a l that appears in the summation. The summations in the formula are such that all like terms are combined thus appear only once. For example, a i a j a k a l appears only for the indices arranged in increasing order i<j<k<l. Example. Consider the first, second and fourth moments of the population A of size N that is uniformly distributed on [-1/,1/] A =[ 1, 1 + h, 1 +h,...,1 ] where h =1/(N 1). Obviously, the first moment µ = 0. The sencond and fourth moments are also easily computed from the mement generating function. µ = 1 N 1 1 N 1 3 N 4 +7 10 N µ 4 = 40 N 4 Substituting µ and µ 4 into the equation 11, we get the variance of the variance of the sample of size n V (v(s)) = 1 ( 1 N 4 +7nN 4 +5N 3 n 9 N 3 0 N n +1N 15 Nn+1N +3n +3 ) (N n) 360 (n 1) n (N 3) (N ) N 3 When N is large, and µ 1 1 µ 4 1 80 V (v(s)) 1 1 + 7 n 360 (n 1) n References [1] W. G. Cochran, Sampling Techniques (3rd ed.), John Wiley, 1977,pages 9,96. 3349

[] R. L. Graham, D. E. Knuth and O. Patashnik, Concrete Mathematics, Addison-Wesley, 1989. [3] J. W. Tukey, Some sampling simplified, Journal of the American Statistical Association, 45 (1950), 501-519. [4] J. W. Tukey, Keeping moment-like sampling computation simple, The Annals of Mathematical Statistics, 7 (1956), 37-54. [5] J. W. Tukey, Variances of variance components: I. Balanced designs. The Annals of Mathematical Statistics, 7 (1956), 7-736. [6] J. W. Tukey, Variances of variance components: II. Unbalanced single classifications. The Annals of Mathematical Statistics, 8 (1957), 43-56. [7] J. W. Tukey, Variance components: III. The third moment in a balanced single classification. The Annals of Mathematical Statistics, 8 (1957), 378-384. [8] J. Wishart, Moment Coefficients of the k-statistics in Samples from a Finite Population. Biometrika, 39 (195), 1-13. 3350