Unequal Probability Inverse Adaptive Cluster Sampling



Similar documents
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

The Characteristic Polynomial

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Chapter 6: Point Estimation. Fall Probability & Statistics

Comparing Alternate Designs For A Multi-Domain Cluster Sample

WHERE DOES THE 10% CONDITION COME FROM?

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

GEORG-AUGUST UNIVERSITÄT GÖTTINGEN

Basic Probability Concepts

NOTES ON LINEAR TRANSFORMATIONS

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

6.3 Conditional Probability and Independence

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

2. Linear regression with multiple regressors

1 Prior Probability and Posterior Probability

The Sample Overlap Problem for Systematic Sampling

Chapter 1: The Nature of Probability and Statistics

E3: PROBABILITY AND STATISTICS lecture notes

Descriptive Methods Ch. 6 and 7

Markov random fields and Gibbs measures

Estimation and Confidence Intervals

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters:

Simple Random Sampling

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Chapter 3. Distribution Problems. 3.1 The idea of a distribution The twenty-fold way

Statistics 522: Sampling and Survey Techniques. Topic 5. Consider sampling children in an elementary school.

Introduction to Matrix Algebra

Least Squares Estimation

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Properties of Stabilizing Computations

Visualization of Complex Survey Data: Regression Diagnostics

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Polarization codes and the rate of polarization

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Elementary Statistics

An Introduction to Basic Statistics and Probability

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Binomial Sampling and the Binomial Distribution

Testing Surface Disinfectants

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

Social Studies 201 Notes for November 19, 2003

2 When is a 2-Digit Number the Sum of the Squares of its Digits?

Mining Social-Network Graphs

Optimization of sampling strata with the SamplingStrata package

AP STATISTICS REVIEW (YMS Chapters 1-8)

CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Fairfield Public Schools

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

PLANNING PROBLEMS OF A GAMBLING-HOUSE WITH APPLICATION TO INSURANCE BUSINESS. Stockholm

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Quantitative Methods for Finance

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

1 Short Introduction to Time Series

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

Probability Distributions

Generalized modified linear systematic sampling scheme for finite populations

Continued Fractions and the Euclidean Algorithm

Algebra 1 Course Title

Supporting Online Material for

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

9.2 Summation Notation

Characterization Of Polynomials Using Reflection Coefficients

Geometric Stratification of Accounting Data

Survey Data Analysis in Stata

Approximability of Two-Machine No-Wait Flowshop Scheduling with Availability Constraints

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

UNDERSTANDING THE TWO-WAY ANOVA

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Social Media Mining. Data Mining Essentials

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Tenacity and rupture degree of permutation graphs of complete bipartite graphs

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

430 Statistics and Financial Mathematics for Business

Every Positive Integer is the Sum of Four Squares! (and other exciting problems)

Algebra 1 Course Information

On the k-path cover problem for cacti

Calculating VaR. Capital Market Risk Advisors CMRA

New SAS Procedures for Analysis of Sample Survey Data

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Multiple Imputation for Missing Data: A Cautionary Tale

Computationally Complete Spiking Neural P Systems Without Delay: Two Types of Neurons Are Enough

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

ON SOME ANALOGUE OF THE GENERALIZED ALLOCATION SCHEME

Friday, January 29, :15 a.m. to 12:15 p.m., only

D-optimal plans in observational studies

Degree Hypergroupoids Associated with Hypergraphs

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

9. Sampling Distributions

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

The primary goal of this thesis was to understand how the spatial dependence of

Graphing Calculator Workshops

WHEN DOES A CROSS PRODUCT ON R n EXIST?

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

Transcription:

736 Chiang Mai J Sci 2013; 40(4) Chiang Mai J Sci 2013; 40(4) : 736-742 http://itsciencecmuacth/ejournal/ Contributed Paper Unequal Probability Inverse Adaptive Cluster Sampling Prayad Sangngam* Department of Statistics, Faculty of Science, Silpakorn University, Nakorn Pathom, Thail *Author for correspondence; e-mail: prayad@suacth Received: 25 August 2011 Accepted: 8 August 2013 ABSTRACT This paper incorporates an unequal probability inverse sampling with adaptive cluster sampling Unbiased estimators of the population total its variance are given The proposed sampling design is compared with the inverse adaptive cluster sampling design using simulation study The results indicate that the proposed sampling design can be more efficient than inverse adaptive cluster sampling design In particular, when an auxiliary variable is highly correlated to the study variable, the estimator under the proposed sampling design produces large efficiency Keywords: unequal probability sampling, adaptive sampling, inverse sampling, unbiased estimator 1 INTRODUCTION A rare population is a population in which only a few units exhibit the characteristics of interest A problem encountered from a fixed sample size sampling for the population is that the sample might yield a zero estimate for the population mean or total In some surveys, population units can be partitioned into two classes Assume that the class in which a unit belongs is not known until the unit is observed Inverse sampling is an efficient sampling design for estimating the parameters of these populations In inverse sampling, units are drawn until the fixed number of units with characteristics of interest is obtained Usually, the purpose of the sample survey is to estimate the proportion of units of interest or to estimate the parameters of the whole population Haldane [1] considered an inverse sampling with equal probability with replacement An unbiased estimator of the proportion of units of interest its variance were derived However, an unbiased estimator of the variance was not given Finney [2] proposed an unbiased estimator of the variance Chistman Lan [3] considered inverse sampling with without replacement when the selection of units is with equal probability in each draw An unbiased estimator of the population total its variance were provided but an unbiased estimator of the variance was not given Salehi Seber [4] proved that Murthy s estimator can be applied to sequential sampling design Using this approach, they obtained an unbiased estimator of the

Chiang Mai J Sci 2013; 40(4) 737 variance for the estimator given by [3] Greco Naddeo [5] considered an unequal probability inverse sampling with replacement They provided an unbiased estimator of the population total the corresponding unbiased variance estimator One way for sampling a rare population is by adaptive cluster sampling Thompson [6] proposed adaptive cluster sampling design which has been shown that to be useful application for rare clustered population However, an initial sample for adaptive cluster sampling is commonly drawn by a classical fixed sample size design so that a final sample may not consist of a unit of interest Chistman Lan [3] considered inverse adaptive cluster sampling when the initial sample is selected by inverse sampling with equal probability Salehi Seber [7] proposed general inverse adaptive cluster sampling Commonly, if the probability of selection units is highly correlated to the study variable, unequal probability sampling design can be higher efficient than equal probability sampling design This paper combines the unequal probability inverse sampling with adaptive cluster sampling The parameter to be estimated is the population total An unbiased estimator of the parameter an unbiased variance estimator are derived A comparison of the proposed sampling design to the inverse adaptive cluster sampling design is performed using simulation study 2 PROPOSED SAMPLING A finite population consists of N distinct units labeled 1,2,,N with their associated study values y 1, y 2,,y N Let y i be an initial selection probability of the i-th unit The parameter of interest is the population total, Assume that population units are divided into two classes according to whether the study values satisfy a condition A common form of the condition is {y : y>c} where c is a given constant The class of units in which study values satisfy the condition is defined to be the class C The notation C is the class of the remaining units Each unit in the population is defined to have a neighborhood, a set of other units associated with that unit In the proposed sampling, an initial sample is drawn by unequal probability inverse sampling with replacement The units are selected with unequal probability (z i ) with replacement until the initial sample consists of m units from the class C For the sample units in class C, their neighborhoods are added to be sampled observed The procedure continues until no more units in the class C are found Since the sampling begins with unequal probability inverse sampling incorporates to adaptive cluster sampling, this sampling scheme is called unequal probability inverse adaptive cluster sampling The final sample consists of the initial sample all adaptively sample units The initial sample s can be partitioned into two parts: parts s C s C are the set of sample units from the class C C, respectively The set of units that is adaptively sampled as a result of the unit i-th being sampled that is also the member of class C is called the network to which the i-th unit belongs The units that are adaptively sampled but are in the class C are called edge units By this way, if any unit in the i-th network is selected in the initial sample,

738 Chiang Mai J Sci 2013; 40(4) all units in the network are sampled From definition of network, the population can be divided into K mutually exclusive networks 3 PARAMETER ESTIMATION Let n 0 denote the initial sample size n be the final sample size So the initial sample consists of m units from the class C n 0 m units from class C Let k denote the set of units in the k-th network m k denote a number of units in the network The total value of the study variable in the network k is the probability of selection of that network is The parameter to be estimated can be written as Since the probability of any edge unit is included in the final sample is not known, some estimators included edge units will be biased [6] So the edge units in the final sample are excluded from the estimation stage The proposed unbiased estimator for the population total uses the sample units in the class C only when they are drawn to be the initial sample The estimator is formed by modifying the unbiased estimator given by Greco Naddeo [5] Theorem 1 Under the proposed sampling design, an unbiased estimator of the population total is ^, (1) network, given by w i = total is Note that ^ The population The expectation of ^ is The notation E 2 refers to the conditional expectation given the initial sample size n 0 E 1 is the unconditional expectation taken under all possible initial samples Under the initial sample, Greco Naddeo [5] showed that for given the initial sample size n 0, the selection with the inverse sampling is exactly the same as the selection procedure with stratified sampling The sample from two strata are independent the selection probability for the i-th unit in the classes C C are, respectively We have where, We obtain that Proof: Let w i represent the new value of a study variable of the i-th unit in the k-th

Chiang Mai J Sci 2013; 40(4) 739 Theorem 2 The variance of the estimator ^ for the parameter is (2) where In addition, the result, so we obtain Theorem 3 An unbiased estimator of the variance V( ^ ) is the variance of P^ V(P^ ) is for the parameter z C (3) Proof: The variance of ^ is equal to where,, The symbols E 2 V 2 denote the conditional expectation variance given the initial sample size n 0, respectively The notations E 1 V 1 represent the unconditional expectation variance taking over n 0, respectively When the initial sample size n 0 is given, we have Proof: Consider Therefore,

740 Chiang Mai J Sci 2013; 40(4) Conditioning on the initial sample size, we have Moreover, We obtain that 4 SIMULATION STUDY The ring-necked ducks data given by Smith et al [8] was used as the study population The population consists of N=200 units The number of ring-necked ducks is used as the study variable (y) Auxiliary variables (x s) correlated to the study variable are created with the coefficients of correlation equal to 01, 02, 05, 06, 08 09 For unequal probability inverse adaptive cluster sampling, the population units are selected by probabilities proportional to the auxiliary variable Simulations of sampling from the population were carried out to study the properties of the unequal probability inverse adaptive cluster sampling (UIACS) compared to the inverse adaptive cluster sampling (IACS) given by Christman Lan [3] We chose the condition {y:y>0} for dividing the units into class C or C The numbers of initial sample units satisfying the condition m in the initial sample are 2, 4, 6 8 The neighborhood of a unit is defined as the set of the four adjacent units The simulation consists of 50,000 replications The population total ( ) was estimated for each sample In each sampling design, the values of the estimates ( ^ ), the final sample size (n) were averaged The averages were interpreted as expected values, ie, Note that the variance of the estimator will be small when the value of y i* /z i * is closed to C for units in class C value of y i* /z i * is closed to C for units in class C If the selection probabilities are equal for every unit, the proposed sampling design is the inverse adaptive cluster sampling given by Christman Lan [3] In addition, the unbiased estimator of the parameter is equivalent to the expression given by [3] The estimate of the variance is where The estimate of stard error is equal to the squared root of the estimated variance,

Chiang Mai J Sci 2013; 40(4) 741 In the Table 1, the averaged values of the estimates are close to the true value of the population total ( =23,333) This result complies with Theory 1 In Table 2, the results indicate that the averaged values of the final sample size under UIACS are smaller than the values under IACS their values increase as m increase Under UIACS with given the number m, when the coefficients of correlation increase, the averaged values of the final sample sizes decrease Table 3 shows that the unequal probability inverse adaptive cluster sampling design outperforms the inverse adaptive cluster sampling design With given the number m, the estimates of stard errors of the estimators under UIACS decrease when the coefficients of correlation increase Table 1 The averages of estimates with vary numbers of initial sample units in class C m IACS UIACS = 01 = 02 = 05 = 06 = 08 = 09 2 23,39631 23,44459 23,51631 23,61257 23,66167 23,35422 23,42758 4 23,49137 23,23950 23,22215 23,09235 23,20613 23,22056 23,27166 6 23,44978 23,28025 23,40485 23,33390 23,25543 23,17689 23,22547 8 23,29932 23,19447 23,24325 23,21396 23,24299 23,29070 23,33418 Table 2 The averages of final sample sizes with vary numbers of initial sample units in class m IACS UIACS = 01 = 02 = 05 = 06 = 08 = 09 2 3547 3527 3498 3400 3362 3217 3090 4 7088 7052 7001 6817 6740 6467 6202 6 10644 10551 10473 10181 10070 9672 9258 8 14203 14086 13982 13606 13462 12909 12362 Table 3 The estimates of stard error of the estimators with vary numbers of initial sample units in class C m IACS UIACS = 01 = 02 = 05 = 06 = 08 = 09 2 72,37944 66,08123 61,37073 52,64163 50,49104 42,28685 37,43459 4 38,30430 35,45889 33,61988 28,27731 26,95717 23,03724 20,87693 6 29,41298 26,78938 25,78874 21,78725 20,54986 17,65388 15,66958 8 24,37467 22,48454 21,41435 18,24797 17,27490 14,87170 13,35546

742 Chiang Mai J Sci 2013; 40(4) 5 CONCLUSION An adaptive cluster sampling is an efficient sampling design for rare clustered population However, an initial sample in adaptive cluster sampling is commonly selected by fixed sample size design This paper proposed an unequal probability inverse sampling to draw the initial sample from the population The neighborhoods of sample units in the class of interest are added as in the adaptive cluster sampling An unbiased estimator of the population total an unbiased estimate of its variance are given The simulation study showed that the efficiency of the proposed sampling design depends on the coefficient of correlation between the study variable the auxiliary variable When the auxiliary variable is highly correlated with the study variable, the unequal probability inverse adaptive cluster sampling design is more efficient than the inverse adaptive cluster sampling design However, when the auxiliary variable is not appropriate for the study variable, an estimator of parameter by using the unequal probability inverse adaptive cluster sampling design may not always increase the efficiency ACKNOWLEDGEMENTS This work was supported by the Faculty of Science, Silpakorn University, Nakorn Pathom, Thail I am grateful to referees associate editors for their comments REFERENCES [1] Haldane JBS, On a method of estimating frequencies, Biometrika, 1945; 33: 222 225 [2] Finney DJ, On a method of estimating frequencies, Biometrika, 1949; 36(1): 223 234 [3] Chistman MC Lan F, Inverse adaptive cluster sampling, Biometrics, 2001; 57: 1096-1105 [4] Salehi MM Seber GAF, A new proof of Murthy s estimator with applies to sequential sampling, Aus NZ Stat, 2001; 43(3): 281-286 [5] Greco L Naddeo N, Inverse sampling with unequal selection probabilities, Communications in Statistics : Theory Methods, 2007; 36(5): 1039-1048 [6] Thompson SK, Adaptive cluster sampling, J Am Stat Assoc, 1990; 85: 1050-1059 [7] Salehi MM Seber GAF, A general inverse sampling scheme its application to adaptive cluster sampling, Aus NZ J Stat, 2004; 46(3): 483-494 [8] Smith DR, Conroy MJ Bralhage DH, Efficiency of adaptive cluster sampling for estimating density of wintering waterfowl, Biometrics, 1995; 51: 777-788