Steps in applying Probability Proportional to Size (PPS) and calculating Basic Probability Weights



Similar documents
SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES

Statistical & Technical Team

Chapter 8: Quantitative Sampling

8 9 +

Northumberland Knowledge

An Introduction to Basic Statistics and Probability

Chapter 7 Sampling (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Descriptive Methods Ch. 6 and 7

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction.

6.2 Normal distribution. Standard Normal Distribution:

Data Collection and Sampling OPRE 6301

Why Sample? Why not study everyone? Debate about Census vs. sampling

Optimization of sampling strata with the SamplingStrata package

STATISTICAL DATA ANALYSIS

Elementary Statistics

AP STATISTICS REVIEW (YMS Chapters 1-8)

Pr(X = x) = f(x) = λe λx

AP STATISTICS 2010 SCORING GUIDELINES

Normal distribution. ) 2 /2σ. 2π σ

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Need for Sampling. Very large populations Destructive testing Continuous production process

Annex 6 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION. (Version 01.

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Sampling and Sampling Distributions

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

New SAS Procedures for Analysis of Sample Survey Data

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Sampling. COUN 695 Experimental Design

SAMPLING METHODS IN SOCIAL RESEARCH

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Analysis of Regression Based on Sampling Weights in Complex Sample Survey: Data from the Korea Youth Risk Behavior Web- Based Survey

Confidence Intervals

Audit Sampling 101. BY: Christopher L. Mitchell, MBA, CIA, CISA, CCSA

Comparing Alternate Designs For A Multi-Domain Cluster Sample

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

XI XI. Community Reinvestment Act Sampling Guidelines. Sampling Guidelines CRA. Introduction

430 Statistics and Financial Mathematics for Business

Stats on the TI 83 and TI 84 Calculator

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS

As we saw in the previous chapter, statistical generalization requires a representative sample. Chapter 6. Sampling. Population or Universe

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Chapter 3. Sampling. Sampling Methods

Chapter 4. Probability and Probability Distributions

Introduction Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups...

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Accounting for complex survey design in modeling usual intake. Kevin W. Dodd, PhD National Cancer Institute

Sample Size and Power in Clinical Trials

2011 UK Census Coverage Assessment and Adjustment Methodology. Owen Abbott, Office for National Statistics, UK 1

Simple Linear Regression Inference

On Correlating Performance Metrics

Behavioral Entropy of a Cellular Phone User

Life Table Analysis using Weighted Survey Data

Complex Survey Design Using Stata

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters:

Binomial Probability Distribution

Appendix B Checklist for the Empirical Cycle

(Following Paper ID and Roll No. to be filled in your Answer Book) Roll No. METHODOLOGY

How To Collect Data From A Large Group

Simple Regression Theory II 2010 Samuel L. Baker

Regression with a Binary Dependent Variable

Selecting Research Participants

STAT/MATH 3379: Dr. Manage Chapter Assignment Chapter 1: The Nature of Statistics-Solutions

Research design and methods Part II. Dr Brian van Wyk POST-GRADUATE ENROLMENT AND THROUGHPUT

ENERGY ADVISORY COMMITTEE. Electricity Supply in Hong Kong: Establishment Survey

SAMPLING DISTRIBUTIONS

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Missing data in randomized controlled trials (RCTs) can

Robert L. Santos, Temple University

Chapter 19 Statistical analysis of survey data. Abstract

Package samplesize4surveys

Introduction to Quantitative Research Contact: tel

Majority of Americans Use Recycled Boxes When Moving and Take About Two Months to Unpack

1. Data and information

Multivariate Analysis of Health Survey Data

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Chapter 1: The Nature of Probability and Statistics

Statistics 522: Sampling and Survey Techniques. Topic 5. Consider sampling children in an elementary school.

Household Survey Data Basics

Practical Database Design and Tuning

The End Point: Example of a Good Survey Report

QUANTITATIVE METHODS IN BUSINESS PROBLEM #2: INTRODUCTION TO SIMULATION

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

HANDOUT #2 - TYPES OF STATISTICAL STUDIES

INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING

Transcription:

Steps in applying Probability Proportional to Size (PPS) and calculating Basic Probability Weights First stage: PPS sampling larger clusters have bigger probability of being sampled Second stage: Sampling exactly the same number of individuals per cluster individuals in large clusters have smaller probability of being sampled Overall: Second stage compensates first stage, so that each individual in the population has the same probability of being sampled 1. Calculate the sample size for each strata. 2. Separate population data into strata. The following steps will have to be applied for each strata. 3. List the primary sampling units (Column A) and their population sizes (Column B). Each cluster has its own Cluster Population Size (a). 4. Calculate the cumulative sum of the population sizes (Column C). The Total Population (b) will be the last figure in Column C. 5. Determine the Number of Clusters (d) that will be sampled in each strata. 6. Determine the Number of Individuals to be sampled from each cluster (c). In order to ensure that all individuals in the population have the same probability of selection irrespective of the size of their cluster, the same number of individuals has to be sampled from each cluster. 7. Divide the total population by the number of clusters to be sampled, to get the Sampling Interval (SI). 1

8. Choose a random number between 1 and the SI. This is the Random Start (RS). The first cluster to be sampled contains this cumulative population (Column C). [Excel command =rand()*si] 9. Calculate the following series: RS; RS + SI; RS + 2SI;. RS+(d-1)*SI. 10. The clusters selected are those for which the cumulative population (Column C) contains one of the serial numbers calculated in item 8. Depending on the population size of the cluster, it is possible that big clusters will be sampled more than once. Mark the sampled clusters in another column (Column D). 11. Calculate for each of the sampled clusters the Probability of Each Cluster Being Sampled (Prob 1) (Column E). Prob 1= (a x d) b a= Cluster population b= Total Population d= Number of Clusters 12. Calculate for each of the sampled clusters the Probability of each individual being sampled in each cluster (Prob 2) (Column G). Prob 2= c / a a= Cluster population c= Number of individuals to be sampled in each cluster 13. Calculate the overall basic weight of an individual being sampled in the population. The basic weight is the inverse of the probability of selection. BW=1/(prob 1 * prob 2) 2

Example: Population 20000 in 30 clusters. Sample 3000 from 10 clusters using PPS. Calculate Prob. 1 = probability of selection for each sampled cluster, Calculate Prob. 2 = probability of selection for each individual in each of the sampled clusters, Calculate the overall weight = inverse of the probability of each individual being sampled in the population A B C D E F G H Cluster Size (a) Cumulative Clusters sum sampled Prob 1 Individuals Prob 2 Overall weight per cluster (c) 1 1028 1028 905 51% 300 29% 6.7 2 555 1583 3 390 1973 4 1309 3282 2905 65% 300 23% 6.7 5 698 3980 6 907 4887 7 432 5319 4905 22% 300 69% 6.7 8 897 6216 9 677 6893 10 501 7394 6905 25% 300 60% 6.7 11 867 8261 12 867 9128 8905 43% 300 35% 6.7 13 1002 10130 14 1094 11224 10905 55% 300 27% 6.7 15 668 11892 16 500 12392 17 835 13227 12905 42% 300 36% 6.7 18 396 13623 19 630 14253 20 483 14736 21 319 15055 14905 16% 300 94% 6.7 22 569 15624 23 987 16611 24 598 17209 16905 30% 300 50% 6.7 25 375 17584 26 387 17971 27 465 18436 28 751 19187 18905 38% 300 40% 6.7 29 365 19552 30 448 20000 (b) 3

Number of clusters (d) = 10 Sampling interval (SI) = Cumulative population (B) / Number clusters (D) = 20000/10 = 2000 Random Start (RS) = 905 Series numbers 1 RS= 905 6 RS+(5*SI)= 10905 2 RS+(1*SI)= 2905 7 RS+(6*SI)= 12905 3 RS+(2*SI)= 4905 8 RS+(7*SI)= 14905 4 RS+(3*SI)= 6905 9 RS+(8*SI)= 16905 5 RS+(4*SI)= 8905 10 RS+(9*SI)= 18905 Probability 1 = (a*d) / b Probability 2 = c / a Probability of selection for each sampled cluster Probability of selection for each individual in each of the sampled clusters Overall weight = 1 / (Prob1 * Prob2) a= number individuals in each cluster b=sum individuals in all clusters c=number individuals sampled per cluster d=number sampled clusters Inverse of the probability of each individual being sampled in the population 4

Definitions The sampling frame is the list of ultimate sampling units, which may be people, households, organizations, or other units of analysis. Random sampling is data collection in which every person in the population has a chance of being selected which is known in advance. Normally this is an equal chance of being selected. Random samples are always strongly preferred, as only random samples permit statistical inference. Probability proportion to size is a sampling procedure under which the probability of a unit being selected is proportional to the size of the ultimate unit, giving larger clusters a greater probability of selection and smaller clusters a lower probability. In order to ensure that all units (ex. individuals) in the population have the same probability of selection irrespective of the size of their cluster, each of the hierarchical levels prior to the ultimate level has to be sampled according to the size of ultimate units it contains, but the same number of units has to be sampled from each cluster at the last hierarchical level. This method also facilitates planning for field work because a pre-determined number of individuals is interviewed in each unit selected, and staff can be allocated accordingly It is most useful when the sampling units vary considerably in size because it assures that those in larger sites have the same probability of getting into the sample as those in smaller sites, and vice verse. The design effect (D) is a coefficient which reflects how sampling design affects the computation of significance levels compared to simple random sampling (discussed below). A design effect coefficient of 1.0 means the sampling design is equivalent to simple random sampling. A design effect greater than 1.0 means the sampling design reduces precision of estimate compared to simple random sampling (cluster sampling, for instance, reduces precision). A design effect less than 1.0 means the sampling design increases precision compared to simple random sampling (stratified sampling, for instance, increases precision). 5