ONS Methodology Working Paper Series No 4. Non-probability Survey Sampling in Official Statistics

Transcription

1 ONS Methodology Working Paper Series No 4 Non-probability Survey Sampling in Official Statistics Debbie Cooper and Matt Greenaway June 2015

2 1. Introduction Non-probability sampling is generally avoided in official statistics, often for good reasons: a lack of selection probabilities makes inference from the sample to the population extremely challenging, quality measures such as standard errors are difficult or impossible to calculate, and so the official statistics context of a wide variety of users, who may use data for different purposes, does not fit well with non-probability methods. However, because of the ever-increasing nonresponse rates, costs associated with probability sampling, and ease of carrying out web surveys some survey researchers have shifted their attention to developing better non-probability sampling and estimation techniques. As there have been numerous developments in the domain of non-probability sampling, this paper endeavours to raise awareness amongst producers of official statistics with regards to challenges and developments relating to non-probability sampling. This paper aims to achieve four main outcomes: i. Provide a concise review of the types of non-probability samples ii. iii. Highlight the key challenges associated with non-probability sampling Increase awareness of techniques available to potentially overcome these challenges iv. Provide guidance to help inform decision-making on whether a non-probability sample is justified In order to achieve these outcomes we first identify the characteristics of non-probability sampling and discuss the growing interest in it. Following this is an overview of various types of non-probability sampling techniques. The key challenges associated with non-probability sampling are then highlighted. Next, techniques developed to overcome some of the challenges associated with non-probability sampling and estimation are discussed. This is followed by a section providing guidance on when the use of non-probability sampling is justified. Finally, a set of recommendations regarding the use of non-probability sampling in official statistics is provided. 2

3 2. What constitutes non-probability sampling? Non-probability sampling has two distinguishing characteristics: i. one cannot specify the probability of selection for each unit that will be included in the sample ii. it is not possible to ensure that every unit in the population has a nonzero probability of inclusion (Frankfort-Nachmias and Nachmias, 1996) In probability sampling, the ability to calculate selection probabilities allows researchers to create design weights which result in an unbiased estimator. Probability samples also allow for representativeness as each unit in the target population has a nonzero probability of selection, and allow for the estimation of sampling variability these are crucial advantages. However, non-random nonresponse and undercoverage violate the assumptions of probability sampling, giving them a non-probability element. Various methods have been developed to deal with coverage and nonresponse issues in probability sampling. These include using multiple sampling frames, adjusting the weights for nonresponse and, if relevant, attrition based on sample characteristics and calibrating to target population totals in order to produce more representative estimates. However, the concerns about increasing nonresponse coupled with the high costs associated with traditional probability sampling methods have led some survey researchers to turn their attention to non-probability sampling. This growing interest in non-probability sampling also results from the fact that web data collection (most of which uses non-probability sampling) has become much easier to carry out. It is also much less costly than certain types of probability sampling. Sometimes non-probability sampling is used because there is no other option available to the researcher. This may be caused by the target population being a hidden population (and therefore there is no sampling frame available) or because of limited resource availability. Section 6 will provide guidance with regards to deciding whether use of a non-probability sample is justified. If this is the case, it is essential to bear in mind the challenges associated with non-probability sampling (see Section 4) and attempt to use techniques aimed at overcoming these challenges. Some of these techniques are described in Section Types of non-probability sampling It is extremely difficult to categorise non-probability sampling techniques because there is a lot of inconsistency in literature regarding definitions and applications of the types of nonprobability sampling methods. The blurred boundaries and different interpretations of the types of non-probability sampling should be borne in mind when interpreting the framework below which attempts to identify the main categories of non-probability sampling. 3

4 Given the multitude of non-probability sampling techniques available, the aim of this section is not to provide a comprehensive review of the types of non-probability sampling methods available but rather to give a flavour for the types of techniques available. This will form the basis for discussing the challenges and limitations of non-probability sampling later on as well as the methods that have been proposed to overcome some of these limitations. A review of literature revealed four common categories for classifying non-probability sampling techniques, these are described below. 3.1 Convenience/accidental sampling According to Baker et al. (2013) Convenience sampling is a form of non-probability sampling in which the ease with which potential participants can be located or recruited is the primary consideration. Therefore, no formal sample design is used. Types of convenience sampling techniques include: i. mall-intercept sampling this is frequently used in market research and involves interviewers attempting to recruit passersby to participate in a survey. ii. volunteer sampling (e.g. some types of online opt-in panels) this consists of people signing-up to participate in research studies. Volunteer sampling is usually done online whereby volunteers are put on a mailing list and receive invitations to participate in surveys. 3.2 Purposive sampling This consists of the researcher using their judgement and approaching only those people who they decide are most appropriate to participate in the study e.g. a sample of experts on a particular topic. 3.3 Sample matching This involves selecting a sample that matches a set of population characteristics of interest (rather than bringing the sample and population into alignment after carrying out the survey as is done with post-stratification). The most common type of sample matching is quota sampling (described in further detail in Section 5.1.1). 4

5 3.4 Chain referral methods These tend to be used for researching rare or hard-to-reach populations. They usually involve obtaining an initial set of respondents (called seeds) from the population of interest and using their links to obtain further respondents from the population of interest. Types of chain referral sampling include: i. snowball sampling - there is a lot of confusion regarding the meaning of snowball sampling. In many texts it is described as a non-probability convenience method used to access hard-to-reach populations whereby respondents from hidden populations are asked to recommend other respondents from the population of interest. However, originally, this method was developed by researchers such as Coleman (1958) and Goodman (1961) to investigate social networks rather than as a means to find participants to interview (Vogt et al., 2012). ii. respondent driven sampling (RDS) - in response to using snowball sampling as a type of convenience sample, survey researchers focused on developing chain referral methods which could be used to produce good estimates. RDS refers to this method (Heckathorn, 2011). As RDS uses a more structured approach to sampling, convenience is not the primary consideration of this type of sampling. RDS is described in further detail in Section Key challenges associated with non-probability sampling The two main concerns when using non-probability sampling are: i. There is a greater likelihood of selection bias. Consequently, the resulting sample may not be representative of the population ii. It is impossible to utilise unbiased estimators and associated quality measures 1 (e.g. variance, standard errors and confidence intervals) These two concerns are described in further detail in Sections 4.1 and 4.2 below. 4.1 Selection bias One of the key challenges when using non-probability sampling is selection bias. Selection bias is The error introduced when the study population does not represent the target population (Delgado-Rodriguez and Llorca, 2004). Selection bias occurs during the recruitment and retention of participants and the most effective way of avoiding such bias is by having a well-designed study. 5 1 Some researchers have focused on developing unbiased estimators for use with RDS. However, these require a number of assumptions to be made and should be used with caution. See Section

6 Selection bias occurs in both probability and non-probability sampling. However, nonprobability sampling is more prone to selection bias. Below is a (non-exhaustive) list of causes of selection bias: i. undercoverage this occurs when some units in the target population have a zero probability of selection thus making the sample unrepresentative of the population ii. iii. volunteer bias many non-probability sampling techniques rely on units volunteering to participate in a study and since volunteers may have different characteristics to those who haven t volunteered, this may result in an unrepresentative sample interviewer/researcher unconscious bias unconscious biases may influence interviewers/researchers so that they are inclined to select participants with particular characteristics e.g. people who look friendly or helpful or people who are more similar to themselves. This is particularly a problem with quota and purposive sampling whereby selection of participants is left to the interviewer/researcher. In all cases above, sampled individuals may differ systematically from non-sampled individuals on variables of interest thus use of the non-probability sample may result in biased estimates. Of the three causes of selection bias listed above, undercoverage may also be an issue in probability sampling. However, this is not usually as extensive/common in probability sampling as it is in non-probability sampling. 4.2 Unbiased estimators and lack of quality measures Standard practice in official statistics, and indeed in most large-scale social surveys, is to utilise probability sampling and design-based estimation, whereby a design weight is calculated as the inverse of the selection probability. This produces the Horvitz-Thompson estimator, which is unbiased for any design where all units have a non-zero probability of selection. Frequently, additional auxiliary information is utilised to adjust these design weights, technically making the estimator model-assisted, although design-based is often still used whenever the estimator accounts for survey design. This methodology has a number of advantages the resulting estimator is unbiased regardless of the purpose for which it is used, and sampling variability can be estimated directly. Since design-based estimation is not suitable for most non-probability samples, these advantages will be lost if a non-probability sample is used. 6

7 5. Types of sampling and estimation methods developed to overcome issues associated with nonprobability sampling Recently, researchers have focused on developing methods for overcoming the challenges relating to non-probability sampling described above. The methods developed focus on the both the sample selection and estimation stages. Some of these methods are described in Sections 5.1 and 5.2 below. 5.1 Overcoming challenges at the sampling stage When using non-probability sampling, the main challenge at the sampling stage is obtaining a representative sample. Two popular non-probability sampling strategies developed to obtain a more representative sample are sample matching and respondent-driven sampling (RDS) Sample matching As described in Section 3.3, the most common type of sample matching is quota sampling. In quota sampling, the interviewers are asked to interview a certain number of people (or units) with particular characteristics so that the final sample mirrors the target population in terms of these characteristics. In order for this to be successful, good estimates of the population characteristics used for matching need to be available (e.g. the estimates could be obtained from a good quality probability sample or a census). By using quota sampling, researchers hope to achieve a more representative sample (for further details of sample matching see Rivers, 2007; Bethlehem, 2014). However, since the choice of who to interview is still in the hands of the interviewer there may still be a substantial amount of selection bias resulting from interviewers approaching certain people over others (because of the unconscious bias as described earlier). For example, Mosteller et al. (1949) suggest in their review of the 1948 United States election poll results that the unconscious bias of interviewers may have considerably affected the incorrect prediction of the results even though quota sampling was used. Another problem with quota sampling can be undercoverage. For example, a quota sample collected on the High Street will not capture people at home or in work. Consequently, quota sampling alone is not sufficient for obtaining a representative sample. In fact Rubin (1979) recommended using both sample matching and weighting in order to obtain more accurate estimates. Various estimation methods for non-probability sampling are discussed in Section Respondent- driven sampling (RDS) This type of sampling is mainly aimed at sampling hidden (or hard-to-reach) populations. It is a type of chain referral sampling that uses link-tracing to obtain respondents from the target population. It is typically used when a sampling frame is not available. 7

8 RDS consists of two distinct sampling phases: in the first phase a convenience sample (the seeds at Wave 0) from the target population is chosen. The rest of the sample (Waves 1 onwards) is selected by following the links from previous respondents. This method, developed by Heckathorn (1997) uses an innovative approach for recruiting participants after Wave 0 because respondents are given a fixed number of coupons to hand out to other members of the target population. People who decide to participate in the survey simply take the coupon to the survey centre. Therefore, after Wave 0, each successive wave of the sample consists of population members who are given coupons by members of the previous wave and return those coupons to the survey centre. This process is repeated several times (until the desired sample size is achieved) so that each time respondents from one wave drive the following wave (Gile and Handcock, 2010). Using coupons in this way reduces confidentiality concerns in marginalized populations. Moreover, it enables the researcher to track social networks for use in estimation. Respondents usually receive additional compensation for each successful recruitment. Respondents handing out coupons are asked to report how many coupons they have distributed. This enables the researchers to develop more accurate estimation methods (refer to Heckathorn, 2011 for a description of the various estimators available using RDS). Moreover, the numerous waves in the study reduce the dependence of the final sample on the original convenience sample (Gile and Handcock, 2010). 5.2 Overcoming challenges at the weighting and estimation stage As described in Section 4.2, it is difficult to obtain unbiased estimates and calculate traditional quality measures when using non-probability sampling. In order to overcome these difficulties during the weighting and estimation stage, researchers have developed a number of methods for use with various sampling techniques. This section will focus on a number of these methods. The aim of this section is not to provide instructions regarding how to apply the methods described, but rather, the aim is to make the reader aware of the weighting and estimation methods that exist when using non-probability sampling. This will enable readers to make better informed decisions regarding the type of non-probability sampling method that would best suit their needs. Section 4.2 outlined why non-probability sampling typically rules out the traditional designbased approach for estimation. The main alternative is model-based estimation, where selection probabilities are not accounted for, and the estimator is based on a (explicit or implicit) model. The precise specification of this estimator will depend on the type of sample and the type and purpose of the estimate required. Some of the methods below propose the use of model-based estimation. In general, it is important to note that a model-based estimator can produce inaccurate or misleading results if the underlying model is incorrect, which is often impossible to verify; and that there are typically a number of models or estimation methods to choose from, each of which may produce different estimates. In contrast, a design-based estimator: will not (usually) produce biased estimates, is appropriate in most situations, and is (broadly) unique 8

9 there is only one Horvitz-Thompson estimator for a given design. For this reason, modelbased estimators should be treated with caution, and users should be wary that the results could always be open to dispute Available estimators and methods for calculating quality measures when using RDS i. Estimators With reference to non-probability sampling, Salganik (2006, p.i98) stated: For many years, researchers thought it was impossible to make unbiased estimates from this type of sample. However, it was recently shown that if certain conditions are met and if the appropriate procedures are used, then the prevalence estimates from respondent-driven sampling are asymptotically unbiased. Heckathorn (2011) provides a description of RDS and the various estimators available when using this sampling approach. Moreover, he specifies the strengths and limitations of these estimators in his paper. It is essential to bear in mind that these estimators require a number of assumptions to be made. Gile and Handcock (2010) caution users of RDS that biased estimates may be produced when these assumptions are not met. In a separate research strand using link-tracing designs Chow and Thompson (2003) proposed a Bayesian approach for estimation. They state that when prior information is available for the characteristics one wants to estimate, then their Bayesian approach should provide better estimators than when no prior information is used. When this prior information is not available, Chow and Thompson (2003) suggest conducting a sensitivity analysis. ii. Quality Measures In terms of quality measures available when using RDS, Salganik (2006) proposes a bootstrap method to construct confidence intervals around estimates produced from RDS samples. Furthermore, following the calculation of design effects for his data, he provides advice regarding the sample sizes required for RDS studies. He recommends that when using RDS, researchers should use a sample size twice as large as that required under simple random sampling. For link-tracing designs using the Bayesian approach described above, Chow and Thompson (2003) describe how, once the estimators are obtained using this approach, not much more effort is required to obtain interval estimates in order to assess the accuracy of estimators. Credibility intervals for use with the Bayesian approach are described in further detail in Section RDS is not suitable for all research and is generally used for researching hidden populations. When RDS is not a suitable non-probability sampling method other estimators and quality measures are required. The use of Propensity Score Adjustments, described next, is one alternative. 9

10 5.2.2 Using weighting for estimation We have already outlined how design-based estimation with non-probability sampling is impossible. However, survey researchers have proposed the use of Propensity Score Adjustments (PSA) to approximate a design-based approach. PSA has been largely used and tested on web panel surveys. There are various methods for using PSA. One of these methods, outlined by Valliant and Dever (2011) involves constructing pseudo design weights and using covariates from a reference (probability) survey to adjust these weights for nonresponse. These adjusted weights are then used to construct estimators. In order to construct the pseudo design weight in the first place, Valliant and Dever (2011) propose that if a subsample from a large panel is used, then the pseudo design weights could be calculated as the inverses of the selection probabilities from the panel. For a comparison of the quality of estimators using PSA see Valliant and Dever (2011). Lee (2006) calculated the bias of estimates when using PSA (by comparing PSA weighted and unweighted estimates to the reference survey estimates) as well as the standard errors. He found that although PSA seems to reduce bias resulting from nonresponse, it seems to increase variance. Consequently, this should be borne in mind when using PSA techniques. Lee (2006) also recommends that covariates that are highly related to the study outcomes should be used in the PSA. For further detail on using PSA for weighting and estimation see Lee and Valliant (2009) and Lee (2006) Additional quality measures proposed for non-probability sampling Some quality measures for estimators calculated from non-probability samples have been discussed in the previous sections. A number of other quality measures for use with nonprobability sampling have been proposed, including credibility intervals and participation rates. These are briefly described below. i. Credibility intervals seem to be gaining popularity when using online opt-in panels. They should be used when a Bayesian approach is adopted. A credibility interval is similar to a confidence interval in that it is used to provide an indication of uncertainty of estimates. In practice it tends to be calculated in exactly the same way as a confidence interval. However, the interpretation of a credibility interval is different from that of a confidence interval (Gill, 2014). Unlike the confidence interval, the credibility interval is directly related to the actual data distribution. Consequently, the interval may or may not include the estimate (e.g. the mean), depending on whether the actual data distribution is skewed. Therefore unlike the confidence interval, the credibility interval is not an interval around the mean (United States, Environmental Protection Agency). Moreover, a Bayesian credible interval has a precise probabilistic meaning (United States Environmental Protection Agency), 2003, p33) so that, for example, a credibility interval of 90% would be interpreted as there being a 90% probability that the true value lies within the credibility interval. For further information on interpreting credibility intervals see AAPOR (2012) and United States Environmental Protection Agency (2003). 10

11 ii. Unweighted probability survey response rates are calculated as: Since the total number of eligible units is generally not known in non-probability samples it is not possible to calculate response rates. Consequently, some researchers have started using the term participation rates for non-probability samples. Participation rates (Baker et al., 2013) can be defined as Baker et al. (2013) state that it is essential for researchers to report on the quality of their estimates in order for readers to be able to use their results appropriately. Unfortunately, there is not currently a widely accepted framework for assessing the quality of estimates resulting from non-probability samples as there is for assessing the quality of estimates produced from probability samples. Therefore, Baker et al. (2013) encourage the development of new quality measures for use with non-probability sampling. They also note the importance of using different terminology for quality measures associated with nonprobability sampling in order to differentiate these from the quality measures associated with probability sampling. 6. When is use of non-probability sampling justified? In designing a study one must consider fitness for purpose. This is a well-known concept in probability sampling too as there is always some degree of compromise that needs to be achieved in terms of cost and precision (or minimisation of error). Groves (2004, p10) emphasises the difference between what he refers to as modellers and describers. Modellers are those researchers from psychometric or econometric backgrounds who are mainly interested in relationships between variables. On the other hand, describers are researchers who are mainly concerned with describing the target population e.g. in terms of means and totals. These include producers of official statistics. Groves (2004) highlights the fact that because of their differing research aims, modellers and describers are interested in different types of errors. As a result, modellers and describers tend to use different sampling techniques. This is important in terms of fitness for purpose because there is no single correct survey method or survey sampling technique. Moreover, there is no single correct level of accuracy that should be achieved when carrying out a survey study. These considerations should be made within the context of the study being carried out bearing in mind the aims of the study. For example, modellers tend to be interested in a narrower range of variables than describers (who tend to conduct large surveys with hundreds of variables); therefore certain techniques such as PSA tend to lend themselves better to modellers needs. In the case of 11

12 PSA this is because it was found that unless the covariates included in the analysis are highly related to the study outcomes (i.e. the variables which will be used to produce estimates), the resulting estimates have similar bias and increased variance compared with estimates produced from the reference survey (Lee, 2006). It follows therefore that it is not possible to making sweeping statements regarding the utility of non-probability sampling techniques in official statistics. However, decisions as to whether to use probability or non-probability sampling boil down to what the researcher is hoping to achieve from the survey. Since in official statistics we are mainly (although by no means exclusively) describers, we need to consider the implications of using non-probability sampling where a large number of variables are collected and used to estimate a fairly wide variety of characteristics of finite populations. As discussed in previous sections, problems such as selection bias and the necessity of model-based estimation make non-probability sampling much less desirable than probability sampling in this context. In addition, the fact that a wide variety of estimates may be produced makes it extremely challenging to ensure that all estimates are fit for purpose. However, sometimes the researcher has no other feasible option for example, when the researcher is attempting to describe characteristics of a hidden population for which there is no sampling frame. In such a case it is essential to consider the outcomes one hopes to achieve in order to design a study that will achieve the best possible outcomes despite the challenges associated with it. In particular, we advise limiting the uses to which the estimates can be put in order to ensure that all estimates are fit for purpose, although this is challenging in an official statistics context where statistics must be publicly available. At the very least, it is crucial to ensure that any quality issues stemming from the choice of sampling method are communicated clearly to users. One aim of this paper was to make researchers aware of the various non-probability sampling and estimation techniques available. These should be carefully considered if a decision to use a non-probability sample is made. For example, with hidden populations, the use of RDS may be a suitable option as much more effort has been made to develop unbiased estimators and good quality measures for RDS than for other sampling types such as convenience sampling. Whether using probability or non-probability sampling, it is the responsibility of researchers to consider carefully the most suitable option, state clearly the reasoning behind their choice of sampling and estimation techniques, and make every effort to describe clearly the quality of their resulting estimates thus ensuring compliance as far as possible with the UK Statistics Authority (2009) Code of Practice for Official Statistics. 7. Recommendations Following a review of literature on non-probability sampling, there are three main recommendations regarding the use of non-probability sampling in official statistics. These are: i. Fitness for purpose should be used to drive survey design. 12

13 ii. Non-probability sampling does not necessarily equate to lack of quality and the various methods available should be carefully considered in order to obtain the best quality estimates for the study at hand. iii. It is essential to be transparent regarding the choice of sampling and estimation techniques, describing the quality of resulting estimates as well as their limitations. 13

14 References AAPOR Understanding a credibility interval and how it differs from the margin of sampling error in a public opinion poll, [online] Available at: atementoncredibilityintervals.pdf [Accessed 7 th May 2015]. Baker, R., Brick, M.J., Bates, N.A., Battaglia, M., Couper, M.P., Dever, J.A., Gile. K.J., Tourangeau, R Report of the AAPOR Task Force on Non-Probability Sampling. [online] Available at: Final_7_revised_FNL_6_22_13.pdf [Accessed 7 th May 2015]. Bethlehem, J Solving the nonresponse problem with sample matching? Statistics Netherlands Discussion Paper, [online] Available at: [Accessed 7 th May 2015]. Chow, M. and Thompson, S.K Estimation with link-tracing sampling designs: a Bayesian approach. Survey Methodology, 29 (2) Coleman, J. S Relational Analysis: The Study of Social Organizations with Survey Methods. Human Organization. 17(4), pp Delgado-Rodriguez, M. and Llorca, J Bias. Journal of Epidemiology & Community Health. 58, pp Frankfort-Nachmias, C. and Nachmias, D Research Methods in the Social Sciences. New York: Worth Publishers. Gile, K.J. and Handcock, M.S Respondent-Driven Sampling: An Assessment of Current Methodology. Sociological Methodology. 40(1), pp Gill, J Bayesian Methods: A Social and Behavioral Sciences Approach. CRC Press. Goodman, L. A Snowball Sampling. Annals of Mathematical Statistics. 32, pp Groves, R.M Survey Errors and Survey Costs. Wiley & Sons: New Jersey Heckathorn, D.D Respondent-driven sampling: A new approach to the study of hidden populations. Sociological Problems, 44( 2), pp Heckathorn, D.D Snowball versus Respondent-Driven Sampling. Sociological Methodology, 41(1), pp Johnston, L.G. and Sabin, K Sampling hard-to-reach populations with respondent driven sampling. Methodological Innovations Online, 5(2), pp

15 Lee, S Propensity Score Adjustment as a Weighting Scheme for Volunteer Panel Web Surveys. Journal of Official Statistics, 22(2), pp Mosteller, F., Hyman, H., McCarthy, P., Marks, E. and Truman, D The Pre-Election Polls of 1948: Report to the Committee on Analysis of Pre-election Polls and Forecasts. New York: Social Science Research Council. Rivers, D Sampling for Web Surveys. [online] Available at: [Accessed 7 th May 2015]. Rubin, D.B Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observational Studies. Journal of the American Statistical Association, 74(366), pp Available at: [Accessed 7 th May 2015]. Salganik, M.J Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling. Journal of Urban Health: Bulletin of the New York Academy of Medicine. 83(7), pp.i98-i112. Salganik, M.J. and Heckathorn, D. D Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling. Sociological Methodology, 34, pp UK Statistics Authority Code of Practice for Official Statistics. Edition 1.0. [online] Available at: [Accessed 7 th May 2015]. United States, Environmental Protection Agency Occurrence Estimation Methodology and Occurrence Findings Report for the Six-Year Review of Existing National Primary Drinking Water Regulations [online] Available at: df [Accessed 18 th June 2015]. Valliant, R. and Dever, J. A Estimating Propensity Adjustments for Volunteer Web Surveys. Sociological Methods & Research, 40(1) pp Vogt, P.W., Gardner, D.C., and Haeffele, L.M When to Use What Research Design. Guilford Press. 15