Inferential Problems with Nonprobability Samples



Similar documents
The Youth Vote in 2012 CIRCLE Staff May 10, 2013

Reflections on Probability vs Nonprobability Sampling

ONS Methodology Working Paper Series No 4. Non-probability Survey Sampling in Official Statistics

Elementary Statistics

RECOMMENDED CITATION: Pew Research Center, January, 2016, Republican Primary Voters: More Conservative than GOP General Election Voters

USUAL WEEKLY EARNINGS OF WAGE AND SALARY WORKERS FIRST QUARTER 2015

COLLEGE ENROLLMENT AND WORK ACTIVITY OF 2014 HIGH SCHOOL GRADUATES

Educational Attainment of Veterans: 2000 to 2009

[Class Survey for Statistics/Sociology/CSSS 221]

Hoover Institution Golden State Poll Fieldwork by YouGov October 3-17, List of Tables. 1. Family finances over the last year...

Appendix I: Methodology

Selected Socio-Economic Data. Baker County, Florida

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

The Presidential Election, Same-Sex Marriage, and the Economy May 11-13, 2012

Clinton Leads Sanders by 29%

Distribution of Household Wealth in the U.S.: 2000 to 2011

Explorations in Non-Probability Sampling Using the Web

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

In 2013, 75.9 million workers age 16 and older in the. Characteristics of Minimum Wage Workers, Highlights CONTENTS

A Basic Introduction to Missing Data

MARYLAND: CLINTON LEADS SANDERS BY 25

Chapter 19 Statistical analysis of survey data. Abstract

STATISTICAL BRIEF #87

Educational Attainment in the United States: 2015

Non-response Bias in the 2013 CPS ASEC Content Test

FLORIDA: TRUMP WIDENS LEAD OVER RUBIO

Business Cycles and Divorce: Evidence from Microdata *

THE CNN / WMUR NH PRIMARY POLL THE UNIVERSITY OF NEW HAMPSHIRE

Continued Majority Support for Death Penalty

Kaiser Family Foundation/New York Times Survey of Chicago Residents

Health Status, Health Insurance, and Medical Services Utilization: 2010 Household Economic Studies

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Module 14: Missing Data Stata Practical

OHIO: KASICH, TRUMP IN GOP SQUEAKER; CLINTON LEADS IN DEM RACE

Real Mean and Median Income, Families and Individuals, , and Households, (Reported in $2012).

SAMPLING DISTRIBUTIONS

Mind on Statistics. Chapter 4

U.S. Population Projections: 2012 to 2060

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

2003 National Survey of College Graduates Nonresponse Bias Analysis 1

How the Survey was Conducted Nature of the Sample: NBC News/Marist New Hampshire Poll of 1,037 Adults

Retirement Readiness in New York City: Trends in Plan Sponsorship, Participation and Income Security

Allen Elementary School

Using Proxy Measures of the Survey Variables in Post-Survey Adjustments in a Transportation Survey

Voting and Political Demography in 1996

Association Between Variables

A General Approach to Variance Estimation under Imputation for Missing Survey Data

How To Calculate Health Insurance Coverage In The United States

Employment-Based Health Insurance: 2010

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

America Is Changing. National Conference of State Legislatures. August 15, 2013 Atlanta, GA

AP Statistics Chapters Practice Problems MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Do Introductory Sentences Cause Acquiescence Response Bias in Survey Questions? Jon A. Krosnick. Ana Villar. and. Bo MacInnis. Stanford University

Appending a Prepaid Phone Flag to the Cell Phone Sample

JSM Survey Research Methods Section

Population, by Race and Ethnicity: 2000 and 2010

Educational Attainment. Five Key Data Releases From the U.S. Census Bureau

MICHIGAN: TRUMP, CLINTON IN FRONT

THE ASSOCIATED PRESS-CNBC INVESTORS SURVEY CONDUCTED BY KNOWLEDGE NETWORKS

HOUSINGSPOTLIGHT FEDERALLY ASSISTED HOUSING? Characteristics of Households Assisted by HUD programs. Our findings affirm that

Assignments Analysis of Longitudinal data: a multilevel approach

Multivariate Logistic Regression

Voting and the Scottish Parliament

Selecting Research Participants

The Economist/YouGov Poll

High School Graduation Rates in Maryland Technical Appendix

Relationship between patient reported experience (PREMs) and patient reported outcomes (PROMs) in elective surgery

Educational Attainment

Trump Still Strong Kasich/Cruz Rise (Trump 42% - Kasich 19.6% - Cruz 19.3% - Rubio 9%)

Executive Summary. Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013

POLLING STANDARDS. The following standards were developed by a committee of editors and reporters and should be adhered to when using poll results.

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Get Britain Working Measures Official Statistics

behavior research center s

CCF Guide to the ACS Health Insurance Coverage Data

ALABAMA and OKLAHOMA: TRUMP LEADS IN BOTH CLINTON LEADS IN AL, SANDERS IN OK

Arizona Report March 2014

Decision Trees from large Databases: SLIQ

Trump Continues Big Michigan Lead (Trump 39% - Rubio 19% - Cruz 14% - Kasich 12%)

BY Maeve Duggan NUMBERS, FACTS AND TRENDS SHAPING THE WORLD FOR RELEASE AUGUST 19, 2015 FOR FURTHER INFORMATION ON THIS REPORT:

States Future Economic Standing

Transcription:

Inferential Problems with Nonprobability Samples Richard Valliant University of Michigan & University of Maryland 9 Sep 2015 (UMich & UMD) WSS seminar 1 / 18

Types of samples Not all nonprobability samples are created equal College sophomores in Psych 100 Mall intercepts Volunteer samples, river samples, snowball samples Probability samples with low response rates Coalitions of the willing AAPOR task force report on non-probability samples (2013) (UMich & UMD) WSS seminar 2 / 18

Types of samples Not all nonprobability samples are created equal College sophomores in Psych 100 Mall intercepts Volunteer samples, river samples, snowball samples Probability samples with low response rates Coalitions of the willing AAPOR task force report on non-probability samples (2013) (UMich & UMD) WSS seminar 2 / 18

Types of samples Not all nonprobability samples are created equal College sophomores in Psych 100 Mall intercepts Volunteer samples, river samples, snowball samples Probability samples with low response rates Coalitions of the willing AAPOR task force report on non-probability samples (2013) (UMich & UMD) WSS seminar 2 / 18

Types of samples Declining response rates Pew Research response rates in typical telephone surveys dropped from 36% in 1997 to 9% in 2012 (Kohut et al. 2012) With such low RRs, a sample initially selected randomly can hardly be called a probability sample Low RRs raise the question of whether probability sampling is worthwhile, at least for some applications Non-probs are faster, cheaper No worse? (UMich & UMD) WSS seminar 3 / 18

Types of samples Polls that failed British parliamentary election May 2015 Final Ipsos/MORI East Anglia/LSE/Durham U Party (online panel) (using poll aggregation) Conservative 51% 36% 43% Labour 36% 35% 41% Israeli March 2015 election (seats); online panels Final Smith- TNS/ Maariv Channel Party Reshet Bet Walla 1 Likud 30 21 23 21 25 Zionist Union 24 25 25 25 25 (UMich & UMD) WSS seminar 4 / 18

Types of samples One that worked Xbox gamers: 345,000 people surveyed in opt-in poll for 45 days continuously before 2012 US presidential election Xboxers much different from overall electorate 18- to 29-year olds were 65% of dataset, compared to 19% in national exit poll 93% male vs. 47% in electorate Unadjusted data suggested landslide for Romney Gelman, et al. used some sort of regression and poststratification to get good estimates Covariates: sex, race, age, education, state, party ID, political ideology, and who voted for in the 2008 pres. election. Wang, W., D. Rothschild, S. Goel, and A. Gelman. 2015. Forecasting Elections with Non-representative Polls. International Journal of Forecasting (UMich & UMD) WSS seminar 5 / 18

Universe & sample Inference problem U Fc Covered s Potentially covered Fpc U-F Not covered For example... U = adult population F pc = adults with internet access F c = adults with internet access who visit some webpage(s) s = adults who volunteer for a panel (UMich & UMD) WSS seminar 6 / 18

Inference problem Ideas used in missing data literature MCAR Every unit has same probability of appearing in sample MAR Probability of appearing depends on covariates known for sample and nonsample cases NINR Probability of appearing depends on covariates and y s (UMich & UMD) WSS seminar 7 / 18

Inference problem Table: Percentages of US households with Internet subscriptions; 2013 American Community Survey Total households 74 Race and Hispanic origin of householder White alone, non-hispanic 77 Black alone, non-hispanic 61 Asian alone, non-hispanic 87 Hispanic (of any race) 67 Household income Less than $25,000 48 $25,000-$49,999 69 $50,000-$99,999 85 $100,000-$149,999 93 $150,000 and more 95 Educational attainment of householder Less than high school graduate 44 High school graduate 63 Some college or associate s degree 79 Bachelor s degree or higher 90 Percent of households with Internet subscription (UMich & UMD) WSS seminar 8 / 18

Inference problem Estimating a total Pop total t = P s y i + P F c s y i + P F pc F c To estimate t, predict 2nd, 3rd, and 4th sums y i + P U F y i What if non-covered units are much different from covered? No 70+ year old Black women in a web panel No 18-21 year old Hispanic males in a phone survey Difference from a bad probability sample with a good frame but low RR: No unit in U F or F pc F c had any chance of appearing in the sample (UMich & UMD) WSS seminar 9 / 18

Full pop vs. Domains Inference problem If domain is completely or mostly in the uncovered part (U F, F pc F c ), then direct domain estimates not possible Small area approach where n D = 0 might be tried Full pop estimates may be OK if uncovered are "like" covered (UMich & UMD) WSS seminar 10 / 18

Quasi-randomization Methods of Inference Quasi-randomization Model probability of appearing in sample Pr (i 2 s) = Pr (has Internet ) Pr (visits webpage j Internet ) Pr (volunteers for panel j Internet ; visits webpage) Pr (participates in survey j Internet ; visits webpage ; volunteers) (UMich & UMD) WSS seminar 11 / 18

Methods of Inference Quasi-randomization Reference sample Select a probability sample from a frame with good coverage Combine probability and non-probability samples together Estimate probability of being in non-probability sample using logistic regression (or similar) Use inverse probability as a Horvitz-Thompson-like weight What does this probability mean? In a volunteer sample, there are people who would never visit recruiting webpage or never volunteer if they did visit The probability has no relative frequency interpretation (UMich & UMD) WSS seminar 12 / 18

Methods of Inference Quasi-randomization Reference sample Select a probability sample from a frame with good coverage Combine probability and non-probability samples together Estimate probability of being in non-probability sample using logistic regression (or similar) Use inverse probability as a Horvitz-Thompson-like weight What does this probability mean? In a volunteer sample, there are people who would never visit recruiting webpage or never volunteer if they did visit The probability has no relative frequency interpretation (UMich & UMD) WSS seminar 12 / 18

Methods of Inference Quasi-randomization Reference sample Select a probability sample from a frame with good coverage Combine probability and non-probability samples together Estimate probability of being in non-probability sample using logistic regression (or similar) Use inverse probability as a Horvitz-Thompson-like weight What does this probability mean? In a volunteer sample, there are people who would never visit recruiting webpage or never volunteer if they did visit The probability has no relative frequency interpretation (UMich & UMD) WSS seminar 12 / 18

Methods of Inference Model for y Superpopulation model Use a model to predict the value for each nonsample unit Linear model: y i = x T i + i If this model holds, then ^t = = X s X y i + s : = t T Ux ^ X F c s ^y i + y i + t T (U s);x ^ X F pc F c ^y i + X U F ^y i where ^y i = x T i ^ (UMich & UMD) WSS seminar 13 / 18

Methods of Inference Model for y Unit-level weights Prediction estimator does lead to weights w i = 1 + t T (U s);x : = t T Ux X T s X s 1 xi X T s X s 1 xi (UMich & UMD) WSS seminar 14 / 18

Methods of Inference Model for y y s & Covariates If y is binary, a linear model is being used to predict a 0-1 variable Done routinely in surveys without thinking explicitly about a model Every y may have a different model ) pick a set of x s good for many y s Same thinking as done for GREG and other calibration estimators Undercoverage: use x s associated with coverage Also done routinely in surveys (UMich & UMD) WSS seminar 15 / 18

Methods of Inference Model for y Modeling considerations Good modeling should consider how to predict y s and how to correct for coverage errors Covariates: an extensive set of covariates needed Dever, Rafferty, & Valliant (2008). Svy. Rsch. Meth. Valliant, Dever (2011). Soc. Meth. Res. Gelman, et al. (2015). Intl. Jnl. Forecasting Model fit for sample needs to hold for nonsample Proving that model estimated from sample holds for nonsample seems impossible (UMich & UMD) WSS seminar 16 / 18

Methods of Inference Model for y SE estimation Variance estimator must be model-based Replication is an option Jackknife or bootstrap Approximate jackknife Large sample approximation to jackknife X P wi 2 v r i 2 J = (1 h ii ) 2 n 1 s w i r i r i = y i ^y i (1 h s ii ) h ii is a leverage v J is consistent if the linear model holds with uncorrelated errors 2 (UMich & UMD) WSS seminar 17 / 18

Methods of Inference Model for y Certification Idea of Joe Sedransk AAPOR sets up certification program for panel purveyors Take questions from some legitimate survey (CPS, HRS, NHIS, a good opinion poll) Panel vendor includes Q s in its survey Make estimates using vendor methods Compare estimates to ones from legitimate survey Vendor must provide microdata and complete description of methods to be reviewed by AAPOR committee in order to be certified" (UMich & UMD) WSS seminar 18 / 18