Complex Survey Design Using Stata



Similar documents
Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data

Survey Data Analysis in Stata

Using Stat/Transfer on the Linux/UNIX Systems

Software for Analysis of YRBS Data

Sampling Error Estimation in Design-Based Analysis of the PSID Data

STATA SURVEY DATA REFERENCE MANUAL

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

Using NVivo for Qualitative Data Analysis

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES

Survey Data Analysis in Stata

Using ATLAS.ti for Qualitative Data Analysis

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications

The Research Data Centres Information and Technical Bulletin

National Longitudinal Study of Adolescent Health. Strategies to Perform a Design-Based Analysis Using the Add Health Data

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Regression Clustering

Using Excel for Statistical Analysis

6. CONDUCTING SURVEY DATA ANALYSIS

Setting Up My Business Account - Wireless

Introduction to Stata... 2 Stata Sessions Running Stata in Batch Mode...2 Running Stata Interactively...3 Data Management...

Tutorial Segmentation and Classification

Or log on to your account via the Employers / Recruiters button located on the right side of the screen.

How to set the main menu of STATA to default factory settings standards

LinkPoint Connect for Salesforce Tutorial Lotus Notes Edition

In all 1336 (35.9%) said yes, 2051 (55.1%) said no and 335 (9.0%) said not sure

IBM SPSS Missing Values 22

Event counters in NOVA

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Process Document Campus Community: Create Communication Template. Document Generation Date 7/8/2009 Last Changed by Status

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

User Replicator USER S GUIDE

This tutorial provides detailed instructions to help you download and configure Internet Explorer 6.0 for use with Web Commerce application.

From this it is not clear what sort of variable that insure is so list the first 10 observations.

Basic Formatting of a Microsoft Word. Document for Word 2003 and Center for Writing Excellence

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Variance Estimation Guidance, NHIS (Adapted from the NHIS Survey Description Documents) Introduction

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

HOME PAGE. Quick Start Guide. Here s how to navigate the Films On Demand home page you first see when you log in.

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

SPSS: Getting Started. For Windows

Why Sample? Why not study everyone? Debate about Census vs. sampling

XI XI. Community Reinvestment Act Sampling Guidelines. Sampling Guidelines CRA. Introduction

Quick Start Configuration Guide Salesforce.com Integration

SharePoint Wiki Redirect Installation Instruction

An introduction to IBM SPSS Statistics

Introduction to Exploratory Data Analysis

IBM SPSS Bootstrapping 22

Binary Diagnostic Tests Two Independent Samples

Using R for Windows and Macintosh

Didacticiel Études de cas

User Guide. Making EasyBlog Your Perfect Blogging Tool

A Guide to Survey Analysis in GenStat. by Steve Langton. Defra Environmental Observatory, 1-2 Peasholme Green, York YO1 7PX, UK.

Multinomial and Ordinal Logistic Regression

VENDORS Welcome to College of DuPage. Next slide please

Web Ambassador Training on the CMS

Northumberland Knowledge

Kuali Requisition Training

Accounting for complex survey design in modeling usual intake. Kevin W. Dodd, PhD National Cancer Institute

IBM SPSS Direct Marketing 23

Rational Quality Manager. Quick Start Tutorial

How to set up Outlook Anywhere on your home system

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

1. Examples using data from the Survey of industrial and service firms

IBM SPSS Direct Marketing 22

Robert L. Santos, Temple University

Tutorial #7A: LC Segmentation with Ratings-based Conjoint Data

Setting Preferences in QuickBooks

Remote Viewer Recording Backup

How To Use Ticket Validation Software On A Pc Or Mac Or Macbook Or Ipad (For Acedo) On A Computer Or Ipa (For An Ipa) On An Ipad Or Macintosh (For Macintosh) On Pc

HLM software has been one of the leading statistical packages for hierarchical

SAS Software to Fit the Generalized Linear Model

Introduction to Using SPSS Command Files

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

and Gologit2: A Program for Ordinal Variables Last revised May 12, 2005 Page 1 ologit y x1 x2 x3 gologit2 y x1 x2 x3, pl lrforce


Before You Begin... 2 Running SAS in Batch Mode... 2 Printing the Output of Your Program... 3 SAS Statements and Syntax... 3

Multinomial Logistic Regression

Dreamweaver Tutorials Creating a Web Contact Form

Introduction to SPSS (version 16) for Windows

Ivy Tech Community College Virtual Library

Intellicus Enterprise Reporting and BI Platform

Plot and Solve Equations

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Development of the nomolog program and its evolution

SharePoint List Filter Favorites Installation Instruction

Using CyTOF Data with FlowJo Version Revised 2/3/14

Directions for using SPSS

Desktop Sync is recommended for use only by teachers and other staff members in schools, not by students.

Transcription:

Complex Survey Design Using Stata 2010 This document provides a basic overview of how to handle complex survey design using Stata. Probability weighting and compensating for clustered and stratified samples are common survey design features that Stata can help you address in your statistical analyses. Stata is publicly available in both the library and residential computer clusters. Stata is also available on Stanford s UNIX system. To use any of the public computers on campus, or to access Stata on UNIX, you must have a SUNet ID (Stanford University Network Identifier). See the following URL for information on obtaining a SUNet ID: http://www.stanford.edu/group/itss/services/sunetid/ Note: Stata commands are indicated in bold type. Variable names in italics should be substituted with your actual variable names. For a list of Stata s default settings, estimation methods and subcommands not discussed here, please consult Stata s reference manuals (available in the SSRC at Green Library) or its online resources at: http://www.stata.com/links/resources1.html. Table of Contents What is Complex Survey Design?... 2 Specifying Survey Design Features... 2 Specifying Survey Design Features Using Menus...2 Specifying Survey Design Features Using Syntax...3 SVY Commands... 2-4 Subpopulation Analysis... 4 For More Information and Assistance... 4-5 Stata Tutorials...4 Documentation and Books...4 Consulting...5 What is Complex Survey Design? Many common statistical analyses assume that the data being analyzed constitute a simple random sample, in which everyone in the population of interest has exactly the same probability of being chosen during sample selection. In real-life survey research, however, sampling frequently works differently. Samples are often stratified by variables of interest (such as region of the country or racial background of the individual) to ensure a balanced number of respondents for each category of the variable. For example, a national survey in the U.S. may have two strata: urban areas and rural areas and within each stratum, individuals may be sampled independently. Additionally, survey samples are frequently clustered; for example, schools are sampled first, then a certain number of students are chosen from within each sampled school. In this case, schools are the survey s primary sampling units, even though students may be the units of interest in the analysis. Both of these sample features can skew the standard errors from results of statistical analyses. Because standard errors affect significance levels, the conclusions you draw from an analysis that does not take these complex survey design features into account may be false. Stanford University Social Science Data and Software

Weighting is another common feature of survey samples. Many surveys over-sample subpopulations of interest in order to include enough people for meaningful statistical analysis of a subgroup. For example, Cuban Americans may be oversampled in a national survey in order for researchers to be able to analyze their ethnic group on its own. In this example, then, the survey includes disproportionately more Cuban Americans than the U.S. population in general. Therefore, when a survey uses oversampling, results cannot be generalized to the broader population until probability weights are applied. These weights take into account the greater probability that Cuban Americans will be included in the sample compared to other groups. It is important to apply probability weights if you want to generalize your results to a larger population. Stata commands that can adjust the standard errors to correct for complex survey design factors and simplifies oversampling can be found in the following sections. Specifying Survey Design Features In Stata, you can use the svyset command to specify the stratification scheme, sampling weights and primary sampling units (used in clustering) for your data. The keywords used here are strata (which specifies the stratification variable) and pweight (which specifies the weight variable). You must also specify the primary sampling units (PSUs). Stata allows you to specify the data s survey design in two ways: using drop-down menus and using syntax. Specifying Survey Design Features Using Menus If you are using menus to run commands, you should choose the Survey data analysis submenu under the Statistics menu. Select Setup & utilities, then Declare survey design for dataset. Enter the relevant information in the Strata and Sampling units boxes. To specify weights, click on the Weights tab, select the Sampling weight variable button and input the name of the variable identifying the sampling weight. On the Main tab, note that you can increase the number of stages to specify the strata and PSUs for multistage designs. Specifying Survey Design Features Using Syntax If you are typing syntax into the Command window instead of using drop-down menus, then you can submit this information in a single command line: svyset yourpsu [pweight=yourpweight], strata(yourstrata) You do not need to enter every subcommand in this line; for example, if your survey only uses cluster sampling, then you do not need to include the pweight and strata subcommands. Since Stata will remember these specifications and will apply them to every survey analysis procedure (SVY command) that follows it, you will only need to enter it once at the start of your program. SVY Commands The following table lists many of the most commonly used SVY commands for survey data. You can use these commands once you have given Stata the relevant information about the survey design using 2 Complex Survey Design Using Stata

the commands above. Each of the SVY commands described in this section produces estimates that are corrected for the complex design of the survey data that you are analyzing. Note that the commands needed to analyze survey data, require the use of the svy: prefix. For additional information, consult Stata s help function. For example, for more information on the svy: regress command, type: help svy: regress. General/Set-Up svyset svydescribe Means, Proportions, Ratios and Totals svy: mean svy: proportion svy: ratio svy: total Specify survey design features Describe strata and PSUs Estimate population means Estimate population proportions Estimate population ratios Estimate population totals Cross-Tabulations svy: tabulate One- and Two-way contingency tables Regression Models svy: regress svy: intreg svy: ivregress svy: logit svy: logistic svy: mlogit svy: ologit svy: probit svy: oprobit svy: poisson Estimate linear regression models Estimate interval regression models Estimate single-equation instrumentalvariables regression models Estimate logistic regression models, reporting coefficients Estimate logistic regression models, reporting odds ratios Estimate multinomial logistic regression models Estimate ordered logistic regression models Estimate probit models Estimate ordered probit models Estimate Poisson regression models Complex Survey Design Using Stata - 3

Post-Estimation These commands are used after using svy estimation commands. lincom test Estimate linear combinations of parameters Perform hypotheses tests Survival Analysis svy: streg svy: stcox Estimate parametric survival models Estimate cox proportional hazards models Subpopulation Analysis Using the subpop option allows you to look at subgroups in your data. For example, if you have an indicator variable, female (where 1=female and 0=male), you could examine the annual income (income) for females using the following command: svy, subpop(female): mean income For More Information and Assistance Stata Tutorials Stata s website (http://www.stata.com) also has online resources such as FAQ, listserver, and online courses, etc.; see Stata users support at http://www.stata.com/support/. Finally, many Stata resources are available at other places on the Web. For a comprehensive list of such resources, see http://www.stata.com/links/resources1.html or do a general web search. Stata Documentation and Books This document provides only basic information about handling complex survey design in Stata. To learn more about features of this software package and to find out the syntax of specific statements, consult the Stata manuals available in the Velma Denning Room of the SSRC (Green Library Bing Wing Room 120F). To view the current collection, you can either stop by or consult our website (http://ssds.stanford.edu) for book lists organized by title, subject and call number. The collection is non-circulating, but additional copies of some books and manuals may be available in the Green Library reserves system. 4 Complex Survey Design Using Stata

Consulting If you have questions about using Stata, please contact the software consultants at Social Science Data and Software. Our website is http://ssds.stanford.edu. The software consultants are available during the academic year on a walk-in basis. Please see our website for our current walk-in hours. Note: This document is based on Stata 11. Copyright 2010, by The Board of Trustees of the Leland Stanford Junior University. Permission granted to copy for noncommercial purposes, provided we receive acknowledgment and a copy of the document in which our material appears. No right is granted to quote from or use any material in this document for purposes of promoting any product or service. Software Support, Social Science Data and Software Document revised: 09/29/10 Complex Survey Design Using Stata - 5