The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

Size: px
Start display at page:

Download "The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?"

Transcription

1 The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health, ABSTRACT With recent releases, SAS has become increasingly capable of analyzing survey data descriptive statistics, crosstabulations, and regression models can be computed using procedures that account for complex sampling schemes and survey designs. The SURVEYFREQ procedure, for instance, is a common method for computing populationbased prevalence estimates of health indicators using data from large national and statewide surveys. Although this procedure may seem familiar to users of PROC FREQ, there are important differences between the analysis of survey data and simple random samples. This paper will provide an overview of PROC SURVEYFREQ and discuss syntax important for producing weighted estimates of prevalence and standard errors. INTRODUCTION Surveys are widely used across the United States and in public health. Survey methods, such as stratification, clustering, and oversampling, allow researchers to gather data from a segment of the population (e.g. a sample) and make generalizations to a larger population (e.g. target population) with efficiency and precision. At the same time, these aspects of a survey s design affect the computation of prevalence and variance estimates and, therefore, must be accounted for in the analysis of data. With the release of version 9.2, SAS is more capable than ever before of analyzing survey data using procedures like PROC SURVEYFREQ. Although survey syntax may seem familiar to users of PROC FREQ, there are important differences between the analysis of survey data and simple random samples (SRS). The purpose of this paper is to provide an overview of PROC SURVEYFREQ, paying specific attention to syntax that is important for producing weighted estimates of prevalence and appropriate standard errors. SURVEY DESIGN 101 Stratification, clustering, and oversampling are common characteristics of surveys and understanding why these methods are used is helpful for understanding how to account for them in the analysis. WHY STRATIFY? A stratified random sample refers to a sample that is drawn from a population that is divided into subgroups. Prior information on the stratification variable(s) is available beforehand and used to create a sampling frame, dividing the population into segments. A random sample is then drawn from each subgroup independently from the other groups. Stratified samples are used for several reasons (Lohr, 1999): Protect from the possibility of obtaining a poor sample. African Americans, for instance, comprise about 5% of mothers giving birth each year in California. A SRS of 1,000 may very well yield no African American mothers in the sample or too few to obtain meaningful estimates of prevalence by race. In comparison, if a stratified sample is taken and African American mothers are sampled proportionate to their distribution in the population, about 50 African American mothers would be included in the sample. Obtain a known precision for subgroups. If only 50 African American mothers were sampled in the scenario above, the precision of prevalence estimates for African Americans would not be as precise as they are for other groups sampled. Therefore, instead of sampling African American women proportionate to their distribution in the population, they could be oversampled so that the level of precision for estimates among African Americans is comparable to that among White and Hispanic mothers, who comprise a greater percentage of births. Convenience and cost. Different sampling approaches may be used in different strata, which may increase the feasibility of the survey and decrease cost. Obtain more precise estimates for the entire population. People from the same group tend to have similar responses or characteristics, and therefore, the variance within strata may be smaller than the variance in the population as a whole. California s Maternal and Infant Health Assessment (MIHA) is an example of a stratified sample. Conducted by the Maternal, Child and Adolescent Health Program at the California Department of Public Health in collaboration with researchers from the University of California, San Francisco, and modeled after the Center for Disease Control and 1

2 Prevention s (CDC) Pregnancy Risk Assessment Monitoring System (PRAMS), MIHA is an annual population-based survey of mothers with a recent live birth who are sampled from birth certificates. Designed to produce a representative sample of live births in the State, the sample is stratified by African American race, high school graduation, and region of California, all of which are available on the birth certificate, with oversampling of African Americans (California Department of Public Health, 2009). MIHA will be used as an example when demonstrating SAS survey procedures below. WHY CLUSTER? A stratified sample can only be taken if information about the population and sampling units is readily available, as is the case with birth certificates all births occurring in California and in the United States are registered with vital statistics and birth certificate data on maternal and infant characteristics are available to help construct a sampling frame. Often this type of detailed information is not available, making the construction of a sampling frame time consuming, costly, or not feasible. Dividing the population into clusters can address these issues, although there are some trade-offs. Whereas stratifying a sample can reduce the standard errors if the population in each stratum is homogenous, clustering usually increases the variance if the members of a cluster are alike. Nevertheless, cluster samples are often used because a (Lohr, 1999): Sampling frame is difficult or impossible to construct for the entire population. Suppose a researcher wanted to sample high school students in California. It would be difficult to obtain a list of students from every school to compile a sampling frame, but it would be possible to obtain a list and sample a proportion of schools, and then proceed with sampling all students (e.g. one-stage cluster sampling) or a proportion of students (e.g. multi-stage cluster sampling) from each school that agrees to participate. Population is widely distributed geographically or clustered. It would be much cheaper to select schools and interview students within these schools than to select students using a SRS. A SRS would result in a sample that was geographically distributed and that contained only a small number of students per school. This would require more resources for travel and interviewers, and increase the cost of the study. The Youth Risk Behavior Surveillance System (YRBSS) is an example of a survey that uses cluster sampling. The survey was first conducted in California in the Spring of 2009 by the California Department of Public Health, the California Department of Education and the Public Health Institute in cooperation with the Centers for Disease Control and Prevention (Survey Research Group, 2009). Designed to produce a representative sample of students in 9 th through 12 th grade, state-based YRBS employs a two-stage cluster sample where a stratified sample of schools is taken first, proportionate to enrollment size, and classes are randomly selected second, from which all students are eligible to participate. National YRBS employs a more complex, three-stage cluster sample where a stratified sample of counties is selected first, according to enrollment size and other factors, such as urban or rural location. Schools and classrooms are selected next (Centers for Disease Control and Prevention, 2004). YRBS will also be used as an example below. SPECIFYING YOUR SURVEY S DESIGN IN SAS The rules used to select units from a population constitute a survey s sample design and must be accounted for in SAS. Sample design information is provided in the WEIGHT, STRATA, and CLUSTER statements, and in the RATE= option in the PROC SURVEYFREQ statement. These portions of syntax play an important role in either the estimation of prevalence or variance (SAS Institute Inc., 2009). ESTIMATING PREVALENCE: THE WEIGHT STATEMENT Weights adjust for different components of a survey s design, and generally it is important to know about these aspects of your survey s methods even if the weights may already be calculated for you. In stratified samples, like MIHA, one component of the weight is the inverse of the sampling fraction in each stratum (e.g. the ratio of the number of people sampled to the number of people in the target population or sampling frame, n s / N s ). This weight, often called the sample design weight, can be adjusted for factors like survey non-response (e.g. the tendency of certain groups not to respond to the questionnaire) and noncoverage (e.g. the sampling frame at the time the sample is drawn may not always be complete). Below PROC SURVEYFREQ is used to calculate the prevalence of smoking during the 3 rd trimester of pregnancy in MIHA, weighted to represent the proportion of women with a recent live birth in California who smoked at the end of their pregnancy. 2

3 proc surveyfreq data = miha rate = samprate nomcar; ESTIMATING VARIANCE Estimating variance is slightly more complicated than estimating prevalence in SAS you must consider sampling rates and whether to use a finite population correction (FPC), strata and cluster information, domain analysis, and missing data. If not otherwise specified, SAS uses the Taylor linearization method to estimate variance. In MIHA and YRBSS, the Taylor linearization method is appropriate and will be used in the rest of the examples. However, briefly, SAS also offers two re-sampling methods for estimating variance balanced repeated replication (BRR) and the jackknife method. These methods can be requested using the VARMETHOD= option in the PROC SURVEYFREQ statement and using the REPWEIGHTS statement. If replicate weights are provided the STRATA and CLUSTER statements are not necessary. Variance estimates in SAS are calculated based on the first stage of the sampling process. In a multi-stage cluster sample, such as YRBS, schools might be selected first. In this case, all of the information provided about strata, clusters, and sampling rates should be at the level of the school. FINITE POPULATION CORRECTION By default, the Taylor linearization method assumes the sampling fraction is small, or the first-stage sample is drawn with replacement such that the sampling fraction is negligible (e.g. the population is infinite). Sometimes, particularly in stratified designs where the sample is drawn without replacement, the sampling fraction is not small. In strata where the sampling fraction (n s / N s ) is large, the sample contains more information about the target population, reducing the variance. The finite population correction (FPC) accounts for the extra efficiency gained in these instances. As the sampling rate becomes large (e.g. approximates one) the FPC will have a larger impact on the reduction in standard errors. The correction is made in SAS using the RATE= option in the PROC SURVEYFREQ statement. In the example below, SAMPRATE is a data set that contains the sampling fraction in each stratum. The stratum number should be located in a variable with the same name of the variable specified in the STRATA statement. Data on the sampling fraction should be located in a variable called _RATE_. Note you could accomplish the same thing using the TOTAL= option, naming a data set that contains the population totals by strata (N s ) in a variable called _TOTAL_. proc surveyfreq data = miha rate = samprate nomcar; Note that SAS does not include the FPC in the variance calculation if replicate weights are used. THE STRATA STATEMENT The STRATA statement should be used where the sample design is stratified at the first stage of sampling. The STRATA variable represents non-overlapping subgroups that were sampled independently. proc surveyfreq data = miha rate = samprate nomcar; 3

4 THE CLUSTER STATEMENT The CLUSTER statement is used to identify variables that contain information on the first-stage clusters, or primary sampling units (PSUs) in a cluster-sample design, such as the YRBSS. proc surveyfreq data = yrbss rate = samprate nomcar; cluster psu; weight weight; tables smoke; DOMAINS AND SUPOPULATIONS Domain analysis refers to the computation of statistics for subpopulations (e.g. stratifying your analysis on race). Other SAS procedures (e.g. PROC SURVEYMEANS) and other software (e.g. SUDAAN, STATA) have specific SUBPOP or DOMAIN statements. However, in PROC SURVEYFREQ, to request a domain analysis you must put the stratification variable in the TABLES statement. proc surveyfreq data = miha rate = samprate nomcar; by race; If you stratify your analysis using a BY statement, like in the example above, or subset your analysis using a WHERE statement, the standard errors produced will be different than the results produced had you included the variable in the TABLES statement. This has to do with the way SAS processes data. When SAS sees a BY statement it treats each analysis as a completely separate analysis, virtually on a separate data set. When SAS sees a WHERE statement the data are subset to exclude all observations that do not meet the condition before the analysis begins. However, standard errors for stratified survey designs are calculated using the value for the total number of individuals in each stratum in the entire sample, N s, and the same thing is true of cluster designs. Using BY and WHERE statements then exclude individuals who should be contributing to these total counts. Using a data set in PROC SURVEYFREQ that has been subset in a DATA step through an IF or WHERE statement would also be inappropriate. proc surveyfreq data = miha rate = samprate nomcar; tables race*smktri3; ACCOUNTING FOR MISSING DATA IN SAS 9.2 If a variable in the TABLES statement contains missing values, SAS assumes these values are missing completely at random by default and excludes these observations from the dataset prior to performing the analysis. This is equivalent to excluding observations with missing values using a WHERE statement, as discussed above, and may result in different estimates of standard error. To request that SAS include observations with missing values in the variance estimation calculations, you must specify the NOMCAR option in the PROC SURVEYFREQ statement available starting in SAS 9.2. Note that SUDAAN does not exclude missing values from variance estimation (Chen and Gorrell, 2004). Also note that this is different from the MISSING option in the PROC SURVEYFREQ statement, which includes missing values both in the table and in the variance estimation. In contrast, NOMCAR includes missing values in the variance estimation, but not in the table, so that you can obtain percentages and corresponding standard errors for 4

5 non-missing values. proc surveyfreq data = miha rate = samprate nomcar; ACCOUNTING FOR MISSING DATA PRIOR TO SAS 9.2 If you have not upgraded to version 9.2, yet want to account for missing data when estimating the variance around percentages of non-missing values, there is a workaround (Chen and Gorrell, 2004). Unlike PROC SURVEYFREQ, PROC SURVEYMEANS has a DOMAIN statement, which can be used to simulate an analysis where missing values are not assumed to be missing at random. The first step is recoding your data to dummy variables for the variable(s) of interest. In the example below, two dummy variables are created. Assume the original variable for smoking during the 3 rd trimester of pregnancy in MIHA, SMKTRI3, has two categories, coded 1 for Yes and 2 for No. In the first dummy variable, values of 1 represent smokers and values of 0 represent everyone else in the sample. In the second dummy variable, values of 1 represent responses with missing data on smoking. data surveymeans; set miha; smktri3_yes = (smktri3 = 1); smktri3_missing = (smktri3 =.); The mean of the dummy variable for smoking with the codes 1 and 0 is the percentage of smokers. By including the dummy variable for missing values in the DOMAIN statement of PROC SURVEYMEANS, the percentage of smokers is calculated among the non-missing values of smoking status. Even though the mean among missing values cannot be computed, SAS also attempts this calculation. Because the missing values are included as a category using the DOMAIN statement, they are not excluded from the variance estimation. proc surveymeans data = surveymeans rate = samprate missing; var smktri3_yes; domain smktri3_missing; CONCLUSION Surveys are common and useful they allow us to study health behaviors and other phenomenon with greater precision and they make data collection more efficient, saving time and money. Because surveys employ more complex sampling designs than a SRS, accounting for their methods in PROC SURVEYFREQ is more complicated than using PROC FREQ. Considerations when estimating prevalence and variance using PROC SURVEYFREQ have been outlined in this paper in order to help others avoid FREQuent mistakes in survey analysis. REFERENCES California Department of Public Health. Maternal and Infant Health Assessment (MIHA) Survey. Available at: Accessed July 14, Centers for Disease Control and Prevention. Methodology of the Youth Risk Behavior Surveillance System. MMWR 2004;53(No. RR-12):[inclusive page numbers]. Chen X, Gorrell P. Variance Estimation With Complex Surveys: Some SAS-SUDAAN Comparisons. Proceedings of the 17 th Annual Northeast SAS Users Group Conference, Lohr SL. Sampling: Design and Analysis. Pacific Grove, California: Duxbury Press Publishing Company; SAS Institute Inc. SAS/STAT 9.2 User s Guide: The SURVEYFREQ Procedure. Available at: Accessed July 14,

6 Survey Research Group. The California Youth Risk Behavior Surveillance System. Available at: Accessed July 14, CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Kathryn Martin Maternal, Child and Adolescent Health Program California Department of Public Health 1615 Capitol Avenue, MS 8304, PO Box Sacramento, CA SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152

More information

Survey Analysis: Options for Missing Data

Survey Analysis: Options for Missing Data Survey Analysis: Options for Missing Data Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Abstract A common situation researchers working with survey data face is the analysis of missing

More information

Survey Data Analysis in Stata

Survey Data Analysis in Stata Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of

More information

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION A sample survey is a process for collecting

More information

Software for Analysis of YRBS Data

Software for Analysis of YRBS Data Youth Risk Behavior Surveillance System (YRBSS) Software for Analysis of YRBS Data June 2014 Where can I get more information? Visit www.cdc.gov/yrbss or call 800 CDC INFO (800 232 4636). CONTENTS Overview

More information

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

More information

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data CONTENTS Overview 1 Background 1 1. SUDAAN 2 1.1. Analysis capabilities 2 1.2. Data requirements 2 1.3. Variance estimation 2 1.4. Survey

More information

Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction

Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction June 9, 2015 Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction The data collected in the NHIS are obtained through a complex, multistage

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

Survey Data Analysis in Stata

Survey Data Analysis in Stata Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP 2009 Canadian Stata Users Group Meeting Outline 1 Types of data 2 2 Survey data characteristics 4 2.1 Single

More information

Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America.

Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America. Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America Abstract Complex sample survey designs deviate from simple random sampling,

More information

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Julian Luke Stephen Blumberg Centers for Disease Control and Prevention National Center for Health Statistics

More information

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction.

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction. Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II Introduction Census or Sample Sampling Frame Probability or non-probability sample Sampling with or without replacement

More information

Complex Survey Design Using Stata

Complex Survey Design Using Stata Complex Survey Design Using Stata 2010 This document provides a basic overview of how to handle complex survey design using Stata. Probability weighting and compensating for clustered and stratified samples

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Accounting for complex survey design in modeling usual intake. Kevin W. Dodd, PhD National Cancer Institute

Accounting for complex survey design in modeling usual intake. Kevin W. Dodd, PhD National Cancer Institute Accounting for complex survey design in modeling usual intake Kevin W. Dodd, PhD National Cancer Institute Slide 1 Hello and welcome to today s webinar, the fourth in the Measurement Error Webinar Series.

More information

Sampling Error Estimation in Design-Based Analysis of the PSID Data

Sampling Error Estimation in Design-Based Analysis of the PSID Data Technical Series Paper #11-05 Sampling Error Estimation in Design-Based Analysis of the PSID Data Steven G. Heeringa, Patricia A. Berglund, Azam Khan Survey Research Center, Institute for Social Research

More information

Why Sample? Why not study everyone? Debate about Census vs. sampling

Why Sample? Why not study everyone? Debate about Census vs. sampling Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling

More information

Comparison of Variance Estimates in a National Health Survey

Comparison of Variance Estimates in a National Health Survey Comparison of Variance Estimates in a National Health Survey Karen E. Davis 1 and Van L. Parsons 2 1 Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850 2 National Center

More information

Paper PO06. Randomization in Clinical Trial Studies

Paper PO06. Randomization in Clinical Trial Studies Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection

More information

Descriptive Methods Ch. 6 and 7

Descriptive Methods Ch. 6 and 7 Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational

More information

6 Regression With Survey Data From Complex Samples

6 Regression With Survey Data From Complex Samples 6 Regression With Survey Data From Complex Samples Secondary analysis of data from large national surveys figures prominently in social science and public health research, and these surveys use complex

More information

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Patricia A. Berglund, Institute for Social Research - University of Michigan Wisconsin and Illinois SAS User s Group June 25, 2014 1 Overview

More information

Texas Diabetes Fact Sheet

Texas Diabetes Fact Sheet I. Adult Prediabetes Prevalence, 2009 According to the 2009 Behavioral Risk Factor Surveillance System (BRFSS) survey, 984,142 persons aged eighteen years and older in Texas (5.4% of this age group) have

More information

Multiple logistic regression analysis of cigarette use among high school students

Multiple logistic regression analysis of cigarette use among high school students Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict

More information

National Longitudinal Study of Adolescent Health. Strategies to Perform a Design-Based Analysis Using the Add Health Data

National Longitudinal Study of Adolescent Health. Strategies to Perform a Design-Based Analysis Using the Add Health Data National Longitudinal Study of Adolescent Health Strategies to Perform a Design-Based Analysis Using the Add Health Data Kim Chantala Joyce Tabor Carolina Population Center University of North Carolina

More information

Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers

Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers Journal of Traumatic Stress, Vol. 2, No., October 2008, pp. 440 447 ( C 2008) Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers

More information

GUIDANCE ON HOW TO ANALYZE DATA FROM A SCHOOL-BASED ORAL HEALTH SURVEY JULY 2013

GUIDANCE ON HOW TO ANALYZE DATA FROM A SCHOOL-BASED ORAL HEALTH SURVEY JULY 2013 GUIDANCE ON HOW TO ANALYZE DATA FROM A SCHOOL-BASED ORAL HEALTH SURVEY JULY 2013 Due to the technical nature of this topic, this information will be most helpful to data analysts, epidemiologists and statisticians.

More information

Comparing Alternate Designs For A Multi-Domain Cluster Sample

Comparing Alternate Designs For A Multi-Domain Cluster Sample Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

Chapter 19 Statistical analysis of survey data. Abstract

Chapter 19 Statistical analysis of survey data. Abstract Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract

More information

An Introduction to Secondary Data Analysis

An Introduction to Secondary Data Analysis research methodology series An Introduction to Secondary Data Analysis Natalie Koziol, MA CYFS Statistics and Measurement Consultant Ann Arthur, MS CYFS Statistics and Measurement Consultant Outline Overview

More information

Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007

Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007 Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007 Bobbie Latham March 8, 2007 Introduction In any research conducted, people, places, and things are studied. The opportunity to

More information

Sample design for educational survey research

Sample design for educational survey research Quantitative research methods in educational planning Series editor: Kenneth N.Ross Module Kenneth N. Ross 3 Sample design for educational survey research UNESCO International Institute for Educational

More information

Robert L. Santos, Temple University

Robert L. Santos, Temple University n ONE APPROACH TO OVERSAMPLING BLACKS AND HISPANICS: THE NATIONAL ALCOHOL SURVEY Robert L. Santos, Temple University I. Introduction The 1984 National Alcohol Survey (NAS) is the first national household

More information

Confidence Intervals in Public Health

Confidence Intervals in Public Health Confidence Intervals in Public Health When public health practitioners use health statistics, sometimes they are interested in the actual number of health events, but more often they use the statistics

More information

Facts about Diabetes in Massachusetts

Facts about Diabetes in Massachusetts Facts about Diabetes in Massachusetts Diabetes is a disease in which the body does not produce or properly use insulin (a hormone used to convert sugar, starches, and other food into the energy needed

More information

Guidelines for Analyzing Add Health Data. Ping Chen Kim Chantala

Guidelines for Analyzing Add Health Data. Ping Chen Kim Chantala Guidelines for Analyzing Add Health Data Ping Chen Kim Chantala Carolina Population Center University of North Carolina at Chapel Hill Last Update: March 2014 Table of Contents Overview... 3 Understanding

More information

2009 Mississippi Youth Tobacco Survey. Office of Health Data and Research Office of Tobacco Control Mississippi State Department of Health

2009 Mississippi Youth Tobacco Survey. Office of Health Data and Research Office of Tobacco Control Mississippi State Department of Health 9 Mississippi Youth Tobacco Survey Office of Health Data and Research Office of Tobacco Control Mississippi State Department of Health Acknowledgements... 1 Glossary... 2 Introduction... 3 Sample Design

More information

Wendy Martinez, MPH, CPH County of San Diego, Maternal, Child & Adolescent Health

Wendy Martinez, MPH, CPH County of San Diego, Maternal, Child & Adolescent Health Wendy Martinez, MPH, CPH County of San Diego, Maternal, Child & Adolescent Health Describe local trends in birth Identify 3 perinatal health problems Identify 3 leading causes of infant death Age Class

More information

Data Collection and Sampling OPRE 6301

Data Collection and Sampling OPRE 6301 Data Collection and Sampling OPRE 6301 Recall... Statistics is a tool for converting data into information: Statistics Data Information But where then does data come from? How is it gathered? How do we

More information

IPUMS User Note: Issues Concerning the Calculation of Standard Errors (i.e., variance estimation)using IPUMS Data Products

IPUMS User Note: Issues Concerning the Calculation of Standard Errors (i.e., variance estimation)using IPUMS Data Products IPUMS User Note: Issues Concerning the Calculation of Standard Errors (i.e., variance estimation)using IPUMS Data Products by Michael Davern and Jeremy Strief Producing accurate standard errors is essential

More information

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Instructions for Analyzing Data from CAHPS Surveys:

Instructions for Analyzing Data from CAHPS Surveys: Instructions for Analyzing Data from CAHPS Surveys: Using the CAHPS Analysis Program Version 3.6 The CAHPS Analysis Program...1 Computing Requirements...1 Pre-Analysis Decisions...2 What Does the CAHPS

More information

SECTION 3.2: MOTOR VEHICLE TRAFFIC CRASHES

SECTION 3.2: MOTOR VEHICLE TRAFFIC CRASHES SECTION 3.2: MOTOR VEHICLE TRAFFIC CRASHES 1,155 Deaths* 4,755 Hospitalizations 103,860 ED Visits *SOURCE: OHIO DEPARTMENT OF HEALTH, VITAL STATISTICS SOURCE: OHIO HOSPITAL ASSOCIATION CHAPTER HIGHLIGHTS:

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

Power Calculation Using the Online Variance Almanac (Web VA): A User s Guide

Power Calculation Using the Online Variance Almanac (Web VA): A User s Guide Power Calculation Using the Online Variance Almanac (Web VA): A User s Guide Larry V. Hedges & E.C. Hedberg This research was supported by the National Science Foundation under Award Nos. 0129365 and 0815295.

More information

SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY

SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY Karen E. Davis National Center for Health Statistics, 6525 Belcrest Road, Room 915, Hyattsville, MD 20782 KEY WORDS: Sample survey, cost model

More information

Selecting Research Participants

Selecting Research Participants C H A P T E R 6 Selecting Research Participants OBJECTIVES After studying this chapter, students should be able to Define the term sampling frame Describe the difference between random sampling and random

More information

2013 State of Colorado Distracted Driver Study

2013 State of Colorado Distracted Driver Study 2013 State of Colorado Distracted Driver Study Colorado Department of Transportation SEAT BE L STUDY T INSTITUTE OF TRANSPORTATION MANAGEMENT EXECUTIVE SUMMARY The Institute of Transportation Management

More information

Vitamin D Status: United States, 2001 2006

Vitamin D Status: United States, 2001 2006 Vitamin D Status: United States, 2001 2006 Anne C. Looker, Ph.D.; Clifford L. Johnson, M.P.H.; David A. Lacher, M.D.; Christine M. Pfeiffer, Ph.D.; Rosemary L. Schleicher, Ph.D.; and Christopher T. Sempos,

More information

National Survey of Franchisees 2015

National Survey of Franchisees 2015 National Survey of Franchisees 2015 An Analysis of National Survey Results FRANCHISEGRADE COM Vital Insight for your investment. FranchiseGrade.com, National Survey of Franchisees, 2015 Contents Introduction...

More information

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,

More information

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study. Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National

More information

Data The estimates presented in the tables originate from the 2013 SCS to the NCVS. The SCS collects information about student and school

Data The estimates presented in the tables originate from the 2013 SCS to the NCVS. The SCS collects information about student and school This document reports data from the 2013 School Crime Supplement (SCS) of the National Crime Victimization Survey (NCVS). 1 The Web Tables show the extent to which with different personal characteristics

More information

Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina

Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina Paper PO-21 Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina ABSTRACT Permuted-block randomization with varying block sizes using

More information

Obesity and Socioeconomic Status in Children and Adolescents: United States, 2005 2008

Obesity and Socioeconomic Status in Children and Adolescents: United States, 2005 2008 Obesity and Socioeconomic Status in Children and Adolescents: United States, 2005 2008 Cynthia L. Ogden, Ph.D.; Molly M. Lamb, Ph.D.; Margaret D. Carroll, M.S.P.H.; and Katherine M. Flegal, Ph.D. Key findings

More information

National Endowment for the Arts. A Technical Research Manual

National Endowment for the Arts. A Technical Research Manual 2012 SPPA PUBLIC-USE DATA FILE USER S GUIDE A Technical Research Manual Prepared by Timothy Triplett Statistical Methods Group Urban Institute September 2013 Table of Contents Introduction... 3 Section

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

More information

Optimization of sampling strata with the SamplingStrata package

Optimization of sampling strata with the SamplingStrata package Optimization of sampling strata with the SamplingStrata package Package version 1.1 Giulio Barcaroli January 12, 2016 Abstract In stratified random sampling the problem of determining the optimal size

More information

Graduate Student Epidemiology Program

Graduate Student Epidemiology Program Graduate Student Epidemiology Program To promote training in MCH Epidemiology Real-World Experience in: Data Analysis and Monitoring Needs Assessment Program Evaluation 2015 Program Guide Submit your application

More information

DATA IN A RESEARCH. Tran Thi Ut, FEC/HSU October 10, 2013

DATA IN A RESEARCH. Tran Thi Ut, FEC/HSU October 10, 2013 DATA IN A RESEARCH Tran Thi Ut, FEC/HSU October 10, 2013 IMPORTANCE OF DATA In research, it needs data for the stages: Research design Sampling design Data gathering and /or field work techniques Data

More information

A COMPREHENSIVE SOFTWARE PACKAGE FOR SURVEY DATA ANALYSIS

A COMPREHENSIVE SOFTWARE PACKAGE FOR SURVEY DATA ANALYSIS SUDAAN: A COMPREHENSIVE SOFTWARE PACKAGE FOR SURVEY DATA ANALYSIS Lisa M. LaVange, Babubhai V. Shah, Beth G. Barnwell and Joyce F. Killinger Lisa M. LaVan~e. Research Triangle Institute KEYWORDS: variance

More information

Sampling strategies *

Sampling strategies * UNITED NATIONS SECRETARIAT ESA/STAT/AC.93/2 Statistics Division 03 November 2003 Expert Group Meeting to Review the Draft Handbook on Designing of Household Sample Surveys 3-5 December 2003 English only

More information

Multilevel Modeling of Complex Survey Data

Multilevel Modeling of Complex Survey Data Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics

More information

Approaches for Analyzing Survey Data: a Discussion

Approaches for Analyzing Survey Data: a Discussion Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata

More information

Simple Random Sampling

Simple Random Sampling Source: Frerichs, R.R. Rapid Surveys (unpublished), 2008. NOT FOR COMMERCIAL DISTRIBUTION 3 Simple Random Sampling 3.1 INTRODUCTION Everyone mentions simple random sampling, but few use this method for

More information

The Research Data Centres Information and Technical Bulletin

The Research Data Centres Information and Technical Bulletin Catalogue no. 12-002 X No. 2014001 ISSN 1710-2197 The Research Data Centres Information and Technical Bulletin Winter 2014, vol. 6 no. 1 How to obtain more information For information about this product

More information

Analysis of complex survey samples.

Analysis of complex survey samples. Analysis of complex survey samples. Thomas Lumley Department of Biostatistics University of Washington April 15, 2004 Abstract I present software for analysing complex survey samples in R. The sampling

More information

Ana M. Viamonte Ros, M.D., M.P.H. State Surgeon General

Ana M. Viamonte Ros, M.D., M.P.H. State Surgeon General Florida Department of Health Division of Disease Control Bureau of Epidemiology Chronic Disease Epidemiology Section Charlie Crist Governor Ana M. Viamonte Ros, M.D., M.P.H. State Surgeon General Florida

More information

Chapter 8: Quantitative Sampling

Chapter 8: Quantitative Sampling Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or

More information

Diabetes. African Americans were disproportionately impacted by diabetes. Table 1 Diabetes deaths by race/ethnicity CHRONIC DISEASES

Diabetes. African Americans were disproportionately impacted by diabetes. Table 1 Diabetes deaths by race/ethnicity CHRONIC DISEASES Diabetes African Americans were disproportionately impacted by diabetes. African Americans were most likely to die of diabetes. People living in San Pablo, Pittsburg, Antioch and Richmond were more likely

More information

Chapter 2: Research Methodology

Chapter 2: Research Methodology Chapter 2: Research Methodology 1. Type of Research 2. Sources of Data 3. Instruments for Data Collection 4. Research Methods 5. Sampling 6. Limitations of the Study 6 Chapter 2: Research Methodology Research

More information

Determines if the data you collect is practical for analysis. Reviews the appropriateness of your data collection methods.

Determines if the data you collect is practical for analysis. Reviews the appropriateness of your data collection methods. Performing a Community Assessment 37 STEP 5: DETERMINE HOW TO UNDERSTAND THE INFORMATION (ANALYZE DATA) Now that you have collected data, what does it mean? Making sense of this information is arguably

More information

Statistical Methods for Sample Surveys (140.640)

Statistical Methods for Sample Surveys (140.640) This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Annex 6 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION. (Version 01.

Annex 6 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION. (Version 01. Page 1 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION (Version 01.1) I. Introduction 1. The clean development mechanism (CDM) Executive

More information

Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada

Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada A Comparison of the Results from the Old and New Private Sector Sample Designs for the Medical Expenditure Panel Survey-Insurance Component John P. Sommers 1 Anne T. Kearney 2 1 Agency for Healthcare Research

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

2014-16. Orange County Health Improvement Plan. 2014 Annual Report. www.ochealthiertogether.org

2014-16. Orange County Health Improvement Plan. 2014 Annual Report. www.ochealthiertogether.org 2014-16 Orange County Health Improvement Plan 2014 Annual Report www.ochealthiertogether.org The Orange County Health Improvement Plan (OCHIP) was published in May 2014 for the time period January 2014-December

More information

An Assessment of the Effect of Misreporting of Phone Line Information on Key Weighted RDD Estimates

An Assessment of the Effect of Misreporting of Phone Line Information on Key Weighted RDD Estimates An Assessment of the Effect of Misreporting of Phone Line Information on Key Weighted RDD Estimates Ashley Bowers 1, Jeffrey Gonzalez 2 1 University of Michigan 2 Bureau of Labor Statistics Abstract Random-digit

More information