The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?
|
|
|
- Prosper Walsh
- 10 years ago
- Views:
Transcription
1 The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health, ABSTRACT With recent releases, SAS has become increasingly capable of analyzing survey data descriptive statistics, crosstabulations, and regression models can be computed using procedures that account for complex sampling schemes and survey designs. The SURVEYFREQ procedure, for instance, is a common method for computing populationbased prevalence estimates of health indicators using data from large national and statewide surveys. Although this procedure may seem familiar to users of PROC FREQ, there are important differences between the analysis of survey data and simple random samples. This paper will provide an overview of PROC SURVEYFREQ and discuss syntax important for producing weighted estimates of prevalence and standard errors. INTRODUCTION Surveys are widely used across the United States and in public health. Survey methods, such as stratification, clustering, and oversampling, allow researchers to gather data from a segment of the population (e.g. a sample) and make generalizations to a larger population (e.g. target population) with efficiency and precision. At the same time, these aspects of a survey s design affect the computation of prevalence and variance estimates and, therefore, must be accounted for in the analysis of data. With the release of version 9.2, SAS is more capable than ever before of analyzing survey data using procedures like PROC SURVEYFREQ. Although survey syntax may seem familiar to users of PROC FREQ, there are important differences between the analysis of survey data and simple random samples (SRS). The purpose of this paper is to provide an overview of PROC SURVEYFREQ, paying specific attention to syntax that is important for producing weighted estimates of prevalence and appropriate standard errors. SURVEY DESIGN 101 Stratification, clustering, and oversampling are common characteristics of surveys and understanding why these methods are used is helpful for understanding how to account for them in the analysis. WHY STRATIFY? A stratified random sample refers to a sample that is drawn from a population that is divided into subgroups. Prior information on the stratification variable(s) is available beforehand and used to create a sampling frame, dividing the population into segments. A random sample is then drawn from each subgroup independently from the other groups. Stratified samples are used for several reasons (Lohr, 1999): Protect from the possibility of obtaining a poor sample. African Americans, for instance, comprise about 5% of mothers giving birth each year in California. A SRS of 1,000 may very well yield no African American mothers in the sample or too few to obtain meaningful estimates of prevalence by race. In comparison, if a stratified sample is taken and African American mothers are sampled proportionate to their distribution in the population, about 50 African American mothers would be included in the sample. Obtain a known precision for subgroups. If only 50 African American mothers were sampled in the scenario above, the precision of prevalence estimates for African Americans would not be as precise as they are for other groups sampled. Therefore, instead of sampling African American women proportionate to their distribution in the population, they could be oversampled so that the level of precision for estimates among African Americans is comparable to that among White and Hispanic mothers, who comprise a greater percentage of births. Convenience and cost. Different sampling approaches may be used in different strata, which may increase the feasibility of the survey and decrease cost. Obtain more precise estimates for the entire population. People from the same group tend to have similar responses or characteristics, and therefore, the variance within strata may be smaller than the variance in the population as a whole. California s Maternal and Infant Health Assessment (MIHA) is an example of a stratified sample. Conducted by the Maternal, Child and Adolescent Health Program at the California Department of Public Health in collaboration with researchers from the University of California, San Francisco, and modeled after the Center for Disease Control and 1
2 Prevention s (CDC) Pregnancy Risk Assessment Monitoring System (PRAMS), MIHA is an annual population-based survey of mothers with a recent live birth who are sampled from birth certificates. Designed to produce a representative sample of live births in the State, the sample is stratified by African American race, high school graduation, and region of California, all of which are available on the birth certificate, with oversampling of African Americans (California Department of Public Health, 2009). MIHA will be used as an example when demonstrating SAS survey procedures below. WHY CLUSTER? A stratified sample can only be taken if information about the population and sampling units is readily available, as is the case with birth certificates all births occurring in California and in the United States are registered with vital statistics and birth certificate data on maternal and infant characteristics are available to help construct a sampling frame. Often this type of detailed information is not available, making the construction of a sampling frame time consuming, costly, or not feasible. Dividing the population into clusters can address these issues, although there are some trade-offs. Whereas stratifying a sample can reduce the standard errors if the population in each stratum is homogenous, clustering usually increases the variance if the members of a cluster are alike. Nevertheless, cluster samples are often used because a (Lohr, 1999): Sampling frame is difficult or impossible to construct for the entire population. Suppose a researcher wanted to sample high school students in California. It would be difficult to obtain a list of students from every school to compile a sampling frame, but it would be possible to obtain a list and sample a proportion of schools, and then proceed with sampling all students (e.g. one-stage cluster sampling) or a proportion of students (e.g. multi-stage cluster sampling) from each school that agrees to participate. Population is widely distributed geographically or clustered. It would be much cheaper to select schools and interview students within these schools than to select students using a SRS. A SRS would result in a sample that was geographically distributed and that contained only a small number of students per school. This would require more resources for travel and interviewers, and increase the cost of the study. The Youth Risk Behavior Surveillance System (YRBSS) is an example of a survey that uses cluster sampling. The survey was first conducted in California in the Spring of 2009 by the California Department of Public Health, the California Department of Education and the Public Health Institute in cooperation with the Centers for Disease Control and Prevention (Survey Research Group, 2009). Designed to produce a representative sample of students in 9 th through 12 th grade, state-based YRBS employs a two-stage cluster sample where a stratified sample of schools is taken first, proportionate to enrollment size, and classes are randomly selected second, from which all students are eligible to participate. National YRBS employs a more complex, three-stage cluster sample where a stratified sample of counties is selected first, according to enrollment size and other factors, such as urban or rural location. Schools and classrooms are selected next (Centers for Disease Control and Prevention, 2004). YRBS will also be used as an example below. SPECIFYING YOUR SURVEY S DESIGN IN SAS The rules used to select units from a population constitute a survey s sample design and must be accounted for in SAS. Sample design information is provided in the WEIGHT, STRATA, and CLUSTER statements, and in the RATE= option in the PROC SURVEYFREQ statement. These portions of syntax play an important role in either the estimation of prevalence or variance (SAS Institute Inc., 2009). ESTIMATING PREVALENCE: THE WEIGHT STATEMENT Weights adjust for different components of a survey s design, and generally it is important to know about these aspects of your survey s methods even if the weights may already be calculated for you. In stratified samples, like MIHA, one component of the weight is the inverse of the sampling fraction in each stratum (e.g. the ratio of the number of people sampled to the number of people in the target population or sampling frame, n s / N s ). This weight, often called the sample design weight, can be adjusted for factors like survey non-response (e.g. the tendency of certain groups not to respond to the questionnaire) and noncoverage (e.g. the sampling frame at the time the sample is drawn may not always be complete). Below PROC SURVEYFREQ is used to calculate the prevalence of smoking during the 3 rd trimester of pregnancy in MIHA, weighted to represent the proportion of women with a recent live birth in California who smoked at the end of their pregnancy. 2
3 proc surveyfreq data = miha rate = samprate nomcar; ESTIMATING VARIANCE Estimating variance is slightly more complicated than estimating prevalence in SAS you must consider sampling rates and whether to use a finite population correction (FPC), strata and cluster information, domain analysis, and missing data. If not otherwise specified, SAS uses the Taylor linearization method to estimate variance. In MIHA and YRBSS, the Taylor linearization method is appropriate and will be used in the rest of the examples. However, briefly, SAS also offers two re-sampling methods for estimating variance balanced repeated replication (BRR) and the jackknife method. These methods can be requested using the VARMETHOD= option in the PROC SURVEYFREQ statement and using the REPWEIGHTS statement. If replicate weights are provided the STRATA and CLUSTER statements are not necessary. Variance estimates in SAS are calculated based on the first stage of the sampling process. In a multi-stage cluster sample, such as YRBS, schools might be selected first. In this case, all of the information provided about strata, clusters, and sampling rates should be at the level of the school. FINITE POPULATION CORRECTION By default, the Taylor linearization method assumes the sampling fraction is small, or the first-stage sample is drawn with replacement such that the sampling fraction is negligible (e.g. the population is infinite). Sometimes, particularly in stratified designs where the sample is drawn without replacement, the sampling fraction is not small. In strata where the sampling fraction (n s / N s ) is large, the sample contains more information about the target population, reducing the variance. The finite population correction (FPC) accounts for the extra efficiency gained in these instances. As the sampling rate becomes large (e.g. approximates one) the FPC will have a larger impact on the reduction in standard errors. The correction is made in SAS using the RATE= option in the PROC SURVEYFREQ statement. In the example below, SAMPRATE is a data set that contains the sampling fraction in each stratum. The stratum number should be located in a variable with the same name of the variable specified in the STRATA statement. Data on the sampling fraction should be located in a variable called _RATE_. Note you could accomplish the same thing using the TOTAL= option, naming a data set that contains the population totals by strata (N s ) in a variable called _TOTAL_. proc surveyfreq data = miha rate = samprate nomcar; Note that SAS does not include the FPC in the variance calculation if replicate weights are used. THE STRATA STATEMENT The STRATA statement should be used where the sample design is stratified at the first stage of sampling. The STRATA variable represents non-overlapping subgroups that were sampled independently. proc surveyfreq data = miha rate = samprate nomcar; 3
4 THE CLUSTER STATEMENT The CLUSTER statement is used to identify variables that contain information on the first-stage clusters, or primary sampling units (PSUs) in a cluster-sample design, such as the YRBSS. proc surveyfreq data = yrbss rate = samprate nomcar; cluster psu; weight weight; tables smoke; DOMAINS AND SUPOPULATIONS Domain analysis refers to the computation of statistics for subpopulations (e.g. stratifying your analysis on race). Other SAS procedures (e.g. PROC SURVEYMEANS) and other software (e.g. SUDAAN, STATA) have specific SUBPOP or DOMAIN statements. However, in PROC SURVEYFREQ, to request a domain analysis you must put the stratification variable in the TABLES statement. proc surveyfreq data = miha rate = samprate nomcar; by race; If you stratify your analysis using a BY statement, like in the example above, or subset your analysis using a WHERE statement, the standard errors produced will be different than the results produced had you included the variable in the TABLES statement. This has to do with the way SAS processes data. When SAS sees a BY statement it treats each analysis as a completely separate analysis, virtually on a separate data set. When SAS sees a WHERE statement the data are subset to exclude all observations that do not meet the condition before the analysis begins. However, standard errors for stratified survey designs are calculated using the value for the total number of individuals in each stratum in the entire sample, N s, and the same thing is true of cluster designs. Using BY and WHERE statements then exclude individuals who should be contributing to these total counts. Using a data set in PROC SURVEYFREQ that has been subset in a DATA step through an IF or WHERE statement would also be inappropriate. proc surveyfreq data = miha rate = samprate nomcar; tables race*smktri3; ACCOUNTING FOR MISSING DATA IN SAS 9.2 If a variable in the TABLES statement contains missing values, SAS assumes these values are missing completely at random by default and excludes these observations from the dataset prior to performing the analysis. This is equivalent to excluding observations with missing values using a WHERE statement, as discussed above, and may result in different estimates of standard error. To request that SAS include observations with missing values in the variance estimation calculations, you must specify the NOMCAR option in the PROC SURVEYFREQ statement available starting in SAS 9.2. Note that SUDAAN does not exclude missing values from variance estimation (Chen and Gorrell, 2004). Also note that this is different from the MISSING option in the PROC SURVEYFREQ statement, which includes missing values both in the table and in the variance estimation. In contrast, NOMCAR includes missing values in the variance estimation, but not in the table, so that you can obtain percentages and corresponding standard errors for 4
5 non-missing values. proc surveyfreq data = miha rate = samprate nomcar; ACCOUNTING FOR MISSING DATA PRIOR TO SAS 9.2 If you have not upgraded to version 9.2, yet want to account for missing data when estimating the variance around percentages of non-missing values, there is a workaround (Chen and Gorrell, 2004). Unlike PROC SURVEYFREQ, PROC SURVEYMEANS has a DOMAIN statement, which can be used to simulate an analysis where missing values are not assumed to be missing at random. The first step is recoding your data to dummy variables for the variable(s) of interest. In the example below, two dummy variables are created. Assume the original variable for smoking during the 3 rd trimester of pregnancy in MIHA, SMKTRI3, has two categories, coded 1 for Yes and 2 for No. In the first dummy variable, values of 1 represent smokers and values of 0 represent everyone else in the sample. In the second dummy variable, values of 1 represent responses with missing data on smoking. data surveymeans; set miha; smktri3_yes = (smktri3 = 1); smktri3_missing = (smktri3 =.); The mean of the dummy variable for smoking with the codes 1 and 0 is the percentage of smokers. By including the dummy variable for missing values in the DOMAIN statement of PROC SURVEYMEANS, the percentage of smokers is calculated among the non-missing values of smoking status. Even though the mean among missing values cannot be computed, SAS also attempts this calculation. Because the missing values are included as a category using the DOMAIN statement, they are not excluded from the variance estimation. proc surveymeans data = surveymeans rate = samprate missing; var smktri3_yes; domain smktri3_missing; CONCLUSION Surveys are common and useful they allow us to study health behaviors and other phenomenon with greater precision and they make data collection more efficient, saving time and money. Because surveys employ more complex sampling designs than a SRS, accounting for their methods in PROC SURVEYFREQ is more complicated than using PROC FREQ. Considerations when estimating prevalence and variance using PROC SURVEYFREQ have been outlined in this paper in order to help others avoid FREQuent mistakes in survey analysis. REFERENCES California Department of Public Health. Maternal and Infant Health Assessment (MIHA) Survey. Available at: Accessed July 14, Centers for Disease Control and Prevention. Methodology of the Youth Risk Behavior Surveillance System. MMWR 2004;53(No. RR-12):[inclusive page numbers]. Chen X, Gorrell P. Variance Estimation With Complex Surveys: Some SAS-SUDAAN Comparisons. Proceedings of the 17 th Annual Northeast SAS Users Group Conference, Lohr SL. Sampling: Design and Analysis. Pacific Grove, California: Duxbury Press Publishing Company; SAS Institute Inc. SAS/STAT 9.2 User s Guide: The SURVEYFREQ Procedure. Available at: Accessed July 14,
6 Survey Research Group. The California Youth Risk Behavior Surveillance System. Available at: Accessed July 14, CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Kathryn Martin Maternal, Child and Adolescent Health Program California Department of Public Health 1615 Capitol Avenue, MS 8304, PO Box Sacramento, CA SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6
Chapter 11 Introduction to Survey Sampling and Analysis Procedures
Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152
Survey Analysis: Options for Missing Data
Survey Analysis: Options for Missing Data Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Abstract A common situation researchers working with survey data face is the analysis of missing
Survey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of
INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES
INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION A sample survey is a process for collecting
Software for Analysis of YRBS Data
Youth Risk Behavior Surveillance System (YRBSS) Software for Analysis of YRBS Data June 2014 Where can I get more information? Visit www.cdc.gov/yrbss or call 800 CDC INFO (800 232 4636). CONTENTS Overview
ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking
Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into
Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data
Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data CONTENTS Overview 1 Background 1 1. SUDAAN 2 1.1. Analysis capabilities 2 1.2. Data requirements 2 1.3. Variance estimation 2 1.4. Survey
Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction
June 9, 2015 Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction The data collected in the NHIS are obtained through a complex, multistage
New SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
Survey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP 2009 Canadian Stata Users Group Meeting Outline 1 Types of data 2 2 Survey data characteristics 4 2.1 Single
Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America.
Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America Abstract Complex sample survey designs deviate from simple random sampling,
Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications
Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Julian Luke Stephen Blumberg Centers for Disease Control and Prevention National Center for Health Statistics
Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction.
Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II Introduction Census or Sample Sampling Frame Probability or non-probability sample Sampling with or without replacement
Complex Survey Design Using Stata
Complex Survey Design Using Stata 2010 This document provides a basic overview of how to handle complex survey design using Stata. Probability weighting and compensating for clustered and stratified samples
Elementary Statistics
Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,
Accounting for complex survey design in modeling usual intake. Kevin W. Dodd, PhD National Cancer Institute
Accounting for complex survey design in modeling usual intake Kevin W. Dodd, PhD National Cancer Institute Slide 1 Hello and welcome to today s webinar, the fourth in the Measurement Error Webinar Series.
Sampling Error Estimation in Design-Based Analysis of the PSID Data
Technical Series Paper #11-05 Sampling Error Estimation in Design-Based Analysis of the PSID Data Steven G. Heeringa, Patricia A. Berglund, Azam Khan Survey Research Center, Institute for Social Research
Why Sample? Why not study everyone? Debate about Census vs. sampling
Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling
Comparison of Variance Estimates in a National Health Survey
Comparison of Variance Estimates in a National Health Survey Karen E. Davis 1 and Van L. Parsons 2 1 Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850 2 National Center
Paper PO06. Randomization in Clinical Trial Studies
Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection
Descriptive Methods Ch. 6 and 7
Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational
6 Regression With Survey Data From Complex Samples
6 Regression With Survey Data From Complex Samples Secondary analysis of data from large national surveys figures prominently in social science and public health research, and these surveys use complex
Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer
Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Patricia A. Berglund, Institute for Social Research - University of Michigan Wisconsin and Illinois SAS User s Group June 25, 2014 1 Overview
Texas Diabetes Fact Sheet
I. Adult Prediabetes Prevalence, 2009 According to the 2009 Behavioral Risk Factor Surveillance System (BRFSS) survey, 984,142 persons aged eighteen years and older in Texas (5.4% of this age group) have
Multiple logistic regression analysis of cigarette use among high school students
Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict
National Longitudinal Study of Adolescent Health. Strategies to Perform a Design-Based Analysis Using the Add Health Data
National Longitudinal Study of Adolescent Health Strategies to Perform a Design-Based Analysis Using the Add Health Data Kim Chantala Joyce Tabor Carolina Population Center University of North Carolina
Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers
Journal of Traumatic Stress, Vol. 2, No., October 2008, pp. 440 447 ( C 2008) Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers
GUIDANCE ON HOW TO ANALYZE DATA FROM A SCHOOL-BASED ORAL HEALTH SURVEY JULY 2013
GUIDANCE ON HOW TO ANALYZE DATA FROM A SCHOOL-BASED ORAL HEALTH SURVEY JULY 2013 Due to the technical nature of this topic, this information will be most helpful to data analysts, epidemiologists and statisticians.
Comparing Alternate Designs For A Multi-Domain Cluster Sample
Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705
WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide
STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO
Chapter 19 Statistical analysis of survey data. Abstract
Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract
An Introduction to Secondary Data Analysis
research methodology series An Introduction to Secondary Data Analysis Natalie Koziol, MA CYFS Statistics and Measurement Consultant Ann Arthur, MS CYFS Statistics and Measurement Consultant Outline Overview
Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007
Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007 Bobbie Latham March 8, 2007 Introduction In any research conducted, people, places, and things are studied. The opportunity to
Sample design for educational survey research
Quantitative research methods in educational planning Series editor: Kenneth N.Ross Module Kenneth N. Ross 3 Sample design for educational survey research UNESCO International Institute for Educational
Robert L. Santos, Temple University
n ONE APPROACH TO OVERSAMPLING BLACKS AND HISPANICS: THE NATIONAL ALCOHOL SURVEY Robert L. Santos, Temple University I. Introduction The 1984 National Alcohol Survey (NAS) is the first national household
Confidence Intervals in Public Health
Confidence Intervals in Public Health When public health practitioners use health statistics, sometimes they are interested in the actual number of health events, but more often they use the statistics
Facts about Diabetes in Massachusetts
Facts about Diabetes in Massachusetts Diabetes is a disease in which the body does not produce or properly use insulin (a hormone used to convert sugar, starches, and other food into the energy needed
Guidelines for Analyzing Add Health Data. Ping Chen Kim Chantala
Guidelines for Analyzing Add Health Data Ping Chen Kim Chantala Carolina Population Center University of North Carolina at Chapel Hill Last Update: March 2014 Table of Contents Overview... 3 Understanding
2009 Mississippi Youth Tobacco Survey. Office of Health Data and Research Office of Tobacco Control Mississippi State Department of Health
9 Mississippi Youth Tobacco Survey Office of Health Data and Research Office of Tobacco Control Mississippi State Department of Health Acknowledgements... 1 Glossary... 2 Introduction... 3 Sample Design
Wendy Martinez, MPH, CPH County of San Diego, Maternal, Child & Adolescent Health
Wendy Martinez, MPH, CPH County of San Diego, Maternal, Child & Adolescent Health Describe local trends in birth Identify 3 perinatal health problems Identify 3 leading causes of infant death Age Class
Data Collection and Sampling OPRE 6301
Data Collection and Sampling OPRE 6301 Recall... Statistics is a tool for converting data into information: Statistics Data Information But where then does data come from? How is it gathered? How do we
IPUMS User Note: Issues Concerning the Calculation of Standard Errors (i.e., variance estimation)using IPUMS Data Products
IPUMS User Note: Issues Concerning the Calculation of Standard Errors (i.e., variance estimation)using IPUMS Data Products by Michael Davern and Jeremy Strief Producing accurate standard errors is essential
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
Northumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
Instructions for Analyzing Data from CAHPS Surveys:
Instructions for Analyzing Data from CAHPS Surveys: Using the CAHPS Analysis Program Version 3.6 The CAHPS Analysis Program...1 Computing Requirements...1 Pre-Analysis Decisions...2 What Does the CAHPS
SECTION 3.2: MOTOR VEHICLE TRAFFIC CRASHES
SECTION 3.2: MOTOR VEHICLE TRAFFIC CRASHES 1,155 Deaths* 4,755 Hospitalizations 103,860 ED Visits *SOURCE: OHIO DEPARTMENT OF HEALTH, VITAL STATISTICS SOURCE: OHIO HOSPITAL ASSOCIATION CHAPTER HIGHLIGHTS:
Life Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
Power Calculation Using the Online Variance Almanac (Web VA): A User s Guide
Power Calculation Using the Online Variance Almanac (Web VA): A User s Guide Larry V. Hedges & E.C. Hedberg This research was supported by the National Science Foundation under Award Nos. 0129365 and 0815295.
SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY
SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY Karen E. Davis National Center for Health Statistics, 6525 Belcrest Road, Room 915, Hyattsville, MD 20782 KEY WORDS: Sample survey, cost model
Selecting Research Participants
C H A P T E R 6 Selecting Research Participants OBJECTIVES After studying this chapter, students should be able to Define the term sampling frame Describe the difference between random sampling and random
2013 State of Colorado Distracted Driver Study
2013 State of Colorado Distracted Driver Study Colorado Department of Transportation SEAT BE L STUDY T INSTITUTE OF TRANSPORTATION MANAGEMENT EXECUTIVE SUMMARY The Institute of Transportation Management
Vitamin D Status: United States, 2001 2006
Vitamin D Status: United States, 2001 2006 Anne C. Looker, Ph.D.; Clifford L. Johnson, M.P.H.; David A. Lacher, M.D.; Christine M. Pfeiffer, Ph.D.; Rosemary L. Schleicher, Ph.D.; and Christopher T. Sempos,
National Survey of Franchisees 2015
National Survey of Franchisees 2015 An Analysis of National Survey Results FRANCHISEGRADE COM Vital Insight for your investment. FranchiseGrade.com, National Survey of Franchisees, 2015 Contents Introduction...
ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA
m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,
Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.
Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National
Data The estimates presented in the tables originate from the 2013 SCS to the NCVS. The SCS collects information about student and school
This document reports data from the 2013 School Crime Supplement (SCS) of the National Crime Victimization Survey (NCVS). 1 The Web Tables show the extent to which with different personal characteristics
Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina
Paper PO-21 Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina ABSTRACT Permuted-block randomization with varying block sizes using
Obesity and Socioeconomic Status in Children and Adolescents: United States, 2005 2008
Obesity and Socioeconomic Status in Children and Adolescents: United States, 2005 2008 Cynthia L. Ogden, Ph.D.; Molly M. Lamb, Ph.D.; Margaret D. Carroll, M.S.P.H.; and Katherine M. Flegal, Ph.D. Key findings
National Endowment for the Arts. A Technical Research Manual
2012 SPPA PUBLIC-USE DATA FILE USER S GUIDE A Technical Research Manual Prepared by Timothy Triplett Statistical Methods Group Urban Institute September 2013 Table of Contents Introduction... 3 Section
Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
Optimization of sampling strata with the SamplingStrata package
Optimization of sampling strata with the SamplingStrata package Package version 1.1 Giulio Barcaroli January 12, 2016 Abstract In stratified random sampling the problem of determining the optimal size
Graduate Student Epidemiology Program
Graduate Student Epidemiology Program To promote training in MCH Epidemiology Real-World Experience in: Data Analysis and Monitoring Needs Assessment Program Evaluation 2015 Program Guide Submit your application
DATA IN A RESEARCH. Tran Thi Ut, FEC/HSU October 10, 2013
DATA IN A RESEARCH Tran Thi Ut, FEC/HSU October 10, 2013 IMPORTANCE OF DATA In research, it needs data for the stages: Research design Sampling design Data gathering and /or field work techniques Data
A COMPREHENSIVE SOFTWARE PACKAGE FOR SURVEY DATA ANALYSIS
SUDAAN: A COMPREHENSIVE SOFTWARE PACKAGE FOR SURVEY DATA ANALYSIS Lisa M. LaVange, Babubhai V. Shah, Beth G. Barnwell and Joyce F. Killinger Lisa M. LaVan~e. Research Triangle Institute KEYWORDS: variance
Sampling strategies *
UNITED NATIONS SECRETARIAT ESA/STAT/AC.93/2 Statistics Division 03 November 2003 Expert Group Meeting to Review the Draft Handbook on Designing of Household Sample Surveys 3-5 December 2003 English only
Multilevel Modeling of Complex Survey Data
Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics
Approaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
Simple Random Sampling
Source: Frerichs, R.R. Rapid Surveys (unpublished), 2008. NOT FOR COMMERCIAL DISTRIBUTION 3 Simple Random Sampling 3.1 INTRODUCTION Everyone mentions simple random sampling, but few use this method for
The Research Data Centres Information and Technical Bulletin
Catalogue no. 12-002 X No. 2014001 ISSN 1710-2197 The Research Data Centres Information and Technical Bulletin Winter 2014, vol. 6 no. 1 How to obtain more information For information about this product
Analysis of complex survey samples.
Analysis of complex survey samples. Thomas Lumley Department of Biostatistics University of Washington April 15, 2004 Abstract I present software for analysing complex survey samples in R. The sampling
Ana M. Viamonte Ros, M.D., M.P.H. State Surgeon General
Florida Department of Health Division of Disease Control Bureau of Epidemiology Chronic Disease Epidemiology Section Charlie Crist Governor Ana M. Viamonte Ros, M.D., M.P.H. State Surgeon General Florida
Chapter 8: Quantitative Sampling
Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or
Diabetes. African Americans were disproportionately impacted by diabetes. Table 1 Diabetes deaths by race/ethnicity CHRONIC DISEASES
Diabetes African Americans were disproportionately impacted by diabetes. African Americans were most likely to die of diabetes. People living in San Pablo, Pittsburg, Antioch and Richmond were more likely
Chapter 2: Research Methodology
Chapter 2: Research Methodology 1. Type of Research 2. Sources of Data 3. Instruments for Data Collection 4. Research Methods 5. Sampling 6. Limitations of the Study 6 Chapter 2: Research Methodology Research
Determines if the data you collect is practical for analysis. Reviews the appropriateness of your data collection methods.
Performing a Community Assessment 37 STEP 5: DETERMINE HOW TO UNDERSTAND THE INFORMATION (ANALYZE DATA) Now that you have collected data, what does it mean? Making sense of this information is arguably
Statistical Methods for Sample Surveys (140.640)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Annex 6 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION. (Version 01.
Page 1 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION (Version 01.1) I. Introduction 1. The clean development mechanism (CDM) Executive
Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada
A Comparison of the Results from the Old and New Private Sector Sample Designs for the Medical Expenditure Panel Survey-Insurance Component John P. Sommers 1 Anne T. Kearney 2 1 Agency for Healthcare Research
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
2014-16. Orange County Health Improvement Plan. 2014 Annual Report. www.ochealthiertogether.org
2014-16 Orange County Health Improvement Plan 2014 Annual Report www.ochealthiertogether.org The Orange County Health Improvement Plan (OCHIP) was published in May 2014 for the time period January 2014-December
An Assessment of the Effect of Misreporting of Phone Line Information on Key Weighted RDD Estimates
An Assessment of the Effect of Misreporting of Phone Line Information on Key Weighted RDD Estimates Ashley Bowers 1, Jeffrey Gonzalez 2 1 University of Michigan 2 Bureau of Labor Statistics Abstract Random-digit
