Survey Analysis: Options for Missing Data

Size: px
Start display at page:

Download "Survey Analysis: Options for Missing Data"

Transcription

1 Survey Analysis: Options for Missing Data Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Abstract A common situation researchers working with survey data face is the analysis of missing data, often due to nonresponse. In addition to missing values for analysis variables, SAS excludes observations if the weight of any of the design variables (strata, cluster, domain) have missing values. This paper discusses two options available with the SAS survey procedures (e.g. SURVEYFREQ, SURVEYMEANS): the MISSING option and the NOMCAR option. The MISSING option is used with categorical variables to instruct SAS to treat missing values as a valid category. The NOMCAR option (new with version 9.2) is used when the default assumption that missing values for analysis variables are missing completely at random (i.e. the group of non-respondents do not differ in any relevant respect from the group of respondents) is not appropriate. Use of the NOMCAR option instructs SAS to perform a domain analysis of missing and non-missing values. Specific examples will be used to illustrate the effect of the use of these two options for variance estimation and the computation of confidence limits Introduction A useful starting point for the discussion of missing data in this paper is the following text from the SAS documentation section on Missing Values for PROC SURVEYMEANS: (1) By default, when computing statistics for an analysis variable, PROC SURVEYMEANS omits observations with missing values for that variable. The procedure computes statistics for each variable based only on observations that have nonmissing values for that variable. This treatment is based on the assumption that the missing values are missing completely at random (MCAR). However, this assumption is sometimes not true. For example, evidence from other surveys might suggest that observations with missing values are systematically different from observations without missing values. If you believe that missing values are not missing completely at random, then you can specify the NOMCAR option to let variance estimation include these observations with missing values in the analysis variables. For the analysis of complex surveys another factor comes into play, i.e. the omission of observations with missing values potentially removes important information with respect to the design properties of the survey, e.g. strata and cluster information. We will see, in the discussion of an example from the Medical Expenditure Panel Survey (MEPS) below, that the use of the NOMCAR option can be an alternative to using a DOMAIN analysis which is often recommended instead of prior restricting of analyses to target subpopulations. The effect of using the NOMCAR option is given in (2). (2) When the NOMCAR option is used, the procedure treats observations with and without missing values for analysis variables as two different domains, and it performs a domain analysis in the domain of nonmissing observations. Although SAS 9.2 includes options for replication methods of variance estimation (BRR, Jackknife), the NOMCAR option only applies to the default Taylor series method. Note also the reference in (2) to analysis variables. In contrast, the MISSING option affects categorical variables. The text in (3) is from the SAS 9.2 documentation for PROC SURVEYMEANS. (3) [The MISSING option] treats missing values as a valid (nonmissing) category for all categorical variables, which includes CLASS, STRATA, CLUSTER, and DOMAIN variables. 1

2 By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value. Note that SAS' characterization of a variable as categorical is based on its use on one of the listed statements (e.g. DOMAIN), and not on the variable's values or range of values. The rest of this paper consists of three examples: Example 1 shows the effect of the NOMCAR option with a simple stratified sample with missing data for the analysis variable; Example 2 shows the effect of the MISSING option for a similar stratified sample with missing values for a categorical variable used in the DOMAIN statement; Example 3 uses a morecomplex example to compare the effect of using the NOMCAR option with a DOMAIN analysis based on missing values for an analysis variable. The goal of this paper is to illustrate some of the effects you will observe when using the NOMCAR and MISSING options. This paper is by no means an exhaustive discussion of the topic. Nor does it advise you when it is appropriate to use or not use these options. Often this is determined solely by the design properties of the survey data you are analyzing and/or your research goals. A discussion of all the different design and analytic factors to consider is beyond the scope of this paper. But the examples discussed below should give you a concrete sense of the use of these options, as well as specific questions to consider when weighing their use. Example 1 (Spending on Ice Cream by Grade Level) This example is straight from the SAS 9.2. documentation for PROC SURVEYMEANS (Example 85.4, Analyzing Survey Data with Missing Values). In this example students from three grades (7, 8, and 9) are sampled with respect to spending for ice cream (you can see a more user-friendly formatting of the ICECREAM data set, sorted by GRADE, SPENDING in Appendix A). The value of WEIGHT is assigned as the inverse of the probability of selection (1/PROB). For each grade, PROB is defined as the ratio of the number sampled to the total number of students. Not shown here is a separate data set (STUDENTTOTALS) which has the total number of students for each grade, i.e. the population totals for each stratum. (4) DATA ICECREAM; INPUT GRADE IF GRADE = 7 THEN PROB = 20/1824; IF GRADE = 8 THEN PROB = 9/1025; IF GRADE = 9 THEN PROB = 11/1151; WEIGHT = 1/PROB; DATALINES; ; For comparison purposes we will first show output for the SURVEYMEANS code shown in (5). Here the mean and sum are requested. Although not germane to the missing data issues discussed here, the STUDENTTOTALS data set is used to compute a finite population correction for variance estimation (it is included here to maintain consistency with the SAS documentation example). In the code below GRADE is the stratification variable, SPENDING the analysis variable, and WEIGHT the weight variable. The LIST option on the STRATA statement requests a Stratum Information table as part of the procedure output. 2

3 (5) PROC SURVEYMEANS DATA= ICECREAM TOTAL=STUDENTTOTALS MEAN SUM; STRATA GRADE / LIST; VAR SPENDING; WEIGHT WEIGHT; The Data Summary table in (6) below lists the number of strata (i.e. grades), the number of observations (cf. the PROC PRINT output in Appendix A), and the weighted sum (i.e. the sum of the population total for all grades). The Stratum Information table lists descriptive information for each strata (grade). The N Obs column shows the number sampled and the N column shows the number of observations with non-missing values for the analysis variable SPENDING. Subtracting N from N Obs shows that Grade 7 has 3 missing values and Grades 8 and 9 have 2 missing values each (see Appendix A). The tables shows the requested MEAN and SUM, along with the variance estimate for each. For these estimates the observations with missing values were excluded. As stated, this is the default SAS behavior. (6) Output for the SURVEYMEANS code in (5). Data Summary Number of Strata 3 Number of Observations 40 Sum of Weights 4000 Stratum Index GRADE Stratum Information Population Total Sampling Rate N Obs Variable % 20 SPENDING % 9 SPENDING % 11 SPENDING 9 N Variable of Sum Std Dev SPENDING Keeping especially the estimates of variance in mind, we now modify the example by including the NOMCAR option to see its effect. 3

4 (7) PROC SURVEYMEANS DATA= ICECREAM TOTAL=STUDENTTOTALS NOMCAR MEAN SUM; STRATA GRADE / LIST; VAR SPENDING; WEIGHT WEIGHT; The Data Summary and Strata Information tables are unchanged from the prior example so they will not be reproduced below. The output in (8) does show a new Variance Estimation table to reflect the inclusion of the NOMCAR option. As stated this option is specific to the Taylor Series method for variance estimation, and this method is listed in the table as is the fact that observations for missing values for the analysis variable will be included. (8) Output for the SURVEYMEANS code in (7) Variance Estimation Method Taylor Series Missing Values Included (NOMCAR) Variable of Sum Std Dev SPENDING Of particular interest here is the difference in the standard error for the mean and the standard deviation for the sum. But first note that the point estimates (MEAN, SUM) are unaffected. It is only the variance estimation which is affected. This is particularly important when variance estimates are used to determine if two point estimates (e.g. the MEAN or SUM in different years) are significantly different. Standard errors and standard deviations tend to be larger when the NOMCAR option is used than when the assumption is made that missing values are missing completely at random. This is certainly the case with the example shown. Therefore the assumption that missing values are not missing completely at random is the more-conservative assumption. Example 2 (Spending on Ice Cream: Domain Analysis, Parent's Education) This example modifies the input data used in Example 1 by adding a new, binary, variable (PARENT_ED) which indicates if the student's parent completed high school or college. In addition to values of COLLEGE or HIGHSCHOOL, in this data set, the variable also has missing values. The code below is similar to that in (5), except for the inclusion of the DOMAIN statement. In addition to the table we saw in Example 1, the use of the DOMAIN statement will generate an output Domain Analysis table. (9) PROC SURVEYMEANS DATA= ICECREAM TOTAL=STUDENTTOTALS MEAN SUM; STRATA GRADE / LIST; VAR SPENDING; DOMAIN PARENT_ED; WEIGHT WEIGHT; In the table below the overall estimates (, Sum and their variance estimates) are identical to those we saw for Example 1 when the NOMCAR option was not used. The Domain Analysis table shows these estimates for the sub- 4

5 populations of students with parents with either a college or high school education. Note that observations are not included if the value of PARENT_ED is missing. (10) Output for the SURVEYMEANS code in (9) Variable of Sum Std Dev SPENDING PARENT_ED Variable Domain Analysis: PARENT_ED of Sum Std Dev COLLEGE SPENDING HIGHSCHOOL SPENDING Below we add the MISSING option in order to include all observations in the data set, including those for students where we don t have information about their parents' education. (11) PROC SURVEYMEANS DATA= ICECREAM TOTAL=STUDENTTOTALS MISSING MEAN SUM; STRATA GRADE / LIST; VAR SPENDING; DOMAIN PARENT_ED; WEIGHT WEIGHT; (12) Output for the SURVEYMEANS code in (11) Variable of Sum Std Dev SPENDING PARENT_ED Variable Domain Analysis: PARENT_ED of Sum Std Dev SPENDING COLLEGE SPENDING HIGHSCHOOL SPENDING

6 In the Domain Analysis table above we now see three rows for the PARENT_ED domain variable. In addition to seeing the estimates for this subpopulation, we also see that the inclusion of these observation, by changing the total number of observations within each stratum, has also changed the variance estimates. For example, the Std Dev for students whose parents attended college is 3,000 when the observations with missing PARENT_ED values are excluded. But the Std Dev for this same group is 3,363 when those observations are included, i.e. including these observations with missing values yields more-conservative estimates of reliability. This difference points to the importance of determining, for the survey analysis you are conducting, whether or not it is appropriate to exclude missing values for categorical variables in generating variances for your estimates. Next we turn to a real-world example using data from the Medical Expenditure Panel Survey. Example 3 (Hospital Stay Expenses) The Medical Expenditure Panel Survey (MEPS) is a complex national probability survey of the civilian noninstitutionalized population. Each year MEPS collects healthcare utilization, expenditure and other information for approximately 32,000 individuals. Public use files (PUFs) are released each year. The data in the example discussed below is from the 2006 MEPS Full-Year Consolidated Data file (HC-105), available for download from the Agency For Healthcare Research and Quality s Web site ( In order to use MEPS data for national estimates, person- and family-level weights are developed and released on the annual public-use files. In the example used here the 2006 person-level weight variable PERWT06F is used. In addition, the MEPS sample design includes stratification, clustering, multiple stages of selection, and disproportionate sampling. Because of these complex design properties, it is not appropriate to assume simple random sampling for variance estimation. To obtain accurate variance estimates an appropriate technique to derive standard errors associated with the weighted estimates must be used. Several methods for estimating standard errors for estimates from complex surveys have been developed, including the Taylor-series linearization method, balanced repeated replication, and the jack-knife method. The MEPS public use files include variables to obtain weighted estimates and to implement a Taylor-series approach to estimate standard errors for weighted survey estimates. These variables, which jointly reflect the MEPS survey design, include the estimation weight, sampling strata, and the cluster or primary sampling unit (PSU). Standard errors for MEPS estimates normally require the analytic file to contain all of the MEPS sample persons (e.g., those with positive values for the person weight variable) in order for the analysis to correctly account for the MEPS strata and PSUs. Subsetting to a population of interest (e.g. persons with a particular condition, procedure, or utilization), although normally an efficient programming move, potentially removes important stratification and clustering information from the analysis procedure. Indeed this is often the reason to use a survey procedure such as SURVEY- MEANS or SURVEY FREQ rather than their counterparts MEANS and FREQ. In the examples discussed below the following design variables will be used: PERWT06F (person-level weight variable); VARSTR (stratum variable); VARPSU (PSU, i.e. cluster, variable). The analysis variable is IPFEXP06 (2006 inpatient hospital stay facility expenses). Consider a situation where you are asked to generate the mean and total person-level expenditures for hospitals stays in 2006, but only for persons with hospital stay expenses. This is a typical way to look at average expenditures because the majority of persons will have zero hospital-stay expenses in a given year. You could remove persons with zero expenses from the analysis by deleting them from the input data set. But this conflicts with the recommendation not to subset in this way because it removes important strata and cluster (PSU) information from the variance estimation calculations. As Machlin et al (2005) point out, (12) Analyses are often limited to a subgroup of the population. However, creating a special analysis file that contains only observations for the subgroup of interest may yield incorrect standard errors because all of the observations corresponding to a stage of the MEPS sample design may be deleted. Therefore, it is advisable to preserve the entire survey design structure for the program by reading in the entire person-level file. 6

7 One apparent alternative is to recode the zero values to missing, as in (13) below, in order to exclude these observations from the analysis. This would indeed exclude those observations since the analysis procedure will omit observations with missing values for the analysis variable. But this would be equivalent to the prior subsetting already discussed. (13) DATA IP2006M; SET CDATA.H105 (KEEP= IPFEXP06 VARSTR VARPSU PERWT06F); IF IPFEXP06 = 0 THEN IPFEXP06 =. ; (14) PROC SURVEYMEANS DATA= IP2006M MEAN SUM; STRATA VARSTR ; CLUSTER VARPSU; VAR IPFEXP06; WEIGHT PERWT06F; (15) Output for the SURVEYMEANS code in (14) Data Summary Number of Strata 203 Number of Clusters 451 Number of Observations Number of Observations Used Number of Obs with Nonpositive Weights 1568 Sum of Weights Variable Label of Sum Std Dev IPFEXP06 HOSP FACILITY EXPENSES As we saw with the previous examples, the Data Summary table contains the basic information for the number of strata, clusters (PSUs), etc. Note that the number of observations used in this table is the number of observations with a positive weight. The sum of observations used and the number of observations with nonpositive weights is the number of observations (32, ,568 = 34,145). The table above shows that, in 2006, the mean, per-person, hospital stay expense, for those with a stay, is $12,584. The standard error for this estimate is The total expense is $264.9 billion, with a standard deviation of 13.3 billion. Having seen in (15) the variance estimates when persons with zero estimates (recoded to missing) are excluded from the analysis, we modify the example to include the NOMCAR option. Note that the input dataset here still has zero values recoded to missing. 7

8 (16) PROC SURVEYMEANS DATA= IP2006M NOMCAR MEAN SUM; STRATA VARSTR / LIST; CLUSTER VARPSU; VAR IPFEXP06; WEIGHT PERWT06F; (17) Output for the SURVEYMEANS code in (16) Variance Estimation Method Taylor Series Missing Values Included (NOMCAR) Variable Label of Sum Std Dev IPFEXP06 HOSP FACILITY EXPENSES Again, as we saw in the ice cream example, neither the mean nor total estimates are affected by the use of the NOMCAR option. But the standard error for the mean, as well as the standard deviation for the sum, are larger, i.e. when observations with zero hospital expenses are excluded from the analysis, the standard error is but when these observations are included. Similarly, when the zero-expense records are excluded, the standard deviation is 13.3 billion, but 14.2 billion when those observations are included. As the SAS documentation says, when the NOMCAR option is used, the analysis procedure treats observations with and without missing values for analysis variables as two different domains, and it performs a domain analysis in the domain of nonmissing observations. We can see this explicitly if we consider that, prior to the introduction of the NOMCAR option with version 9.2., the only alternative was to create a domain variable and use the DOMAIN statement to instruct SAS to perform a domain analysis. Consider the domain variable SUBPOP created in (18) and used in (19). Here the zero values for IPFEXP06 have not been recoded to missing, but rather keep their original value. (18) DATA IP2006; SET CDATA.H105 (KEEP= IPFEXP06 VARSTR VARPSU PERWT06F); IF IPFEXP06 > 0 THEN SUBPOP = 'WITH EXP'; ELSE SUBPOP = 'WITHOUT EXP'; (19) PROC SURVEYMEANS DATA= IP2006 MEAN SUM; STRATA VARSTR ; CLUSTER VARPSU; VAR IPFEXP06; WEIGHT PERWT06F; DOMAIN SUBPOP; 8

9 (20) Output for the SURVEYMEANS code in (19) Variable Label of Sum Std Dev IPFEXP06 HOSP FACILITY EXPENSES Domain Analysis: SUBPOP SUBPOP Variable Label of Sum Std Dev WITH EXP IPFEXP06 HOSP FACILITY EXPENSES WITHOUT IPFEXP06 HOSP FACILITY EXPENSES Here the table gives the estimates for the full population, i.e. persons with and without a hospital stay expense. The Domain Analysis table shows the estimates of interest, i.e. those for persons with an expense (WITH EXP). What is important to note here is that both the standard error and the standard deviation are identical to those produced by use of the NOMCAR option. This follows from the fact that the NOMCAR option is, behind the scenes, performing the domain analysis explicitly coded in (18) and (19). One potential advantage of using the explicit DOMAIN analysis here is that the output more accurately reflects the input data and the analysis preformed. The NOMCAR option, although potentially a useful shortcut, masks both the properties of the input data and the fact that a domain analysis is being performed. Summary This paper has illustrated the use of two options of potential use when working with survey data with missing values. The MISSING option overrides SAS' default behavior of excluding observations where the values of a categorical value are missing. Instead it treats missing values as a valid analysis category. The NOMCAR option is intended for use when the default assumption that observations where the analysis variable has missing values are missing completely at random is not justified. This option instructs SAS to perform a domain analysis for observations with and without missing values for the analysis variable. References Machlin, S., Yu, W., and Zodet, M. Computing Standard Errors for MEPS Estimates. January Agency for Healthcare Research and Quality, Rockville, MD. Available at: Acknowledgements I would like to thank my colleagues at Social & Scientific Systems, Inc. for lots of help with SAS in general and survey analysis in particular. 9

10 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. Contact Information Paul Gorrell Social & Scientific Systems, Inc Georgia Avenue Silver Spring, MD

11 APPENDIX A ICECREAM DATA SET Obs GRADE SPENDING Obs GRADE SPENDING

1 The SAS System 13:06 Tuesday, December 16, 2008

1 The SAS System 13:06 Tuesday, December 16, 2008 1 The SAS System 13:06 Tuesday, December 16, 2008 NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.2 (TS1M0) Licensed to SOCIAL & SCIENTIFIC SYSTEMS

More information

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY? The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health, ABSTRACT

More information

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

Survey Data Analysis in Stata

Survey Data Analysis in Stata Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of

More information

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data CONTENTS Overview 1 Background 1 1. SUDAAN 2 1.1. Analysis capabilities 2 1.2. Data requirements 2 1.3. Variance estimation 2 1.4. Survey

More information

Software for Analysis of YRBS Data

Software for Analysis of YRBS Data Youth Risk Behavior Surveillance System (YRBSS) Software for Analysis of YRBS Data June 2014 Where can I get more information? Visit www.cdc.gov/yrbss or call 800 CDC INFO (800 232 4636). CONTENTS Overview

More information

Advanced Tutorials. Numeric Data In SAS : Guidelines for Storage and Display Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD

Advanced Tutorials. Numeric Data In SAS : Guidelines for Storage and Display Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Numeric Data In SAS : Guidelines for Storage and Display Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD ABSTRACT Understanding how SAS stores and displays numeric data is essential

More information

Survey Data Analysis in Stata

Survey Data Analysis in Stata Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP 2009 Canadian Stata Users Group Meeting Outline 1 Types of data 2 2 Survey data characteristics 4 2.1 Single

More information

Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction

Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction June 9, 2015 Variance Estimation Guidance, NHIS 2006-2014 (Adapted from the 2006-2014 NHIS Survey Description Documents) Introduction The data collected in the NHIS are obtained through a complex, multistage

More information

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

More information

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION A sample survey is a process for collecting

More information

Comparison of Variance Estimates in a National Health Survey

Comparison of Variance Estimates in a National Health Survey Comparison of Variance Estimates in a National Health Survey Karen E. Davis 1 and Van L. Parsons 2 1 Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850 2 National Center

More information

Paper PO06. Randomization in Clinical Trial Studies

Paper PO06. Randomization in Clinical Trial Studies Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection

More information

ANALYTIC AND REPORTING GUIDELINES

ANALYTIC AND REPORTING GUIDELINES ANALYTIC AND REPORTING GUIDELINES The National Health and Nutrition Examination Survey (NHANES) Last Update: December, 2005 Last Correction, September, 2006 National Center for Health Statistics Centers

More information

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Patricia A. Berglund, Institute for Social Research - University of Michigan Wisconsin and Illinois SAS User s Group June 25, 2014 1 Overview

More information

National Longitudinal Study of Adolescent Health. Strategies to Perform a Design-Based Analysis Using the Add Health Data

National Longitudinal Study of Adolescent Health. Strategies to Perform a Design-Based Analysis Using the Add Health Data National Longitudinal Study of Adolescent Health Strategies to Perform a Design-Based Analysis Using the Add Health Data Kim Chantala Joyce Tabor Carolina Population Center University of North Carolina

More information

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE

More information

Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America.

Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America. Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America Abstract Complex sample survey designs deviate from simple random sampling,

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina

Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina Paper PO-21 Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina ABSTRACT Permuted-block randomization with varying block sizes using

More information

Sampling Error Estimation in Design-Based Analysis of the PSID Data

Sampling Error Estimation in Design-Based Analysis of the PSID Data Technical Series Paper #11-05 Sampling Error Estimation in Design-Based Analysis of the PSID Data Steven G. Heeringa, Patricia A. Berglund, Azam Khan Survey Research Center, Institute for Social Research

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

TIMSS 2011 User Guide for the International Database

TIMSS 2011 User Guide for the International Database TIMSS 2011 User Guide for the International Database Edited by: Pierre Foy, Alka Arora, and Gabrielle M. Stanco TIMSS 2011 User Guide for the International Database Edited by: Pierre Foy, Alka Arora,

More information

How to set the main menu of STATA to default factory settings standards

How to set the main menu of STATA to default factory settings standards University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Julian Luke Stephen Blumberg Centers for Disease Control and Prevention National Center for Health Statistics

More information

Taming the PROC TRANSPOSE

Taming the PROC TRANSPOSE Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious

More information

Innovative Techniques and Tools to Detect Data Quality Problems

Innovative Techniques and Tools to Detect Data Quality Problems Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

MEPS HC-168B: 2014 Dental Visits

MEPS HC-168B: 2014 Dental Visits MEPS HC-168B: 2014 Dental Visits June 2016 Agency for Healthcare Research and Quality Center for Financing, Access, and Cost Trends 5600 Fishers Lane Rockville, MD 20857 (301) 427-1406 Table of Contents

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

ANOMALIES IN FORM 5500 FILINGS: LESSONS FROM SUPPLEMENTAL DATA FOR GROUP HEALTH PLAN FUNDING

ANOMALIES IN FORM 5500 FILINGS: LESSONS FROM SUPPLEMENTAL DATA FOR GROUP HEALTH PLAN FUNDING ANOMALIES IN FORM 5500 FILINGS: LESSONS FROM SUPPLEMENTAL DATA FOR GROUP HEALTH PLAN FUNDING Final Report December 14, 2012 Michael J. Brien, PhD Deloitte Financial Advisory Services LLP 202-378-5096 michaelbrien@deloitte.com

More information

AP STATISTICS 2010 SCORING GUIDELINES

AP STATISTICS 2010 SCORING GUIDELINES 2010 SCORING GUIDELINES Question 4 Intent of Question The primary goals of this question were to (1) assess students ability to calculate an expected value and a standard deviation; (2) recognize the applicability

More information

Missing Data Part 1: Overview, Traditional Methods Page 1

Missing Data Part 1: Overview, Traditional Methods Page 1 Missing Data Part 1: Overview, Traditional Methods Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 17, 2015 This discussion borrows heavily from: Applied

More information

National Endowment for the Arts. A Technical Research Manual

National Endowment for the Arts. A Technical Research Manual 2012 SPPA PUBLIC-USE DATA FILE USER S GUIDE A Technical Research Manual Prepared by Timothy Triplett Statistical Methods Group Urban Institute September 2013 Table of Contents Introduction... 3 Section

More information

Descriptive Methods Ch. 6 and 7

Descriptive Methods Ch. 6 and 7 Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data,

More information

Health Care Expenditures for Uncomplicated Pregnancies

Health Care Expenditures for Uncomplicated Pregnancies Health Care Expenditures for Uncomplicated Pregnancies Agency for Healthcare Research and Quality U.S. Department of Health & Human Services August 2007 ABSTRACT This report uses data pooled from three

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada

Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada A Comparison of the Results from the Old and New Private Sector Sample Designs for the Medical Expenditure Panel Survey-Insurance Component John P. Sommers 1 Anne T. Kearney 2 1 Agency for Healthcare Research

More information

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Kathy Welch Center for Statistical Consultation and Research The University of Michigan 1 Background ProcMixed can be used to fit Linear

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

More information

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC A PROC SQL Primer Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC ABSTRACT Most SAS programmers utilize the power of the DATA step to manipulate their datasets. However, unless they pull

More information

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7 Chapter 1 Introduction 1 SAS: The Complete Research Tool 1 Objectives 2 A Note About Syntax and Examples 2 Syntax 2 Examples 3 Organization 4 Chapter by Chapter 4 What This Book Is Not 5 Chapter 2 SAS

More information

Reshaping & Combining Tables Unit of analysis Combining. Assignment 4. Assignment 4 continued PHPM 672/677 2/21/2016. Kum 1

Reshaping & Combining Tables Unit of analysis Combining. Assignment 4. Assignment 4 continued PHPM 672/677 2/21/2016. Kum 1 Reshaping & Combining Tables Unit of analysis Combining Reshaping set: concatenate tables (stack rows) merge: link tables (attach columns) proc summary: consolidate rows proc transpose: reshape table Hye-Chung

More information

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction David Izrael, Abt Associates Sarah W. Ball, Abt Associates Sara M.A. Donahue, Abt Associates ABSTRACT We examined

More information

EXST SAS Lab Lab #4: Data input and dataset modifications

EXST SAS Lab Lab #4: Data input and dataset modifications EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will

More information

PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC

PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC Paper BB-12 PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC ABSTRACT PROC SUMMARY is used for summarizing the data across all observations and is familiar to most SAS

More information

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha

More information

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study. Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National

More information

Sensitivity Analysis in Multiple Imputation for Missing Data

Sensitivity Analysis in Multiple Imputation for Missing Data Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Samuel Zuvekas. Agency for Healthcare Research and Quality Working Paper No. 09003. August 2009

Samuel Zuvekas. Agency for Healthcare Research and Quality Working Paper No. 09003. August 2009 Validity of Household Reports of Medicare-covered Home Health Agency Use Samuel Zuvekas Agency for Healthcare Research and Quality Working Paper No. 09003 August 2009 Suggested citation: Zuvekas S. Validity

More information

Chapter 19 Statistical analysis of survey data. Abstract

Chapter 19 Statistical analysis of survey data. Abstract Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract

More information

Logistic (RLOGIST) Example #3

Logistic (RLOGIST) Example #3 Logistic (RLOGIST) Example #3 SUDAAN Statements and Results Illustrated PREDMARG (predicted marginal proportion) CONDMARG (conditional marginal proportion) PRED_EFF pairwise comparison COND_EFF pairwise

More information

Salary. Cumulative Frequency

Salary. Cumulative Frequency HW01 Answering the Right Question with the Right PROC Carrie Mariner, Afton-Royal Training & Consulting, Richmond, VA ABSTRACT When your boss comes to you and says "I need this report by tomorrow!" do

More information

A Guide to Survey Analysis in GenStat. by Steve Langton. Defra Environmental Observatory, 1-2 Peasholme Green, York YO1 7PX, UK.

A Guide to Survey Analysis in GenStat. by Steve Langton. Defra Environmental Observatory, 1-2 Peasholme Green, York YO1 7PX, UK. Survey Analysis A Guide to Survey Analysis in GenStat by Steve Langton Defra Environmental Observatory, 1-2 Peasholme Green, York YO1 7PX, UK. GenStat is developed by VSN International Ltd, in collaboration

More information

Research. Dental Services: Use, Expenses, and Sources of Payment, 1996-2000

Research. Dental Services: Use, Expenses, and Sources of Payment, 1996-2000 yyyyyyyyy yyyyyyyyy yyyyyyyyy yyyyyyyyy Dental Services: Use, Expenses, and Sources of Payment, 1996-2000 yyyyyyyyy yyyyyyyyy Research yyyyyyyyy yyyyyyyyy #20 Findings yyyyyyyyy yyyyyyyyy U.S. Department

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

Survey Inference for Subpopulations

Survey Inference for Subpopulations American Journal of Epidemiology Vol. 144, No. 1 Printed In U.S.A Survey Inference for Subpopulations Barry I. Graubard 1 and Edward. Korn 2 One frequently analyzes a subset of the data collected in a

More information

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction.

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction. Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II Introduction Census or Sample Sampling Frame Probability or non-probability sample Sampling with or without replacement

More information

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

More information

Creating Pivot Tables Using Excel 2003

Creating Pivot Tables Using Excel 2003 1 Creating Pivot Tables Using Excel 2003 Creating Six Kinds of Tables Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project Director, W M Keck Statistical

More information

Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers

Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers Journal of Traumatic Stress, Vol. 2, No., October 2008, pp. 440 447 ( C 2008) Statistical and Methodological Issues in the Analysis of Complex Sample Survey Data: Practical Guidance for Trauma Researchers

More information

Optimization of sampling strata with the SamplingStrata package

Optimization of sampling strata with the SamplingStrata package Optimization of sampling strata with the SamplingStrata package Package version 1.1 Giulio Barcaroli January 12, 2016 Abstract In stratified random sampling the problem of determining the optimal size

More information

A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

More information

Sampling strategies *

Sampling strategies * UNITED NATIONS SECRETARIAT ESA/STAT/AC.93/2 Statistics Division 03 November 2003 Expert Group Meeting to Review the Draft Handbook on Designing of Household Sample Surveys 3-5 December 2003 English only

More information

Statistics and Data Analysis

Statistics and Data Analysis NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA

The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA PAPER IN09_05 The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA ABSTRACT The SAS DATA step is easy enough for beginners to produce results quickly. You can

More information

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular

More information

The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL

The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL Paper 88-216 The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL ABSTRACT The HPSUMMARY procedure provides

More information

Methodology. Report 1. Design and Methods of the Medical Expenditure Panel Survey Household Component

Methodology. Report 1. Design and Methods of the Medical Expenditure Panel Survey Household Component Design and Methods of the Medical Expenditure Panel Survey Household Component Methodology,,,,, yyyyy zzzz Report 1 {{{{{ U.S. Department of Health and Human Services Public Health Service Agency for Health

More information

Designing a Sampling Method for a Survey of Iowa High School Seniors

Designing a Sampling Method for a Survey of Iowa High School Seniors Designing a Sampling Method for a Survey of Iowa High School Seniors Kyle A. Hewitt and Michael D. Larsen, Iowa State University Department of Statistics, Snedecor Hall, Ames, Iowa 50011-1210, larsen@iastate.edu

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Why Sample? Why not study everyone? Debate about Census vs. sampling

Why Sample? Why not study everyone? Debate about Census vs. sampling Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling

More information

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Samuel Zuvekas and Gary Olin. Agency for Healthcare Research and Quality Working Paper No. 08004. March 2008

Samuel Zuvekas and Gary Olin. Agency for Healthcare Research and Quality Working Paper No. 08004. March 2008 Validating the Collection of Separately Billed Doctor Expenditures for Hospital Services: Results from the Medicare-MEPS Validation Study Samuel Zuvekas and Gary Olin Agency for Healthcare Research and

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there

More information

Community Tracking Study. Comparison of Selected Statistical Software Packages for Variance Estimation in the CTS Surveys

Community Tracking Study. Comparison of Selected Statistical Software Packages for Variance Estimation in the CTS Surveys Community Tracking Study Comparison of Selected Statistical Software Packages for Variance Estimation in the CTS Surveys Elizabeth Schaefer Frank Potter Stephen Williams Nuria Diaz-Tena James D. Reschovsky

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

Accounting for complex survey design in modeling usual intake. Kevin W. Dodd, PhD National Cancer Institute

Accounting for complex survey design in modeling usual intake. Kevin W. Dodd, PhD National Cancer Institute Accounting for complex survey design in modeling usual intake Kevin W. Dodd, PhD National Cancer Institute Slide 1 Hello and welcome to today s webinar, the fourth in the Measurement Error Webinar Series.

More information

Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

More information

SAS: A Mini-Manual for ECO 351 by Andrew C. Brod

SAS: A Mini-Manual for ECO 351 by Andrew C. Brod SAS: A Mini-Manual for ECO 351 by Andrew C. Brod 1. Introduction This document discusses the basics of using SAS to do problems and prepare for the exams in ECO 351. I decided to produce this little guide

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Guidelines for Analyzing Add Health Data. Ping Chen Kim Chantala

Guidelines for Analyzing Add Health Data. Ping Chen Kim Chantala Guidelines for Analyzing Add Health Data Ping Chen Kim Chantala Carolina Population Center University of North Carolina at Chapel Hill Last Update: March 2014 Table of Contents Overview... 3 Understanding

More information

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can. SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants

More information

Better Safe than Sorry: A SAS Macro to Selectively Back Up Files

Better Safe than Sorry: A SAS Macro to Selectively Back Up Files Better Safe than Sorry: A SAS Macro to Selectively Back Up Files Jia Wang, Data and Analytic Solutions, Inc., Fairfax, VA Zhengyi Fang, Social & Scientific Systems, Inc., Silver Spring, MD ABSTRACT SAS

More information

Chapter 63 The SURVEYSELECT Procedure

Chapter 63 The SURVEYSELECT Procedure Chapter 63 The SURVEYSELECT Procedure Chapter Table of Contents OVERVIEW...3275 GETTING STARTED...3276 Simple Random Sampling...3277 StratifiedSampling...3279 Stratified Sampling with Control Sorting...3282

More information