Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction



Similar documents
Paper RIV15 SAS Macros to Produce Publication-ready Tables from SAS Survey Procedures

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in Maine

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in Indiana

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in Florida

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in Washington

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in Georgia

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in Idaho

Enrollment under the Medicaid Expansion and Health Insurance Exchanges. A Focus on Those with Behavioral Health Conditions in New Hampshire

Figure 1.1 Percentage of persons without health insurance coverage: all ages, United States,

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

EXST SAS Lab Lab #4: Data input and dataset modifications

Effective Use of SQL in SAS Programming

How to set the main menu of STATA to default factory settings standards

Technical Notes for HCAHPS Star Ratings

IBM SPSS Statistics for Beginners for Windows

Survey Analysis: Options for Missing Data

Using the American Community Survey Data

HEALTH INSURANCE COVERAGE STATUS American Community Survey 5-Year Estimates

FACILITATOR/MENTOR GUIDE

Instructions for Analyzing Data from CAHPS Surveys:

Health Services Research Utilizing Electronic Health Record Data: A Grad Student How-To Paper

Northumberland Knowledge

Chartpack. August 2008

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

HOUSEHOLDS WITH HIGH LEVELS OF NET ASSETS

Introduction to SAS Business Intelligence/Enterprise Guide Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN

Example: Find the expected value of the random variable X. X P(X)

Using the Magical Keyword "INTO:" in PROC SQL

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Paper AD11 Exceptional Exception Reports

Descriptive Statistics Categorical Variables

Utilizing Clinical SAS Report Templates with ODS Sunil Kumar Gupta, Gupta Programming, Simi Valley, CA

Healthcare Utilization by Individuals with Criminal Justice Involvement: Results of a National Survey

Main Effects and Interactions


SPSS and AM statistical software example.

KEY FEATURES OF SOURCE CONTROL UTILITIES

SPSS Workbook 1 Data Entry : Questionnaire Data

Creating Dynamic Reports Using Data Exchange to Excel

ing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

Drawing a histogram using Excel

Cohort Analysis for Genetic Epidemiology (C. A.G. E.) User Reference Manual

Chapter 5 Analysis of variance SPSS Analysis of variance

Paper PO06. Randomization in Clinical Trial Studies

SUGI 29 Statistics and Data Analysis

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

Death Data: CDC Wonder, Texas Health Data, and VitalWeb

ABSTRACT INTRODUCTION STUDY DESCRIPTION

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data

Software for Analysis of YRBS Data

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Salary. Cumulative Frequency

Using SAS to Examine Health-Promoting Life Style Activities of Upper Division Nursing Students at USC

Using Excel s PivotTable to Analyze Learning Assessment Data

Is it statistically significant? The chi-square test

Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC

Jessica S. Banthin and Thomas M. Selden. Agency for Healthcare Research and Quality Working Paper No July 2006

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Utilizing Clinical SAS Report Templates Sunil Kumar Gupta Gupta Programming, Thousand Oaks, CA

Health Care and Life Sciences

Directions for using SPSS

Innovative Techniques and Tools to Detect Data Quality Problems

Performing Queries Using PROC SQL (1)

Using Names To Check Accuracy of Race and Gender Coding in NAEP

Comparing 2010 SIPP and 2013 CPS Content Test Health Insurance Offer and Take-Up Rates 1. Hubert Janicki U.S Census Bureau, Washington D.

watch Introduction January 2012 No. 83

Data The estimates presented in the tables originate from the 2013 SCS to the NCVS. The SCS collects information about student and school

Figure 1.1. Percentage of persons of all ages without health insurance coverage: United States,

Federal Employee Viewpoint Survey Online Reporting and Analysis Tool

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

SPSS (Statistical Package for the Social Sciences)

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

Tutorial Segmentation and Classification

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

Best Practice in SAS programs validation. A Case Study

January 26, 2009 The Faculty Center for Teaching and Learning

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

THE ASSOCIATED PRESS-CNBC INVESTORS SURVEY CONDUCTED BY KNOWLEDGE NETWORKS

CCF Guide to the ACS Health Insurance Coverage Data

Racial and Ethnic Differences in Health Insurance Coverage Among Adult Workers in Florida. Jacky LaGrace Mentor: Dr. Allyson Hall

Health Insurance Coverage: Estimates from the National Health Interview Survey, 2005

Health Insurance Coverage: Estimates from the National Health Interview Survey, 2004

This book serves as a guide for those interested in using IBM

Experiences in Using Academic Data for BI Dashboard Development

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16

Beyond the Basics: Advanced REPORT Procedure Tips and Tricks Updated for SAS 9.2 Allison McMahill Booth, SAS Institute Inc.

5 Point Choice ( 五 分 選 擇 題 ): Allow a single rating of between 1 and 5 for the question at hand. Date ( 日 期 ): Enter a date Eg: What is your birthdate

Generating Randomization Schedules Using SAS Programming Chunqin Deng and Julia Graz, PPD, Inc., Research Triangle Park, North Carolina

Evaluating the results of a car crash study using Statistical Analysis System. Kennesaw State University

Importing and Exporting With SPSS for Windows 17 TUT 117

Transcription:

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction David Izrael, Abt Associates Sarah W. Ball, Abt Associates Sara M.A. Donahue, Abt Associates ABSTRACT We examined a survey sample consisting of treated and not-treated respondents. We show how, using SAS macros based on PROC SURVEYFREQ, the user can easily construct a table that presents survey findings of interest: the unweighted sample, unweighted sample percent (column percent ) and weighted sample for characteristics/variables of interest (rows). We then show how to use the macros to compute a weighted column percent and the weighted treatment ratio (weighted row percent), with respective confidence intervals. We demonstrate the application of the macros to two types of variables: those representing single-select survey questions (i.e., survey question with one response allowed) and those representing survey questions that allow the respondent to choose more than one response. INTRODUCTION Our assumptions are as follows: 1) The survey data includes calculated survey weights and a variable to identify treated vs. not-treated respondents (TX); 2) The survey data includes variables that characterize a respondent s demographic characteristics, such as age, gender, education, and health insurance; 3) The survey design uses stratification and clustering and thus the survey data include the variables strata and cluster. We constructed a table of survey data based on the following shell: Demographic characteristic Sample Sample % Total Weighted sample Age 18-24 25-35 36-55 56-65 66+ Gender Male Female Race/Ethnicity 1

Demographic characteristic Sample Sample % Non-Hispanic white Weighted sample Non-Hispanic black Hispanic Non-Hispanic other Education High school graduate or less Some college or Associate degree Bachelor's degree Master's degree or above Insurance* Private medical insurance Medicare Medicaid Other public insurance No Health Insurance CI: confidence interval *Respondents may select more than one type of insurance Note that the variables age, gender, race/ethnicity, and education represent single-select survey questions, in contrast to insurance, which represents a survey question for which respondents may select more than one response. Sample % is the column percent based on the unweighted sample ; it totals 100 for all categories of the variables that represent each single-select survey question. is the column percent based on the weighted sample ; it also totals 100 for all categories of variables that represent each single-select survey question. Percent treated is the weighted percent of a given category s population that is identified as treated (weighted row percent). For a multiple choice survey item, such as insurance, the sums of Sample % and may be greater than 100 because respondents may select more than one response. Thus, an individual respondent may be counted in more than one category of the variable representing insurance. SAS MACROS TO CALCULATE PERCENT AND CONFIDENCE INTERVALS IN EVERY DIRECTION 1. The first macro TOTAL computes the first Total row of the table and is driven by two procedures: PROC SUMMARY gives us the sample and weighted sample : PROC SUMMARY nway data= ourdata noprint; var N final_wgt; output out=out(drop=_: ) sum = n wgt_n; PROC SURVEYFREQ calculates the total percent of treated respondents and the lower and upper limits of the 95% confidence interval: PROC SURVEYFREQ data=outdata nosummary; tables TX/cl nostd ; 2

ods output OneWay = Tot (keep =TX Frequency WgtFreq Percent LowerCL UpperCL ); strata strata; cluster cluster; weight final_wgt; As expected, the sample percent and the weighted percent for the total row are 100. The macro %TOTAL results in the data set total, which carries all needed values for the first row. 2. The second macro (%SINGLE) is intended to calculate the column values for single-select survey questions (such as age, gender, etc.). The macro call looks like the following: %SINGLE (var, charact, fmt); where var is a reported variable (age, for example), charact is the label that precedes the categories in the leftmost column of the table shell ( Race/Ethnicity, for example), and fmt is a user format with which the categories of the variable will be printed. For the variables that represent single-select survey questions in the above table shell the macro calls look like the following: %SINGLE (age, %NRBQUOTE (Age in years), agef); %SINGLE (sex, %NRBQUOTE (Gender), sexf); %SINGLE (race_ethn, %NRBQUOTE (Race/Ethnicity), racef); %SINGLE (education, %NRBQUOTE (Education), educationf); Here and below we use %NRBQUOTE macro function to accommodate various symbols in the labels, such as,, &, %, etc. Each macro call ultimately creates the data set with the name of the variable the macro processes. This data set contains all the numbers needed to fill the table shell. To combine these data sets for printing we use the following data step: data combined_single; set age sex, race_ethn, education; The core of the macro %SINGLE contains two PROC SURVEYFREQ s and one PROC FREQ. The first PROC SURVEYFREQ calculates the weighted percent of a given category s population that is identified as treated (weighted row percent), with a 95% confidence interval: PROC SURVEYFREQ data=f nosummary; tables &var*tx/cl row nostd; ods output CrossTabs = goriz; strata strata; cluster cluster; weight final_wgt; The data set goriz has all components ( ) of the estimates for all categories of the variable. The second PROC SURVEYFREQ calculates the unweighted and weighted sample for each category of the variable, as well as the weighted column percent and its 95% confidence interval: PROC SURVEYFREQ data=f nosummary ; tables &var/cl nostd; ods output OneWay = vertic(keep = &var frequency wgtfreq percent LowerCL UpperCL rename = (frequency=n wgtfreq = wgt_n )); 3

strata strata; cluster cluster; weight final_wgt; The data set vertic has all components (Sample, Weighted sample, ) of the estimates for all categories of the variable. Finally, to calculate the unweighted percent for each category of the variable, we use PROC FREQ (unfortunately, PROC SURVEYFREQ does not calculate the unweighted percent), as follows: PROC FREQ data=f; tables &var/noprint out=unw (keep = &var percent rename = (percent = unw_pct)); The data set unw has all components (Sample %) of the estimates for all categories of the variable. 3. The third macro (%MULTY) is intended to calculate the column values for each category of those survey questions for which respondents may select more than one response ( multiple response items, such as insurance). As a rule, a multiple response item in the SAS data set includes several variables that represent the individual response options. For insurance (shown in the table shell) the variables are I1-I5. Each variable can be selected (1) or not selected (0). Contrary to the way we approached single response items by processing all categories of the variable in one macro call, the %MULTY macro calculates the values of the columns for each variable (I1-I5) separately. The macro call looks like the following: %MULTY (var, text); where var is a variable representing a response option (for example, I1) and text is the name we would like to assign to this variable in the left most column of the table shell (for example, Private medical insurance for I1). For the insurance multiple response item in the above table shell the macro calls look like the following: %MULTY (I1, %NRBQUOTE (Private medical insurance)); %MULTY (I2, %NRBQUOTE (Medicare)); %MULTY (I3, %NRBQUOTE (Medicaid)); %MULTY (I4, %NRBQUOTE (Other public insurance)); %MULTY (I5, %NRBQUOTE (No Health Insurance)); Each macro call ultimately creates a data set with the name of the variable it processes preceded by the prefix r_ that contains all of the numbers to fill the table shell. To combine those data sets for printing we use the following data step: data combined_multy; set r_i1-r_i5; Unlike in the macro %SINGLE, however, the user must assign the title of the multiple response item ( Insurance in our case) to the variables representing the response options. This can be done by creating a dummy data set as follows: data dummy; 4

length characteristic $100; characteristic='insurance'; output; After assigning a title, the data set containing all values for each variable that represents an individual response option of a multiple response item is created: data combined_multy; set dummy combined_multy; At the core of the macro %MULTY are essentially the same PROC SUREVYFREQ and PROC FREQ as described above; however, the user should remember that contrary to %SINGLE, %MULTY only works with dichotomized variables (with values 1 and 0 ) and only the level 1 (selected) is the object of the estimate. RESULTS Finally, the user combines the data sets total, combined_single, and combined_multy and then prints the dataset in the format of the table shell. The resulting table for the example described is presented below. Weighted sample Demographic characteristic Sample Sample % Total 3000 100 10464000 100 49.9( 47.1, 52.6) Age 18-24 320 10.7 3625000 34.6( 31.7, 37.6) 48.2( 42.3, 54.0) 25-35 601 20 2893000 27.6( 25.3, 30.0) 51.3( 46.4, 56.2) 36-55 1205 40.2 2253000 21.5( 19.8, 23.2) 50.4( 46.6, 54.1) 56-65 284 9.5 1358000 13.0( 11.2, 14.8) 49.9( 42.6, 57.2) 66+ 590 19.7 335000 3.2( 2.9, 3.5) 52.2( 48.2, 56.2) Gender Male 1487 49.6 5162485 49.3( 46.6, 52.1) 50.0( 46.1, 53.9) Female 1513 50.4 5301515 50.7( 47.9, 53.4) 49.7( 45.9, 53.5) Race/Ethnicity Non-Hispanic white 2082 69.4 5311387 50.8( 48.0, 53.5) 51.1( 47.8, 54.3) Non-Hispanic black 314 10.5 2476003 23.7( 21.0, 26.4) 48.4( 41.5, 55.3) Hispanic 453 15.1 2234485 21.4( 19.0, 23.7) 48.0( 41.6, 54.3) Non-Hispanic other 151 5 442125 4.2( 3.2, 5.2) 52.7( 40.8, 64.6) Education High school graduate or less 899 30 3194838 30.5( 28.0, 33.1) 49.7( 44.7, 54.8) Some college or Associate degree 602 20.1 2073367 19.8( 17.6, 22.0) 53.2( 47.1, 59.4) Bachelor's degree 891 29.7 3105997 29.7( 27.2, 32.2) 45.4( 40.5, 50.4) Master's degree or above 608 20.3 2089798 20.0( 17.8, 22.1) 53.3( 47.3, 59.3) Insurance* Private medical insurance 2090 69.7 7468136 71.4( 68.9, 73.8) 49.9(46.6,53.2) Medicare 927 30.9 3203223 30.6( 28.1, 33.1) 50.0(45.2,54.9) Medicaid 589 19.6 2090449 20.0( 17.8, 22.2) 50.7(44.5,56.9) 5

Weighted sample Demographic characteristic Sample Sample % Other public insurance 612 20.4 2141661 20.5( 18.2, 22.7) 50.0(43.8,56.1) No Health Insurance 410 13.7 1391595 13.3( 11.5, 15.1) 51.1(43.8,58.4) CI: confidence interval *Respondents may select more than one type of insurance FLEXIBILITY How flexible is our table? Suppose we need to replace education with marital status and place marital status after insurance. We would write the following statements: %SINGLE(age, %NRBQUOTE (Age in years), agef); %SINGLE(sex, %NRBQUOTE (Gender), sexf); %SINGLE(race_ethn, %NRBQUOTE (Race/Ethnicity), racef); /* %SINGLE(education, %NRBQUOTE (Education), educationf); OLD LINE COMMENTED */ %SINGLE(marital_status, %NRBQUOTE (Marital status),maritalf); /* NEW LINE */ and then construct the data set for printing like this: data forprint; set total age sex race_ethn combined_multy marital_status; where combined_multy is the combined insurance data created earlier. Could not be easier! What if the format of the table is different? For example, a table might require parallel columns for two separate groups of survey respondents (males and females in the example below). Males Females Demographic characteristic Total Sample Sample Age 18-24 25-35 36-55 56-65 66+ Race/Ethnicity Non-Hispanic white Non-Hispanic black Hispanic Non-Hispanic other 6

No worry! Using the variable indicating the category of the group of survey respondents (in this example, Gender ) apply the macros presented above (TOTAL, %SINGLE, %MULTY, as needed]) to the first group ( Males ), renaming the macros output data sets with a marker for the group (e.g., age_male). Then apply the same macros to the second group ( Females ). Before combining the resulting data sets using the data combined_single step outlined above, merge the two data sets by each individual variable (in the above example, there will be two data sets each for age and race_ethn). After merging, the data sets that now contain output from the two separate groups of survey respondents can be combined using the data combined_single step outlined above. For the table shell presented above, do not forget to drop the unweighted percent. As needed, apply the other macros and combine all data sets to print in the format of the table shell. To print this kind of table one can use PROC REPORT rather than PROC PRINT. Done! DISCLAIMER All the numbers in the table are based on randomly generated data and, therefore, have nothing in common with any survey data we have dealt with in our work. CONTACT INFORMATION David Izrael Abt Associates Inc, 617.349.2434 david_izrael@abtassoc.com 7