Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC



Similar documents
Spelling Checker Utility in SAS using VBA Macro and SAS Functions Ajay Gupta, PPD, Morrisville, NC

PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC

Enhancing the SAS Enhanced Editor with Toolbar Customizations Lynn Mullins, PPD, Cincinnati, Ohio

Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

Need for Speed in Large Datasets The Trio of SAS INDICES, PROC SQL and WHERE CLAUSE is the Answer, continued

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction

SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC

Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC

Instant Interactive SAS Log Window Analyzer

Categorical Variables

Generating Randomization Schedules Using SAS Programming Chunqin Deng and Julia Graz, PPD, Inc., Research Triangle Park, North Carolina

ing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

Effective Use of SQL in SAS Programming

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1


Health Services Research Utilizing Electronic Health Record Data: A Grad Student How-To Paper

PharmaSUG Paper QT26

Innovative Techniques and Tools to Detect Data Quality Problems

EXST SAS Lab Lab #4: Data input and dataset modifications

A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC

Utilizing Clinical SAS Report Templates Sunil Kumar Gupta Gupta Programming, Thousand Oaks, CA

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

2: Entering Data. Open SPSS and follow along as your read this description.

Paper PO06. Randomization in Clinical Trial Studies

Taming the PROC TRANSPOSE

Using the Magical Keyword "INTO:" in PROC SQL

PROC MEANS: More than just your average procedure

Paper RIV15 SAS Macros to Produce Publication-ready Tables from SAS Survey Procedures

Histogram of Numeric Data Distribution from the UNIVARIATE Procedure

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA

Beyond the Simple SAS Merge. Vanessa L. Cox, MS 1,2, and Kimberly A. Wildes, DrPH, MA, LPC, NCC 3. Cancer Center, Houston, TX.

Salary. Cumulative Frequency

Creating Dynamic Reports Using Data Exchange to Excel

A Macro to Create Data Definition Documents

Computing Percentage Using PROC TABULATE from Simple to More Complex

Beyond the Basics: Advanced REPORT Procedure Tips and Tricks Updated for SAS 9.2 Allison McMahill Booth, SAS Institute Inc.

Utilizing Clinical SAS Report Templates with ODS Sunil Kumar Gupta, Gupta Programming, Simi Valley, CA

Paper Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Chapter 32 Histograms and Bar Charts. Chapter Table of Contents VARIABLES METHOD OUTPUT REFERENCES...474

New Tricks for an Old Tool: Using Custom Formats for Data Validation and Program Efficiency

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD

Subsetting Observations from Large SAS Data Sets

THE POWER OF PROC FORMAT

How To Write A Clinical Trial In Sas

The Art of Designing HOLAP Databases Mark Moorman, SAS Institute Inc., Cary NC

PharmaSUG Paper DG06

Introduction Course in SPSS - Evening 1

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA

Scientific Method. 2. Design Study. 1. Ask Question. Questionnaire. Descriptive Research Study. 6: Share Findings. 1: Ask Question.

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

SSN validation Virtually at no cost Milorad Stojanovic RTI International Education Surveys Division RTP, North Carolina

Data exploration with Microsoft Excel: analysing more than one variable

Chapter 6 INTERVALS Statement. Chapter Table of Contents

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Federal Employee Viewpoint Survey Online Reporting and Analysis Tool

REx: An Automated System for Extracting Clinical Trial Data from Oracle to SAS

HELP! - My MERGE Statement Has More Than One Data Set With Repeats of BY Values!

Remove Orphan Claims and Third party Claims for Insurance Data Qiling Shi, NCI Information Systems, Inc., Nashville, Tennessee

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC

Post Processing Macro in Clinical Data Reporting Niraj J. Pandya

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

Paper PO12 Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs, a process overview with real world examples ABSTRACT INTRODUCTION

Using the SAS Enterprise Guide (Version 4.2)

PharmaSUG Paper AD08

Storing and Using a List of Values in a Macro Variable

IBM SPSS Direct Marketing 23

Report Customization Using PROC REPORT Procedure Shruthi Amruthnath, EPITEC, INC., Southfield, MI

IBM SPSS Direct Marketing 22

Managing Tables in Microsoft SQL Server using SAS

SAS Visual Analytics dashboard for pollution analysis

An macro: Exploring metadata EG and user credentials in Linux to automate notifications Jason Baucom, Ateb Inc.

Statistics and Analysis. Quality Control: How to Analyze and Verify Financial Data

Sensitivity Analysis in Multiple Imputation for Missing Data

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached.

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Transcription:

ABSTRACT PharmaSUG 2015 - Paper QT33 Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC In Pharmaceuticals/CRO industries, table programing is often started when only partial data is available and it is common that we need to summarize data based on all possible combinations of values. However, if there is no data collected for all levels of these values in the data set, by default, existing SAS procedures are not able to display them in a summary table. A viable solution is to add the PRELOADFMT option in these procedures. PROC FREQ is commonly used procedure to summarize data and calculate statistics, but the PRELOADFMT option is not supported. However using the existing SAS procedures (e.g. PROC FORMAT, PROC DATASETS, and PROC FREQ), we can create a SAS macro which can perform the above mentioned task easily without much programming work. INTRODUCTION Table programing is often started when only partial data is available and it is common that we need to summarize data based on all possible combinations of values. However, if there is no data collected for all levels of these values in the data set, by default, existing SAS procedures are not able to display them in a summary table. A viable solution is to add the PRELOADFMT option in these procedures. This option specifies that all formats are preloaded with all possible levels for all the variables of interest. But some procedures like PROC FREQ, which are widely used to summarize data and calculate statistics, do not have the PRELOADFMT option. In order to simulate the PRELOADFMT option in a PROC FREQ, for example, a viable solution would be to preprocess the data with all plausible combinations of values and later execute the PROC FREQ using the SPARSE option in the TABLE statement. HANDLING MISSING VALUES USING PROC FREQ This paper will identify the issue by executing PROC FREQ on demographic data with missing values. Later, the paper will discuss above mentioned solution step by step. CREATE DEMOGRAHIC DATA AND FORMATS For demonstration purpose, we will use demographic data set with following scenarios: The data possibly can have three treatment groups. In this example treatment 3 is not present in the data set The data possibly can have both male and female subjects. In this example there are no male subjects in Treatment 2. There will be one associated format for gender and treatment each. /*Create format for treatment group and gender*/ PROC FORMAT; VALUE $trt "1"="Treatment 1" "2"="Treatment 2" "3"="Treatment 3" ; VALUE $sex "1"="Male" "2"="Female" ; /* dm data set below has data only for 2 treatments and male subject is missing for treatment 2*/ DATA dm; INPUT subject $ sex $ treatment $; CARDS; 1

; 101 1 1 102 2 1 202 2 2 203 1 1 301 1 1 303 2 2 /*Apply format to treatment and gender*/ DATA dm1; SET dm; FORMAT treatment $trt. Sex $sex.; LABEL treatment="treatment" sex="sex"; Please see below screenshot of output demographic data set. Please note that there is no observation for treatment 3. Also, male subject is not present for treatment 2. Output 1. Demographic data set EXAMPLE 1: PROC FREQ WITH NO OPTION Initially, to get the frequency counts by gender and treatment, we will execute PROC FREQ on the demographic data with no option. /* Execute PROC FREQ to get the counts*/ PROC FREQ DATA=dm1; TABLES treatment*sex/ out=count (drop=percent); Please see below screenshot of output data set. Note that the following frequency counts are missing: For Treatment 2, frequency count for male subjects is missing. For Treatment 3, frequency counts for both male and female subjects are missing. Output 2. Count data set when no option is used in PROC FREQ 2

EXAMPLE 2: PROC FREQ WITH SPARSE OPTION In order to get the all missing frequency counts, let s try to execute PROC FREQ on the demographic data with only SPARSE option. SPARSE option reports all possible combinations of the variable values, even if a combination does not occur in the data. When you use the SPARSE and OUT= options, PROC FREQ includes empty cross tabulation table cells in the output data set. By default, PROC FREQ does not include zero-frequency table cells in the output data set. /* Execute PROC FREQ with sparse option to get the counts*/ PROC FREQ DATA=dm1; TABLES treatment*sex/sparse out=count (drop=percent); Please see below screenshot of output data set. Note the following changes in frequency count data set. For Treatment 2, frequency count for male subjects is present now. Please see the highlighted row below. For Treatment 3, frequency counts for both male and female subjects are still missing. Output 3. Count data set when only SPARSE option is used in PROC FREQ EXAMPLE 3: PROC FREQ WITH SPARSE AND PRELOADFMT (SIMULATE) OPTION In order to get all missing frequency counts, PROC FREQ must have options similar to COMPLETETYPES and PRELOADFMT which are present in the MEANS, SUMMARY and REPORT procedures. Although, PROC FREQ has SPARSE option similar to COMPLETETYPES; however it does not have option similar to PRELOADFMT. So our best solution here is to simulate data similar to PRELOADFMT process and then execute the PROC FREQ. Please see steps below to generate PRELOADFMT option effect on demographic data: Create a SAS data set using PROC FORMAT with the CNTLOUT option. Then subset the data set using the treatment format. This data set will have all possible combinations of treatment. Merge above data set with demographic data set. The new data set will now have all possible combinations of treatment. Execute PROC FREQ with SPARSE option to get all possible frequency count. Please see below SAS code for detail: /*Get all possible treatments from format and rename the start variable to treatment*/ PROC FORMAT CNTLOUT=trt(KEEP=start RENAME=(start=treatment)); SELECT $TRT; /*Merge all possible combination of treatment with dm*/ PROC SORT DATA = trt; PROC SORT DATA = dm; 3

DATA dm1; MERGE trt dm; FORMAT treatment $trt. sex $sex.; LABEL treatment="treatment" sex="sex"; PROC FREQ DATA=dm1 NOPRINT; TABLES treatment*sex/ sparse out=count(where=(sex~=" ") drop=percent); Please see below screenshot of output data set. Please note the following change in frequency count data set. For Treatment 2, frequency count for male subjects is present now. Please see the highlighted row below. For Treatment 3, frequency counts for both male and female subjects are present now. Please see the highlighted row below. Output 4. Count data set when SPARSE and PRELOADFMT (Simulate) options are used in PROC FREQ Overall summary, using the above method we can get the results to look as if we were using PRELOADFMT option on PROC FREQ. If multiple categories are missing then we need to merge each category from format data set separately with the original data to get the required missing frequency counts. Also, any further change in the category for e.g. an additional group added at a later stage of study will not require any change in the code. CONCLUSION Above solution demonstrates an innovative approach to simulate PRELOADFMT option in PROC FREQ. This solution requires minimal amount of coding and avoids any hard coding of zero count for missing category. REFERENCES SAS Knowledge base: http://support.sas.com/resources/ Surampalli, Naren (2012), The power of using options COMPLETETYPES and PRELOADFMT, Proceedings of the PharmaSUG 2012 Conference, http://www.pharmasug.org/proceedings/2012/cc/pharmasug-2012-cc26.pdf 4

ACKNOWLEDGMENTS Thanks to Lindsay Dean, Daniela Popa, Ken Borowiak, David Gray, Richard DAmato, Lynn Clipstone, Thomas Fritchey, and PPD Management for their reviews and comments. Thanks to my family for their support. DISCLAIMER The contents of this paper are the work of the author and do not necessarily represent the opinions, recommendations, or practices of PPD. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Ajay Gupta, M.S PPD 3900 Paramount Parkway Morrisville, NC-27560 Phone: (919) -456-6461 Email: Ajay.gupta@ppdi.com Ajaykailasgupta@aol.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies 5