Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing



Similar documents
Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC

Instant Interactive SAS Log Window Analyzer

Using the Magical Keyword "INTO:" in PROC SQL

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

SAS Views The Best of Both Worlds

ing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1

Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA

Effective Use of SQL in SAS Programming

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

Post Processing Macro in Clinical Data Reporting Niraj J. Pandya

From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

PharmaSUG 2014 Paper CC23. Need to Review or Deliver Outputs on a Rolling Basis? Just Apply the Filter! Tom Santopoli, Accenture, Berwyn, PA

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

PharmaSUG Paper QT26

PROC MEANS: More than just your average procedure

PharmaSUG Paper CC30

A Macro to Create Data Definition Documents

Beyond the Simple SAS Merge. Vanessa L. Cox, MS 1,2, and Kimberly A. Wildes, DrPH, MA, LPC, NCC 3. Cancer Center, Houston, TX.

Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC

Anyone Can Learn PROC TABULATE

To complete this workbook, you will need the following file:

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD

Creating Dynamic Reports Using Data Exchange to Excel

Different ways of calculating percentiles using SAS Arun Akkinapalli, ebay Inc, San Jose CA

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

Normalized EditChecks Automated Tracking (N.E.A.T.) A SAS solution to improve clinical data cleaning

Converting Electronic Medical Records Data into Practical Analysis Dataset

Introduction to proc glm

PharmaSUG Paper DG06

Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA

SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC

Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC

Taming the PROC TRANSPOSE

REx: An Automated System for Extracting Clinical Trial Data from Oracle to SAS

Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA

Histogram of Numeric Data Distribution from the UNIVARIATE Procedure

A Method for Cleaning Clinical Trial Analysis Data Sets

Imputing Missing Data using SAS

Paper Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

We begin by defining a few user-supplied parameters, to make the code transferable between various projects.

Web Reporting by Combining the Best of HTML and SAS

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports

Alternatives to Merging SAS Data Sets But Be Careful

Top Ten Reasons to Use PROC SQL

The SAS Data step/macro Interface

Paper PO06. Randomization in Clinical Trial Studies

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Identifying Invalid Social Security Numbers

EXST SAS Lab Lab #4: Data input and dataset modifications

Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

Salary. Cumulative Frequency

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction

Creating Word Tables using PROC REPORT and ODS RTF

Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT

Managing Tables in Microsoft SQL Server using SAS

Base Conversion written by Cathy Saxton

Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC

The Query Builder: The Swiss Army Knife of SAS Enterprise Guide

Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN

Statistics and Analysis. Quality Control: How to Analyze and Verify Financial Data

Instructions for Creating Silly Survey Database

Pharmaceutical Applications

SUGI 29 Statistics and Data Analysis

Storing and Using a List of Values in a Macro Variable

OS/390 SAS/MXG Computer Performance Reports in HTML Format

LOCF-Different Approaches, Same Results Using LAG Function, RETAIN Statement, and ARRAY Facility Iuliana Barbalau, ClinOps LLC. San Francisco, CA.

Report Customization Using PROC REPORT Procedure Shruthi Amruthnath, EPITEC, INC., Southfield, MI

How to Color Your Report? By Xuefeng Yu, Celgene Co., Summit, NJ

ODS for PRINT, REPORT and TABULATE

Paper PO12 Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs, a process overview with real world examples ABSTRACT INTRODUCTION

Automated distribution of SAS results Jacques Pagé, Les Services Conseils HARDY, Quebec, Qc

ABSTRACT INTRODUCTION %CODE MACRO DEFINITION

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server

Release Management in Vasont

Transcription:

Paper CC22 Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing Stacey D. Phillips, i3 Statprobe, San Diego, CA Gary Klein, i3 Statprobe, San Diego, CA ABSTRACT SAS is wonderful at summarizing our data, including creating frequency counts and percentages. However, sometimes, what isn t in the data is just as important as what is in the data. Unfortunately, it is not so easy to get SAS to summarize what isn t there, e.g., how can a PROC FREQ count data points that do not exist in the data? This paper will explore five different methods of creating summaries that will contain a count of zero. INTRODUCTION Frequently the role of the SAS programmer is to present summaries of information from various sources. For example, in the pharmaceutical industry, the programmer may have to summarize all of the demographics that appear on a case report form. However, when the data contains a small population or there is something obscure on the CRF which no subject in the data fulfills, the summarization of all the points on the CRF becomes difficult. In another example, a statistician may want to see all of the values on the CRF in a table even if no subject in the data reported that characteristic. In these cases we are interested in the fact that no one is actually in the data---or as we call it here, a zero row. The goal of this paper is to present five different examples of how to get SAS to summarize those zero rows for us, that is, summarize records that aren t there. INITIAL DATA For our examples, we will create a dataset with some categories missing that we will need to summarize later. We will count the number of subjects with the different combinations of brown, blonde, and red hair with brown, blue and hazel eyes. So we will begin by creating the following dataset and formats: proc format value $ hair "Br" = "" "Bl" = "Blonde" "Re" = "Red" value $ eyes "Br" = "" "Bl" = "Blue" "Ha" = "Hazel" quit data temp attrib col1 label = 'Hair' format = $hair. col2 label = 'Eyes' format = $eyes. col1 = "Br" col2 = "Br" output col1 = "Br" col2 = "Bl" output col1 = "Bl" col2 = "Br" output Below is a simple PROC PRINT of our data: Hair Blonde Eyes Blue It should be apparent that the combination of blonde hair with blue eyes is missing, as well as all different combinations with

red hair and hazel eyes. We will need to summarize every combination. TECHNIQUES TO SUMMARIZE MISSING DATA METHOD 1 PROC FREQ USING A DUMMY HARD-CODED DATASET title3 "Method 1: PROC FREQ using hard-coded dummy dataset" title4 "Benefits: No formats necessary." title5 "Limitations: Need to create a dummy dataset - susceptible to errors if any changes" proc freq data=temp noprint table col1 * col2 /out=way1a(drop=percent) data dummy do col1 = "Br", "Bl", "Re" do col2 = "Br", "Bl", "Ha" output end end proc sort data=way1a by col1 col2 proc sort data=dummy by col1 col2 data way1b merge way1a dummy by col1 col2 if count =. then count = 0 proc print data=way1b l noobs title3 Hair Eyes Frequency Count Hazel 0 Red 0 In this example, we use simple output statements to create a blank record for each possible combination of eye and hair color. After using PROC FREQ to create a dataset with the counts of actual data, we merge this dataset with the dummy dataset and have a complete set of frequencies for every possible combination of hair and eye color. The biggest advantage to this method is that it is simple and requires no formats. One large disadvantage, however, is that the programmer needs to be aware of all of the possible combinations before programming. It could become a maintenance nightmare if the possible values change. METHOD 2 PROC FREQ USING THE SPARSE OPTION

title3 "Method 2: PROC FREQ using sparse option" title4 "Benefits: Simple Code. No formats needed" title5 "Limitations: - Need at least one from each row" proc freq data=temp noprint table col1 * col2 /out=way2(drop=percent) sparse proc print data=way2 l noobs title3 Hair Eyes Frequency Count Again in this example we use PROC FREQ to create a dataset that includes the frequencies of the various combinations of hair and eye color. This time, however, we use the sparse option in SAS. If we use the TEMP dataset from the first example there are 2 options for hair color: blond and brown and 2 options for eye color: blue and brown. Using the sparse option in PROC FREQ, SAS outputs a record for every possible combination that could potentially occur in the data rather than just the combinations that do occur. For example, there is no record of blonde hair and blue eyes in the data, but SAS lists it as a possible combination because blond hair and blue eyes occur in other data points. The sparse option is convenient to use and allows for simpler code. A glaring limitation is that the sparse option will only summarize what it sees in the data. So although we know hazel eyes and red hair are options from the CRF, SAS does not know this and these two characteristics are left off of the frequency counts. METHOD 3 PROC FREQ USING AN AUTOMATED DUMMY DATASET title3 "Method 3: PROC FREQ with an automated dummy dataset" title4 "Benefits: Automates all values from formats - less chance for error" title5 "Limitations: Complicated Code, also need to know which format to apply" title6 "(Could find formats using proc contents or sashelp - but it could get overly complicated)" proc freq data=temp noprint table col1 * col2 /out=way3a(drop=percent) proc format cntlout=formats proc sql create table dummy as select a.start label="hair" format=$hair. as col1, b.start label="eyes" format=$eyes. as col2 from formats(where=(fmtname='hair')) as a, formats(where=(fmtname='eyes')) as b quit create table way3b as select a.col1, a.col2, coalesce(b.count, 0) as count from dummy as a left join way3a as b on a.col1 = b.col1 and a.col2 = b.col2 proc print data=way3b l noobs

title3 Hazel 0 Red 0 This example automatically uses PROC SQL to create a dummy dataset based on the values of formats specified by the programmer. Then, using a combination of PROC SQL and the coalesce function, SAS joins the dataset with the counts (created from the PROC FREQ) with the dummy dataset and fills in a count of zero where an actual count from the data does not exist. This method is great because it is automatic and based on the formats, but is problematic because the code is rather complicated. METHOD 4 PROC MEANS USING COMPLETETYPES OPTION title3 "Method 4: PROC MEANS using completetypes " title4 "Benefits: Simple code" title5 "Limitations: Need at least one from each row" proc means data=temp completetypes noprint nway class col1 col2 output out=way4(rename=(_freq_=count) drop=_type_) proc print data=way4 l noobs title3 In this example, we use a method that is very similar to using the sparse option in PROC FREQ but instead this time we use PROC MEANS. Using PROC MEANS on dataset TEMP and the completetypes option, we get an output dataset that includes all possible combinations that could potentially occur in the data in addition to combinations that do occur.. So, even though there is no record with blonde hair and blue eyes, the completetypes option includes this combination because blonde hair and blue eyes occur elsewhere in dataset TEMP. Like the sparse option in PROC FREQ, the completypes option is very simple to use. Similar again to sparse, there must be at least one occurrence of a value for completypes to summarize appropriately. METHOD 5 PROC MEANS USING COMPLETETYPES AND THE PRELOADFMT OPTION title3 "Method 5: PROC MEANS using completetypes and preloadfmt" title4 "Benefits: Automated to get all records from formats and do not necessarily need to know format if assigned" title5 "Limitations: Needs formats" proc means data=temp completetypes noprint nway class col1 col2 /PRELOADFMT output out=way5(rename=(_freq_=count) drop=_type_) proc print data=way5 l noobs

title3 Hazel 0 Red 0 In the original dataset TEMP, the formats for both hair and eye color were assigned to each variable using ATTRIB statements. The PRELOADFMT option in PROC MEANS uses these assigned formats to determine what the possible combinations of values could be. Advantages to this method include simplicity of use and the fact there is no requirement to have at least one occurrence of a value in the data. A disadvantage is that this method only works if you are using formats in combination with your data. You don t necessarily need to know what the format values are, but you do need to be sure that the formats are assigned to the variables you are trying to summarize. CONCLUSION When producing summary tables in the pharmaceutical industry, it is frequently important to summarize what is not there as well as what is there. In this paper we ve discussed five separate ways to accomplish this task and which method you choose depends on the complexity and characteristics of your data. Whichever method you choose, you should now be armed with the knowledge and the ability to summarize nothing! CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Gary Klein i3 Statprobe 10052 Mesa Ridge Court Suite 200 San Diego, CA 92121 Work Phone: (858) 597-1000 ext. 2460 Email: gary.klein@i3statprobe.com Stacey D. Phillips i3 Statprobe 10052 Mesa Ridge Court Suite 200 San Diego, CA 92121 Email: stacey.phillips@i3statprobe.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.