Taming the PROC TRANSPOSE



Similar documents
Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

Salary. Cumulative Frequency

Paper Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server

Paper Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

Search and Replace in SAS Data Sets thru GUI

Alternatives to Merging SAS Data Sets But Be Careful

Performing Queries Using PROC SQL (1)

Integrating Data and Business Rules with a Control Data Set in SAS

Survey Analysis: Options for Missing Data

PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC

One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University

ONLINE EXTERNAL AND SURVEY STUDIES

Chapter 2 The Data Table. Chapter Table of Contents

Paper An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois

Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC

Introduction to Criteria-based Deduplication of Records, continued SESUG 2012

PharmaSUG Paper QT26

New Tricks for an Old Tool: Using Custom Formats for Data Validation and Program Efficiency

Sage Abra SQL HRMS Reports. User Guide

IRA Pivot Table Review and Using Analyze to Modify Reports. For help,

BUSINESS DATA ANALYSIS WITH PIVOTTABLES

Frequently Asked Questions About Using The GRE Search Service

Transforming SAS Data Sets Using Arrays. Introduction

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

Access Queries (Office 2003)

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

Managing Tables in Microsoft SQL Server using SAS

SAS BI Dashboard 3.1. User s Guide

Chapter 32 Histograms and Bar Charts. Chapter Table of Contents VARIABLES METHOD OUTPUT REFERENCES...474

Horizon Debt Collect. User s and Administrator s Guide

More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

PROC MEANS: More than just your average procedure

Tips, Tricks, and Techniques from the Experts

Statistics and Data Analysis

Reading Delimited Text Files into SAS 9 TS-673

Subsetting Observations from Large SAS Data Sets

SAS/Data Integration Studio Creating and Using A Generated Transformation Jeff Dyson, Financial Risk Group, Cary, NC

Internal User Guide. AECsoft USA, Inc 1776 Yorktown Ste 435 Houston, TX

The Query Builder: The Swiss Army Knife of SAS Enterprise Guide

THE TERMS AND CONCEPTS

Technical Paper. Reading Delimited Text Files into SAS 9

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Hypercosm. Studio.

Sample- for evaluation purposes only. Advanced Crystal Reports. TeachUcomp, Inc.

This book serves as a guide for those interested in using IBM SPSS

Handling Missing Values in the SQL Procedure

EXST SAS Lab Lab #4: Data input and dataset modifications

Microsoft Access Basics

What You re Missing About Missing Values

Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC

Excel Database Management Microsoft Excel 2003

ing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

1 P a g e. User Guide support.keytime.co.uk

Microsoft Access 2010 Overview of Basics

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Wave Analytics Data Integration

Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

Creating Tables ACCESS. Normalisation Techniques

EXTRACTING DATA FROM PDF FILES

Microsoft Access 2003 Module 1

Top Ten Reasons to Use PROC SQL

Logi Ad Hoc Reporting System Administration Guide

9.1 SAS. SQL Query Window. User s Guide

The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL

SPSS (Statistical Package for the Social Sciences)

Using Excel As A Database

Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD

This document describes the capabilities of NEXT Analytics v5.1 to retrieve data from Google Analytics directly into your spreadsheet file.

Creating Dynamic Reports Using Data Exchange to Excel

Text Analytics Illustrated with a Simple Data Set

Switching from PC SAS to SAS Enterprise Guide Zhengxin (Cindy) Yang, inventiv Health Clinical, Princeton, NJ

Beyond the Basics: Advanced REPORT Procedure Tips and Tricks Updated for SAS 9.2 Allison McMahill Booth, SAS Institute Inc.

Dynamic Decision-Making Web Services Using SAS Stored Processes and SAS Business Rules Manager

Effective Use of SQL in SAS Programming

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

BUSINESS OBJECTS XI WEB INTELLIGENCE

Scatter Chart. Segmented Bar Chart. Overlay Chart

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Transcription:

Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious about syntax that is often not particularly intuitive. Many programmers resort to use of the Data step to achieve transposition which they believe results in better control. This paper is intended to demystify the procedure, explaining the syntax and providing useful examples of how to utilize it. INTRODUCTION The PROC TRANSPOSE is part of the SAS language that does not get used as much as it should. It is very helpful when needing to shift data from rows to columns or vice versa. This function if achieved in a DATA step can be much more cumbersome to code. The PROC TRANSPOSE can save time and complexity once it is properly explained. TRANSPOSE SYNTAX: The syntax for the PROC TRANPOSE is somewhat misunderstood amongst SAS users. It is probably a little unorthodox when compared with other procedures that analysts use. In this section we will detail all of the features of the PROC TRANSPOSE. BY This statement allows you to transpose data within the combination of the BY variables, the BY variables themselves aren t transposed. A PROC SORT needs to be run on the source data in order for BY variables to be processed properly, unless the DESCENDING or NOTSORTED options are COPY ID This statement copies the value of a variable from the source data set to the data set resulting of the procedure. Because of this, the number of records in your output data set will be the same as your input data set. Missing data will show up for your duplicate records. This statement identifies the variable which creates a name for the value that was transposed. If the variable in the ID statement is numeric, an underscore will be put at the beginning of the variable name, in keeping with variable naming convention. Without the ID variable the default value will be col1, col2, etc. IDLABEL VAR This statement labels the variable being transposed. In order for this statement to work properly, it must follow the ID statement. This statement lists the actual data that needs to be transposed. If you do not include a VAR statement, the procedure will transpose all numeric variables that are not included in a BY statement or a ID statement. If you want to transpose a character variable, a VAR statement is required. DATA= This option specifies the input data set. LABEL= This option allows you to choose a name for the automatic variable _LABEL_ created by the procedure. In many cases this variable is dropped from the final results. 1

NAME= This option allows you to choose a name for the automatic variable _NAME_ created by the procedure. If you have chosen a more complex problem that involves multiple variables in the var statement this additional variable becomes important to identify which variable is represented in the results. Otherwise it can generally be dropped. OUT= This option creates a new data set for your results. If you do not specify an output data set in the code the results will be put into a default data set called data1. PREFIX= The prefix adds a string to the beginning of the transposed variable. In the default example, the prefix would be col as mentioned in the ID description. This option can be used in conjunction with the ID variable or with the default ID value. EXAMPLE 1 SIMPLE TRANSPOSE We start with a simple example of the transpose procedure. The data we are beginning with looks like the results of a typical PROC FREQ. It includes a product, a decision code and a count of each combination. State Popflag COUNT DC Pop2 6 DC Pop3 2 DC Pop4 3 DE Pop2 6 DE Pop3 5 DE Pop4 6 FL Pop2 8 FL Pop3 6 FL Pop4 6 GA Pop2 8 Our desired output is to transpose the value of count for each state. We would also like the columns to be titled with the values of popflag so that the data is clearly labeled. The following code illustrates how this would be done: proc sort data=tr1; proc transpose data=tr1 out=tr2; var count; The variable we desire to transpose is count and therefore goes into the VAR statement. The title desired for the transposed columns is the popflag field and goes in the ID field. Because we would like the transpose to occur for each value of product, it goes into the BY statement. The results of this code look like the following: Obs State _NAME LABEL_ Pop2 Pop3 Pop4 Pop1 1 DC COUNT Frequency Count 6 2 3. 2 DE COUNT Frequency Count 6 5 6. 3 FL COUNT Frequency Count 8 6 6. 4 GA COUNT Frequency Count 8 8 4. 5 NC COUNT Frequency Count 16 9 2 22 2

6 PA COUNT Frequency Count 2... 7 SC COUNT Frequency Count 12 14 3. 8 VA COUNT Frequency Count 25 12 5 10 Note the default variables of _NAME_ and _LABEL_ were created by the procedure indicating which variable was transposed. If your ID variable was numeric, SAS would automatically put an underscore in front of it to conform to SAS rules on naming conventions. EXAMPLE 2 A MORE COMPLEX TRANSPOSE Our beginning data for this example has another dimension. It includes the state, the popflag, a count of accounts and a sum of the balances those accounts have. Here is its appearance: State Popflag count balance DC Pop2 6 23861 DC Pop3 2 3544 DC Pop4 3 15485 DE Pop2 6 24388 DE Pop3 5 19154 DE Pop4 6 26540 FL Pop2 8 42289 FL Pop3 6 33745 FL Pop4 6 32695 The desired output for this procedure is to keep the state value on the left. We wish to transpose all of the numeric variables in the data set, those being count and balance. The ID variable will be the popflag variable in the data set. In this example we will be able to utilize the _NAME_ variable to keep the transposed variables straight. The code would look like this: proc sort data=test1; proc transpose data=test1 out=test2(drop=_label_) name=metrics; var count balance; As with the previous example, the transposed variables are listed in the VAR statement. In this example we use the NAME= option to title the name column and drop the _LABEL_ field. The resulting data looks like this: Obs State metrics Pop2 Pop3 Pop4 Pop1 1 DC count 6 2 3. 2 DC balance 23861 3544 15485. 3 DE count 6 5 6. 4 DE balance 24388 19154 26540. 5 FL count 8 6 6. 6 FL balance 42289 33745 32695. 7 GA count 8 8 4. 8 GA balance 54868 59833 25787. Note the name variable is now titled metrics, while the _LABEL_ field has been deleted. The package codes are now across and the transposed variable are vertical. 3

EXAMPLE 3 THE DOUBLE TRANSPOSE Transposition of the same data twice in theory should return you to the exact same data. However, there are a few quirks to the procedure that a programmer can use to your advantage. For this example, our starting data has different source channels for each month and the corresponding booked accounts. Obs State Popflag COUNT 1 GA Pop2 8 2 GA Pop3 8 3 GA Pop4 4 4 NC Pop1 22 5 NC Pop2 16 6 NC Pop3 9 7 NC Pop4 2 One note about our beginning data is that not all values of popflag are represented in each month. On some occasions, you would like to report on all values in all months, whether they had population or not. This is where the double transpose can come in handy. The first transpose is similar to previous examples. We are transposing the count variable with popflag as the title of each column, and transposing it by state. proc sort data=tr1; proc transpose data=tr1 out=tr2(drop=_label_); var count; As can be seen, the resulting data has a placeholder for the popflags without data, therefore achieving the result we desired. Also note that we kept the _NAME_ variable this time because we will need it in the second transpose. Obs State _NAME_ Pop2 Pop3 Pop4 Pop1 1 GA COUNT 8 8 4. 2 NC COUNT 16 9 2 22 The second transpose is an attempt to restore the data that we had originally. The syntax is designed to reverse the previous procedure. However, since the first transpose added placeholders for the missing months, they will be kept in the resulting data. proc sort data=tr2; proc transpose data=tr2 out=tr3(drop=_label_) name=popflag; var pop1 pop2 pop3 pop4; id _name_; The result of the second transpose returns the data back to its previous structure, but adds fields that were missing in the previous data. This gives the user a complete look at the data for all values of popflag and all states. 4

Obs State popflag COUNT 1 GA Pop1. 2 GA Pop2 8 3 GA Pop3 8 4 GA Pop4 4 5 NC Pop1 22 6 NC Pop2 16 7 NC Pop3 9 8 NC Pop4 2 CONCLUSIONS The PROC TRANPOSE can be a very useful procedure for SAS users. Once you pick up the syntax, it can serve a useful purpose in your coding arsenal and can make your life easier when desiring to shift data. ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Matt Taylor Carolina Analytical Consulting, LLC 8511 Davis Lake Parkway Ste # C6-285 Charlotte, NC 28269 704-947-8882 taylor_matthew@yahoo.com www.cacanalytics.com * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 5