Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison

Similar documents

UNIX Comes to the Rescue: A Comparison between UNIX SAS and PC SAS

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

EXST SAS Lab Lab #4: Data input and dataset modifications

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);

SUGI 29 Coders' Corner

Search and Replace in SAS Data Sets thru GUI

A Macro to Create Data Definition Documents

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health

Everything you wanted to know about MERGE but were afraid to ask

EXTRACTING DATA FROM PDF FILES

ABSTRACT INTRODUCTION SAS AND EXCEL CAPABILITIES SAS AND EXCEL STRUCTURES

A Method for Cleaning Clinical Trial Analysis Data Sets

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC

Creating Dynamic Reports Using Data Exchange to Excel

Applications Development ABSTRACT PROGRAM DESIGN INTRODUCTION SAS FEATURES USED

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports

SSN validation Virtually at no cost Milorad Stojanovic RTI International Education Surveys Division RTP, North Carolina

CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases

SAS Macros as File Management Utility Programs

Using Wharton's FDIC Research Database

Introduction to SAS Mike Zdeb ( , #122

Reading Delimited Text Files into SAS 9 TS-673

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS Comments How Straightforward Are They? Jacksen Lou, Merck & Co.,, Blue Bell, PA 19422

Technical Paper. Reading Delimited Text Files into SAS 9

More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

Optimizing System Performance by Monitoring UNIX Server with SAS

AN INTRODUCTION TO UNIX

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL

Paper An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois

Basics of STATA. 1 Data les. 2 Loading data into STATA

Data Cleaning 101. Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ. Variable Name. Valid Values. Type

Vendor: Crystal Decisions Product: Crystal Reports and Crystal Enterprise

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT

Handling Missing Values in the SQL Procedure

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

Paper FF-014. Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL

You have got SASMAIL!

Using SAS Enterprise Business Intelligence to Automate a Manual Process: A Case Study Erik S. Larsen, Independent Consultant, Charleston, SC

NØGSG DMR Contact Manager

ing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

Introduction. What is an Operating System?

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Nine Steps to Get Started using SAS Macros

Performing Queries Using PROC SQL (1)

Access Queries (Office 2003)

Using Stat/Transfer on the Linux/UNIX Systems

Chapter 2 The Data Table. Chapter Table of Contents

Paper Creating SAS Datasets from Varied Sources Mansi Singh and Sofia Shamas, MaxisIT Inc, NJ

Lab 2 : Basic File Server. Introduction

Using Delphi Data with Excel and Access

Flat Pack Data: Converting and ZIPping SAS Data for Delivery

Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA

Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database

ABSTRACT INTRODUCTION %CODE MACRO DEFINITION

PharmaSUG Paper AD11

Introduction to SAS Business Intelligence/Enterprise Guide Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN

Microsoft Access 3: Understanding and Creating Queries

Search help. More on Office.com: images templates

We begin by defining a few user-supplied parameters, to make the code transferable between various projects.

Simply Accounting Intelligence Tips and Tricks Booklet Vol. 1

Customized Excel Output Using the Excel Libname Harry Droogendyk, Stratia Consulting Inc., Lynden, ON

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1

Query. Training and Participation Guide Financials 9.2

SAS UNIX-Space Analyzer A handy tool for UNIX SAS Administrators Airaha Chelvakkanthan Manickam, Cognizant Technology Solutions, Teaneck, NJ

WS_FTP Professional 12

Using DDE and SAS/Macro for Automated Excel Report Consolidation and Generation

Fred Hantelmann LINUX. Start-up Guide. A self-contained introduction. With 57 Figures. Springer

SPSS: Getting Started. For Windows

Getting started with the Stata

Five Little Known, But Highly Valuable, PROC SQL Programming Techniques. a presentation by Kirk Paul Lafler

AN ANIMATED GUIDE: SENDING SAS FILE TO EXCEL

Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments

PC/SAS Archive and Restore System

DiskPulse DISK CHANGE MONITOR

Google Drive: Access and organize your files

From Validating Clinical Trial Data Reporting with SAS. Full book available for purchase here.

Data Analysis Tools. Tools for Summarizing Data

AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates

How To Write A Clinical Trial In Sas

Effective Use of SQL in SAS Programming

Instant Interactive SAS Log Window Analyzer

Training/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program

What You re Missing About Missing Values

Using the Magical Keyword "INTO:" in PROC SQL

Transcription:

Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Assumptions That you have used SAS at least a few times. It doesn t matter whether you run SAS in interactive mode (Windows) or in batch mode (Unix/Linux).

Interactive SAS for Windows

Editing SAS code in EMACS for Batch mode execution

Executing SAS in batch mode at the Linux prompt

Where we re going Rarely does data come to you in a form that is analyzable. As a best case, all you need to do is clean your data and check it for consistency. As a worst case, extensive manipulation of the data is needed in order to analyze. We want to get familiar with some tools in SAS used to check, clean, and manipulate data.

The dark toolbox SAS is like a toolbox as big as a garage with thousands of tools. For many SAS users, this toolbox is dark and there is a monster lurking inside. Let s turn on the light and meet a few of the tools available in SAS. No guarantees about the monster

Published resources Lots available I ve learned on my own and with the SAS documentation Google searches

SAS Online Documentation http://shelf.doit.wisc.edu/sasonlinedocs/onldoc.htm or http://shelf.doit.wisc.edu/sasonlinedocs/main.htm

What I use most often in the SAS Online Documentation Base SAS Software SAS Language Reference: Concepts SAS Language Reference: Dictionary SAS Macro Language Reference SAS Procedures Guide SAS/STAT SAS/STAT User s Guide

SAS Language Reference: Dictionary

SAS Procedures Guide

SAS/STAT User s Guide

Conventions, terminology, and options

Conventions SAS terminology in red SAS code in blue

Basic terminology Data in SAS exists as a data set, with variables and observations. variables are the columns. observations are the rows. Two types of variables: Character and numeric. Character variables can range in length from 1 to 32,767 = 2 15 characters. Numeric variables can be virtually any size (within the limitations of the computer)

My favorite options On the first line of almost every SAS program I write, I include the following: options nodate nocenter nonumber ps=3000 ls=200 mprint mlogic symbolgen; These options control the format of the output and make macro code easier to debug

Importing data into SAS

Importing data into SAS Data can exist in many forms (text file, Excel spreadsheet, permanent SAS data set, etc.) Excel spreadsheets are probably the most common form. Can use DDE (dynamic data exchange) (Windows version of SAS only) But for Excel files, I like to use CSV file format (comma separated value). Works on any platform.

Excel CSV text file SAS data set Column of j s provide a buffer at the end of each line of text in the CSV file. If you are running SAS on a Linux or UNIX machine, you need to add the j s (or use the dos2unix command to convert the CSV file to the text formatting used by UNIX).

Save Excel spreadsheet in the CSV format

How the CSV file looks (when viewed with a text editor) Location,Type,Length,j Albuquerque,1,1.414,j Albuquerque,1,2.000,j Albuquerque,1,1.414,j Albuquerque,1,2.236,j Albuquerque,2,2.000,j Albuquerque,2,2.236,j Albuquerque,2,2.236,j Albuquerque,2,2.236,j Lexington,1,2.000,j

SAS code to import the CSV file data fake; infile c:\mydata\fake.csv dsd firstobs=2; input location :$14. type length; proc print; run; Note, if I were to use: input location $ type length; it would truncate the location variable to 8 characters in length.

Results from the proc print Obs location type length 1 Albuquerque 1 1.414 2 Albuquerque 1 2.000 3 Albuquerque 1 2.236 4 Albuquerque 2 2.000 5 Albuquerque 2 2.236 6 Albuquerque 2 2.236 7 Lexington 1 2.000 8 Lexington 1 2.000 9 Lexington 1 1.414 10 Lexington 1 2.000 11 Lexington 2 1.732 12 Lexington 2 1.732 13 Lexington 2 2.236 14 Johannesburg 1 2.000 15 Johannesburg 1 2.236 16 Johannesburg 1 2.236 17 Johannesburg 2 2.000 18 Johannesburg 2 1.414 19 Johannesburg 2 2.000

Checking and summarizing data

Checking and summarizing data This is IMPORTANT! Don t take for granted that there aren t mistakes in your data. proc contents data=fake; proc freq data=fake; run;

Result from proc contents Data Set Name: WORK.FAKE Observations: 19 Member Type: DATA Variables: 3 Engine: V8 Indexes: 0 Created: 11:46 Wednesday, April 16, 2003 Observation Length: 32 Last Modified: 11:46 Wednesday, April 16, 2003 Deleted Observations: 0.. File Name: /tmp/sastmp_sego/sas_workb6b700000836_ gstat201.stat.wisc.edu/fake.sas7bdat.. -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ----------------------------------- 3 length Num 8 8 1 location Char 14 16 2 type Num 8 0

The FREQ Procedure Results from proc freq Cumulative Cumulative location Frequency Percent Frequency Percent ----------------------------------------------------------------- Albuquerque 6 31.58 6 31.58 Johannesburg 6 31.58 12 63.16 Lexington 7 36.84 19 100.00 Cumulative Cumulative type Frequency Percent Frequency Percent --------------------------------------------------------- 1 10 52.63 10 52.63 2 9 47.37 19 100.00 Cumulative Cumulative length Frequency Percent Frequency Percent ----------------------------------------------------------- 1.414 3 15.79 3 15.79 1.732 2 10.53 5 26.32 2 8 42.11 13 68.42 2.236 6 31.58 19 100.00

Selecting subsets of the data

Selecting observations (rows) A large contiguous group of observations Specific observation numbers Using selection criteria e.g. when the location is Lexington or when the length is between 1 and 2.

Selecting a group of contiguous observations data smallfake; set fake (firstobs=10 obs=15); proc print; Data set options Obs location type length 1 Lexington 1 2.000 2 Lexington 2 1.732 3 Lexington 2 1.732 4 Lexington 2 2.236 5 Johannesburg 1 2.000 6 Johannesburg 1 2.236 Selects observations 10 through 15

Selecting specific observation numbers data smallfake; set fake; if _n_ in (7,11,16); proc print; Obs location type length 1 Lexington 1 2.000 2 Lexington 2 1.732 3 Johannesburg 1 2.236 Selects observation numbers 7, 11, and 16.

Selection criteria: where statement data smallfake; set fake; where location = Lexington ; or where location ne Lexington ; or where location in ( Lexington, Albuquerque ); or where (1 le length le 2); or where (length > 2.3);

Selection criteria: if statement data smallfake; set fake; if location in ( Lexington, Albuquerque ); or if location = Lexington location = Albuquerque ; or if location = Johannesburg then delete; These three if statements produce identical results.

Some comparison operators ne or ^= not equals to eq or = equals to ge or >= greater than or equal to gt or > greater than le or <= less than or equal to lt or < less than in if contained in a group not in if not contained in a group and or & and logical operator or or or logical operator

Selecting and managing variables

Selecting variables using keep data smallfake (keep = location length); set fake; where type = 1; (type is available for processing, but not written to smallfake data set) Data set options data smallfake; set fake (keep = location length); (type is not available for processing)

Selecting variables using drop data smallfake (drop = type); set fake; where type = 1; (type is available for processing, but not written to smallfake data set) Data set options data smallfake; set fake (drop = type); (type is not available for processing)

Renaming variables location type data fake1; set fake (rename=(location=place type=trt)); where trt = 1; trt place data fake2 (rename=(location=place type=trt)); set fake; where type =1; data fake3 (drop = location type); set fake; where type = 1; place = location; trt = type; These three pieces of code achieve the same result. Look closely at the where statements.

Concatenation

Concatenation (stacking) SAS can stack multiple data sets on top of one another. Pay attention whether or not the variables and their attributes (length and variable type) match among the different data sets. Can use the set statement or proc append to concatenate data sets.

Using the set statement to concatenate data sets Suppose you wanted to stack the three data sets fake, faux, and fraud on top of one other: data fantastic; set fake faux fraud;

Using proc append to concatenate data sets proc append concatenates only two data sets at a time and typically these data sets must have the same variable names with the same attributes. proc append base=fake data=faux; Here the observations in faux are tacked onto the end of the fake data set. The combined data set is called fake.

Splitting data into several data sets Supposed we want all Albuquerque observations to go into a data set called albuq, the Lexington observations to go into the data set lexing, and observations that have lengths larger than 3.0 into the data set large. data albuq lexing large; set fake; if location = Albuquerque then output albuq; else if location = Lexington then output lexing; if length gt 3 then output large;

Merging data

Merging (combining) data Merging data sets places two or more data sets side by side into a single data set. If you simply want place two data sets side by side (1 to 1 merging): data faux; set faux (rename=(location=location1 type=type1)); data fantastic; merge fake faux; proc print data = fantastic;

Results of 1 to 1 merge Obs location type length location1 type1 weight 1 Albuquerque 1 1.414 Lexington 2 4.456 2 Albuquerque 1 2.000 Lexington 2 2.546 3 Albuquerque 1 2.236 Lexington 2 3.789 4 Albuquerque 2 2.000 Lexington 1 2.457 5 Albuquerque 2 2.236 Lexington 1 4.456 6 Albuquerque 2 2.236 Lexington 1 4.551 7 Lexington 1 2.000 Lexington 1 3.246 8 Lexington 1 2.000 Johannesburg 1 2.998 9 Lexington 1 1.414 Johannesburg 1 2.721 10 Lexington 1 2.000 Johannesburg 1 2.115 11 Lexington 2 1.732 Johannesburg 2 3.489 12 Lexington 2 1.732 Johannesburg 2 3.447 13 Lexington 2 2.236 Johannesburg 2 4.122 14 Johannesburg 1 2.000 Albuquerque 1 2.118 15 Johannesburg 1 2.236 Albuquerque 1 2.871 16 Johannesburg 1 2.236 Albuquerque 1 2.516 17 Johannesburg 2 2.000 Albuquerque 2 3.445 18 Johannesburg 2 1.414 Albuquerque 2 2.998 19 Johannesburg 2 2.000 Albuquerque 2 3.549

Merging data with a by variable All data sets that will be merged must first be sorted by the linking variables: proc sort data = fake; by location type; proc sort data = faux; by location type; data fantastic; merge fake faux; by location type; proc print data = fantastic;

Results of merging with by variables Obs location type length weight 1 Albuquerque 1 1.414 2.118 2 Albuquerque 1 2.000 2.871 3 Albuquerque 1 2.236 2.516 4 Albuquerque 2 2.000 3.445 5 Albuquerque 2 2.236 2.998 6 Albuquerque 2 2.236 3.549 7 Johannesburg 1 2.000 2.998 8 Johannesburg 1 2.236 2.721 9 Johannesburg 1 2.236 2.115 10 Johannesburg 2 2.000 3.489 11 Johannesburg 2 1.414 3.447 12 Johannesburg 2 2.000 4.122 13 Lexington 1 2.000 2.457 14 Lexington 1 2.000 4.456 15 Lexington 1 1.414 4.551 16 Lexington 1 2.000 3.246 17 Lexington 2 1.732 4.456 18 Lexington 2 1.732 2.546 19 Lexington 2 2.236 3.789

More about merging When you merge with a by statement, you may only want observations that have by-variable matches in both data sets. fake fraud

Using (in =) data set option Assume both fake and fraud are sorted by location and type. data fantastic; merge fake (in = tmp1) fraud (in = tmp2); by location type; from_fake = tmp1; from_fraud = tmp2; proc print;

Identifying obs from both data sets from_ from_ Obs location type length thickness fake fraud 1 Albuquerque 1 1.414 1.120 1 1 2 Albuquerque 1 2.000 1.120 1 1 3 Albuquerque 1 2.236 1.120 1 1 4 Albuquerque 2 2.000. 1 0 5 Albuquerque 2 2.236. 1 0 6 Albuquerque 2 2.236. 1 0 7 Albuquerque 3. 1.400 0 1 8 Albuquerque 3. 1.540 0 1 9 Albuquerque 3. 1.968 0 1 10 Johannesburg 1 2.000 1.320 1 1 11 Johannesburg 1 2.236 1.945 1 1 12 Johannesburg 1 2.236 1.945 1 1 13 Johannesburg 2 2.000 1.440 1 1 14 Johannesburg 2 1.414 1.440 1 1 15 Johannesburg 2 2.000 1.440 1 1 16 Johannesburg 3. 1.521 0 1 17 Johannesburg 3. 1.500 0 1 18 Lexington 1 2.000. 1 0 19 Lexington 1 2.000. 1 0 20 Lexington 1 1.414. 1 0 21 Lexington 1 2.000. 1 0 22 Lexington 2 1.732 1.160 1 1 23 Lexington 2 1.732 0.783 1 1 24 Lexington 2 2.236 0.783 1 1 25 Lexington 4. 1.372 0 1 26 Lexington 4. 1.980 0 1

Using (in =) data set option Now select observations that are common to both data sets: data fantastic; merge fake (in = tmp1) fraud (in = tmp2); by location type; if tmp1=1 and tmp2=1; proc print;

After selecting for observations in common: Obs location type length thickness 1 Albuquerque 1 1.414 1.120 2 Albuquerque 1 2.000 1.120 3 Albuquerque 1 2.236 1.120 4 Johannesburg 1 2.000 1.320 5 Johannesburg 1 2.236 1.945 6 Johannesburg 1 2.236 1.945 7 Johannesburg 2 2.000 1.440 8 Johannesburg 2 1.414 1.440 9 Johannesburg 2 2.000 1.440 10 Lexington 2 1.732 1.160 11 Lexington 2 1.732 0.783 12 Lexington 2 2.236 0.783

Merge mania data fantastic; merge fake fraud; by location type; data fantastic; merge fake (in=tmp1) fraud; by location type; if tmp1 = 1; data fantastic; merge fake (in=tmp1) fraud (in=tmp2); by location type; if tmp1 =1 and tmp2 = 1;

Creating new variables

Creating new variables data fantastic; merge fake faux; by location type; newcode = substr(location,1,1) '-' trim(left(type)); growth_index = length + weight**2; if (growth_index gt 15) then large = '*'; else large = ' ';

Results of new variables growth_ Obs location type length weight newcode index large 1 Albuquerque 1 1.414 2.118 A-1 5.8999 2 Albuquerque 1 2.000 2.871 A-1 10.2426 3 Albuquerque 1 2.236 2.516 A-1 8.5663 4 Albuquerque 2 2.000 3.445 A-2 13.8680 5 Albuquerque 2 2.236 2.998 A-2 11.2240 6 Albuquerque 2 2.236 3.549 A-2 14.8314 7 Johannesburg 1 2.000 2.998 J-1 10.9880 8 Johannesburg 1 2.236 2.721 J-1 9.6398 9 Johannesburg 1 2.236 2.115 J-1 6.7092 10 Johannesburg 2 2.000 3.489 J-2 14.1731 11 Johannesburg 2 1.414 3.447 J-2 13.2958 12 Johannesburg 2 2.000 4.122 J-2 18.9909 * 13 Lexington 1 2.000 2.457 L-1 8.0368 14 Lexington 1 2.000 4.456 L-1 21.8559 * 15 Lexington 1 1.414 4.551 L-1 22.1256 * 16 Lexington 1 2.000 3.246 L-1 12.5365 17 Lexington 2 1.732 4.456 L-2 21.5879 * 18 Lexington 2 1.732 2.546 L-2 8.2141 19 Lexington 2 2.236 3.789 L-2 16.5925 *

Common functions used to manipulate text strings compress index left scan substr trim Refer to SAS Online Docs: Base SAS Software SAS Language Reference: Dictionary Dictionary of Language Elements Functions and Call Routines

by-group processing

by-group processing Suppose you wanted a subset of the data that contained the observation with the smallest length from each location. proc sort data = fake; by location length; data shortest; set fake; by location length; first = first.location; last = last.location;

output from by-group processing Obs location type length first last 1 Albuquerque 1 1.414 1 0 2 Albuquerque 1 2.000 0 0 3 Albuquerque 2 2.000 0 0 4 Albuquerque 1 2.236 0 0 5 Albuquerque 2 2.236 0 0 6 Albuquerque 2 2.236 0 1 7 Johannesburg 2 1.414 1 0 8 Johannesburg 1 2.000 0 0 9 Johannesburg 2 2.000 0 0 10 Johannesburg 2 2.000 0 0 11 Johannesburg 1 2.236 0 0 12 Johannesburg 1 2.236 0 1 13 Lexington 1 1.414 1 0 14 Lexington 2 1.732 0 0 15 Lexington 2 1.732 0 0 16 Lexington 1 2.000 0 0 17 Lexington 1 2.000 0 0 18 Lexington 1 2.000 0 0 19 Lexington 2 2.236 0 1

by-group processing proc sort data = fake; by location length; data shortest; set fake; by location length; if first.location = 1; Obs location type length 1 Albuquerque 1 1.414 2 Johannesburg 2 1.414 3 Lexington 1 1.414

Basic macros

Basic macros SAS macros allow you to easily program repetitive tasks. On the surface, creating a SAS macro is very similar to creating a function in R or S-Plus. SAS Macro is actually a text generation tool.

Name macro, begin macro definition, identify macro variables Code to be generated by macro Note the use of proc append %macro analyze(dataset=,response=); proc mixed data = &dataset; class location type; model &response = location type; lsmeans location type; ods output lsmeans=model_means; data model_means; set model_means; variable = "&response"; Macro example proc append base=results data=model_means; End macro definition Call macro Print results %mend analyze; %analyze(dataset=fake,response=length) %analyze(dataset=faux,response=weight) proc print data = results;

Results from macro code Obs Effect location type Estimate StdErr DF tvalue Probt variable 1 location Albuquerque _ 2.0203 0.1243 15 16.25 <.0001 length 2 location Johannesburg _ 1.9810 0.1243 15 15.94 <.0001 length 3 location Lexington _ 1.8728 0.1155 15 16.21 <.0001 length 4 type 1 1.9621 0.09678 15 20.27 <.0001 length 5 type 2 1.9540 0.1015 15 19.25 <.0001 length 6 location Albuquerque _ 2.9162 0.2844 15 10.25 <.0001 weight 7 location Johannesburg _ 3.1487 0.2844 15 11.07 <.0001 weight 8 location Lexington _ 3.6842 0.2643 15 13.94 <.0001 weight 9 type 1 2.9614 0.2214 15 13.37 <.0001 weight 10 type 2 3.5379 0.2322 15 15.23 <.0001 weight

proc transpose

Rearranging data with proc transpose Consider this output from proc mixed: Obs year cultivar Effect trt Estimate StdErr DF tvalue Probt 1 95 1 Intercept _ 1.5930 0.4477 38 3.56 0.0010 2 95 1 trt 1 0.2987 0.7485 35 0.40 0.6923 3 95 1 calevel _ -0.00155 0.002401 35-0.65 0.5229 4 95 1 calevel*trt 1-0.00010 0.003992 35-0.02 0.9804 5 95 2 Intercept _ 0.5331 0.2782 38 1.92 0.0628 6 95 2 trt 1 0.3637 0.3501 36 1.04 0.3058 7 95 2 calevel _ 0.002348 0.001570 36 1.49 0.1437 8 95 2 calevel*trt 1-0.00170 0.002173 36-0.78 0.4388 13 96 1 Intercept _ 1.4129 0.3650 10 3.87 0.0031 14 96 1 trt 1 0.4670 0.4387 160 1.06 0.2887 15 96 1 calevel _ 0.003074 0.001146 160 2.68 0.0081 16 96 1 calevel*trt 1-0.00127 0.001387 160-0.92 0.3598

Results from proc transpose proc transpose data=solution out=tsolution; by year cultivar; var estimate; id effect; calevel_ Obs year cultivar _NAME_ Intercept trt calevel trt 1 95 1 Estimate 1.5930 0.2987-0.00155-0.00010 2 95 2 Estimate 0.5331 0.3637 0.002348-0.00170 3 95 3 Estimate 1.3996 0.08453-0.00121 0.000326 4 96 1 Estimate 1.4129 0.4670 0.003074-0.00127 5 96 2 Estimate 1.9705 0.1312 0.003656-0.00073 6 96 3 Estimate 3.2758-0.1568-0.00131 0.001427 7 96 4 Estimate 2.6396 0.7034 0.001114-0.00237 8 97 1 Estimate 1.3943 0.2684-0.00026-0.00120 9 97 2 Estimate 1.5388-0.2840-0.00183 0.002834

Parting words of advice

Attributes of SAS SAS is read/write intensive. Every time you create a data set, the data set is written to the disk. Where does it get written? To the SAS Work Library, which is assigned to a directory somewhere..use proc contents to find out. For CALS HP users and PC users, the SAS Work Library resides on the actual machine.

Attributes of SAS Users of the AFS system beware! (Stat department, CS department) The SAS Work Library is assigned to your account in AFS not to the local machine that is running SAS. Your local computer running SAS Network traffic AFS recording and reading your SAS data sets

Assigning the SAS Work Library To assign the SAS work library to a local directory (when running in batch mode on a Linux or Unix system): $ sas mysascode.sas -work /scratch

Synthesis Most of what we ve covered today involves the data step. Many of the techniques shown in this presentation can be applied together in a single data step. Now that you know the names of some of the tools, use the Online Documentation!