SAS AGGREGATING MULTIPLE RECORDS INTO ONE WIDE RECORD

Similar documents
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University


Beyond the Simple SAS Merge. Vanessa L. Cox, MS 1,2, and Kimberly A. Wildes, DrPH, MA, LPC, NCC 3. Cancer Center, Houston, TX.

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Best Practice in SAS programs validation. A Case Study

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

Please use BLOCK LETTERS and place an X in the relevant boxes. Please answer all questions that apply to you.

Survey Analysis: Options for Missing Data

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

(V1) Independent Student Standard Verification Group

Paper Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Introduction to SAS Informats and Formats

SPSS Workbook 1 Data Entry : Questionnaire Data

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Paper Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Getting started with the Stata

Introduction to Criteria-based Deduplication of Records, continued SESUG 2012

EXST SAS Lab Lab #4: Data input and dataset modifications

PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC


ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

Running Descriptive Statistics: Sample and Population Values

Fuel Allowance under the National Fuel Scheme

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);

Student Quick Guide to MMU Online Graduation booking system 2015

: SAS-Institute-Systems A : Sas Base Programming Exam

HELP! - My MERGE Statement Has More Than One Data Set With Repeats of BY Values!

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Health and Safety Benefit

Disablement Benefit and/or Incapacity Supplement under the Occupational Injuries Scheme

Text Analytics Illustrated with a Simple Data Set

Personal Financial Manager (PFM) FAQ s

How To Claim Death Benefits In The United States

Salary. Cumulative Frequency

4 Other useful features on the course web page. 5 Accessing SAS

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

Banner Web Time Entry User Guide. Students. Delaware State University 1 Banner Web Entry

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

White Earth Early Learning Scholarship Program Information about the program Household Size Gross income How to complete the application:

Everything you wanted to know about MERGE but were afraid to ask

Computer Analysis of Survey Data -- File Organization for Multi-Level Data. Chris Wolf Ag Econ Computer Service Michigan State University 1/3/90

Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA

5. Crea+ng SAS Datasets from external files. GIORGIO RUSSOLILLO - Cours de prépara+on à la cer+fica+on SAS «Base Programming»

SAS Certified Base Programmer for SAS 9 A SAS Certification Questions and Answers with explanation

Household V1-Veri ication Worksheet McMurry University

College SAS Tools. Hope Opportunity Jobs

Setting your Themis notification preferences

Host Excellence. Client Helpdesk. Version 1.0

Paper PO06. Randomization in Clinical Trial Studies

Navigation Course. Introduction to Navigation in SAP Solutions and Products

Electronic Reporting of Drinking Water Quality Monitoring Information. User Manual for Drinking Water Operators. Web Form Data Submission

Independent Verification

ORBIS QuickGuide Copyright 2003 Bureau van Dijk Electronic Publishing ( Last updated July 2003

Document Services Customer Accounts Online

How Parents Use Single Sign On and New PowerSchool Features

Importing Data into SAS

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health

Performing Queries Using PROC SQL (1)

Web Advisor Instructions Link to WebAdvisor through Sacred Heart Website by clicking on MYSHU Log In and Choose Faculty Point of Entry

Tax Preparation - Client Information Sheet

An automatic predictive datamining tool. Data Preparation Propensity to Buy v1.05

Introduction to proc glm

SNACS. Free and Reduced Price School Meal Application Guide

Consumer Guide for Annual Household Income Data Matching Issues

SDTM, ADaM and define.xml with OpenCDISC Matt Becker, PharmaNet/i3, Cary, NC

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Functional Test Plan Template

Health and Safety Benefit Forms

Fidelity Charitable Gift Fund Gender Differences in Charitable Giving 2009 Executive Summary

Deciding How Much Financial Assistance to Use to Lower Your Monthly Premiums

QAS Small Business for Salesforce CRM

TRAINING FOR VSO LESSON TEN DEPENDENCY WHO AND HOW?

GOD GAVE HIS CHILDREN A PATH THROUGH THE SEA (A.2.Spring.7)

VPN 3000 Concentrator Bandwidth Management Configuration Example

Categorical Variables

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1

Step-by-Step Guide to Creating a Hunting, Angling and Trapping Licence (HAL) Account and Purchasing a Licence

HIPAA Privacy and Security. Rochelle Steimel, HIPAA Privacy Official Judy Smith, Staff Development January 2012

Jefferson County Public Schools. FRYSC Tips and Tricks

Business Intelligence Getting Started Guide

Transcription:

SAS AGGREGATING MULTIPLE S INTO ONE WIDE In some cases raw data files have multiple records that all belong to one larger overall unit. In the example below, there is one record in the raw data file for each member of the larger overall unit: household. A common requirement is to combine these multiple records into one single wide record that contains all the information for the overall unit. In the example for this tip sheet the requirement is to create one wide record containing all the information for all the members of the household. In SAS, this operation is referred to as aggregation. In the example below: the SAS dataset households has information on the Household ID (variable: householdid), the Household member (variable: householdmember), the Household members gender (variable: gender), and the Household member s age (variable: age). Note there are multiple observations for a given Household in the raw data. This program illustrates how to create a new SAS dataset named: singlerecord, which has all the information for all the members of the household in one wide observation. In SAS terminology, the program aggregates the individual records for the Household into one wide record. Please refer to the Code Notes section below for an explanation of what the code is doing in those sections identified by a number in a red-filled circle. * Aggregation: combining multiple observations into one wide record in SAS */ data households ; input householdid householdmember :$30. gender :$1. age ; datalines ; 23914 head F 47 23914 child F 8 11654 child M 6 11654 spouse M 31 11654 head F 30 11654 child F 2 91203 head M 57 91203 spouse F 61 ; proc print data = households; title "Information on a sample of Canadian Households"; title2 "Source file has one record for each member of each household" ; run; Step 1: Create the SAS dataset: households Step 2: Print the dataset to illustrate it has multiple records for a household unit. 2/18/2009 9:50:00 AM Page 1

SAS AGGREGATING MULTIPLE S INTO ONE WIDE proc sort data = households ; by householdid ; Step 3: Sort the dataset by the overall unit identifier to group all records for a given household together. data singlerecord ; set households ; by householdid ; 1 2 2 /* if this is the first record for the household then initialize all variables for this household */ if first.householdid then do ; headgender = ' '; headage =. ; end; spousegender = ' ' ; spouseage =. ; numberofchildren = 0 ; if householdmember = 'head' then do ; headgender = gender ; headage = age ; end ; if householdmember = 'spouse' then do ; spousegender = gender ; spouseage = age ; end ; Step 4: Create the wide dataset singlerecord. (this step is continued on the next page). See the Code Notes section of this document for an explanation of what the code is doing in those areas identified with red-filled circles. 3 if householdmember = 'child' then do ; numberofchildren = numberofchildren + 1 ; end ; 2/18/2009 9:50:00 AM Page 2

SAS AGGREGATING MULTIPLE S INTO ONE WIDE 4 5 /* if the last record for this household has been processed, output the single wide record to the dataset */ if last.householdid then output ; retain numberofchildren headgender headage spousegender spouseage; keep householdid headgender headage spousegender spouseage numberofchildren ; Step 4 continued proc print data = singlerecord ; title "Information on a sample of Canadian Households"; title2 "One record for the entire household"; run; Step 5: Print the wide record dataset to confirm the aggregation worked as expected. proc means data = singlerecord ; title "Descriptive information on Canadian Households" ; var headage spouseage numberofchildren ; Step 6: This step illustrates an optional procedure (PROC MEANS) that would be difficult to perform on multiple records per unit dataset households. 2/18/2009 9:50:00 AM Page 3

SAS AGGREGATING MULTIPLE S INTO ONE WIDE Output produced by this program s PROCS (PROCEDURES): Information on a sample of Canadian Households Source file has one record for each member of each household household household Obs ID Member gender age 1 23914 head F 47 2 23914 child F 8 3 11654 child M 6 4 11654 spouse M 31 5 11654 head F 30 6 11654 child F 2 7 91203 head M 57 8 91203 spouse F 61 Output produced by Step 2 (Print the dataset to illustrate it has multiple records for a household unit.) Information on a sample of Canadian Households One record for the entire household number household head head spouse spouse Of Obs ID Gender Age Gender Age Children 1 11654 F 30 M 31 2 2 23914 F 47. 1 3 91203 M 57 F 61 0 Output produced by Step 5 (Print the wide record dataset to confirm the aggregation worked as expected.) 2/18/2009 9:50:00 AM Page 4

SAS AGGREGATING MULTIPLE S INTO ONE WIDE Descriptive information on Canadian Households The MEANS Procedure Variable N Mean Std Dev Minimum Maximum headage 3 44.6666667 13.6503968 30.0000000 57.0000000 spouseage 2 46.0000000 21.2132034 31.0000000 61.0000000 numberofchildren 3 1.0000000 1.0000000 0 2.0000000 Output produced by Step 6 (This step illustrates an optional procedure (PROC MEANS) that would be difficult to perform on multiple records per unit dataset households.) Code Notes: (explanation of what the code is doing in those areas identified by a number in a red-filled circle) 1. The code in this section is executed when the first record for any household is encountered. In this section you do two things: (a) name all the variables to appear on the wide record and (b) initialize these variables i.e set them all to missing. 2. Add additional IF code blocks for each possible value of householdmember you wish to map to variables in the wide record. In this example we have two IF code blocks. One is mapping information on the head of the household to variables headgender and headage., The second IF code block is mapping information on the spouse in the household to variables spousegender and spouseage. 3. In this example, we re just counting the number of children in the household, not mapping each child s information as a separate set of variables on the wide record as we did in area 2 above. 4. Write out the wide observation to the dataset singlerecord when you have processed the last observation for this household from the households dataset. 5. The RETAIN statement tells SAS to remember the values of these variables as a new observation is read in from the dataset households. 2/18/2009 9:50:00 AM Page 5

SAS AGGREGATING MULTIPLE S INTO ONE WIDE 6. The KEEP statement names the variables that should appear in the new dataset being constructed (singlerecord). Any variables in the data step not named on the KEEP statement are not written to the dataset. Tips: All records in the multiple-records-per-unit dataset (dataset: households in this example) must have a common variable that uniquely identifies the overall unit the record belongs to. (in this example, the variable is householdid ). All records in the multiple-records-per-unit dataset (dataset: households in this example) must have a common variable that uniquely identifies the role the observation plays in the larger overall unit. (in this example, this variable is householdmember this variable indicates whether the observation is for the head of the household, the spouse, a child, etc. ). See Step 3: the multiple-records-per-unit dataset (dataset: households in this example) must be sorted by the overall unit identifier (in this example, the variable householdid) for the code in Step 4 to work. It is helpful to print the aggregated dataset (singlerecord) to verify that the aggregation worked as expected (see Step 5 above). Always examine the SAS log for any error messages or warnings. 2/18/2009 9:50:00 AM Page 6