LOCF-Different Approaches, Same Results Using LAG Function, RETAIN Statement, and ARRAY Facility Iuliana Barbalau, ClinOps LLC. San Francisco, CA.



Similar documents
Paper Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Defining a Validation Process for End-user (Data Manager / Statisticians) SAS Programs

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

Demonstrating a DATA Step with and without a RETAIN Statement

Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

Changing the Shape of Your Data: PROC TRANSPOSE vs. Arrays

Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

A Many to Many Merge, Without SQL? Paper TU05

Using SAS/FSP Software for Ad hoc Reporting By Marc Schlessel SPS Software Services Inc.

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

PharmaSUG Paper MS05

Survey Analysis: Options for Missing Data

Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD

PharmaSUG Paper AD08

Transforming SAS Data Sets Using Arrays. Introduction

DATA CLEANING: LONGITUDINAL STUDY CROSS-VISIT CHECKS

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

Normalized EditChecks Automated Tracking (N.E.A.T.) A SAS solution to improve clinical data cleaning

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

Paper PO06. Randomization in Clinical Trial Studies

PharmaSUG 2014 Paper CC23. Need to Review or Deliver Outputs on a Rolling Basis? Just Apply the Filter! Tom Santopoli, Accenture, Berwyn, PA

Improving Maintenance and Performance of SQL queries

What Is Recursion? Recursion. Binary search example postponed to end of lecture

How to Create an XML Map with the XML Mapper

How to build ADaM from SDTM: A real case study

SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC

KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG

Qualification Process for Standard Scripts in the Open Source Repository with Cloud Services

Quality Assurance: Best Practices in Clinical SAS Programming. Parag Shiralkar

SAS Enterprise Guide A Quick Overview of Developing, Creating, and Successfully Delivering a Simple Project

Introduction to Criteria-based Deduplication of Records, continued SESUG 2012

SAS and Clinical IVRS: Beyond Schedule Creation Gayle Flynn, Cenduit, Durham, NC

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Applications Development

CC03 PRODUCING SIMPLE AND QUICK GRAPHS WITH PROC GPLOT

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Converting Electronic Medical Records Data into Practical Analysis Dataset

EXST SAS Lab Lab #4: Data input and dataset modifications

Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL

Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC

SAS Logic Coding Made Easy Revisit User-defined Function Songtao Jiang, Boston Scientific Corporation, Marlborough, MA

Table Lookups: From IF-THEN to Key-Indexing

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports

Programming Idioms Using the SET Statement

USE CDISC SDTM AS A DATA MIDDLE-TIER TO STREAMLINE YOUR SAS INFRASTRUCTURE

Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates

Paper PO12 Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs, a process overview with real world examples ABSTRACT INTRODUCTION

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

PharmaSUG Paper QT26

ABSTRACT INTRODUCTION %CODE MACRO DEFINITION

Anyone Can Learn PROC TABULATE

Using Proc SQL and ODBC to Manage Data outside of SAS Jeff Magouirk, National Jewish Medical and Research Center, Denver, Colorado


The Query Builder: The Swiss Army Knife of SAS Enterprise Guide

Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN

Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina

B A S I C S C I E N C E S

ABSTRACT INTRODUCTION

Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI

Paper PO03. A Case of Online Data Processing and Statistical Analysis via SAS/IntrNet. Sijian Zhang University of Alabama at Birmingham

A Method for Cleaning Clinical Trial Analysis Data Sets

Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA

SENDING S IN SAS TO FACILITATE CLINICAL TRIAL. Frank Fan, Clinovo, Sunnyvale CA

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Leads and Lags: Static and Dynamic Queues in the SAS DATA STEP

ALS Configuration Management Plan. Nuclear Safety Related

APP INVENTOR. Test Review

BUSINESS PROCESS DOCUMENTATION

Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing

Imputing Missing Data using SAS

The entire SAS code for the %CHK_MISSING macro is in the Appendix. The full macro specification is listed as follows: %chk_missing(indsn=, outdsn= );

Taming the PROC TRANSPOSE

Storing and Using a List of Values in a Macro Variable

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Managing Data Issues Identified During Programming

Training/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program

Paper Creating SAS Datasets from Varied Sources Mansi Singh and Sofia Shamas, MaxisIT Inc, NJ

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT

Calculating Changes and Differences Using PROC SQL With Clinical Data Examples

Managing A Leadership Transition

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI

Search and Replace in SAS Data Sets thru GUI

Transcription:

LOCF-Different Approaches, Same Results Using LAG Function, RETAIN Statement, and ARRAY Facility Iuliana Barbalau, ClinOps LLC. San Francisco, CA. ABSTRACT LOCF stands for Last Observation Carried Forward and is a frequently used method in Clinical Trials Environment, and a popular imputation method used in the pharmaceutical industry. For example, if a patient drops out of the study after the second week, then the value is carried forward until the end of the treatment as a conservative estimate of how well the subject would have done had he or she remained in the study [1]. In another application, a vital signs dataset contains an observation for each time a patient has a scheduled/unscheduled visit with a doctor. If a patient missed an assigned visit, then the primary measure is missing. For this paper, we will be considering the weight, as measurement of interest. LOCF imputation would use non-missing weight from an early visit as the weight for the later missing visit. This paper introduces SAS syntax to accomplish LOCF, and demonstrates the use of RETAIN statement, ARRAY facility, and LAG function. THE ORGANIZATION OF THIS PAPER 1. Example data set before and after LOCF 2. LOCF using RETAIN statement 3. LOCF using ARRAY facility and LAG function 4. Conclusions 1. EXAMPLE OF DATA SET BEFORE AND AFTER LOCF Below is a SAS code that generates four distinct patients with 1 to 4 distinct visits and random measurements for weight. A couple of patients have missing weight information such as patient 1 at visits 2 and 3, patient 2 at visit 3, patient 3 at visit 2 and patient 4 at visits 1 and 3. We can use LOCF methods to retain the non-missing weight measurement from an early visit if the weight measurement for current visit is missing. data sda; input ptno visit weight; format ptno z3. ; cards; 1 1 122 1 2. 1 3. 1 4 123 2 1 156 2 3. 3 1 112 3 2. 4 1. 4 2 123 4 3. ; The data before applying LOCF looks like the one below figure 1. Please note all the patients have measurements available for visit 1 (considered baseline, or screening), except patient 4. Patient 4 is considered an exception. It doesn t have weight measurement available for visit 1. We will check for this particular patient after we apply LOCF method. If the LOCF method is applied correctly, the weight measurement will be missing in the final dataset for patient 4 at visit 1, while for the other patients the missing weights will be carried forward from previous non-missing visits. 1

Figure 1 Data set sda before LOCF After applying LOCF method, data should look like figure 2 below. Please note that patient 4 is missing weight information at visit 1. When applying LOCF methods we need to make sure the correct information for the correct patient is being carried forward from the non-missing visits. Figure 2 Data set sda after LOCF 2. LOCF USING THE RETAIN STATEMENT A very elegant way to use RETAIN statement is presented below. This approach is inspired by paperwork LOCF Method and Application in Clinical Data Analysis by Huijuan Xu[3]. It uses a RETAIN statement to create a temporary variable called tempval. This variable will retain the value that needs to be imputed from one patient to another. The only drawback of this procedure is the assumption that the baseline value (visit 1) will always be nonmissing. We need to create a list of common subjects using the code below. data all; format ptno z3.; do i=1 to 4; do j=1 to 4; ptno=i; visit=j; output; drop i j; A description of data all is presented in figure 3. The dataset includes four patients, each one being assigned a number of four scheduled visits for a total of sixteen observations. 2

Figure 3 data set all We will sort the common list of subjects called all and our regular dataset called sda by patient and visit. This will help us later on, when the two datasets will be merged by patient and visit. proc sort data=sda; by ptno visit; proc sort data=all; by ptno visit; Below we will RETAIN tempval as 0. data final (drop=tempval); retain tempval 0; merge sda(in=b) all (in=val); by ptno visit ; if all; if weight eq. then weight=tempval; else tempval=weight; Using the above code, we obtain the data presented in figure 4. Figure 4 Data set final Please note that using this method, for patient 4, we carried forward the weight measurement from previous patient (patient 3) at visit two (weight equals 112). Although the method gives valid results if visit 1 is present, in case data is missing for first visit, then LOCF will not be implemented correctly. It is a good idea to check the data before attempting any LOCF methods, and check the data after applying LOCF. 3

3. LOCF USING ARRAY FACILITY AND LAG FUNCTION LAGN () returns the value of the nth previous observation. Example: if our data has 3 observations where x takes on the values of 1, 2 and 3, then LAG2(x) on the 3 rd observation will return 1, the value of the first observation. LAG () is the same as LAG1 (). This procedure is a lengthier one and obtains the last available non-missing observation using a set of conditions (in our case, the dataset final3 is set by patient and visit). Once the condition is met (first patient), we reset the LAGN () value to missing. After that we assign LOCF using ARRAY facility. FIRST STEP sort data set sda by patient and by visit. This way, the appropriate weight measurements will be used for LAGN () function in the data step. SECOND STEP - define the array reset by declaring the number of missing variables per patient. As a general rule, we should have a total of n-1 array elements (where n equals total number of possible or scheduled visits). In our case, n is 4 as total possible visits. The reason behind this logic is that we need to carry forward for a maximum number of possible visits less one (the original observation we use as primary for our LOCF). THIRD STEP set to missing the array reset lagx1, lagx2, lagx3 each time first observation for a patient occurs. This way, we prevent a weight measurement being carried forward from one patient to another. FOURTH STEP - we need to consider all LAGN () for weight values we need to carry forward from one visit to another. For example, if weight is missing for a particular visit and lagx1 is not missing, then we use lagx1 value to populate the current missing weight measurement. In another case, if the current weight and lagx1 are missing, then we use the earliest non-missing measurement available either one lagx2, or lagx3. data final3; set sda; by ptno visit; array reset(*) lagx1-lagx3; lagx1=lag(weight); lagx2=lag2(weight); lagx3=lag3(weight); if first.ptno then count=1; do i=count to dim(reset); reset(i)=.; count+1; if weight=. and lagx1 ne. then weight=lagx1; else if weight=. and lagx1 eq. and lagx2 ne. then weight=lagx2; else if weight=. and lagx1 eq. and lagx2 eq. then weight=lagx3; The dataset created using this method is presented below in figure 5. As we can observe, the weight for patient 1 will be retained from visit 1 for visits 2 and 3, while for patient 4 at visit 1 weight measurement will be missing since we have no information available for that particular visit. Please note the number of observations (eleven) is the same as original data set sda (figure 2) versus previous example data final (figure 4) - when we created a common dataset (sixteen observations) with all the possible (scheduled) visits a patient might have. Figure 5 Interim set final3 4

Final dataset of interest will look like figure 6 presented below. Figure 6 Data set final3 4. CONCLUSIONS It is a good idea to know there are multiple ways to obtain LOCF results because it encourages SAS programmers to become more creative while programming their code. Before starting LOCF, we need to ask ourselves what is the final dataset we want to obtain. Do we want to generate a common dataset with all the possible values for scheduled visits, or are we interested only in LOCF values for missing measurements in the dataset of interest to us? After we answer this question, we need to check the structure of our data and chose the most efficient method that produces accurate results. The most important thing to mention about LOCF is that we need to be familiar with our data. If we are not cautious enough, we could impute incorrect information to different patients or time points and that could affect the integrity of the information to be analyzed. REFERENCES [1] Definition: http://en.wikipedia.org/wiki/analysis_of_clinical_trials [2] Encyclopedia of biopharmaceutical statistics, by Shein-Chung Chow, page 176 [3] LOCF Method and Application in Clinical Data Analysis Huijuan Xu, Biogenidec, Inc. ACKNOWLEDGMENTS I would like to thank my manager Irina Walsh for continuous support, Patrick Thornton for helpful mentoring and Jeanina Worden for encouraging me to be one of the SAS gigs at WUSS Conference. CONTACT INFORMATION Iuliana Barbalau ClinOps LLC. 353 Sacramento Street, Suite 800 San Francisco, CA 94111 Work Phone: (415) 679-2373 Fax: (415) 679-3280 E-mail: iuliana24@yahoo.com Web: www.clinops.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 5