Paper TT15 SAS and Microsoft Excel for Tracking and Managing Clinical Trial Data: Methods and Applications for Information Delivery Na Li, Pharmacyclics, Sunnyvale, CA Kathy Boussina, Pharmacyclics, Sunnyvale, CA ABSTRACT Microsoft Excel is widely used in the pharmaceutical industry and therefore frequently requested due to familiarity and ease of use. This paper outlines an automated data communication between Excel and SAS programs such as how to consolidate data from multiple Excel files, MS ACCESS files and other data sources, establish a data warehouse by manipulating and integrating information in SAS, and deliver information in Excel format applying Excel templates with reusable macros or by dynamically creating multiple Excel worksheets. Several SAS methods are explored including DATA step, SAS/ACCESS (PROC ACCESS, SQL Procedure, PROC DBLOAD, PROC IMPORT/EXPORT), DDE (Dynamic Data Exchange), and ODS (Output Delivery System). The pros and cons of each method are also summarized. Furthermore, the scheduling of SAS executions on a daily basis is addressed. The application of these methods is further detailed, illustrating how they can help in the management of clinical trials when delivering information to Clinical Research Associates, and Medical and Safety Monitors. The powerful data warehousing techniques and information delivery methods of the SAS system are utilized to integrate multiple data sources and deliver consolidated Microsoft Excel worksheets to medical professionals. INTRODUCTION MS Excel is user-friendly and commonly used by non-programming professionals. The expanded column fields, ability to format cells, highlight, sort and even add data are some of the reasons that non-programming professionals rely on Excel. In the conduct of clinical trials, Clinical Research Assistants, Data Managers and Medical Monitors are among the various personnel that may create Excel files to help manage their activities and track information. SAS data warehousing and data manipulation procedures are optimal for bringing together these multiple data sources into a data warehouse. The information can be delivered back in a consolidated Excel file that is in a format that the end-user finds to be familiar and comfortable. Traditionally using DATA step or the IMPORT/EXPORT wizard to bring external data into SAS or export SAS data sets to external files are the commonly used methods. Newly addressed methods such as ODS (Output Delivery System) lack the flexibility to export data to user defined complicated formatted Excel templates with limited programming time. There is no perfect method to carry the flexibility and cover all of the Excel user s needs. In order to deliver the data correctly and promptly, not only understanding that different methods have different installation requirements, but also understanding the procedures and knowing the limitation of each method can be very critical. This paper compares each method to help SAS programmers to determine the right method to meet their needs. IMPORTING EXCEL, MS ACCESS INTO SAS SAS/ACCESS software establishes a data transfer method between almost any database product and the SAS system. Corresponding components of SAS/ACCESS must be installed and licensed at your site to be used. SAS/ACCESS (PROC ACCESS, PROC IMPORT, and PROC SQL), DATA step, and DDE are common approaches for importing Excel data into SAS data sets. Each method is evaluated in the context of the application in the PC (Windows and OS/2) environment. Before introducing different importing methods, it is necessary to understand the Excel file data type, date/time convention and naming convention. Excel software has two data types: character and numeric. In character type, data can be entered as text, numbers, or dates. In numeric type, data can be entered as numbers (beginning with +,$,@,-,=,or #), dates, times, or formulas. Dates are stored as numeric values defined by days from 1/1/1900 to the specified date, while in SAS dates are stored as numeric values defined by days from 1/1/1960 to the specified date. For example, March 12, 1994 is recorded as 34405 in Excel but is 12489 in SAS. When all of the data in the date column are saved as numeric type, SAS can convert the data correctly. However in the mixed data type field, it can cause problems. In such case, either save the column as a text field in Excel or do the programming convention with an additional DATA step. PROC ACCESS SAS/ACCESS to PC Files Formats has to be installed and licensed if you are running SAS/ACCESS under the PC operating environment. PROC ACCESS can be run within the SAS Display Manager via the IMPORT Wizard for ease of use. After implementing the wizard, recalling the code in the Program window provides SAS code that can be saved to a file for future use. The code is readable and the syntax provides options to, for example, select all or some variables from the source or to
provide variables names or labels to the resulting data set. PROC ACCESS is also flexible at handling source data with mixed data types. The drawback of this method is that the excel file has to be saved as Excel 5.0/95 or 4.0 workbook format. The Excel file has to be closed upon program execution. The data range can only be selected in rows and columns manner and cannot contain a block of empty cells, which can be a problem if the records are added or removed. Use the TYPE column identifier=c to avoid missing text information caused by mixed data types in the same column, since sometimes the MIXED=YES statement may not resolve the problem. There may be a content truncation problem if the lengths of the character fields are not specified in the format statement and SCANTYPE=Y is not in the program. If the column names are included in the SAS data set, the Excel column names have to start with a letter, and have the combination of A-Z, 0-9, space, underscore, and hyphen to avoid error messages. To be able to format variables, it is suggested to use column names without spaces, hyphens, or underscores. When importing the date-time columns, the information has to follow the Excel numeric date/time format to avoid missing information. Below are the example Excel snapshot (save as MS Excel 5.0/95 workbook), SAS code and SAS output: Site Number Patient Number Exam Date Modality Date Digital Data Received Date Films Received Date Translated Date Films Digitized 999 9999 01/Jan/2003 01/01/1900 01/01/1900 01/01/1900 01/01/1900 002 0001 17/Jan/2001 MRI 03/05/2003 03/25/2003 03/25/2003 002 0001 17/Jan/2002 MRI 002 0001 10/Feb/2003 Port 03/05/2004 03/28/2003 03/28/2003 002 0001 12/Feb/2004 Port 03/05/2004 03/28/2003 03/28/2003 002 0002 07/Aug/2003 Port 08/25/2002 002 0002 04/Aug/2003 MRI 08/25/2002 08/28/2003 08/29/2003 002 0002 04/Aug/2003 MRI 002 0002 04/Aug/2003 MRI 002 0002 04/Aug/2003 MRI PROC ACCESS DBMS=XLS; CREATE WORK.BITI.ACCESS; PATH='C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\BITI.XLS'; WORKSHEET='TRACKING'; SKIPROWS=2; MIXED=YES; TYPE VAR6=N; FORMAT VAR6 DATE9. VAR3 $10.; CREATE WORK.BITI.VIEW; SELECT VAR0 VAR1 VAR3 VAR6; SAS output: Obs VAR0 VAR1 VAR3 VAR6 1 002 2001 MRI 25MAR2003 2 002 2001 MRI. 3 002 2001 Port 28MAR2003 4 002 2001 Port 28MAR2003 5 002 2 Port. 6 002 2 MRI 28AUG2003 7 002 2 MRI. 8 002 2 MRI. 9 002 2 MRI. PROC IMPORT The PROC IMPORT method is similar to PROC ACCESS. The settings are the same as PROC ACCESS. The code can also be generated using the IMPORT Data Wizard. This method can access the newer version Excel file (Excel 97). When importing Excel 97 data, the data range definition has to be created in the Excel file. The menu sequence is: INSERT NAME DEFINE. Provide a name for the range and enter the range reference, Click on ADD. PROC IMPORT can contain empty rows or columns in the defined input data range whereas PROC ACCESS does not allow for this and will result in an error.
PROC IMPORT evaluates the first 20 rows of the column to determine the attributes of the variables. This can be a drawback since it can cause problems both for the mixed data type fields and in the truncation of data. For example, if the fields are interpreted as numeric, based on the first 20 observations, any subsequent character data will result in missing data in these fields. Similarly, truncation will occur if the length setting does not accommodate the longer fields occurring after observation 20. The default of 20 row setting can be increased through the following commands: Go to the SAS interactive display, type REGEDIT on command line to open Registry Editor window, click PRODUCTS BASE EFI on the right window click GuessingRows enter a new value click OK. The following is a coding example and the SAS output of the Excel file that was shown previously. PROC IMPORT OUT= WORK.EXTRACT DATAFILE= "C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\BITI.XLS" DBMS=EXCEL5 REPLACE; SHEET='TRACKING'; RANGE='A2..H11'; /*DEFINE THE DATA RANGE A2..H11*/ GETNAMES=YES; Obs VAR0 VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 1 002 2001 01/17/03 MRI 03/05/03 03/25/03 03/25/03 2 002 2001 01/17/03 MRI... 3 002 2001 02/10/03 Port 03/05/03 03/28/03 03/28/03 4 002 2001 02/12/03 Port 03/05/03 03/28/03 03/28/03 5 002 2 08/07/03 Port 08/25/03.. 6 002 2 08/04/03 MRI 08/25/03 08/28/03 08/29/03 7 002 2 08/04/03 MRI... 8 002 2 08/04/03 MRI... 9 002 2 08/04/03 MRI... PROC SQL SAS ACCESS to ODBC (Open Database Connectivity) may be considered the most flexible technique in handling large databases allowing conditional access to the data, and following the standard PROC SQL format. For this method, SAS/ACCESS Interface to ODBC needs to be installed and licensed. A Microsoft Excel ODBC driver must also be installed and configured on your PC or server. PROC SQL allows access to the features and structure of the SQL language, such as the SELECT statement to select certain fields and WHERE to filter information. PROC ACCESS can be helpful when Excel file information is changing on a regular basis with rows and column being added in. There is no need to define variable names unless renaming is needed. The database or Excel can be opened or closed upon execution. This method can be very convenient if the database (Excel file) name and location are fixed since the setting on the PC control panel or server for the data location is required. Also each access data source (Excel file) requires an alias name and a setting on the Control Panel or Server. The setting can be tedious if a lot of Excel files are processed, and must be manually defined on each computer before running the same PROC SQL code. By default, the procedure sets the character field length as 200 so that text truncation usually is not an issue unless the contents are too long. Mixed data type can be an issue and cause inadvertent missing results. To avoid this issue, the field has to be defined as a text field in Excel. To do so, select the column and go to DATA dropdown list Text to Columns menu click NEXT> for the step 1 and step 2 choose TEXT button on the step 3 and click Finish. Additionally, the date/time field is converted to datetime22 format. To get the correct date or time information, use DATEPART or TIMEPART function associated with format= in the SELECT statement. The following code is an example of the usage of PROC SQL, accessing MS ACCESS and Excel files. PROC SQL; CONNECT TO ODBC AS PT_QCS (DSN=PT_QCS); /** DSN= PUT THE NAME DEFINED IN CONTROL PANEL **/ CONNECT TO ODBC AS CRAEDIT (DSN=CRAUPDATE); /**ACCESS TO EXCEL FILE **/ CONNECT TO ODBC AS AERECON (DSN=AERECON); /** ACCESS TO EXCEL FILE**/ CONNECT TO ODBC AS SAE (DSN=ACCESS) ; /*** ACCESS TO MS ACCESS**/ CREATE TABLE SAE AS SELECT *FROM CONNECTION TO SAE (SELECT SUBJECTNBR AS PAT_ID, FOLLOWUP AS FU_NO, SAEID, PROTNO FROM TBLSAE WHERE PROTNO = 'XXXX-XXX'); CREATE TABLE PTQCS AS SELECT PATIENT, OUTSTAND FROM CONNECTION TO PT_QCS
(SELECT * FROM "CTQCS$") WHERE PATIENT^=.; CREATE TABLE CRAEDIT AS SELECT PATIENT_ AS PATIENT, F5 AS COMPLET1, DATEPART(F6) AS NO_QRFS FORMAT=MMDDYY8. FROM CONNECTION TO CRAEDIT (SELECT * FROM "SHEET1$") WHERE PATIENT_ ^=.; ALTER TABLE CRAEDIT MODIFY COMPLET1 CHAR(10) FORMAT=$10.; CREATE TABLE AERECON AS SELECT SUBJ_ AS PATIENT_, _1ST_REC, LVL_3_CL AS LVL3CLN, LVL_5_CL AS LVL5CLN FROM CONNECTION TO AERECON (SELECT * FROM "AE_SAE MASTER RECON$"); ALTER TABLE AERECON MODIFY PATIENT_ CHAR(8) FORMAT=$8., LVL3CLN CHAR(50) FORMAT=$50., LVL5CLN CHAR(50) FORMAT=$50.; DISCONNECT FROM PT_QCS; DISCONNECT FROM CRAEDIT; DISCONNECT FROM AERECON; DISCONNECT FROM SAE; QUIT; DATA STEP The DATA step is a traditional method that is commonly used. It does not require any additional software outside of SAS/BASE. To begin, the Excel file needs to be saved as.csv file (comma separate file). CSV file does not allow multiple worksheets and therefore one worksheet per CSV file will be needed. This method is easy to use but can be cumbersome because each variable name and length has to be defined. It may result in truncation if the variable length is not assigned correctly. Additionally, observations may be inadvertently omitted if the LRECL is not defined appropriately. For example, our default LREC=256 has to be increased to LRECL=500 to avoid data omission problems. Inclusion of options MISSOVER and TRUNCOVER is suggested for this method to avoid incorrect or missing information on short line or short value data. Option DSD is needed to handle embedded commas in text values. When executing this method the Excel file has to be closed upon execution. This method is not generally recommended due to limitations as compared to other methods. Example is as following: Excel file: a 1 1 23 12:45:30 AM b 1 2 24 11:36:45 PM c 1 3 24 10:28:00 PM d 2 2 e 2 3 33.U The SAS code: DATA TEST2; INFILE 'C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\TEST2.CSV' DELIMITER = ',' MISSOVER DSD LRECL=500; FORMAT TRTGRP $1. PTID $1. SITE $1. AGE $2. TIME TIMEAMPM8. ; INFORMAT TRTGRP $1. PTID $1. SITE $1. AGE $2. TIME TIME11. ; /*** INFORMAT HAS TO BE TIME, NOT TIMEAMPM **/ INPUT TRTGRP $ PTID $ SITE $ AGE $ TIME; The SAS output: Obs TRTGRP PTID SITE AGE TIME 1 a 1 1 23 12:45 AM 2 b 1 2 24 11:36 PM 3 c 1 3 24 10:28 PM 4 d 2 2. 5 e 2 3 33 U
DYNAMIC DATA EXCHANGE (DDE) Dynamic Data Exchange (DDE) is also easy and flexible to use but is only available in the PC environment. This method does not require any additional license or installation since it is a part of Base SAS. Because it involves a DATA step, this method requires a list of variable names to be explicitly defined. Sample code and the output are shown below. The input file is from the example above, test2.csv file (saved as test2.xls). OPTIONS NOXWAIT NOXSYNC MISSING=' ' SYMBOLGEN; X '"C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\EXCEL.EXE"'; /**EXCEL EXECUTION FILE**/ WAIT_SEC=SLEEP(5); FILENAME EXCELCMD DDE 'EXCEL SYSTEM'; FILE EXCELCMD; PUT '[OPEN("C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\TEST2.XLS")]'; FILENAME INEXCEL DDE "EXCEL [TEST2.XLS]TEST2!R1C1:R5C5" NOTAB; DATA ONE; INFILE INEXCEL DLM='09'X DSD MISSOVER; FORMAT TRTGRP $1. PTID $1. SITE $1. AGE $2. TIME TIMEAMPM8. ; INFORMAT TRTGRP $1. PTID $1. SITE $1. AGE $2. TIME TIME11. ; INPUT TRTGRP $ PTID $ SITE $ AGE $ TIME; FILE EXCELCMD; PUT '[QUIT]'; SAS output: Obs TRTGRP PTID SITE AGE TIME 1 a 1 1 23 12:45 AM 2 b 1 2 24 11:36 PM 3 c 1 3 24 10:28 PM 4 d 2 2 5 e 2 3 33 U DATA WAREHOUSING Once data from Excel are read into SAS data sets they can be merged together with other SAS data sets such as the study Clinical Database. Integration of various tracking Excel files with clinical data helps provide the big picture to complete the data management, clinical monitoring and safety activities required for the conduct of the trial. For example, patient termination and death dates can be consolidated with Excel files that track Case Report Forms in-house, monitoring and Safety reporting activities. Projected timings, missing information or discrepancies in information can be summarized and reported programmatically. The following report provides an example. The problems/notes column and Date issues column indicate the discrepancies. At the bottom of the report, there is a summary report display the overall discrepancies and data collection status. Check of Consistency of Death reports and Dates SAE Subj. (MW) -CRF In- -Death Dates- Status Problems/ Date ID Death Term. Death SAE Death Death CRF Term. Date at Term. Notes Issues 119008 Yes Yes Yes 01/04/2004 01/04/2004 01/04/2004 DEAD 127001 Yes No No 12/30/2003 Death & Term form missing 127002 Yes Yes No 11/17/2003 09/05/2003 ALIVE Term. Status is Dth., but Term. Date is bfr. Dth Date 133003 Yes No No 01/05/2004 Death & Term form missing 149002 Yes Yes No 07/24/2003 07/24/2003 DEAD Term. form indicates death, also in SAE db. Not captured on Death form
Total Death not Total Death not Total Medwatch Total Death Page Included in Included in Total Termination Grand Total Grand Total Reported Deaths Reported Deaths Medwatch Count Death Page Count Forms in Deaths Terminations 71 65 0 6 72 71 75 EXPORTING SAS TO EXCEL The data warehouse can also be exported back to Excel to provide the information to various audiences, such as Data Management, Clinical, Safety or Finance. The methods that are used to transfer data from SAS back to Excel are different, although some of the methods can be similar with only minor syntax changes. SAS/ACCESS (PROC DBLOAD, PROC EXPORT), DATA step, DDE, and ODS are discussed. PROC SQL procedure is not recommended for exporting data due to the overhead effort, although it is a very powerful method for importing data, while other methods are more convenient. PROC DBLOAD To use this method, SAS/ACCESS to PC Files Formats has to be installed and licensed. This method is an easy quick way to turn the SAS data set into an Excel file. Certain output fields can be selected and the format of the fields can be addressed in the program. The output fields can also be renamed. Conditional export can be conducted. The ability to use SAS labels as Excel file headers is another advantage. However, this procedure cannot write to individual Excel worksheets within one Excel file. Also, the Excel file cannot pre-exist in the same location before running the program therefore using Excel file template is not suitable. This method only creates new Excel files, and cannot append data to any Excel file that already exists. Setting up a reusable macro in the Excel file is a solution to compensate the lacking usage of the Excel template. To do this, open Excel, go to Tools Macro Record new Macro, give a macro name, record the cell setting format, page layout, and click Stop Record save the Excel file containing the macro as personal.xls under C:\Program Files\Microsoft Office\Office\XLSTART. Storing the macro(s) in the personal.xls under the specially named subfolder XLSTART allows the access of the macro to any Excel file, acting as an autoexec file to execute any SAS program. To use the macro, open the Excel file, go to Tools Macro Macros select the macro name that you want to use click run. In this way, the simple procedure can be reusable with the expecting Excel formatted cells and printing setup. Below is the PROC DBLOAD example. OPTIONS NOXWAIT NOXSYNC; X 'DEL C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\DBTEST.XLS'; **DELETE IT IN CASE IT EXIST; WAIT_SEC=SLEEP(5); LIBNAME EXFILE 'C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG'; PROC DBLOAD DBMS=EXCEL DATA=EXFILE.A_PT_MON; PATH="C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\DBTEST.XLS"; PUTNAMES YES; LABEL; RESET ALL; WHERE PTID>2000; LIMIT=0; RENAME CRA=CRAS; LOAD; Data Processing Status Death Date in Safety Database Patient ID CRAS DEATH DATE (CHAR) DATE OF TERMINATION 2004 John Smith 01/04/2004 01/04/04 3 01/04/04 43002 John Smith 12/21/2003 12/21/03 3 12/21/03 45002 Martha 01/19/2004 01/19/04 3 01/19/04 119008 Martha 01/04/2004 01/04/04 3 01/04/04 127001 Patricia 12/30/03 127002 Patricia 09/05/03 3 11/17/03 Manual Review Complete
DYNAMIC DATA EXCHANGE (DDE) This method is the most powerful way to export SAS data to Excel. It does not require any additional software installation other than SAS/BASE and Excel. An Excel template can be created and applied in situations when the data change but the structure is retained. The ability to apply Excel templates and append data to existing Excel file are unique advantages over other exporting methods. Additionally with simple syntax, the method allows the creation of multiple Excel files conditionally. For example, if each clinical monitor needs a subset of the information for his/her specific site, each site can be a stand-alone Excel file. The same logic can be used to create multiple worksheets, as in the following example. Furthermore, additional formatting can be added to the Excel file by using SAS code, such as add a row in between patients and do not show the repeated information on certain columns. This method allows SAS to control the Excel application, such as send commands that clear previous Excel data contents, save the file, or control the formatting and position of variables. The limitation of this method is that every variable name needs to be defined. The data range also needs to be defined. The information may be truncated if the LRECL= is not big enough. The same problem may occur when the data output range is not specified big enough. The Excel file name should be pre-assigned, i.e., an empty file has to be created in advance. This process can be manually done before running the SAS code, or incorporated as part of the SAS code. The DDE is only available on OS/2 and Windows platforms. The Excel file shown above is created by the SAS code below to report the summary information with the user defined template, which can be printed on a legal paper with a nice layout. In this Excel file, each study related information is on a specific worksheet. On each worksheet, each clinical site is separated by an underline row to improve readability. Clinical Site Name only appears once on all the clinical site related rows. ******* INITIATE EXCEL, EXCEL HAS TO BE CLOSED ********; X '"C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\EXCEL.EXE"';
OPTIONS NOXWAIT NOXSYNC MISSING=' ' SYMBOLGEN; WAIT_SEC=SLEEP(5); ***ALLOW DOS FINISH THE INITIATION BEFORE NEXT STEP; FILENAME EXCELCMD DDE 'EXCEL SYSTEM'; FILE EXCELCMD; ***COMMEND TO OPEN THE EXCEL FILE; PUT '[OPEN("\\S1\DATA\DEVLPMNT\CLINICAL\REGULATORY DOCUMENT TRACKING\REG_DOC_TRACKING_SHEET.XLS")]'; ***USING MACRO TO ALLOW THE REPEATED PROCESS TO CREATE DIFFERENT WORKSHEETS FOR DIFFERENT STUDIES, PARAMETERS ARE DEFINED AS: OUTNAME IS TO DEFINE FILE NAME IN FILE STATEMENT, COND= IS TO SELECT STUDIES, OUTS= IS TO GIVE THE WORKSHEET LOCATION AND RANGE; %MACRO OUTFILES(OUTNAME, COND=, OUTS=); FILENAME &OUTNAME &OUTS; *** UNXXX IS USED TO DEFINE THE UNDERLINE LENGTH THAT SEPARATE THE CENTER NAME ON THE EXCEL FILE REPORT; SET OUTALL END=EOF; BY USERS CENTERNO CENTERID SORTORD; /*BY STATEMENT ALLOW REPEATED INFO APPEAR ONCE*/ UNDT=REPEAT('_',6); /***ASSIGN UNDERLINE LENGTH FOR DATE RELATED FIELDS **/ UNYS=REPEAT('_',2); /** UNDERLINE LENGTH FOR YES, NO FIELDS ***/ UND3 =REPEAT('_',3); /** UNDERLINE REPEAT 3 TIMES FOR THE FIELDS THAT NEEDED**/ UND14=REPEAT('_',14); UND16=REPEAT('_',16); UND21=REPEAT('_',21); UND23=REPEAT('_',23); FILE &OUTNAME LRECL=8000; &COND PUT PI_CNT '09'X SITENO '09'X USERSO '09'X CRADT '09'X INITDT '09'X AMENDNO '09'X PSIGNDT '09'X PPROVDT '09'X RELEVEL2'09'X IPROVDT'09'X SPNSPROV '09'X SPNSRECV '09'X F1572DT'09'X LASTNAME'09'X CVRECV'09'X PREVDT'09'X EFFDT '09'X BROCHDTC'09'X MEMBLIST'09'X LABSITE'09'X LABTYPEC '09'X LABDT '09'X INIVSDT'09'X DRUGDT'09'X DVCDT; IF LAST.CENTERID THEN PUT UND21 '09'X UND3 '09'X UND16 '09'X UNDT '09'X UNDT '09'X UNDT '09'X UNDT '09'X UNDT '09'X UNDT '09'X UNDT '09'X UNYS '09'X UNYS '09'X UNDT '09'X UND14 '09'X UNYS '09'X UNDT '09'X UNDT '09'X UNDT '09'X UND21 '09'X UND23 '09'X UND3 '09'X UNDT '09'X UNDT '09'X UNDT '09'X UNDT '09'X ; %MEND; *** THE FOLLOWING STEP IS TO CLEAR THE CONTENTS OF SPECIFIC WORKSHEETS; FILE EXCELCMD; PUT '[WORKBOOK.ACTIVATE("XXXX-1111")]'; PUT '[SELECT("R2C1:R100C26")]'; PUT '[CLEAR(3)]'; ** CHOSE 3 MEANS CLEAR CONTENTS; PUT '[SELECT("R1C1")]'; **CLEAR THE PREVIOUS SELECT RANGE; PUT '[WORKBOOK.ACTIVATE("XXXX-0304")]';
PUT '[SELECT("R2C1:R100C26")]'; PUT '[CLEAR(3)]'; PUT '[SELECT("R1C1")]'; **MACRO CALL TO SET THE OUTPUT CONDITION FOR THE SPECIFIC STUDY TO THE DESIGNATED WORKSHEET (XXXX-1111) FROM THE RANGE ROW 2 COLUMN 1 TO ROW 100 COLUMN 26; %OUTFILES(OUTFILE1, COND=%STR(WHERE PROTOCOL='XXXX-1111';), OUTS=%STR(DDE "EXCEL \\S1\DATA\DEVLPMNT\CLINICAL\REGULATORY DOCUMENT TRACKING\[REG_DOC_TRACKING_SHEET.XLS]XXXX-1111!R2C1:R100C26") ***COMMAND TO ALLOW EXCEL TO SAVE THE FILE AND QUIT THE SOFTWARE; FILE EXCELCMD; PUT '[SAVE()]'; PUT '[QUIT()]'; PROC EXPORT This method is the opposite of PROC IMPORT. Using IMPORT/EXPORT Wizard instruction can make this procedure very easy to follow. EFI (External File Interface) facility can be used to change the fields name, format, and data type. The method has the same limitation as PROC DBLOAD, lacking the capacity of controlling worksheets in one file. The method does not allow conditional output. *** EXPORTING DATA TO EXCEL; LIBNAME EXFILE 'C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG'; PROC EXPORT DATA= EXFILE.A_PT_MON OUTFILE= "C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\EXPRT.XLS" DBMS=EXCEL2000 REPLACE; DATA STEP This method is useful if you do not have SAS/ACCESS components. ASCII files can be created using tab delimiters. The conditional output task can be fulfilled by using the IF statement. The output fields can be selective. The limitation is that all the fields need to be predefined and the header parts of the Excel file fields need to be specified in the program, as does format and data type. This method cannot access to a specific worksheet or put the information into multiple worksheets. LIBNAME TESTLIB 'C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG'; FILE 'C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG\OUTEXCEL.XLS'; SET TESTLIB.TEST2; IF _N_=1 THEN PUT 'TREATMENT' '09'X 'PTID' '09'X 'SITE' '09'X 'AGE' '09'X 'TIME'; PUT TRTGRP'09'X PTID '09'X SITE '09'X AGE '09'X TIME; OUTPUT DELIVERY SYSTEM (ODS) ODS is a great method to export SAS data to Excel files in Version 8 and higher. It provides some flexibility, which is a much better approach compared to most export methods. Variables can be selected by this method. Variable types can also be defined. Additionally the font and style can be modified using PROC TEMPLATE or HTML style sheets, although this requires some familiarity of the syntax. However, this method still cannot provide the flexibility found in the DDE. Access to multiple Excel worksheets by this method might be difficult. The user defined Excel template cannot be used. In order to use a template, PROC TEMPLATE or Style sheet have to be created, which requires time and management, by the SAS user rather than the Excel user. Another drawback is that the Excel file has to be closed upon program execution, which is a problem if other users are accessing the file. Following is an example of the SAS code and the output: LIBNAME DATAS 'C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG' ; ODS LISTING CLOSE;
ODS HTML PATH='C:\WINNT\PROFILES\NLI\DESKTOP\PHARMSUG' BODY='TESTHTML.XLS' STYLE=MINIMAL; ODS NOPTITLE; SET DATAS.TEST2; PUT TRTGRP PTID SITE AGE TIME; FILE PRINT ODS; PUT _ODS_; ODS HTML CLOSE; ODS LISTING; The SAS System TRTGRP PTID SITE AGE TIME a 1 1 23 12:45 AM b 1 2 24 11:36 PM c 1 3 24 10:28 PM d 2 2. e 2 3 33 U AUTOMATE THE PROCEDURE BY SCHEDULING SAS EXECUTION Since some information from the import data sources are updated daily, the integrated report in Excel format also needs to be updated frequently. To accomplish the task, using Windows platform, the procedure automation can be conducted by setting an execution schedule in SAS. The example of setting up an automated procedure is as following: %MACRO DOLOOP; %DO I=5 %TO 9 ; *** ASSIGN PROGRAM RUN FROM 1/5/04 TO 1/09/04 DAILY AT 5:01:00AM; START=DHMS(MDY(1,&I,2004),05,01,0); CALL SYMPUT('START',START); %LET AWOKE=%SYSFUNC(WAKEUP(&START)); %CONTROL; *** THE PROGRAM THAT NEED TO BE RUN, MAKE IT AS A MACRO; %END; %MEND; %DOLOOP; CONCLUSION This paper puts each IMPORT/EXPORT method in the context of usage benefits, pitfalls, and limitations to help the user to determine the right method for their purpose. In general, there are several aspects that need to be considered. When importing data, the accuracy is very important, such as date, time, truncation, and missing data issues. While exporting data, the flexibility to fit the Excel users need in an automated fashion is very important, although the above issues also need to be taken into account. The advantage and disadvantage of each method can also change over time. SAS has been improving their procedures and more features are released with each new version of SAS. Starting with Version 9.0, SAS allows LIBNAME to be used in conjunction with ODBC, mixed data type is allowed with the LIBNAME statement MIXED=yes. With version 9.1, PROC IMPORT support MIXED=yes to also allow mixed data type. By implementing SAS data warehousing procedures in conjunction with Importing and Exporting Excel files, information can be consolidated to provide the big picture in the clinical trial setting. This process of importing multiple Excel data sets, creating a data warehouse and delivering an integrated Excel file has been successfully applied at Pharmacyclics, Inc.. Applications include:
Tracking Safety information: ensuring timely reviewing and reporting of Safety events, Tracking Clinical Monitoring and database status to allow freezing the database on a by-patient basis, Tracking in order to coordinate the timing of external committee meetings such as Data Safety Monitoring Boards by integrating patient enrollment, adverse event mapping, adverse event and serious adverse event reconciliation, CRFs in-house. REFERENCES Jensen, Karl and Greathouse, Matt (1999), From SAS Access to E-mail: Building an Automated Report Application, Proceedings of the 7 TH Annual Western Users of SAS Software Conference, 7, Paper 330-336. Loren, Judy (2003), SAS/ACCESS to External Databases: Wisdom for the Warehouse user, Proceedings of the 28 th Annual SAS Users Group International Conference, 28, Paper 18-28. Mumma, Michael T. (1999), The Redmond to Cary Express A Comparison of Methods to Automate Data Transfer Between SAS and Microsoft Excel, Proceedings of the 12th Annual Northeast SAS Users Group Conference, 12, Paper 654-662. Parker, Chevell (2003), Generating Custom Excel Spreadsheets using ODS, Proceedings of the 28 th Annual SAS Users Group International Conference, 28, Paper 12-28. SAS Institute Inc., Technical Support Notes, TS-589B : Importing Excel Files to SAS Datasets, Cary, NC: SAS Institute, Inc. SAS Institute Inc. (2000), SAS OnlineDoc, Version 8, Cary, NC: SAS Institute, Inc. ACKNOWLEDGMENTS The authors would like to express the appreciation to Eugene Yeh and Sy Truong for their technical contribution and suggestions. Also the gratitude goes to our coworkers for their encouragement, Marilyn Ciraulo, Shuling Hwang, Sho-Rong Lee, Betty Harris, and Brad Harris. Na Li would like to honor God for His inspiration, encouragement and support in writing this paper. CONTACT INFORMATION If you have any comments and questions regarding this paper, please contact: Na Li Pharmacyclics, Inc. 995 East Arques Avenue Sunnyvale, CA 94089 (408) 990-7293 (work) E-mail: nli@pcyc.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.