A Recursive SAS Macro to Automate Importing Multiple Excel Worksheets into SAS Data Sets



Similar documents
Writing Data with Excel Libname Engine

Choosing the Best Method to Create an Excel Report Romain Miralles, Clinovo, Sunnyvale, CA

Customized Excel Output Using the Excel Libname Harry Droogendyk, Stratia Consulting Inc., Lynden, ON

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets

SPSS for Windows importing and exporting data

ABSTRACT INTRODUCTION SAS AND EXCEL CAPABILITIES SAS AND EXCEL STRUCTURES

Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

AN ANIMATED GUIDE: SENDING SAS FILE TO EXCEL

Import and Output XML Files with SAS Yi Zhao Merck Sharp & Dohme Corp, Upper Gwynedd, Pennsylvania

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

Flat Pack Data: Converting and ZIPping SAS Data for Delivery

Integrating SAS and Excel: an Overview and Comparison of Three Methods for Using SAS to Create and Access Data in Excel

SUGI 29 Applications Development

Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC

How To Write A Clinical Trial In Sas

Moving Data Between Access and Excel

Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC

Create an Excel report using SAS : A comparison of the different techniques

Microsoft' Excel & Access Integration

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

5. Crea+ng SAS Datasets from external files. GIORGIO RUSSOLILLO - Cours de prépara+on à la cer+fica+on SAS «Base Programming»

Create a New Database in Access 2010

Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound

How to Use SDTM Definition and ADaM Specifications Documents. to Facilitate SAS Programming

SAS 9.4 PC Files Server

Using PRX to Search and Replace Patterns in Text Strings

Importing TSM Data into Microsoft Excel using Microsoft Query

Advanced Excel 10/20/2011 1

Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA

ACCESS Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818)

Microsoft Office. Mail Merge in Microsoft Word

Effective Use of SQL in SAS Programming

How to easily convert clinical data to CDISC SDTM

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

OBJECT_EXIST: A Macro to Check if a Specified Object Exists Jim Johnson, Independent Consultant, North Wales, PA

CDW DATA QUALITY INITIATIVE

Importing Data into SAS

Training Needs Analysis

Excel basics. Before you begin. What you'll learn. Requirements. Estimated time to complete:

Managing Tables in Microsoft SQL Server using SAS

SDTM, ADaM and define.xml with OpenCDISC Matt Becker, PharmaNet/i3, Cary, NC

A Method for Cleaning Clinical Trial Analysis Data Sets

REx: An Automated System for Extracting Clinical Trial Data from Oracle to SAS

PharmaSUG 2014 Paper CC23. Need to Review or Deliver Outputs on a Rolling Basis? Just Apply the Filter! Tom Santopoli, Accenture, Berwyn, PA

A Macro to Create Data Definition Documents

Using DDE and SAS/Macro for Automated Excel Report Consolidation and Generation

Introduction. Why Use ODBC? Setting Up an ODBC Data Source. Stat/Math - Getting Started Using ODBC with SAS and SPSS

Importing from Tab-Delimited Files

Importing Excel Files Into SAS Using DDE Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA

Instructions for Configuring a SAS Metadata Server for Use with JMP Clinical

Pharmacy Affairs Branch. Website Database Downloads PUBLIC ACCESS GUIDE

Technical Paper. Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication

Paper Creating SAS Datasets from Varied Sources Mansi Singh and Sofia Shamas, MaxisIT Inc, NJ

Managing Custom Data Standards in SAS Clinical Data Integration

Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA

ABSTRACT INTRODUCTION THE MAPPING FILE GENERAL INFORMATION

Tips and Tricks for Creating Multi-Sheet Microsoft Excel Workbooks the Easy Way with SAS. Vincent DelGobbo, SAS Institute Inc.

Talend Component: tjasperreportexec

WHITE PAPER. CONVERTING SDTM DATA TO ADaM DATA AND CREATING SUBMISSION READY SAFETY TABLES AND LISTINGS. SUCCESSFUL TRIALS THROUGH PROVEN SOLUTIONS

Paper FF-014. Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL

Simply Accounting Intelligence Tips and Tricks Booklet Vol. 1

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

Chapter 2 The Data Table. Chapter Table of Contents

Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access

Introduction to Microsoft Access 2003

Search help. More on Office.com: images templates

PharmaSUG Paper AD11

Microsoft Access Introduction

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

SAS Programming Tips, Tricks, and Techniques

Using SAS Enterprise Business Intelligence to Automate a Manual Process: A Case Study Erik S. Larsen, Independent Consultant, Charleston, SC

Using Delphi Data with Excel and Access

Suite. How to Use GrandMaster Suite. Exporting with ODBC

How to Excel with CUFS Part 2 Excel 2010

Instant Interactive SAS Log Window Analyzer

Encoding the Password

SAS Analyst for Windows Tutorial

RA MODEL VISUALIZATION WITH MICROSOFT EXCEL 2013 AND GEPHI

Microsoft Excel 2013: Macro to apply Custom Margins, Titles, Gridlines, Autofit Width & Add Macro to Quick Access Toolbar & How to Delete a Macro.

PharmaSUG AD08. Maximize the power of %SCAN using WORDSCAN utility Priya Saradha, Edison, NJ

ECDL. European Computer Driving Licence. Spreadsheet Software BCS ITQ Level 2. Syllabus Version 5.0

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

How To Create A Report In Excel

Setting Up ALERE with Client/Server Data

CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases

PharmaSUG Paper QT26

SAS to Excel with ExcelXP Tagset Mahipal Vanam, Kiran Karidi and Sridhar Dodlapati

PharmaSUG Paper AD08

Microsoft Access 2007 Introduction

Business Intelligence Tutorial

SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN

Microsoft Excel Training - Course Topic Selections

Word 2010: Mail Merge to with Attachments

1.5 MONITOR. Schools Accountancy Team INTRODUCTION

Transcription:

PharmaSUG2011 - Paper CC10 A Recursive SAS Macro to Automate Importing Multiple Excel Worksheets into SAS Data Sets Wenyu Hu, Merck Sharp & Dohme Corp., Upper Gwynedd, PA Liping Zhang, Merck Sharp & Dohme Corp., Upper Gwynedd, PA ABSTRACT In order to import data from Microsoft Excel into SAS, there are various methods such as PROC IMPORT, IMPORT WIZARD, ODBC and Excel LIBNAME engine. The optimal way often depends on the project needs and programmer's preference. The need for importing multiple worksheets adds more complexity to an already challenging job. This paper introduces a SAS macro that can automatically handle all Excel files with various numbers of worksheets in different subfolders and generate separate SAS datasets for each Excel file. As Excel files often contain multiple worksheets, we believe this macro is a flexible and user-friendly tool that has wide applicability for importing Excel files to SAS. Keywords: proc import, Excel, recursive INTRODUCTION In the pharmaceutical industry, SDTM and ADaM Metadata information are usually stored in Excel spreadsheets. The current version of Excel files often need to be compared to previous versions to see what changes were made to the data definition file. Since there are no good tools to compare two files and identify the exact differences, we decided to convert the Excel files to SAS datasets and then compare the SAS datasets. Normally for different studies, the number of sheets and sheet names are different, thus using hard coding is very time consuming and error prone. An automated SAS macro would be a desirable utility tool to accomplish this task. This paper introduces a macro which reads multiple Excel files that have multi-sheets under different folders and subfolders by using a recursive macro technique and the Excel LIBNAME engine. The Excel LIBNAME engine was introduced in SAS version 9 and constructed a new way to connect SAS files and Microsoft Excel files. By using this new LIBNAME engine, along with other data transfer tools such as PROC EXPORT, PROC IMPORT, PROC SQL or DATA step, transferring data between Excel and SAS can be greatly simplified. THE MACRO DESIGN The fundamental design of this macro (%xls2sas) is to use a recursive approach. The macro flow is as follows: it first opens the root folder using DOPEN. If DOPEN returns a directory id value of greater than 0, it then uses DNUM to return the number of members in this directory. Once it gets the total number of members in the folder, it uses DREAD to get the member name and the scan function to extract the second word of the member name with the period as the delimiter. If the second word is XLS, then the Excel spreadsheet is read in using Excel LIBNAME. Once the LIBNAME statement is successfully executed, the Excel workbook, the sheets, and named ranges defined in the workbook will be available in the SAS Explorer window. If the second word resolves to null, it means the first word is considered as sub-folder. %xls2sas calls itself upon reaching sub-folders not listed in the exclusion folders list. When %xls2sas is called, it repeats the same process as the above. Upon finishing reading any file or folder, it goes up one level to its parent folder to read the next XLS file or folder. The process finishes when all Excel files and folders under the root folder are read. The key to the recursive design is to localize the macro variables so that every call to %xls2sas will have its local instance of variables. This prevents subsequent %xls2sas calls from overwriting the macro variable values of previous macro calls. 1

MACRO CODE STRUCTURE AND PARAMETERS THE MACRO CODE STRUCTURE IS: %macro xls2sas(folder=,subfd=y,exclfd=,startrow=); STEP #1: Declare local macro variables %local _j _cnt _dsid _i _num _s_ext _s_name _filename _rc; %let _rc=%qsysfunc(filename(filrf,&folder)); %if &_rc=0 %then STEP #2: Open the folders %let _dsid=%sysfunc(dopen(&filrf)); %if &_dsid>0 %then STEP #3: Get the number of files and folders %let _num=%sysfunc(dnum(&_dsid)); %do _i=1 %to &_num; STEP #4: Check if the file extension is XLS. If so, open Excel workbook and read in the multi-sheets or named ranges %if %upcase(&_s_ext)=xls %then %end; Libname excellib excel "&folder\&_filename"; libname excellib clear; 2

STEP #5: If the file extension resolves to NULL, check if the read subfolder flag is Y and also it is not in the exclusion folder list. If so, call %xls2sas to open the subfolder and Excel files under it %mend xls2sas; %else %if %scan(&_filename, 2,.)= and &subfd=y and %qsysfunc(indexw(&exclfd,&_filename))=0 %then %end; %xls2sas(folder=&folder\&_filename, subfd=&subfd, exclfd=&exclfd, startrow=&startrow) THE MACRO PARAMETERS ARE: folder subfd exclfd startrow root folder name to be searched for include subfolder (Y/N, default=y) subfolder list to be excluded starting row of Excel worksheet A CLOSER LOOK AT HOW TO PROCESS EXCEL WORKBOOKS First we use the LIBNAME statement to define libref Excellib as shown in step 4 above. Once the library is established, the worksheet names become available under Excellib, as shown in figure 1 below: Figure 1 3

By using Data Step or Proc SQL, we can create a list of macro variables from SASHELP.VSTABVW. In our example &sheet1, &sheet2, &sheet3, &sheet4 and &sheet5 are resolved to AE$, CM$, CO$, DM$ and DS$ respectively. Notice that here we use SASHELP.VSTABVW instead of SASHEP.VTABLE to get sheet names because if there are quotation marks in sheet names, then SASHELP.VTABLE will not have these sheet names in memname column. data _null_; set sashelp.vstabvw end=last; where libname="excellib"; memname=upcase(scan(memname, 1, '$')); call symputx(cats('sheet', _n_), memname, 'L'); if last then call symputx('_cnt', _n_, 'L'); run; libname excellib clear; After the macro variables list is created, we need to clear the LIBNAME for future use. The next step uses Proc Import combined with %do loop to read in the data from multiple Excel worksheets. In the code example below, &folder refers to the folder/subfolder %xls2sas currently processing. Since this is a recursive macro, &folder could be the subfolder under &folder user originally specified and &_filename is the Excel file currently under &folder. We also use "range" instead of the "sheet" option to specify the sheet name and combine it with the starting row if the variable values do not start on row 2. This adds more flexibility to the macro. Finally, the sheet information is appended to the master dataset created for the Excel file read in. In our example, the master dataset name is created using the folder/subfolder name together with the Excel spreadsheet name so that each Excel spreadsheet is guaranteed to have a unique master output dataset name for later comparisons. %do _j=1 %to &_cnt; %end; proc import datafile="&folder\&_filename" out= &&sheet&_j dbms=excel replace; range="&&sheet&_j..$&startrow.:65000"; mixed=yes; getnames=yes; run; data &master; set &master &&sheet&_j; run; 4

CAPABILITIES AND LIMITATIONS WITH EXCEL LIBNAME ENGINE The Excel LIBNAME engine has the capabilities to: Read data from pre-existing sheets and named ranges Append data to pre-existing sheets and named ranges Create new workbooks Create new spreadsheets and named ranges Write data to empty existing named ranges Delete existing data from spreadsheets Delete existing data from named ranges The following are not possible via Excel LIBNAME engine Read data from hidden worksheets Delete entire workbook or spreadsheets Change or apply formatting, i.e, font, color Delete cells containing formula Write formulas to the cell Rename spreadsheets within a workbook OTHER LIMITATIONS By default, SAS only scans the first 8 records of XLS file to determine column attributes/types. If the first 8 records contain only numeric data type and records with character data type are in the records afterwards, this column is considered as numeric data type when imported into SAS even though mix=yes is specified as an option in proc import statement. To solve this issue, the user needs to increase the TypeGuessRows in the Windows registry for the Microsoft Jet Engine to 0 which indicates 16,384 rows. This would force Proc Import to scan until the last non-empty data row in the workbook to determine the column data type. In our example of reading multiple Excel spreadsheets and appending them to the master dataset, we assumed that all worksheets within a single Excel file have similar attributes and data types because this is the nature of our metadata. Special attention is needed if the user has different data types for the same column within 2 different spreadsheets. CONCLUSION %xls2sas provides a flexible way to import a variable number of Excel worksheets under different folder/subfolder into SAS datasets. The macro, built with the Excel LIBNAME engine in conjunction with SASHELP.VSTABVW and recursive techniques, greatly reduces the amount of manual work. REFERENCES Ed Heaton "So, Your Data are in Excel!" Proceedings of the 31 st Annual SAS Users Group International Paul A. Choate, Carol A. Martell "De-Mystifying the SAS LIBNAME Engine in Microsoft Excel: A Practical Guide" Proceedings of the 31 st Annual SAS Users Group International Harry Droogendyk "Customized Excel Output Using the Excel Libname" Proceedings of South East SAS Users Group 2008 5

TRADEMARKS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Wenyu Hu Liping Zhang UG 1D-88 UG 1D-10 Merck Sharp & Dohme Corp. Merck Sharp & Dohme Corp. Upper Gwynedd, PA 19454 Upper Gwynedd, PA 19454 (267) 305-6847 (267) 305-7980 wenyu_hu@merck.com liping_zhang@merck.com 6