Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached. Nitin Gupta, Tailwind Associates, Schenectady, NY ABSTRACT This paper describes a method to run discreet macro(s) embedded in a single SAS code on a particular day of the week or date in the month. The initiation of the macro happens only when both the day of the week is reached and source(s) data is available. This paper is appropriate for SAS users who run SAS jobs on a production environment. The macro efficiently moves analytical activities given the available schedule/data, avoiding redundancies in running SAS jobs. The code is a combination of SQL and SAS Macro language. This paper is intended for SAS users of all skill levels. Keywords: SAS Macros, Scheduling SAS Jobs, SQL, %Sysfunc(exist), DO loop INTRODUCTION Maintenance and execution of multiple SAS programs become more difficult as the organizations grow and the number of variables involved increases. Scheduling the right program on the right day may involve multiple departments and various resources to get desired results. Also, the ability to run multiple programs simultaneously across multiple SAS sessions may require expensive scheduling tool(s). This paper presents three steps to make the scheduling process simple and error-free. After reading this paper, you should be able to tailor your SAS programs to process Macro statements on the given schedule in your organization. This combination of macros and SQL will be introduced through an example. After the example, the syntax section will describe each of the macros and their arguments. A weekly processing for each of our SAS codes consisted of multiple SAS steps and took an analyst number of hours to just compile each step interactively and maintain the output of these codes. A production framework was needed to better utilize our staff and to ensure that SAS code(s) processing was completed successfully by producing a logical log as output. In the process, we developed a consistent scheduling structure for our SAS macro(s) processing environment by running different macros on different days using a single SAS batch file. The scheduling macro facility requires two parameters, the date of the execution of SAS process and Source or the DATA set to be processed by the SAS macro code. The macro then conditionally reads and validates the data sets if they exist or ends execution if the source DATA sets do not exist. 1
The figure below shows relationship between scheduling tasks and individual processing steps. METHOD Step A involves establishing SAS Macro environment and saving key macro variables in a single SAS File. DATA sources passes key variable to the SAS macro and check for returned code from each step. Execute the production task if the return code is Yes. For the purpose of this paper, assume that all the macros are included in a single SAS batch file, which would be used for scheduling purposes. These programs were set up to run under SAS in the Microsoft Windows environment, so you should keep that in mind when looking at them. You may easily modify the systems interfaces of these programs so that they may run in a UNIX, Linux, or z/os environment. The following three step process takes data from the sources and transforms them to the target using scheduling logic: Prepare single file containing various Macro program, which needs to be executed on a given day/dates. Create a list of all the dates, on which the SAS macros would be run. Check if the DATA Sources are available and day reached for Macro(s) to Run. 2
STEP 1: CREATE A DATE FILE A SAS DATA step is used, to create a file called dates, which represents list of dates. You can choose any year for start and end of the date, in our example we have chosen the dates between years 1980 and 2020. The table contains date of the year and all other information related to that date i.e. month of the date, week of the date, quarter of the date, days of week name etc. Variable name thedate contains the information related to the specific date, to help triage any schedule for the SAS program execution. In the following example, thedate and the dayofweekname variable are used to create the boundary conditions. You can use any of the other variable depending upon the frequency of execution of your SAS program. data dates; retain dayofyear; format datekey 11.; do year = 1980 to 2020; dayofyear = 0; do month = 1 to 12; do dayofmonth = 1 to 31; format thedate mmddyy10.; thedate = mdy(month,dayofmonth,year); if (thedate =.) then leave; dayofyear = dayofyear + 1; weekofyear = (1 + floor(dayofyear / 7)); dayofweekname = left(trim(put(thedate,downame9.))); quarter = put(thedate,qtr1.); datekey = int(compress(put(thedate,yymmdd10.),"-")); output; run; end; end; end; Explanation of Syntax: DO loop is used to create various intervals of dates. Date variable is formatted and displayed in day/week/ month/year. Table 1: The result of the above DATA step would produce the following dates table. 3
STEP 2: CREATE THE BOUNDARY CONDITIONS FOR THE EXECUTION SCHEDULE. SQL code is used to create the date range (minimum and maximum dates) for the Macro step(s) to be executed. In this example below, the SAS job should run on each Friday of the week between the dates 1stJan2009 (&mindate) and 30thApr2009 (&maxdate). Number of observations or frequency of dates can be modified by adjusting the where clause in the SQL statement. %let mindate = '1Jan09'd; %let maxdate = '30Apr09'd; proc sql; create table snapshotdates as select thedate from dates where dayofweekname = 'Friday' and (thedate between &mindate and &maxdate) order by thedate; quit; The DATA table Snapshotdates would contain dates, which corresponds to all Friday s of the month between the minimum and maximum dates specified in our schedule. Step 3 below uses the Windows batch job scheduler to run the relevant job once the date is reached by executing the SAS Macro. The following section explains the processes using the SAS Syntax. STEP 3: VERIFY THE EXISTENCE OF THE SPECIFIED SOURCES AND SAS PROGRAM EXECUTION DATE. The DATA step below would checks if the date(s) for the process has been reached by comparing the thedate variable with the system date (&sysdate) of the machine. If the Run date of the program has been reached, there would be one observation in the snapshotdate_job1 data set. data snapshotdates_job1; set snapshotdates; where (thedate) = "&sysdate."d; keep thedate; run; A DATA _NULL_ step is used to create a macro variable containing a list of all the conditional logic and DATA sets. The SAS EXIST function verifies that a data library source member exists and is available to access. By default the EXIST function checks for a member type of DATA. The EXIST function returns a 1 if the library member exists and a zero if the member name does not exist or if the member type is invalid. In the example below macro entitled %Checkfordateandsource requires two parameters for input, the DATA set snapshotdate_job1 and the source file name with the location. Macro %RunSASMarco would be executed if both the source files and the date file are available. The macro variable &dsn requires the name of the source file and macro 4
variable &dsn1 checks if the date of the month has reached. Additional conditional logic can be used to schedule multiple number of SAS Macros on different days by creating the various number of Snapshotdates_job files for each schedule date. We would suggest the use of ERRORABEND option to terminate a SAS process if the source data is not available. The Log in the Macro statement %Checkfordateandsource is designed in such a way that SAS would only execute ERRORABEND option if the data source(s) are not available. SAS log would show the statement Date is not reached to run the SAS Macro and would not terminate SAS session if only the date condition is not met. Since the production job would runs without manual intervention, notification is recorded in the SAS output if either the Source(s) is not available or the day of the week is not reached, so that corrective action can begin immediately. An email notification can also be setup using an email client function (not presented in this paper), including the LOG of the failing step as an attachment. %Macro CheckfordateandSource(dsn, dsn1); data _null_; %if %sysfunc(exist(&dsn)) = 1 %then %do; %let dsid = %sysfunc(open(&dsn)); %let numobs = %sysfunc(attrn(&dsid,nlobs)); %if &numobs gt 0 %then %do; %if %sysfunc(exist(&dsn1)) = 1 %then %do; %let dsid1 = %sysfunc(open(&dsn1)); %let numobs1 = %sysfunc(attrn(&dsid1,nlobs)); %if &numobs1 gt 0 %then %do; /* The name of the Macro Variable, which would be executed */ %RunSASMacro; /* Multiple SAS Macros can be Scheduled using Conditional Logic */ %else %do; %put ##########################################; %put Date is not reached to run the SAS Macro; %put ##########################################; %else %do; %put #########################################; %put ERROR: The data set &dsn does not; %put ERROR- exist in the specified location; %put ERROR- Terminating SAS Process; %put #########################################; xxxx; run; %Mend CheckfordateandSource; /** &Source: The file that %RunSASMacro uses as a Source ***/ /** SnapshotDates_Job1: The day of the Week the Job is supposed to Schedule ***/ %CheckfordateandSource(&Source, SnapshotDates_Job1); Explanation of Syntax: The DATA set of interest is opened for inquiry. The ATTRN function returns number of observations in the DATA set. 5
CONCLUSION The sysfunc(exist) checks for the number of observations and executes macro %runsasmacro only if the return value is 1. This paper presents a framework that automates the task of running various macro statements on a particular day of the week or date in the month if the source(s) are available. Three easy SAS steps can help schedule tasks more efficiently and error free, notifying each step of successful completion to an analyst. The analyst is freed from the tedium of running each step manually and only needs to respond to correct errors or check results at completion of the entire process. The production framework like the one described above helps maintain standard programming practices and reduces occurrence of errors. DISCLAIMER The contents of this paper are the work of the author and do not necessarily represent the opinions, recommendations, or practices of NYS Office of Mental Health. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates a USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. REFERENCES Additional information about the SAS Macro Facility can be found in the SAS Macro Language: Reference, Version 8. ACKNOWLEDGMENTS This work was supported by the New York State Office of Mental Health. I would like to express my appreciation to Dr. Emily Leckman-Westin for her contributions to this paper. CONTACT INFORMATION Please feel free to contact me if you have any questions or comments about this paper. You may reach me at: Nitin Gupta 1462 Erie Boulevard, Schenectady, NY 12305 Work Phone: 518-486-9644 Email: issdnxg@omh.state.ny.us 6