Automation of Large SAS Processes with Email and Text Message Notification Seva Kumar, JPMorgan Chase, Seattle, WA ABSTRACT SAS includes powerful features in the Linux SAS server environment. While creating large programs for automated data extraction and delivery on a daily, weekly, and monthly basis, we developed an internal process for scheduling, detecting database readiness, and reporting the success or failure of the process via email and text messaging. This paper illustrates methods used to check database readiness, dependent SAS data readiness, FTP final data, and notification of all recipients via email and text messaging from the Linux SAS server environment. INTRODUCTION In a large and complex data environment, there is an inevitable need for automation of existing and new processes, especially if the organization is transitioning. Ensuring the automation process is running correctly and delivering the results is critical. Imagine, for example, having the obligation of delivering results from multiple SAS programs, each thousands of lines long. To automate such programs and ensure there are no failures in the process, we have developed a wrapper macro program to control the process. The wrapper macro program detects events, runs the program, and sends appropriate messages via email and text messaging. The over arching steps necessary for the automation process are the following: 1. The Crontab process must start the wrapper macro program on an hourly, monthly, or daily basis. Crontab is part of the Linux scheduling process. 2. The wrapper macro program should generally carry out the following steps: a. Detect current date and set up environmental variables b. Determine success or failure events of a previous run c. Query database readiness d. Run the SAS program e. Generate success or failure email and text message f. Copy the log and output to an archive. CRONTAB For the full automation process to work, a process scheduler such as Crontab is necessary. One way to determine if you have Crontab privileges is to type the following command at the command prompt. If you don t have privileges, you should ask the sysadmin to set it up for you. $ crontab -l You (abcde) are not allowed to use this program (crontab) Contact your sysadmin to change If you have privileges, then you will see a listing of the existing Crontab jobs. Each Crontab scheduling is done in the following manner. To edit a Crontab file, use the command bellow. The editing is done using the vi editor. $ crontab -e # pound represents comment 02 6 1-5 * * /abc/def/sas/config/sas.sh -SASenv /def/evar.txt -- -log "/afsdf/log" - print "/afsdf/log" /werqw/prog.sas The numbers in the line represents the following (from left to right): minute (0-59), hour (0-23), day of month (1-31), month (1-12), day of week (0 6, Sunday=0 or 7). Following the scheduling time are commands that should be run. In the case above, start SAS, set the environmental variables including database access, route the log and output to the folder LOG, and finally run the program. If you are able to setup a small program and run it via Crontab, the remaining automation aspects are programming in nature rather than system configuration and setup. 1
ENVIRONMENTAL VARIABLES Since Crontab will run the wrapper macro program on schedule, it is very important to automatically create separate folders, date and time stamp the log, and include in the email the date and time of process. Generally, take the following steps: 1. Include SAS encrypted passwords to access databases. The sql procedure will need the user id and password to access various databases. It is necessary to keep the SAS encrypted passwords in a protected location accessible only by the automated process. *%INCLUDE '/top/secret/place/oh_nothing.sas'/nosource; 2. The following date construct is very useful in controlling the program logic, querying the databases, and writing the data sets, log, and output. %LET SASdate = %SYSFUNC(DATE( ) *Today as SAS date value; %LET orcltoday = %BQUOTE(')%sysfunc(DATE( ),mmddyy10.)%bquote(') ; *Today's date in the format mm/dd/yyyy. for use in sql where clause; %LET orcltom = %BQUOTE(')%sysfunc(intnx(DAY,&SASdate,1),mmddyy10.)%BQUOTE(') ; *Tomorrow's date in format mm/dd/yyyy. for use in sql where statement; %LET SASyest = %sysfunc(intnx(day,&sasdate,-1)) ; *yesterday's date for use in comparing to the date of the data; %LET sdate = %sysfunc(date( ),yymmddn8. *Date and time in the format YYYYMMDD.; *%LET sdate=20090331; *this is just a comment to remember the format and helpful in testing; %LET datehr = %sysfunc(date( ),yymmddn8.)%sysfunc(hour(%sysfunc(datetime( ))),z2. *Date and time in the format YYYYMMDDHH. If need something hourly.; *%LET datehr=2009033109; %put ******today in SAS: &SASdate -- today: &ortoday -- tomorrow: &ortom -- SASyest: &SASyest; *this row will output the results of the date variables in the log; 3. The following code demonstrates how to detect if a library exists, or if not, how to create it. *1) Is there a library? If not, create one for the day.; libname OUTLIB "/where/in/the/world/is/champ/&sdate."; %if &syslibrc ne (0) %then %do; %let PATH=/where/in/the/world/is/champ/&sdate.; x "mkdir /where/in/the/world/is/champ/&sdate."; libname OUTLIB "/where/in/the/world/is/champ/&sdate."; %if &syslibrc ne (0) %then %do; %PUT "***** Unable to create or establish OUTLIB library. ******"; %else %let PATH=/where/in/the/world/is/champ/&sdate.; Setting up these environmental variables is important because it will make it easy for the process to run. For example, if a process is running every hour, then it is important to create a date hour (datehr) macro variable at the very beginning. After that macro variable creation, it becomes easy to write out unique data set names, log, and output just by including the macro variable in the file name. Also, it is recommended that macro variable be output to the log with a specific identifier (such as six asterisks) at the beginning. When you are searching for a macro variable in a log that is hundreds of pages long, using the identifier makes it easy search and find it. SUCCESS OR FAILURE OF PREVIOUS RUN It is critical to detect if the process completed and ran successfully. If so, the wrapper macro should stop processing and exit the program. 2
1. One of the simplest way to check if the program already ran is to check if the files exit. If the expected files are there, then the wrapper macro should be stopped. %LET fe=%sysfunc(fileexist("&path./y&dt._co_template.csv"), z1 *Checks if the file exists. 1=file exists 0=file does not; %LET dsid=%sysfunc(exist(libref.y&dt._co) *Checks if the data set exists. 1=file exists 0=file does not; %if &dsid eq (1) %then %do; %PUT "******The charge of file (libref.y&dt._co) already exists! Program exit!"; %RETURN; 2. Another type of check is the number of observations from the main data set. The OBSCOUNT macro variable can be checked against the expected norms. proc sql noprint; select count(*) into :OBSCOUNT from libref.y&dt._co; Depending on the process, there are multiple ways to check for process success or failure. The wrapper macro must detect if the previous job completed and then act accordingly. For example, if the program takes six hours to run, we do not want to run it again. If the files are there, it is better to exit the wrapper process and save CPU resources. QUERY DATABASE READINESS In a complex database environment, there will be interdependent data feeds across various databases. If an automated system functions effectively, it must detect if the database is ready to run a query. If the database is ready, then run the main program; otherwise, the program must exit with appropriate message to people expecting the data. For example, consider a situation where the database can be updated anywhere from 1 to 5 a.m. Crontab can execute wrapper macro hourly from 1 to 9 a.m. Each run must detect if the files have already been created. If the files were not created, check the database to determine if it is ready. If there are data ready tables in the system, the following code, for example, can be used to check if the tables are ready for processing. *Check if the data is ready for query.; PROC SQL; CONNECT TO ORCL (USER="&UID" PASS="&PWD" PATH=DB CREATE TABLE WORK.READY AS SELECT TABLE_NAME, datepart(lst_dt) AS AS_OF_DT format=yymmdd10., today()-1 as YESTERDAY format=yymmdd10., case when TABLE_NAME ne 'LS_WK' then datepart(lst_dt)=today()-1 else datepart(lst_dt)=intnx('wk',today(),-1,'end') end as READY FROM CONNECTION TO ORCL ( SELECT TABLE_NAME, MAX(AS_DT)as LST_DT FROM ED_DAT_RDY_SUM WHERE TBL_NAME in ( 'LS_WK','ABC_TBL','CDE_TBL','DEF_TBL','EFG_TBL','FGH_TBL','GHI_TBL','HIJ_TBL','IJK_TBL' ) AND DT_RDY_T <> 'NA' 3
group by TBL_NAME order by TBL_NAME DISCONNECT FROM ORCL; QUIT; proc sql noprint; select sum(ready) into :READY from work.ready ; quit; %put ******READY at end--&ready; Another method is to submit a query to check the number of observations on the table for the timeframe. If it is within a reasonable range, then consider the table is ready. RUNNING THE SAS PROGRAM Running the main SAS program is easy. If the program passes through all the above conditions, then include the program with an include statement. If you need to include source code: %include '/some/place/now/where/did/i/put/it/big_sas_program.sas' / source2; In the wrapper macro, all the conditions must be checked before running the main program. GENERATE SUCCESS OR FAILURE EMAIL Regardless of how complex the program or how well it is crafted, something will usually fail. It is important that a person continues to monitor even the best automated systems. People prefer emails, so the process must generate emails explaining the status. Sample macro to generate email: *Email in the event of the program success or failure; %macro email_sf( TO=,CC=,SUB = "&date.-- DAILY PROCESS: Please enter subject line.",att =,MSG0= "",MSG1= "",MSG2= "",MSG3= "",MSG4= "",MSG5= "" options nosyntaxcheck; *If SAS in syntax check mode because of error, set the nosyntaxcheck and send email.; filename mail email ' ' to=(&to) cc=(&cc) subject=&sub &ATT ; DATA _NULL_; FILE mail; PUT "START EMAIL"; PUT &MSG0; 4
PUT "-------------------------------------------------------------------"; PUT &MSG1; PUT &MSG2; PUT &MSG3; PUT &MSG4; PUT &MSG5; PUT "-------------------------------------------------------------------"; PUT "END EMAIL"; RUN; %if &syserr ne (0) %then %do; %put "*****Unable to EMAIL! Unable to send a error email because of error!"; %put "*****syserr : &syserr"; %put "*****sysmsg : &sysmsg"; %ABORT return; %mend email_sf; *Anyone receiving email from the process; %let TO= "abc.def@world.net" "def.x.efg@world.com" "the.p.afdsf@world.gov" ; %let CC= "you.m.sas@sas.net" "mail.box@archive.net"; *Examples; *%email_sf( TO=&TO,CC=&CC,SUB = "&date.-- DAILY PROCESS: ERROR!",ATT = /*%STR(attach =("&PATH./DAILY.log"))*/,MSG0= "This is an automated message from DAILY creation process. The DAILY program did not run! The reason for the error.",msg1= "*****This is a test message 1.",MSG2= "*****This is a test message 2.",MSG3= "*****This is a test message 3.",MSG4= "*****This is a test message 4.",MSG5= "*****You can access the log, output, and program here: &SBPATH." *%email_sf( TO=&TO,CC=&CC,SUB = "&date.-- DAILY PROCESS: SUCCESS!",ATT = %STR(attach =("&PATH./DAILY.SAS" "&PATH./DAILY.SAS")),MSG0= "This is an automated message from DAILY creation process. The DAILY program ran successfully! Please review the notes below:",msg1= "*****This is a test message 1.",MSG2= "*****This is a test message 2.",MSG3= "*****This is a test message 3.",MSG4= "*****This is a test message 4.",MSG5= "*****You can access the log, output, and program here: &PATH." proc format; value yn 1="***YES***" 5
0="@@@NO@@@" ; run; %let M1=******COUNT : ABCS (%trim(&abcs_cnt)) + BCS (%trim(&bcs_cnt.)) = TOTAL (%trim(&cnt.) %let M2=******BALANCE : ABCS (%trim(&acls_bal)) + BCS (%trim(&bcs _BAL.)) = TOTAL (%trim(&bal.) %let M3=******BASE FINAL : COUNT (%trim(&fin_cnt.)) PASS: %sysfunc(putn(%eval(&cnt.=&fin_cnt.),yn.)) AND BALANCE (%trim(&fin_bal.)) PASS: %sysfunc(putn(%eval(&bal.=&fin_bal.),yn.) %let M4=******BASE FINAL DAY: COUNT (%trim(&find_cnt.)) PASS: %sysfunc(putn(%eval(&cnt.=&find_cnt.),yn.)) AND BALANCE (%trim(&find_bal.)) PASS: %sysfunc(putn(%eval(&bal.=&find_bal.),yn.) %email_sf( TO=&TO,CC=&CC,SUB = "&date.-- DAILY PROCESS: ***The Files are ready!***",att = /*%STR(attach =("&PATH./crontab/DAILY.log" "&PATH./crontab/DAILY.lst"))*/,MSG0= "This is an automated message from DAILY creation process. The DAILY program ran! The tables are ready for processing. The program will run again tomorrow when the tables are ready.",msg1= "&M1.",MSG2= "&M2.",MSG3= "&M3.",MSG4= "&M4.",MSG5= "******You can access the log here: &PATH./_DAILY/LOG and output here: &PATH./DAILY/RAW_DAT" The key to the email process is to build a generic macro that can be used for different purposes. GENERATE SUCCESS OR FAILURE TEXT MESSAGE The text messaging is very easy if you have the email process in place. The carriers can take a standard email and route it has a text message. Below is a sample macro to send text message: *Text Message in the event of the program success or failure; %macro email_sf( TO=,SUB = "&date.-- DAILY PROCESS: Please enter subject line.",msg0= "",MSG1= "",MSG2= "",MSG3= "",MSG4= "",MSG5= "" options nosyntaxcheck; *In the event SAS in syntax check mode because of error, set the nosyntaxcheck and send email.; filename mail email ' ' to=(&to) subject=&sub ; DATA _NULL_; FILE mail; PUT "START"; PUT &MSG0; 6
PUT &MSG1; PUT &MSG2; PUT &MSG3; PUT &MSG4; PUT &MSG5; PUT "END"; RUN; %if &syserr ne (0) %then %do; %put "*****Unable to EMAIL! Unable to send a error email because of error!"; %put "*****syserr : &syserr"; %put "*****sysmsg : &sysmsg"; %ABORT return; %mend email_sf; *Anyone receiving email from the process; *You can find all the SMS gateways listed here http://en.wikipedia.org/wiki/sms_gateways %let TO= "2061234567@txt.att.net" ; *AT&T; %let TO= "2061234567@tmomail.net" ; *TMOBIL; %LET date = %sysfunc(date( ),yymmddn8. *Examples; %email_sf( TO=&TO,SUB = "&date.-- DAILY PROCESS",MSG0= "This is an automated message from DAILY creation process.",msg1= "*****This is a test message 1.",MSG2= "*****This is a test message 2.",MSG3= "*****This is a test message 3.",MSG4= "*****This is a test message 4.",MSG5= "*****You can access the log, output, and program here:" Note that most providers impose a limitation on the length of text messages; therefore, messages should be designed with this restriction in mind. FTPING THE DATA There are automated processes that will run a scheduled job and ftp the data to a server. It is sometimes efficient to check for updated files and copy them to the SAS server. Below is a sample macro to FTP data on a schedule: *This marco will copy the file from FTP server to SAS server.; %macro GET_FIL(inpath=,outpath=,FN= FILENAME INFL FTP "&FN" LRECL=32767 CD="/&inpath" HOST="someftpserver.net" USER="anonymous" PASS="" ; filename OUTFL "&path./&outpath/&fn" LRECL=32767; %let CKINFL=%sysfunc(fexist(INFL) %*****If unable to establish a filename, send error stop; %if (&CKINFL=0)%then %do; %put ******Unable to establish FTP connection to file--&fn; %put ******Problem with input file: &CKINFL; 7
%abort; *This will end the program; %else %put ******Establish FTP connection to file--&fn; DATA _null_; INFILE INFL; FILE OUTFL; input; put _infile_; RUN; %let CKOUTFL=%sysfunc(fexist(OUTFL) %*****If unable to create a output file, stop; %if (&CKOUTFL=0)%then %do; %put ******Unable to establish output file--&fn; %put ******Problem with output file: &CKOUTFL; %abort; *This will end the program; %else %put ******Downloaded file--&fn; %mend; *Anyone receiving email from the process; The key to setting up an automated FTP connection is to make sure that manual FTP can be completed without problems. With slight modification, reading and writing to an FTP server is very similar to the above code. LOG AND OUTPUT It is a very good idea to maintain a log and output of each run. When Crontab runs, it will automatically capture the log and output to a file at the location provided. In the example below, the log and output are sent to the location /afsdf/log/prog.log and /afsdf/log/prog.lst. 02 6 1-5 * * /abc/def/sas/config/sas.sh -SASenv /def/evar.txt -- -log "/afsdf/log" - print "/afsdf/log" /werqw/prog.sas Since each time the Crontab runs, it will generate the same log and output file names. It is suggested that log and output be renamed at the end of the program and that all permissions be reset to default values. *House keeping; *the log file, change the name and put it into the appropriate folder; X "mv &PATH./SAT/ctab/DAILY.log &SBPATH./DAILY/LOG/Y&date.DAILY.log"; *The output file, change the name and put it into the appropriate folder; X "mv &PATH./SAT/ctab/DAILY.lst &PATH./DAILY/LOG/Y&date.DAILY.lst"; *Grant access to the files; %let PATH=/somewhere/over/the/rainbow; X "chgrp -R abcusers &PATH; chmod -R g=rwx &PATH; chmod -R o=r &PATH"; Keeping the log and output is good record keeping. Importantly, in the event of future process failure, you will know exactly where and what time the process failed. CONCLUSION The goal of this paper is to provide set tools for creating automated processes and keeping the recipients informed. The basic tools for creating such processes are outlined in the paper. Given the right level of access, it is easy to set up automated processes that can continue to check a program s status and report to the users day or night via email and text messaging. This increases data reliability and reduces a recipient s concerns regarding status updates. REFERENCES 1. http://en.wikipedia.org/wiki/cron 8
2. http://en.wikipedia.org/wiki/sms_gateways CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Seva Kumar JPMorgan Chase 1301 2nd Ave, Floor 25 Seattle, WA, 98101 206-500-6385 E-mail: sevak@hevanet.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. SAS indicates USA registration. Other brand and product names are trademarks of their respective companies. 9