SQL Case Expression: An Alternative to DATA Step IF/THEN Statements John Q. Zhang LCS Industries, Inc. Clifton NJ ABSTRACT This paper will show that case-expression utilized in PROC SQL can be as efficient and flexible as the logical expression IF/THEN statement. In addition to the advantage of clear, brief code, PROC SQL can not only perform procedural steps but data steps as well. This paper will show how powerful a tool PROC SQL can be. INTRODUCTION Quite a few papers presented in the previous conferences and current one have elaborated the fundamentals of SAS SQL procedure; the syntax like CREATE, SELECT *, FROM, GROUP BY, ORDER BY, WHERE, INTERSECT, UNION, EXCEPT, EXISTS, INSERT and COALESCE function become more and more familiar to the programmers and users. It is not premature, definitely, to further exploit the potential and capacity of PROC SQL with the objective of benefiting the SAS data processing community. The paper represents one attempt at this exploitation. RATIONALE The SQL CASE expression normally is composed of case-operand, when-condition and resultexpression. They must be valid sql-expressions. The syntax format goes like the following: CASE<case-operand> WHEN when condition THEN result-expression <WHEN when-condition THEN result-expression>... <ELSE result-expression> END. Multiple Nested CASE expression can also be used correspondingly to nested IF/THEN statement: CASE<case-operand> WHEN when condition THEN( CASE<case-operand> WHEN when condition THEN( result-expression) <ELSE(result-expression)> END) <ELSE(result-expression)> END. Three versions are available of the SAS secondorder program (also called program generator ). Two 1 versions are compiled by the base SAS DATA step, with very little PROC SQL involved, while the third version quite heavily leaned on PROC SQL (the complete program presented in APPENDIX III). The function of the programs is using SAS to read COBOL copybooks (layouts)to create SAS programs which are able to create SAS data sets ready for further manipulation and analysis. Some of the copybooks are longer than 500 lines. It would be considerably timeconsuming, tedious and error-prone to manually put in the input statement, start position, variable names, variable length and label statement, etc. Those programs can take care of the problems and produce exactly the same output in a few seconds. To set up the input statement (a sample of a copybook can be found in APPENDIX I)copybooks have to be cut in 2 parts: the description part (called dummy or dummy1 in the program generator); and the length part (called code ). To create the variables using the description part sequence numbers have to be concatenated with character strings to avoid duplicates. After the variables are created they can be concatenated with the description part itself to set up label statement. To set up the length of the variables data transformation and manipulation of the length part of the copybook are necessary, which is the major component of the first 2 versions of the program generator, and where IF/THEN statements or PROC SQL is heavily involved. The basic operation is to take the number of the bytes that the each variable occupies and express them as SAS informats. After the length of variables is set up, the program can calculate the starting position of the variables. First pick just the number (of bytes)in the length, then sum the number for each variable using the LAG function. The first program uses IF/THEN statements in SAS DATA step to assign SAS informats to the length of variables created, i.e. to change COBOL notation to SAS notation. The CASE expression, specially the nested CASE WHEN condition, supported in PROC SQL is
introduced in the second program to handle the same task. The rest of the program is quite similar to the first one. In APPENDIX IV part of the first program (IF/THEN statements to set up length)is presented to compare with the CASE expression utilized in the second program. The third program also uses DATA step statements to manipulate the description part of the copybook and the starting position of SAS variables as well. But the PROC FORMAT procedure is used to create SAS informats assigned to the variables established. The first program looks lengthier and more redundant than the second as well as the third. The third one works as well as the other two, and it is concise and brief but a user-defined format, created by a SAS macro, must be stored in a SAS catalog library. CONCLUSION After running the second program it can be seen that the CASE expression of PROC SQL may completely take the place of IF/THEN statements and result in the same output(provided in APPENDIX II). It is more efficient in terms of CPU time used. The code would be more brief and concise combined with in-line views. Therefore, another alternative to IF/THEN statements would be available whenever a particular situation fits. SAS users would be better armed in terms of choice of techniques. It also shows the versatility of PROC SQL. This paper does not intend to prove that PROC SQL, especially the CASE expression, leaves nothing to be desired in comparison with IF/THEN statements in regular DATA step. In fact, it does have some disadvantages. For instance, if the situation is a multiple condition with a single result CASE expressions would work as well as IF/THEN statements, and the code even briefer. This is what the 2nd program shows. However, in the case of a single condition with multiple result the CASE expression would appear lengthy and clumsy, even though the output would be the same, for example, THEN(2) END AS B, CASE WHEN(A=1) THEN(3) END AS C, CASE WHEN(A=1) THEN(4) END AS D... The repetitive CASE WHEN condition makes the code look awkward. Secondly, some useful SAS functions and operators are not supported in PROC SQL, definitely not in CASE expression, such as the family of LAG functions. Note: this paper is just a experiment that attempts to exploit and expand the capacity of PROC SQL. Any comments, suggestions and advice would certainly be welcome. With respect to the application, the program generators can be modified to read any flat file of layouts, either in COBOL or in other languages, which could readily facilitate SAS data step programming and analysis. The program generators appended to the paper could set up the SAS programs that read only the fixedlength-portion of COBOL files, in consideration of space. Those for reading both fixed and variable-length-portion COBOL files are also available but they become lengthier and more complex. ACKNOWLEDGMENTS SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. indicates USA registration. Thanks for the advice and information from Mr. Bernard F. Kaine & Mr. Harold B. Legg. CONTACT INFORMATION John Q. Zhang e-mail addr: jzhang@lcsi.com (201)778-5588/EXT3486(O) (212)567-6092(H) IF A=1 THEN DO; B=2; C=3; D=4; ; For PROC SQL, the code would be like the following: CASE WHEN(A=1) 2
APPENDIX I: SAMPLE OF COBOL COPYBOOK *********************************************************** 96/08/30 * CPXXXXXX * XXXXXXX STANDARDIZATION RECORD LAYOUT * LV009 *********************************************************** * DPFF-FIELDS = STANDARDIZED RETURN FIELDS * IN-FIELDS = ORIGINAL DATA PLUS COMPLETE NAME FIELD * PERS-FIELDS = PERSONS OUTPUT *********************************************************** * PROJECT * 01 DPFF-SITE-RECORD. 03 DPFF-REC-LENGTH PIC X(05). 03 DPFF-ORIGINAL-LIST PIC X(03). 03 DPFF-SEQUENCE PIC X(08). 03 DPFF-STD-ADDRESS. 05 DPFF-TITLE PIC X(25). 05 DPFF-ADDRESS1 PIC X(10). 05 DPFF-ADDRESS2 PIC X(20). 05 DPFF-POST-CODE PIC X(08). 05 DPFF-COUNTRY PIC X(25). 07 PERS-GEOCODE PIC X(12). 07 PERS-GEOCODE-LEVEL PIC X(05). 07 PERS-STREET-KEY PIC X(05). 07 PERS-SEX-CODE PIC X(01). 03 DPFF-TOWN-POSITION PIC S9(09) COMP-3. 03 DPFF-TITLE-LENGTH PIC 9(02). 03 DPFF-TITLE-LENGTH2 PIC 9(04). 03 DPFF-TV-AREA-CODE PIC 9(04) COMP. 03 DPFF-PREMIS-POSITION PIC 9(08) COMP. 03 FILLER PIC X(45). 03 DPFF-CREATION-DATE PIC 9(06) COMP-3. 03 DPFF-THOROUGHFARE-POS PIC 9(03)V9 COMP-3. 03 DPFF-MAJOR-THORFR-POS PIC 9(03)V99 COMP-3. 03 DPFF-DEC-POSITION PIC S9(07)V999 COMP-3. 03 DPFF-COMPANY-INDICATOR PIC X(04). 03 DPFF-PRIVATE-ADDR-IND PIC X(04). 03 DPFF-FLAT-KEY PIC X(02). 03 DPFF-POSTCODE-DPS PIC X(02). 03 DPFF-LIST-CODES PIC X(10). 03 IN-ORIGINAL-RECORD. 05 IN-CESSATION-REASON PIC X(02). 05 IN-EXCHANGE-CODE PIC X(06). 05 IN-INSTALLATION-STATUS PIC X(01). 05 IN-CUSTOMER-CLASS PIC X(01). 05 IN-CUSTOMER-TYPE PIC X(02). 05 IN-END-DATE PIC X(08). 05 IN-START-DATE PIC X(08). 05 IN-SITE-INFO. 10 IN-TITLE PIC X(28). 10 IN-INITIALS PIC X(28). 05 IN-POSTCODE PIC X(10). 03 FILLER PIC X(1808). APPENDIX II: OUTPUT DATA XXXX; INFILE YYYY; INPUT @0001 RECLE14 $CHAR05. @0006 ORIGI15 $CHAR03. @0009 SEQUE16 $CHAR08. @0017 TITLE18 $CHAR25. @0042 ADDRE19 $CHAR10. @0052 ADDRE20 $CHAR20. @0072 POSTC21 $CHAR08. @0080 COUNT22 $CHAR25. @0105 GEOCO23 $CHAR12. @0117 GEOCO24 $CHAR05. @0122 STREE25 $CHAR05. @0127 SEXCO26 $CHAR01. @0128 TOWNP27 PD5. @0133 TITLE28 2. @0135 TITLE29 4. 3
@0139 TVARE30 PIB2. @0141 PREMI31 PIB4. /*@0145 FILL32 $CHAR45. */ @0190 CREAT33 PD3. @0193 THORO34 PD2.1 @0195 MAJOR35 PD3.2 @0198 DECPO36 PD5.3 @0203 COMPA37 $CHAR04. @0207 PRIVA38 $CHAR04. @0211 FLATK39 $CHAR02. @0213 POSTC40 $CHAR02. @0215 LISTC41 $CHAR10. @0225 CESSA43 $CHAR02. @0227 EXCHA44 $CHAR06. @0233 INSTA45 $CHAR01. @0234 CUSTO46 $CHAR01. @0235 CUSTO47 $CHAR02. @0237 ENDDA48 $CHAR08. @0245 START49 $CHAR08. @0253 TITLE51 $CHAR28. @0281 INITI52 $CHAR28. @0309 POSTC53 $CHAR10. /*@0319 FILL54 $CHAR1808.*/ ; LABEL RECLE14="REC LENGTH" ORIGI15="ORIGINAL LIST" SEQUE16="SEQUENCE" TITLE18="TITLE" ADDRE19="ADDRESS1" ADDRE20="ADDRESS2" POSTC21="POST CODE" COUNT22="COUNTRY" GEOCO23="GEOCODE" GEOCO24="GEOCODE LEVEL" STREE25="STREET KEY" SEXCO26="SEX CODE" TOWNP27="TOWN POSITION" TITLE28="TITLE LENGTH" TITLE29="TITLE LENGTH2" TVARE30="TV AREA CODE" PREMI31="PREMIS POSITION" CREAT33="CREATION DATE" THORO34="THOROUGHFARE POS" MAJOR35="MAJOR THORFR POS" DECPO36="DEC POSITION" COMPA37="COMPANY INDICATOR" PRIVA38="PRIVATE ADDR IND" FLATK39="FLAT KEY" POSTC40="POSTCODE DPS" LISTC41="LIST CODES" CESSA43="CESSATION REASON" EXCHA44="EXCHANGE CODE" INSTA45="INSTALLATION STATUS" CUSTO46="CUSTOMER CLASS" CUSTO47="CUSTOMER TYPE" ENDDA48="END DATE" START49="START DATE" TITLE51="TITLE" INITI52="INITIALS" POSTC53="POSTCODE" ; 4
APPENDIX III: THE 2nd PROGRAM(PROC SQL version) FILENAME X 'CPXXXXXX.SAS'; OPTIONS LS=132 MPRINT; DATA INPUT; INFILE X MISSOVER LENGTH=L; INPUT DUMMY $VARYING68. L; DUMMY=TRANSLATE(DUMMY,' ','-'); PIC=INDEXW(DUMMY,'PIC'); SEQ=_N_; IF PIC THEN DO; DUMMY1=SUBSTR(DUMMY,1,PIC-1); CODE=SUBSTR(DUMMY,PIC,INDEX(DUMMY,'.')-PIC+1); ELSE DELETE; FILL=INDEXW(DUMMY,'FILLER'); L1=INDEX(CODE,'('); L2=INDEX(CODE,')'); X=INDEX(CODE,'X('); T9=INDEX(CODE,'9('); T=INDEX(CODE,'3.'); V=INDEX(CODE,'V'); V9=INDEX(CODE,'V9'); V99=INDEX(CODE,'V99'); V999=INDEX(CODE,'V999'); COMP=INDEX(CODE,'COMP'); IF PIC & FILL=0 & INDEXC(DUMMY,'0123456789') THEN WORD=SCAN(DUMMY,2); PROC SQL; CREATE VIEW PICK AS SELECT UNIQUE WORD FROM INPUT WHERE WORD NE ' '; DATA _NULL_; SET PICK; S+1; CALL SYMPUT('SYS' LEFT(PUT(S,2.)),TRIM(LEFT(WORD))); CALL SYMPUT('CON',PUT(S,2.)); %MACRO SRCH1; %DO J=1 %TO &CON; Y&J=INDEX(DUMMY,"&&SYS&J"); % %M /*SET UP FIELDS, LENGTH AND LABEL*/ DATA INPUT2; SET INPUT; %SRCH1 PROC SQL; CREATE VIEW MANU AS SELECT LENG,FIELD,LENGTH,TRIM(FIELD) '=' '"' TRIM(LEFT(DSCRPTN)) '"' AS LABEL FROM( SELECT CASE WHEN(INDEX(LENGTH,'.')) THEN(COMPRESS(SUBSTR(LENGTH,1,INDEX(LENGTH,'.')-1),'$CHARPIBD.')) ELSE(LENGTH) END AS LENG, FIELD, LENGTH FROM( SELECT CASE %MACRO SRCH2; %DO K=1 %TO &CON; WHEN(Y&K AND LENGTH(COMPRESS(DUMMY1))>=5) THEN(SUBSTR(COMPRESS(SUBSTR(DUMMY1,Y&K+LENGTH("&&SYS&K"))),1,5) LEFT(PUT(SEQ,3.))) WHEN(Y&K AND LENGTH(COMPRESS(DUMMY1))<5) THEN(COMPRESS(DUMMY1) LEFT(PUT(SEQ,3.))) % %M %SRCH2 WHEN(FILL) THEN(SUBSTR(DUMMY1,FILL,4) LEFT(PUT(SEQ,3.))) ELSE(DUMMY1) END AS FIELD, CASE %MACRO SRCH3; %DO L=1 %TO &CON; WHEN(Y&L) 5
THEN(SUBSTR(DUMMY1,Y&L+LENGTH("&&SYS&L"),PIC-Y&L-LENGTH("&&SYS&L"))) % %M %SRCH3 ELSE(DUMMY1) END AS DSCRPTN, CASE WHEN(X & T9=0) THEN(COMPRESS('$CHAR' SUBSTR(CODE,L1+1,L2-L1-1) '.')) WHEN(X=0 & COMP=0) THEN(CASE WHEN(T9) THEN(CASE WHEN(V9=0) THEN( COMPRESS(PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.),3.) '.')) WHEN(V9 & V99=0 & V999=0) THEN( COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1),3.) '.1')) WHEN(V9 & V99 & V999=0) THEN( COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2),3.) '.2')) WHEN(V9 & V99 & V999) THEN( COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3),3.) '.3')) WHEN(X=0 & COMP>0) THEN(CASE WHEN(T=0 & T9) THEN( COMPRESS('PIB' PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2),3.) '.')) WHEN(T) THEN( CASE WHEN(T9) THEN( CASE WHEN(V9=0) THEN( COMPRESS('PD' PUT(CEIL(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2),3.) '.')) WHEN(V9 & V99=0 & V999=0) THEN( COMPRESS('PD' PUT(CEIL((INPUT(SUBSTR( CODE,L1+1,L2-L1-1),3.)+1)/2),3.) '.1')) WHEN(V9 & V99 & V999=0) THEN( COMPRESS('PD' PUT(CEIL((INPUT(SUBSTR( CODE,L1+1,L2-L1-1),3.)+2)/2),3.) '.2')) WHEN(V9 & V99 & V999) THEN( COMPRESS('PD' PUT(CEIL((INPUT(SUBSTR( CODE,L1+1,L2-L1-1),3.)+3)/2),3.) '.3')) ELSE(CODE) END AS LENGTH FROM INPUT2)); /* SET UP STARTING POSITIONS */ DATA START(KEEP=START FIELD LENGTH LABEL); IF _N_=1 THEN STRT=1; SET MANU; STRT+INPUT(LENG,4.); STAT=LAG(STRT); IF STAT=. THEN STAT=1; START='@' LEFT(PUT(STAT,Z4.)); /* WRITE TO A OUTPUT FILE AND MAKE A SAS PROGRAM READY */ DATA _NULL_; IF _N_=1 THEN PUT @5 'DATA XXXX;' /@7 'INFILE YYYY;' /@7 'INPUT'; DO UNTIL(END1); SET START END=END1; IF FIELD ^=:'FILL' THEN PUT @10 START @20 FIELD @31 LENGTH; ELSE PUT @8 '/*' @10 START @20 FIELD @31 LENGTH @41 '*/'; IF END1 THEN PUT @10 ';' /@7 ' ' /@7 'LABEL'; DO UNTIL(END2); SET START END=END2; IF INDEX(LABEL,'FILLER')=0 THEN PUT @10 LABEL; IF END2 THEN PUT @7 ';' /@7 ''; 6
APPENDIX IV: corresponding IF/THEN statements IF PIC & X THEN DO; LENGTH='$CHAR' SUBSTR(CODE,L1+1,L2-L1-1) '.'; ELSE IF PIC & X=0 THEN DO; IF COMP=0 & T=0 THEN DO; IF T9 & V=0 THEN DO; LENGTH=COMPRESS(PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.),3.) '.'); ELSE IF T9 & V THEN DO; IF V9 & V99=0 THEN DO; LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1),3.) '.1'); ELSE IF V9 & V99 & V999=0 THEN DO; LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2),3.) '.2'); ELSE IF V9 & V99 & V999 THEN DO; LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3),3.) '.3'); ELSE IF COMP & T=0 THEN DO; IF T9 & V=0 THEN DO; LENGTH=COMPRESS('PIB' PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2,3.) '.'); ELSE IF COMP & T THEN DO; IF T9 & V=0 THEN DO; LENGTH=COMPRESS('PD' PUT((CEIL(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2)),3.) '.'); ELSE IF T9 & V THEN DO; IF V9 & V99=0 THEN DO; LENGTH=COMPRESS('PD' PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1)/2),3.) '.1'); ELSE IF V9 & V99 & V999=0 THEN DO; LENGTH=COMPRESS('PD' PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2)/2),3.) '.2'); ELSE IF V9 & V99 & V999 THEN DO; LENGTH=COMPRESS('PD' PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3)/2),3.) '.3'); 7