How to Reduce the Disk Space Required by a SAS Data Set

Size: px
Start display at page:

Download "How to Reduce the Disk Space Required by a SAS Data Set"

Transcription

1 How to Reduce the Disk Space Required by a SAS Data Set Selvaratnam Sridharma, U.S. Census Bureau, Washington, DC ABSTRACT SAS datasets can be large and disk space can often be at a premium. In this paper, SAS options like COMPRESS, SAS statements like LENGTH and ATTRIB statements, SAS View, and Macros are discussed as to how to reduce the size of a SAS dataset. An already developed %SQUEEZE macro can find the minimum lengths required for both numeric and character variables in a SAS dataset, and use these minimum lengths for the variables to reduce the size of the SAS dataset. Another macro %DROPMISS that is developed here can automatically identify and drop SAS variables that have only missing or null values. INTRODUCTION When storing a large data set, storage space can be exhausted. By reducing the size of a dataset by compressing the dataset using SAS COMPRESS= option, a large amount of storage space can be saved. Using SAS Views instead of SAS datasets, a large amount of storage space can be saved. Another way to reduce the size of a SAS data set is by saving only the needed variables in a dataset using DROP and/or KEEP options. Also, using LENGTH or ATTRIB statements to assign the minimum lengths that are required for the variables in a SAS dataset can reduce the size of a SAS data set. It is often difficult to find the minimum length required by a variable in a SAS data set. An already developed macro %SQUEEZE finds the minimum lengths required by the variables in a SAS data set and assigns the minimum lengths to these variables. The %SQUEEZE macro is modified slightly here to make it more efficient. Sometimes all the values of some variables in a SAS data set are missing and we would like to drop these variables to save storage space. Using the % DROPMISS macro that is developed here can do this. SAS SYSTEM OPTIONS / STATEMENTS Some SAS system options / statements such as LENGTH, ATTRIB, KEEP, DROP, and COMPRESS can be used to reduce the size required by a SAS data set. LENGTH AND ATTRIB Controlling the lengths of individual variables may greatly reduce the size of a SAS data set. LENGTH or ATTRIB statement can be used to assign a length to a numeric or a character variable. For character variables, the statement must occur in a data step before the first occurrences to the variables included in the statement. In a SAS data set, for integers and character variables with short values this may dramatically decrease the size of the data set. For character variables one byte corresponds to one character. Hence, to minimize storing space, set the length of each character variable to the number of characters in the longest value of the variable. The minimum length required by a numeric variable depends on the operating environment. Two examples are given below. 15 Data X; 16 Length a b c 3 17 d e 5 18 F g $4; 19 set Y; 20 run; 50 Data X; 51 Attrib a b c length=3; 52 Attrib d e length=5; 53 Attrib f g length=$4; 54 set Y; 55 run; The ATTRIB statement can also be used to change a variable's FORMAT, INFORMAT, and LABEL. 1

2 One needs to be careful when assigning the length of a numeric variable using the LENGTH statement. If the length assigned for a numeric variable is not adequate, some of the values of that variable will be truncated in the output data set. The statement will not generate an error. It is not advisable to change the lengths of non-integer variables because you can loose the precision of some of the non-integer values. When the length assigned is not adequate for a character variable, the length statement will generate an error. KEEP AND DROP When a SAS data set is created, only the needed variables should be kept. This could save a large amount of space required to store the dataset. This can be done using KEEP= and/or DROP= to delete the unnecessary variables. To save processing time, this should be done as early as logically possible as in the following example. 39 Data A (keep= a b q r); 40 Set B (drop = h k); 41 a= l+p; 42 b= r+q; 43 Run; SAS COMPRESS The COMPRESS= option is a SAS system option and a data set option that can be used to greatly reduce the disk space required to store a SAS data set. You can set the option to either YES or BINARY. In new versions CHAR can be used instead of YES. If there are more character variables than numeric variables, generally it is better to use COMPRESS = YES option. If there are more numeric variables than character variables, generally it is better to use COMPRESS = BINARY. But both options should be tried to find out which one works better. These options are used like as they are used in the following examples. 58 Data A (COMPRESS= YES); 59 SET SASHELP.EISMSG; 60 RUN; NOTE: There were 1470 observations read from the data set SASHELP.EISMSG. NOTE: The data set WORK.A has 1470 observations and 6 variables. NOTE: Compressing data set WORK.A decreased size by percent. Compressed is 15 pages; un-compressed would require 35 pages. 61 Data A (COMPRESS= BINARY); 62 SET SASHELP.EISMSG; 63 RUN; NOTE: There were 1470 observations read from the data set SASHELP.EISMSG. NOTE: The data set WORK.A has 1470 observations and 6 variables. NOTE: Compressing data set WORK.A decreased size by percent. Compressed is 16 pages; un-compressed would require 35 pages. When COMPRESS = option is used as a SAS system option in the beginning of a program, all the SAS datasets created by the program will be compressed. An option to use with COMPRESS= is REUSE= option. Specifying this option allows SAS to reuse space within the compressed SAS data set that has been freed by deleted observations. All compressed SAS data sets are uncompressed by SAS prior to being used in computations in the DATA or PROC steps. So, although compression saves disk space, it requires additional CPU time to compress and uncompress. Sometimes, compressing will result in a file larger than the uncompressed file if the uncompressed file is small. Beginning with version 8, SAS will not compress a SAS data set when the result would be a larger file. 72 Data A (COMPRESS= YES); 73 SET SASHELP.ACCPEO; 74 RUN; 2

3 NOTE: There were 20 observations read from the data set SASHELP.ACCPEO. NOTE: The data set WORK.A has 20 observations and 3 variables. NOTE: Compressing data set WORK.A increased size by percent. Compressed is 2 pages; un-compressed would require 1 pages. Here are some benchmark results for a large data set that is used at Census Bureau. The compression ratio is the ratio of the size of the compressed data set to the size of the uncompressed data set. COMPRESS OPTION SIZE (BYTES) COMPRESSION RATIO None 80,159, Binary 21,517, % Char 37,008, % SAS VIEW As an alternative to a SAS data set, one can use a SAS view. SAS Views provide all the functionality of a SAS data set. A SAS View contains only the instructions that are required for retrieving data values from other SAS data sets or files, and it occupies only a little fraction of the space required by the SAS data set. A SAS View can be created with data step or with a PROC SQL. Following is an example of a SAS View created with data step. 10 Data B /view = B; 11 set sashelp.eismsg; 12 run; NOTE: DATA STEP view saved on file WORK.B. A PROC SQL View can read data from DATA step Views, SAS data sets, other PROC SQL views, ORACLE or other DBMS data. 62 Proc sql; 63 Create view AB as 64 select var1, var2, var3 65 from A 66 order by var3, var4; NOTE: SQL view WORK.AB has been defined. 67 quit; In the above example A can be a SAS view, SAS data set, ORACLE table or any other DBMS table. Starting with Version 8, DATA step View retains source statements. One can retrieve these statements as in the following example. 32 data view=b; 33 describe; 34 run; NOTE: DATA step view WORK.B is defined as: data B/view=B; set sashelp.eismsg; run; To retrieve the source statements for an SQL View, one needs to use SQL as in the following example. 3

4 68 proc sql; 69 describe view AB; NOTE: SQL view WORK.AB is defined as: select var1, var2, var3 from A order by var3 asc, var4 asc; 70 quit; SQUEEZING A SAS DATA SET When a large dataset is created, most often it is difficult to find the minimum length required by an individual variable. The following macros can find the minimum lengths required by numeric or character variables for a SAS data set and use these lengths to reduce the size the data set. These macros could greatly reduce the storage space required by a SAS dataset. %SQUEEZE %SQUEEZE macro created by Ross Bettinger (see Reference 1) can squeeze a data set by reducing the space required by numeric and character variables. If you do not want to squeeze some variables, you have the option of doing so by using a parameter in the %SQUEEZE macro. If you do not include the highlighted part of the code in Appendix A, you would have the code for %SQUEEZE macro. %SQUEEZE_1 The %SQUEEZE macro is modified slightly here to create a macro %SQUEEEZE_1. %SQUEEZE macro checks all numeric variables to find the minimum lengths required by these variables by repeated use of TRUNC function on each and every value of these variables. You do not need to find the minimum length of a numeric variable if its length is already three, and sometimes you do not need to apply the TRUNC function on each and every value of a numeric variable. %SQUEEZE_1 incorporates these improvements. This macro generally runs faster than %SQUEEZE and it runs at least as fast as %SQUEEZE. The squeezing technique that is discussed here may be used on integer valued numeric variable, but should not be used on non-integer valued numeric variables. If you use this technique on non-integer valued numeric variables, you might lose some accuracy for these variables. The code for this macro is given in Appendix A. COMPARING %SQUEEZE AND %SQUEEZE_1 %SQUEEZE_1 SIZE (bytes) SQUEEZED RATIO No 24,969, Yes 21,515, % %SQUEEZE and %SQUEEZE_1 are used to squeeze some large SAS data set of size bytes to come with the results below. Macros TIME (minutes) %SQUEEZE 126 %SQUEEZE_1 108 DROPING VARIABLES WITH ONLY MISSING VALUES When all the values in some numeric or character variables are missing, deleting these variables can save a large amount of disk space. But sometimes you want to keep some variables even though all the values for these variables are missing. A macro %DROPMISS (see Appendix B) that is developed here will automatically drop the variables in a data set that have always missing values. You have the option of not dropping the variables you do not want to drop by using a parameter in the %DROPMISS macro. This macro is more efficient than the program in Reference 2. The table below gives the results for both programs for a SAS data set. 4

5 Programs SIZE (bytes) TIME (minutes) Program in Sample , %DROPMISS 567, COMBINING %SQUEEZE_1 AND %DROPMISS Combining %SQUEEZE_1 and %DROPMISS, another macro %SQ_DROPMISS (see Appendix C) is created. To save processing time, instead of using %SQUEEZE_1 and %DROPMISS on a SAS data set, it would be better to use %SQ_DROPMISS. When the methods in %SQUEEZE are used on a SAS data set, these methods squeeze the lengths of the character variables that are always missing to 1, and squeeze the lengths of the numeric variables that are always missing to 3. So, after applying the methods in the %SQUEEZE macro to SAS data set, to drop the variables that have all the values missing we need to check only the character variables with length 1 and numeric variables with length 3 for the variables that have always missing values. This saves a great amount of processing time. CONCLUSIONS There are many ways to reduce the size of a data set that you want to store. Some SAS options and SAS statements, SAS Views, and some macros discussed in this paper can be used to reduce the space required by a SAS data set. Instead of using %SQUEEZE_1 and %DROPMISS for a data set, it would be better to use the macro %SQ_DROPMISS to save processing time. REFERENCES 1. Ross Bettinger, Sample 267: %SQUEEZE-ing before Compressing Data, Redux. 7 Jul < 2. Sample 53: Delete variables that have only missing data. 7 Jul < ACKNOWLEDGMENTS We would like to thank David Chapman for offering valuable suggestions and comments. SAS is a Registered Trademark of the SAS Institute, Inc. of Cary, North Carolina. DISCLAIMER This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a more limited review by the Census Bureau than its official publications. This report is released to inform interested parties and to encourage discussion. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Selvaratnam Sridharma Economic Planning and Coordination Division U.S. Bureau of the Census Washington, DC selvaratnam.sridharma@census.gov 5

6 APPENDIX A: %SQUEEZE_1 %macro SQUEEZE_1( DSNIN /* name of input SAS dataset */, DSNOUT /* name of output SAS dataset */, NOCOMPRESS= /* [optional] variables to be omitted from the minimum-length computation process */ ); /* PURPOSE: create LENGTH statement for vars that minimizes the variable length * to: * numeric vars: the fewest # of bytes needed to exactly represent the values * contained in the variable * character vars: the fewest # of bytes needed to contain the longest * character string * * macro variable SQZLENTH is created which is then invoked in a subsequent * data step * * NOTE: if no char vars in dataset, produce no char var processing code * NOTE: length of format for char vars is changed to match computed length * of char var * e.g., if length( CHAR_VAR ) = 10 after %SQUEEZE-ing, then FORMAT CHAR_VAR * $10. ; is generated * NOTE: variables in &DSNOUT are maintained in same order as in &DSNIN * NOTE: variables named in &NOCOMPRESS are not included in the minimum- * length computation process and keep their original lengths as specified in * &DSNIN * * EXAMPLE OF USE: * %SQUEEZE( DSNIN, DSNOUT ) * %SQUEEZE( DSNIN, DSNOUT, NOCOMPRESS=A B C D--H X1-X100 ) * %SQUEEZE( DSNIN, DSNOUT, NOCOMPRESS=_numeric_ ) * %SQUEEZE( DSNIN, DSNOUT, NOCOMPRESS=_character_ ) */ %global SQUEEZE ; %local I ; %if "&DSNIN" = "&DSNOUT" %then %do ; %put / \ ; %put ERROR from SQUEEZE: ; %put Input Dataset has same name as Output Dataset. ; %put Execution terminating forthwith. ; %put \ / ; %goto L9999 ; /*###############################################################################*/ /* begin executable code /*###############################################################################*/ 6

7 /* Find the first positive integer n such that n+1 needs more than 3 bytes /* Negative of this number will be the first negative integer n such that n-1 /* needs more than 3 bytes data x; do i=1 to 10000; a=trunc(i,3); if a ^=i then do; call symput ('max_3', a); output; stop; end; end; run; /* create dataset of variable names whose lengths are to be minimized /* exclude from the computation all names in &NOCOMPRESS proc contents data=&dsnin( drop=&nocompress ) memtype=data noprint out=_cntnts_( keep= name type LENGTH) ; run ; %let N_CHAR = 0 ; %let N_NUM = 0 ; data _null_ ; set _cntnts_ end=lastobs nobs=nobs ; WHERE (TYPE =1 AND LENGTH ^= 3) OR (TYPE =2 AND LENGTH ^=1); if nobs = 0 then stop ; n_char + ( type = 2 ) ; n_num + ( type = 1 ) ; /* create macro vars containing final # of char, numeric variables */ if lastobs then do ; call symput( 'N_CHAR', left( put( n_char, 5. ))) ; call symput( 'N_NUM', left( put( n_num, 5. ))) ; end ; run ; /* if there are NO numeric or character vars in dataset, stop further /* processing %if %eval( &N_NUM + &N_CHAR ) = 0 %then %do ; %put / \ ; %put ERROR from SQUEEZE: ; %put No variables in dataset. ; %put Execution terminating forthwith. ; %put \ / ; %goto L9999 ; /* put global macro names into global symbol table for later retrieval %do I = 1 %to &N_NUM ; %global NUM&I NUMLEN&I ; %do I = 1 %to &N_CHAR ; %global CHAR&I CHARLEN&I ; 7

8 /* create macro vars containing variable names /* efficiency note: could compute n_char, n_num here, but must declare macro /* names to be global b4 stuffing them /* note: if no char vars in data, do not create macro vars proc sql noprint ; %if &N_CHAR > 0 %then %str( select name into :CHAR1 - :CHAR&N_CHAR from _cntnts_ where type = 2 AND LENGTH NE 1; ) ; %if &N_NUM > 0 %then %str( select name int o :NUM1 - :NUM&N_NUM from _cntnts_ where type = 1 AND LENGTH NE 3; ) ; quit ; /* compute min # bytes (3 = min length, for portability over platforms) for /* numeric vars compute min # bytes to keep rightmost character for char vars data _null_ ; set &DSNIN end=lastobs ; %if &N_NUM > 0 %then %str ( array _num_len_ ( &N_NUM ) 3 _temporary_ ; ) ; %if &N_CHAR > 0 %then %str( array _char_len_ ( &N_CHAR ) _temporary_ ; ) ; if _n_ = 1 then do; %if &N_CHAR > 0 %then %str( do i = 1 to &N_CHAR ; _char_len_( i ) = 0 ; end ; ) ; %if &N_NUM > 0 %then %str( do i = 1 to &N_NUM ; _num_len_ ( i ) = 3 ; end ; ) ; end ; %if &N_CHAR > 0 %then %do I = 1 %to &N_CHAR ; _char_len_( &I ) = max( _char_len_( &I ), length( &&CHAR&I )) ; %if &N_NUM > 0 %then %do I = 1 %to &N_NUM ; if &&NUM&I ne. THEN DO; IF ( &&NUM&I > &max_3 OR &&NUM&I < -&max_3) THEN DO; if &&NUM&I ne trunc( &&NUM&I, 7 ) then _num_len_( &I ) = max( _num_len_( &I ), 8 ) ; else if &&NUM&I ne trunc( &&NUM&I, 6 ) then _num_len_( &I ) = max( _num_len_( &I ), 7 ) ; else if &&NUM&I ne trunc( &&NUM&I, 5 ) then _num_len_( &I ) = max( _num_len_( &I ), 6 ) ; else if &&NUM&I ne trunc( &&NUM&I, 4 ) then _num_len_( &I ) = max( _num_len_( &I ), 5 ) ; else if &&NUM&I ne trunc( &&NUM&I, 3 ) then _num_len_( &I ) = max( _num_len_( &I ), 4 ) ; end ; end; if lastobs then do ; %if &N_CHAR > 0 %then %do I = 1 %to &N_CHAR ; call symput( "CHARLEN&I", put( _char_len_( &I ), 5. )) ; %if &N_NUM > 0 %then %do I = 1 %to &N_NUM ; call symput( "NUMLEN&I", put( _num_len_( &I ), 1. )) ; end ; run ; 8

9 proc datasets nolist ; delete _cntnts_ ; run ; /* initialize SQZ_NUM, SQZ_CHAR global macro vars %let SQZ_NUM = LENGTH ; %let SQZ_CHAR = LENGTH ; %let SQZ_CHAR_FMT = FORMAT ; %if &N_CHAR > 0 %then %do I = 1 %to &N_CHAR ; %let SQZ_CHAR = &SQZ_CHAR %qtrim( &&CHAR&I ) $%left( &&CHARLEN&I ) ; %let SQZ_CHAR_FMT = &SQZ_CHAR_FMT %qtrim( &&CHAR&I ) $%left( &&CHARLEN&I ). ; %if &N_NUM > 0 %then %do I = 1 %to &N_NUM ; %let SQZ_NUM = &SQZ_NUM %qtrim( &&NUM&I ) &&NUMLEN&I ; /* build macro var containing order of all variables data _null_ ; length retain $32767 ; retain retain 'retain ' ; dsid = open( "&DSNIN", 'I' ) ; /* open dataset for read access only */ do _i_ = 1 to attrn( dsid, 'nvars' ) ; retain = trim( retain ) ' ' varname( dsid, _i_ ) ; end ; call symput( 'RETAIN', retain ) ; run ; /* apply SQZ_* to incoming data, create output dataset data &DSNOUT ; &RETAIN ; %if &N_CHAR > 0 %then %str( &SQZ_CHAR ; ); /* optimize char var lengths */ %if &N_NUM > 0 %then %str( &SQZ_NUM ; ); /* optimize numeric var lengths */ %if &N_CHAR > 0 %then %str( &SQZ_CHAR_FMT ; ) ; /* adjust char var format lengths */ set &DSNIN ; run ; %L9999: %mend SQUEEZE_1 ; 9

10 APPENDIX B: %DROPMISS %macro DROPMISS( DSNIN /* name of input SAS dataset */, DSNOUT /* name of output SAS dataset */, NODROP= /* [optional] variables to be omitted from dropping even if they have only missing values */ ) ; /* PURPOSE: To find both Character and Numeric the variables that have only * missing values and drop them if they are not in &NONDROP * * NOTE: if no char vars in dataset, produce no char var processing code * * EXAMPLE OF USE: * %DROP1( DSNIN, DSNOUT ) * %DROP1( DSNIN, DSNOUT, NODROP=A B C D--H X1-X100 ) * %DROP1( DSNIN, DSNOUT, NODROP=_numeric_ ) * %DROP1( DSNIN, DSNOUT, NOdrop=_character_ ) */ %global DROP1 ; %local I ; %if "&DSNIN" = "&DSNOUT" %then %do ; %put / \ ; %put ERROR from DROPMISS: ; %put Input Dataset has same name as Output Dataset. ; %put Execution terminating forthwith. ; %put \ / ; %goto L9999 ; /*###############################################################################*/ /* begin executable code /*###############################################################################*/ /* create dataset of variable names that have only missing values /* exclude from the computation all names in &NODROP proc contents data=&dsnin( drop=&nodrop ) memtype=data noprint out= _cntnts_( keep= name type ) ; run ; %let N_CHAR = 0 ; %let N_NUM = 0 ; data _null_ ; set _cntnts_ end=lastobs nobs=nobs ; if nobs = 0 then stop ; n_char + ( type = 2 ) ; n_num + ( type = 1 ) ; /* create macro vars containing final # of char, numeric variables */ if lastobs then do ; call symput( 'N_CHAR', left( put( n_char, 5. ))) ; call symput( 'N_NUM', left( put( n_num, 5. ))) ; end ; 10

11 run ; /* if there are NO numeric or character vars in dataset, stop further */ %if %eval( &N_NUM + &N_CHAR ) = 0 %then %do ; %put / \ ; %put ERROR from DROP1: ; %put No variables in dataset. ; %put Execution terminating forthwith. ; %put \ / ; %goto L9999 ; /* put global macro names into global symbol table for later retrieval */ %do I = 1 %to &N_NUM ; %global NUM&I ; %do I = 1 %to &N_CHAR ; %global CHAR&I ; /* create macro vars containing variable names /* efficiency note: could compute n_char, n_num here, but must declare macro /* names to be global b4 stuffing them /* note: if no char vars in data, do not create macro vars proc sql noprint ; %if &N_CHAR > 0 %then %str( select name into :CHAR1 - :CHAR&N_CHAR from _cntnts_ where type = 2 ; ) ; %if &N_NUM > 0 %then %str( select name into :NUM1 - :NUM&N_NUM _cntnts_ where type = 1 ; ) ; quit ; from /* put MAXIMUM values of the variables into macro variables %IF &N_CHAR > 1 %THEN %let N_CHAR_1 = %EVAL(&N_CHAR - 1); %IF &N_NUM > 1 %THEN %let N_NUM_1 = %EVAL(&N_NUM - 1); Proc sql ; %IF &N_NUM >1 %THEN %DO; %do I= 1 %to &N_NUM_1; max (&&NUM&I), %IF &N_NUM > 0 %THEN %DO; MAX(&&NUM&N_NUM) %IF &N_CHAR >0 AND &N_NUM >0 %THEN %DO;, %IF &N_CHAR > 1 %THEN %DO; %do I= 1 %to &N_CHAR_1; max(&&char&i), 11

12 %IF &N_CHAR >0 %THEN %DO; MAX(&&CHAR&N_CHAR) into %IF &N_NUM > 1 %THEN %DO; %do I= 1 %to &N_NUM_1; :NUMMAX&I, %IF &N_NUM > 0 %THEN %DO; :NUMMAX&N_NUM %IF &N_CHAR> 0 AND &N_NUM >0 %THEN %DO;, %IF &N_CHAR > 1 %THEN %DO; %do I= 1 %to &N_CHAR_1; :CHARMAX&I, %IF &N_CHAR > 0 %THEN %DO;:CHARMAX&N_CHAR from &DSNIN; /* initialize DROP_NUM, DROP_CHAR global macro vars %let DROP_NUM = ; %let DROP_CHAR = ; %if &N_CHAR > 0 %THEN %DO; %do I = 1 %to &N_CHAR ; %IF &&CHARMAX&I = %THEN %DO; %let DROP_CHAR = &DROP_CHAR %qtrim( &&CHAR&I ) ; %if &N_NUM > 0 %THEN %DO; %do I = 1 %to &N_NUM ; %IF &&NUMMAX&I =. %THEN %DO; %let DROP_NUM = &DROP_NUM %qtrim( &&NUM&I ) ; %End ; /* apply SQZ_* to incoming data, create output dataset */ data &DSNOUT ; %if &DROP_CHAR ^= %then %str( DROP &DROP_CHAR ; ) ; /* drop char variables that have only missing values */ %if &DROP_NUM ^= %then %str( DROP &DROP_NUM ; ) ; /* drop num variables that have only missing values */ set &DSNIN ; run ; %L9999: %mend DROPMISS ; 12

13 APPENDIX C: %SQ_DROPMISS OPTIONS MPRINT MLOGIC MSYMBOLGEN; %macro SQDROPMISS( DSNIN /* name of input SAS dataset */, DSNOUT /* name of output SAS dataset */, NOCOMPRESS= /* [optional] variables to be omitted from the minimum-length computation process */, NODROP= /* [optional] variables to be omitted from droping even if they have only missing values ) ; /* PURPOSE: Squeeze a data set to have minimum lengths required for the * variables excluding the variables in &NOCOMPRESS applying %SQUEEZE_1 and * then DROP the variables that have always missing values in a more * efficient way. * * EXAMPLE OF USE: * %SQ_DROPMISS( DSNIN, DSNOUT, NOCOMPRESS= ) * %SQ_DROPMISS( DSNIN, DSNOUT, NOCOMPRESS=A B C D--H X1-X100 ) * %SQ_DROPMISS( DSNIN, DSNOUT, NOCOMPRESS=_numeric_ ) * %SQ_DROPMISS DSNIN, DSNOUT, NOCOMPRESS=_character_ * %SQ_DROPMISS DSNIN, DSNOUT, NOCOMPRESS=_character_, NONDROP= A C D) */ /*###############################################################################*/ /* begin executable code /*###############################################################################*/ /* Squeezing part /* Include the code for the macro %SQUEEZE_1 here */ %SQUEEZE_1 (&DSNIN, DSNSQUEEZED, &NOCOMPRESS); /* Dropping part %global DROP1 ; %local I ; %if "&DSNIN" = "&DSNOUT" %then %do ; %put / \ ; %put ERROR from DROPMISS: ; %put Input Dataset has same name as Output Dataset. ; %put Execution terminating forthwith. ; %put \ / ; %goto L9999 ; /* create dataset of variable names that have only missing values /* exclude from the computation all names in &NODROP proc contents data=dsnsqueezed( drop=&nodrop ) memtype=data noprint out= _cntnts_( keep= name type length) ; run ; 13

14 %let N_CHAR = 0 ; %let N_NUM = 0 ; data _null_ ; set _cntnts_ end=lastobs nobs=nobs ; where (type =1 and length =3) or (type=2 and length =1); if nobs = 0 then stop ; n_char + ( type = 2 ) ; n_num + ( type = 1 ) ; /* create macro vars containing final # of char, numeric variables */ if lastobs then do ; call symput( 'N_CHAR', left( put( n_char, 5. ))) ; call symput( 'N_NUM', left( put( n_num, 5. ))) ; end ; run ; /* if there are NO numeric or character vars in dataset, stop further */ %if %eval( &N_NUM + &N_CHAR ) = 0 %then %do ; %put / \ ; %put ERROR from DROP1: ; %put No variables in dataset to drop. ; %put Execution terminating forthwith. ; %put \ / ; %goto L9999 ; /* put global macro names into global symbol table for later retrieval */ %do I = 1 %to &N_NUM ; %global NUM&I ; %do I = 1 %to &N_CHAR ; %global CHAR&I ; /* create macro vars containing variable names /* efficiency note: could compute n_char, n_num here, but must declare macro /* names to be global b4 stuffing them /* note: if no char vars in data, do not create macro vars proc sql noprint ; %if &N_CHAR > 0 %then %str( select name into :CHAR1 - :CHAR&N_CHAR from _cntnts_ where type = 2 ; ) ; %if &N_NUM > 0 %then %str( select name into :NUM1 - :NUM&N_NUM from _cntnts_ where type = 1 ; ) ; quit ; /* put MAXIMUM values of the variables into macro variables %IF &N_CHAR > 1 %THEN %let N_CHAR_1 = %EVAL(&N_CHAR - 1); %IF &N_NUM > 1 %THEN %let N_NUM_1 = %EVAL(&N_NUM - 1); 14

15 Proc sql ; select %IF &N_NUM >1 %THEN %DO; %do I= 1 %to &N_NUM_1; max (&&NUM&I), %IF &N_NUM > 0 %THEN %DO; MAX(&&NUM&N_NUM) %IF &N_CHAR >0 AND &N_NUM >0 %THEN %DO;, %IF &N_CHAR > 1 %THEN %DO; %do I= 1 %to &N_CHAR_1; max(&&char&i), %IF &N_CHAR >0 %THEN %DO; MAX(&&CHAR&N_CHAR) into %IF &N_NUM > 1 %THEN %DO; %do I= 1 %to &N_NUM_1; :NUMMAX&I, %IF &N_NUM > 0 %THEN %DO; :NUMMAX&N_NUM %IF &N_CHAR> 0 AND &N_NUM >0 %THEN %DO;, %IF &N_CHAR > 1 %THEN %DO; %do I= 1 %to &N_CHAR_1; :CHARMAX&I, %IF &N_CHAR > 0 %THEN %DO;:CHARMAX&N_CHAR from &DSNIN; quit; /* initialize DROP_NUM, DROP_CHAR global macro vars %let DROP_NUM = ; %let DROP_CHAR = ; %if &N_CHAR > 0 %THEN %DO; %do I = 1 %to &N_CHAR ; %IF &&CHARMAX&I = %THEN %DO; %let DROP_CHAR = &DROP_CHAR %qtrim( &&CHAR&I ) ; %if &N_NUM > 0 %THEN %DO; %do I = 1 %to &N_NUM ; %IF &&NUMMAX&I =. %THEN %DO; %let DROP_NUM = &DROP_NUM %qtrim( &&NUM&I ) ; 15

16 %End ; /* apply Drop_* to incoming data, create output dataset */ data &DSNOUT ; %if &DROP_CHAR ^= %then %str( DROP &DROP_CHAR ; ) ; /* drop char variables that have only missing values */ %if &DROP_NUM ^= %then %str( DROP &DROP_NUM ;) ; /* drop num variables that have only missing values */ set DSNSQUEEZED ; run ; %L9999: %mend SQDROPMISS ; 16

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although

More information

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn); SAS-INTERVIEW QUESTIONS 1. What SAS statements would you code to read an external raw data file to a DATA step? Ans: Infile and Input statements are used to read external raw data file to a Data Step.

More information

Preparing Real World Data in Excel Sheets for Statistical Analysis

Preparing Real World Data in Excel Sheets for Statistical Analysis Paper DM03 Preparing Real World Data in Excel Sheets for Statistical Analysis Volker Harm, Bayer Schering Pharma AG, Berlin, Germany ABSTRACT This paper collects a set of techniques of importing Excel

More information

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY Paper FF-007 Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY ABSTRACT SAS datasets include labels as optional variable attributes in the descriptor

More information

An Approach to Creating Archives That Minimizes Storage Requirements

An Approach to Creating Archives That Minimizes Storage Requirements Paper SC-008 An Approach to Creating Archives That Minimizes Storage Requirements Ruben Chiflikyan, RTI International, Research Triangle Park, NC Mila Chiflikyan, RTI International, Research Triangle Park,

More information

Programming Idioms Using the SET Statement

Programming Idioms Using the SET Statement Programming Idioms Using the SET Statement Jack E. Fuller, Trilogy Consulting Corporation, Kalamazoo, MI ABSTRACT While virtually every programmer of base SAS uses the SET statement, surprisingly few programmers

More information

More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make

More information

Data Presentation. Paper 126-27. Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

Data Presentation. Paper 126-27. Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs Paper 126-27 Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs Tugluke Abdurazak Abt Associates Inc. 1110 Vermont Avenue N.W. Suite 610 Washington D.C. 20005-3522

More information

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL The SET Statement and Beyond: Uses and Abuses of the SET Statement S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT The SET statement is one of the most frequently used statements in the SAS System.

More information

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms.

More information

SUGI 29 Coders' Corner

SUGI 29 Coders' Corner Paper 074-29 Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 19 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users

More information

9.1 SAS. SQL Query Window. User s Guide

9.1 SAS. SQL Query Window. User s Guide SAS 9.1 SQL Query Window User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2004. SAS 9.1 SQL Query Window User s Guide. Cary, NC: SAS Institute Inc. SAS

More information

Storing and Using a List of Values in a Macro Variable

Storing and Using a List of Values in a Macro Variable Storing and Using a List of Values in a Macro Variable Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT When using the macro language it is not at all unusual to need

More information

CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases

CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases 3 CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases About This Document 3 Methods for Accessing Relational Database Data 4 Selecting a SAS/ACCESS Method 4 Methods for Accessing DBMS Tables

More information

3.GETTING STARTED WITH ORACLE8i

3.GETTING STARTED WITH ORACLE8i Oracle For Beginners Page : 1 3.GETTING STARTED WITH ORACLE8i Creating a table Datatypes Displaying table definition using DESCRIBE Inserting rows into a table Selecting rows from a table Editing SQL buffer

More information

A Macro to Create Data Definition Documents

A Macro to Create Data Definition Documents A Macro to Create Data Definition Documents Aileen L. Yam, sanofi-aventis Inc., Bridgewater, NJ ABSTRACT Data Definition documents are one of the requirements for NDA submissions. This paper contains a

More information

Remove Voided Claims for Insurance Data Qiling Shi

Remove Voided Claims for Insurance Data Qiling Shi Remove Voided Claims for Insurance Data Qiling Shi ABSTRACT The purpose of this study is to remove voided claims for insurance claim data using SAS. Suppose that for these voided claims, we don t have

More information

Coders' Corner. Paper 81-26

Coders' Corner. Paper 81-26 Paper 81-26 Automating the Process of Listing the Most Frequent Values of Thousands of Variables in Large Datasets Haiping Luo, Dept. of Veterans Affairs, Washington, DC Philip Friend, Dept. of Agriculture,

More information

Beginning Tutorials. bt009 A TUTORIAL ON THE SAS MACRO LANGUAGE John J. Cohen AstraZeneca LP

Beginning Tutorials. bt009 A TUTORIAL ON THE SAS MACRO LANGUAGE John J. Cohen AstraZeneca LP bt009 A TUTORIAL ON THE SAS MACRO LANGUAGE John J. Cohen AstraZeneca LP Abstract The SAS Macro language is another language that rests on top of regular SAS code. If used properly, it can make programming

More information

Nine Steps to Get Started using SAS Macros

Nine Steps to Get Started using SAS Macros Paper 56-28 Nine Steps to Get Started using SAS Macros Jane Stroupe, SAS Institute, Chicago, IL ABSTRACT Have you ever heard your coworkers rave about macros? If so, you've probably wondered what all the

More information

Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois

Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data

More information

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make

More information

Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX

Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX Paper 126-29 Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX ABSTRACT This hands-on workshop shows how to use the SAS Macro Facility

More information

Advanced Tutorials. Numeric Data In SAS : Guidelines for Storage and Display Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD

Advanced Tutorials. Numeric Data In SAS : Guidelines for Storage and Display Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Numeric Data In SAS : Guidelines for Storage and Display Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD ABSTRACT Understanding how SAS stores and displays numeric data is essential

More information

A Faster Index for sorted SAS Datasets

A Faster Index for sorted SAS Datasets A Faster Index for sorted SAS Datasets Mark Keintz Wharton Research Data Services, Philadelphia PA ABSTRACT In a NESUG 2007 paper with Shuguang Zhang, I demonstrated a condensed index which provided significant

More information

Managing Tables in Microsoft SQL Server using SAS

Managing Tables in Microsoft SQL Server using SAS Managing Tables in Microsoft SQL Server using SAS Jason Chen, Kaiser Permanente, San Diego, CA Jon Javines, Kaiser Permanente, San Diego, CA Alan L Schepps, M.S., Kaiser Permanente, San Diego, CA Yuexin

More information

The Power of CALL SYMPUT DATA Step Interface by Examples Yunchao (Susan) Tian, Social & Scientific Systems, Inc., Silver Spring, MD

The Power of CALL SYMPUT DATA Step Interface by Examples Yunchao (Susan) Tian, Social & Scientific Systems, Inc., Silver Spring, MD Paper 052-29 The Power of CALL SYMPUT DATA Step Interface by Examples Yunchao (Susan) Tian, Social & Scientific Systems, Inc., Silver Spring, MD ABSTRACT AND INTRODUCTION CALL SYMPUT is a SAS language

More information

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC Using SAS With a SQL Server Database M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC ABSTRACT Many operations now store data in relational databases. You may want to use SAS

More information

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health INTRODUCTION There are a number of SAS tools that you may never have to use. Why? The main reason

More information

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular

More information

The SAS Data step/macro Interface

The SAS Data step/macro Interface Paper TS09 The SAS Data step/macro Interface Lawrence Heaton-Wright, Quintiles, Bracknell, Berkshire, UK ABSTRACT The SAS macro facility is an extremely useful part of the SAS System. However, macro variables

More information

The entire SAS code for the %CHK_MISSING macro is in the Appendix. The full macro specification is listed as follows: %chk_missing(indsn=, outdsn= );

The entire SAS code for the %CHK_MISSING macro is in the Appendix. The full macro specification is listed as follows: %chk_missing(indsn=, outdsn= ); Macro Tabulating Missing Values, Leveraging SAS PROC CONTENTS Adam Chow, Health Economics Resource Center (HERC) VA Palo Alto Health Care System Department of Veterans Affairs (Menlo Park, CA) Abstract

More information

That Mysterious Colon (:) Haiping Luo, Dept. of Veterans Affairs, Washington, DC

That Mysterious Colon (:) Haiping Luo, Dept. of Veterans Affairs, Washington, DC Paper 73-26 That Mysterious Colon (:) Haiping Luo, Dept. of Veterans Affairs, Washington, DC ABSTRACT The colon (:) plays certain roles in SAS coding. Its usage, however, is not well documented nor is

More information

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc

More information

Subsetting Observations from Large SAS Data Sets

Subsetting Observations from Large SAS Data Sets Subsetting Observations from Large SAS Data Sets Christopher J. Bost, MDRC, New York, NY ABSTRACT This paper reviews four techniques to subset observations from large SAS data sets: MERGE, PROC SQL, user-defined

More information

Search and Replace in SAS Data Sets thru GUI

Search and Replace in SAS Data Sets thru GUI Search and Replace in SAS Data Sets thru GUI Edmond Cheng, Bureau of Labor Statistics, Washington, DC ABSTRACT In managing data with SAS /BASE software, performing a search and replace is not a straight

More information

Writing cleaner and more powerful SAS code using macros. Patrick Breheny

Writing cleaner and more powerful SAS code using macros. Patrick Breheny Writing cleaner and more powerful SAS code using macros Patrick Breheny Why Use Macros? Macros automatically generate SAS code Macros allow you to make more dynamic, complex, and generalizable SAS programs

More information

A Technique for Storing and Manipulating Incomplete Dates in a Single SAS Date Value

A Technique for Storing and Manipulating Incomplete Dates in a Single SAS Date Value A Technique for Storing and Manipulating Incomplete Dates in a Single SAS Date Value John Ingersoll Introduction: This paper presents a technique for storing incomplete date values in a single variable

More information

Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison

Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison Assumptions That you have used SAS at least a few times. It doesn t matter whether you run SAS in

More information

Project Request and Tracking Using SAS/IntrNet Software Steven Beakley, LabOne, Inc., Lenexa, Kansas

Project Request and Tracking Using SAS/IntrNet Software Steven Beakley, LabOne, Inc., Lenexa, Kansas Paper 197 Project Request and Tracking Using SAS/IntrNet Software Steven Beakley, LabOne, Inc., Lenexa, Kansas ABSTRACT The following paper describes a project request and tracking system that has been

More information

Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments

Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments David A. Mabey, Reader s Digest Association Inc., Pleasantville, NY ABSTRACT When SAS applications are driven by data-generated

More information

A Method for Cleaning Clinical Trial Analysis Data Sets

A Method for Cleaning Clinical Trial Analysis Data Sets A Method for Cleaning Clinical Trial Analysis Data Sets Carol R. Vaughn, Bridgewater Crossings, NJ ABSTRACT This paper presents a method for using SAS software to search SAS programs in selected directories

More information

Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached.

Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached. Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached. Nitin Gupta, Tailwind Associates, Schenectady, NY ABSTRACT This paper describes a method to run discreet macro(s)

More information

TECHNICAL UNIVERSITY OF CRETE DATA STRUCTURES FILE STRUCTURES

TECHNICAL UNIVERSITY OF CRETE DATA STRUCTURES FILE STRUCTURES TECHNICAL UNIVERSITY OF CRETE DEPT OF ELECTRONIC AND COMPUTER ENGINEERING DATA STRUCTURES AND FILE STRUCTURES Euripides G.M. Petrakis http://www.intelligence.tuc.gr/~petrakis Chania, 2007 E.G.M. Petrakis

More information

Applications Development

Applications Development Paper 45-25 Building an Audit and Tracking System Using SAS/AF and SCL Hung X Phan, U.S. Census Bureau ABSTRACT This paper describes how to build an audit and tracking system using SAS/AF and SCL. There

More information

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT Paper AD01 Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT ABSTRACT The use of EXCEL spreadsheets is very common in SAS applications,

More information

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: +966 12 739 894 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training is designed to

More information

Integrating Data and Business Rules with a Control Data Set in SAS

Integrating Data and Business Rules with a Control Data Set in SAS Paper 3461-2015 Integrating Data and Business Rules with a Data Set in SAS Edmond Cheng, CACI International Inc. ABSTRACT In SAS software development, data specifications and process requirements can be

More information

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

From The Little SAS Book, Fifth Edition. Full book available for purchase here. From The Little SAS Book, Fifth Edition. Full book available for purchase here. Acknowledgments ix Introducing SAS Software About This Book xi What s New xiv x Chapter 1 Getting Started Using SAS Software

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

Keywords are identifiers having predefined meanings in C programming language. The list of keywords used in standard C are : unsigned void

Keywords are identifiers having predefined meanings in C programming language. The list of keywords used in standard C are : unsigned void 1. Explain C tokens Tokens are basic building blocks of a C program. A token is the smallest element of a C program that is meaningful to the compiler. The C compiler recognizes the following kinds of

More information

1 Files to download. 3 A macro to list out-of-range data values. 2 Reading in the example data file. 22S:172 Lab session 9 Macros for data cleaning

1 Files to download. 3 A macro to list out-of-range data values. 2 Reading in the example data file. 22S:172 Lab session 9 Macros for data cleaning 1 2 22S:172 Lab session 9 Macros for data cleaning July 20, 2005 GENDER VISIT HR SBP DBP DX AE = "Gender" = "Visit Date" = "Heart Rate" = "Systolic Blood Pressure" = "Diastolic Blood Pressure" = "Diagnosis

More information

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T) Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating

More information

Informatica e Sistemi in Tempo Reale

Informatica e Sistemi in Tempo Reale Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)

More information

Financial Data Access with SQL, Excel & VBA

Financial Data Access with SQL, Excel & VBA Computational Finance and Risk Management Financial Data Access with SQL, Excel & VBA Guy Yollin Instructor, Applied Mathematics University of Washington Guy Yollin (Copyright 2012) Data Access with SQL,

More information

The Power of PROC DATASETS Lisa M. Davis, Blue Cross Blue Shield of Florida, Jacksonville, Florida

The Power of PROC DATASETS Lisa M. Davis, Blue Cross Blue Shield of Florida, Jacksonville, Florida Paper #TU20 The Power of PROC DATASETS Lisa M. Davis, Blue Cross Blue Shield of Florida, Jacksonville, Florida ABSTRACT The DATASETS procedure can be used to do many functions that are normally done within

More information

The programming language C. sws1 1

The programming language C. sws1 1 The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan

More information

Tips, Tricks, and Techniques from the Experts

Tips, Tricks, and Techniques from the Experts Tips, Tricks, and Techniques from the Experts Presented by Katie Ronk 2997 Yarmouth Greenway Drive, Madison, WI 53711 Phone: (608) 278-9964 Web: www.sys-seminar.com Systems Seminar Consultants, Inc www.sys-seminar.com

More information

9 Control Statements. 9.1 Introduction. 9.2 Objectives. 9.3 Statements

9 Control Statements. 9.1 Introduction. 9.2 Objectives. 9.3 Statements 9 Control Statements 9.1 Introduction The normal flow of execution in a high level language is sequential, i.e., each statement is executed in the order of its appearance in the program. However, depending

More information

Applications Development ABSTRACT PROGRAM DESIGN INTRODUCTION SAS FEATURES USED

Applications Development ABSTRACT PROGRAM DESIGN INTRODUCTION SAS FEATURES USED Checking and Tracking SAS Programs Using SAS Software Keith M. Gregg, Ph.D., SCIREX Corporation, Chicago, IL Yefim Gershteyn, Ph.D., SCIREX Corporation, Chicago, IL ABSTRACT Various checks on consistency

More information

MS ACCESS DATABASE DATA TYPES

MS ACCESS DATABASE DATA TYPES MS ACCESS DATABASE DATA TYPES Data Type Use For Size Text Memo Number Text or combinations of text and numbers, such as addresses. Also numbers that do not require calculations, such as phone numbers,

More information

PL/SQL MOCK TEST PL/SQL MOCK TEST I

PL/SQL MOCK TEST PL/SQL MOCK TEST I http://www.tutorialspoint.com PL/SQL MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to PL/SQL. You can download these sample mock tests at your local

More information

C++ INTERVIEW QUESTIONS

C++ INTERVIEW QUESTIONS C++ INTERVIEW QUESTIONS http://www.tutorialspoint.com/cplusplus/cpp_interview_questions.htm Copyright tutorialspoint.com Dear readers, these C++ Interview Questions have been designed specially to get

More information

Introduction to Market Basket Analysis Bill Qualls, First Analytics, Raleigh, NC

Introduction to Market Basket Analysis Bill Qualls, First Analytics, Raleigh, NC Paper AA07-2013 Introduction to Market Basket Analysis Bill Qualls, First Analytics, Raleigh, NC ABSTRACT Market Basket Analysis (MBA) is a data mining technique which is widely used in the consumer package

More information

SAS Data Views: A Virtual View of Data John C. Boling, SAS Institute Inc., Cary, NC

SAS Data Views: A Virtual View of Data John C. Boling, SAS Institute Inc., Cary, NC SAS Data Views: A Virtual View of Data John C. Boling, SAS Institute Inc., Cary, NC ABSTRACT The concept of a SAS data set has been extended or broadened in Version 6 of the SAS System. Two SAS file structures

More information

Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint

Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint Rita Volya, Harvard Medical School, Boston, MA ABSTRACT Working with very large data is not only a question of efficiency

More information

Techniques for Managing Large Data Sets: Compression, Indexing and Summarization Lisa A. Horwitz, SAS Institute Inc., New York

Techniques for Managing Large Data Sets: Compression, Indexing and Summarization Lisa A. Horwitz, SAS Institute Inc., New York Techniques for Managing Large Data Sets: Compression, Indexing and Summarization Lisa A. Horwitz, SAS Institute Inc., New York Abstract Storage space and accessing time are always serious considerations

More information

Stacks. Linear data structures

Stacks. Linear data structures Stacks Linear data structures Collection of components that can be arranged as a straight line Data structure grows or shrinks as we add or remove objects ADTs provide an abstract layer for various operations

More information

Oracle SQL. Course Summary. Duration. Objectives

Oracle SQL. Course Summary. Duration. Objectives Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data

More information

Using SQL Server Management Studio

Using SQL Server Management Studio Using SQL Server Management Studio Microsoft SQL Server Management Studio 2005 is a graphical tool for database designer or programmer. With SQL Server Management Studio 2005 you can: Create databases

More information

Using SAS Views and SQL Views Lynn Palmer, State of California, Richmond, CA

Using SAS Views and SQL Views Lynn Palmer, State of California, Richmond, CA Using SAS Views and SQL Views Lynn Palmer, State of Califnia, Richmond, CA ABSTRACT Views are a way of simplifying access to your ganization s database while maintaining security. With new and easier ways

More information

Encoding the Password

Encoding the Password SESUG 2012 Paper CT-28 Encoding the Password A low maintenance way to secure your data access Leanne Tang, National Agriculture Statistics Services USDA, Washington DC ABSTRACT When users access data in

More information

Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA

Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA CC13 Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA ABSTRACT Prior to SAS version 8, permanent SAS data sets cannot be moved directly

More information

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports Jeff Cai, Amylin Pharmaceuticals, Inc., San Diego, CA Jay Zhou, Amylin Pharmaceuticals, Inc., San Diego, CA ABSTRACT To supplement Oracle

More information

Overview. NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT

Overview. NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT 177 CHAPTER 8 Enhancements for SAS Users under Windows NT Overview 177 NT Event Log 177 Sending Messages to the NT Event Log Using a User-Written Function 178 Examples of Using the User-Written Function

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Answers to Review Questions Chapter 7

Answers to Review Questions Chapter 7 Answers to Review Questions Chapter 7 1. The size declarator is used in a definition of an array to indicate the number of elements the array will have. A subscript is used to access a specific element

More information

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007 Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007 The Java Type System By now, you have seen a fair amount of Java. Time to study in more depth the foundations of the language,

More information

Top Ten SAS Performance Tuning Techniques

Top Ten SAS Performance Tuning Techniques Paper AD39 Top Ten SAS Performance Tuning Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract The Base-SAS software provides users with many choices for accessing,

More information

Arithmetic Coding: Introduction

Arithmetic Coding: Introduction Data Compression Arithmetic coding Arithmetic Coding: Introduction Allows using fractional parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation

More information

NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT

NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT 157 CHAPTER 8 Enhancements for SAS Users under Windows NT 157 NT Event Log 157 Sending Messages to the NT Event Log using SAS Code 158 NT Performance Monitor 159 Examples of Monitoring SAS Performance

More information

Using Casio Graphics Calculators

Using Casio Graphics Calculators Using Casio Graphics Calculators (Some of this document is based on papers prepared by Donald Stover in January 2004.) This document summarizes calculation and programming operations with many contemporary

More information

1 Description of The Simpletron

1 Description of The Simpletron Simulating The Simpletron Computer 50 points 1 Description of The Simpletron In this assignment you will write a program to simulate a fictional computer that we will call the Simpletron. As its name implies

More information

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

More information

Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces

More information

Number Representation

Number Representation Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data

More information

CDW DATA QUALITY INITIATIVE

CDW DATA QUALITY INITIATIVE Loading Metadata to the IRS Compliance Data Warehouse (CDW) Website: From Spreadsheet to Database Using SAS Macros and PROC SQL Robin Rappaport, IRS Office of Research, Washington, DC Jeff Butler, IRS

More information

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012 Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about

More information

Tips for Constructing a Data Warehouse Part 2 Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA

Tips for Constructing a Data Warehouse Part 2 Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA Tips for Constructing a Data Warehouse Part 2 Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA ABSTRACT Ah, yes, data warehousing. The subject of much discussion and excitement. Within the

More information

Tips to Use Character String Functions in Record Lookup

Tips to Use Character String Functions in Record Lookup BSTRCT Tips to Use Character String Functions in Record Lookup njan Matlapudi Pharmacy Informatics, PerformRx, The Next Generation PBM, 200 Stevens Drive, Philadelphia, P 19113 This paper gives you a better

More information

Arrays. Atul Prakash Readings: Chapter 10, Downey Sun s Java tutorial on Arrays: http://java.sun.com/docs/books/tutorial/java/nutsandbolts/arrays.

Arrays. Atul Prakash Readings: Chapter 10, Downey Sun s Java tutorial on Arrays: http://java.sun.com/docs/books/tutorial/java/nutsandbolts/arrays. Arrays Atul Prakash Readings: Chapter 10, Downey Sun s Java tutorial on Arrays: http://java.sun.com/docs/books/tutorial/java/nutsandbolts/arrays.html 1 Grid in Assignment 2 How do you represent the state

More information

Innovative Techniques and Tools to Detect Data Quality Problems

Innovative Techniques and Tools to Detect Data Quality Problems Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful

More information

EXTRACTING DATA FROM PDF FILES

EXTRACTING DATA FROM PDF FILES Paper SER10_05 EXTRACTING DATA FROM PDF FILES Nat Wooding, Dominion Virginia Power, Richmond, Virginia ABSTRACT The Adobe Portable Document File (PDF) format has become a popular means of producing documents

More information

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print In the simplest terms, grep (global regular expression print) will search input

More information

Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI

Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper

More information

THE POWER OF PROC FORMAT

THE POWER OF PROC FORMAT THE POWER OF PROC FORMAT Jonas V. Bilenas, Chase Manhattan Bank, New York, NY ABSTRACT The FORMAT procedure in SAS is a very powerful and productive tool. Yet many beginning programmers rarely make use

More information

Class Notes CS 3137. 1 Creating and Using a Huffman Code. Ref: Weiss, page 433

Class Notes CS 3137. 1 Creating and Using a Huffman Code. Ref: Weiss, page 433 Class Notes CS 3137 1 Creating and Using a Huffman Code. Ref: Weiss, page 433 1. FIXED LENGTH CODES: Codes are used to transmit characters over data links. You are probably aware of the ASCII code, a fixed-length

More information