Dropping variables from a large SAS data set when all their values are missing Gwen Babcock, New York State Department of Health, Albany, NY

Size: px
Start display at page:

Download "Dropping variables from a large SAS data set when all their values are missing Gwen Babcock, New York State Department of Health, Albany, NY"

Transcription

1 Dropping variables from a large SAS data set when all their values are missing Gwen Babcock, New York State Department of Health, Albany, NY ABSTRACT Several methods have been proposed to drop variables whose values are all missing from a SAS data set. These methods typically use PROC SQL, macro variables, and/or arrays, all of which may be slow or fail when applied to large data sets. One method uses hash tables and CALL EXECUTE, but it works only on character variables. Therefore, I have developed an alternative method which can be applied to data sets of any size. The strategy is to use PROC MEANS and PROC FREQ to identify variables whose values are all missing. These variables are then dropped using DATA steps generated by CALL EXECUTE. This method, which uses intermediate level coding techniques, will complement previously proposed methods because it works well on large data sets, and on both character and numeric variables. An example is given using US Census American Community Survey data with SAS 9.3 under Windows XP. INTRODUCTION With relatively small data sets, you can often run PROC FREQ or some other procedure which provides descriptive statistics, visually inspect the results, and use the DROP data set option to remove variables whose values are all missing (Zdeb, 2). However, for large data sets, this method is very tedious and time consuming. Therefore, two macros have been proposed to remove variables from a SAS data set when all its values are missing. The first, the %DROPMISS macro (Sridharma, 26 and 2) fails with large data sets. The second, the %drop_blank_vars macro proposed by Tanimoto (2), works only on character variables. Both use advanced coding techniques: Sridharma uses macros, and Tanimoto uses macros and hash tables. The method proposed here uses intermediate coding techniques, is fast, and works on the largest data sets. In addition, with minor modifications, this method can be used to remove coded missing values, such as or, etc. PREPARATION To prepare to drop all variables with missing values, I first specified the libraries and data sets to use. I wished to preserve the label and compression of the original data set, so I obtained that information using PROC CONTENTS and placed it in macro variables using CALL SYMPUTX in a DATA step. (For a discussion of alternative ways to access table metadata, see Carpenter (23)). I also make a copy of the original data set to use for manipulation, so that the original is preserved in case of need. /*specify any libraries used*/ libname sas 'E:\census\ACS\27_2\SAS\sasblock'; /*specify the input and output data sets*/ %let dsin=sas.acs2_5yr_blkgrp; %let dsout=sas.acs2_5yr_blkgrp2; /*specify an alternative missing value for numeric variables. this value should be a number*/ %let altnumericmissvalue=; proc contents data=&dsin noprint out=mycontents (keep=memlabel compress); set mycontents(obs=); call symputx("mylabel",memlabel); call symputx("mycompress",compress); data &dsout (compress &mycompress.); set &dsin; FINDING NUMERIC VARIABLES WITH ONLY MISSING VALUES Once you have finished the preparation, You can find the number of non-missing values for all numeric variables by using the N statistic in PROC MEANS.

2 proc means data=&dsout noprint; output out=checknumeric(drop=_:) n=;/*get the number of NON-MISSING for all numeric variables*/ /*one observation, many variables*/ The data set output from PROC MEANS has only one observation. It has one variable for each numeric variable in the input data set. I transpose the data and select only those variables which have no values. proc transpose data=checknumeric out=checknumeric2 (drop=_label_ rename=(col=numhere _name_=vname)); data dropnumeric; set checknumeric2 (where=(numhere=)); Obs vname numhere BAe 2 BAe2 3 BAe3 4 BAe4 5 BAe5 6 BAe6 7 BAe7 8 BAe8 9 BAe9 BAe PROC MEANS is very fast. However, for datasets containing hundreds of variables and millions of observations, finding variables with all missing values can be done more quickly by first testing a subset of the data. For example, the OBS= data set option can be used to test the first, observations. Only the variables whose values are all missing in the subset would be tested in the entire dataset. Appendix A shows the code for this approach. FINDING NUMERIC VARIABLES WITH ONLY MISSING CODED VALUES If the missing values are coded, a slight variation on the theme is needed. The assumption behind this code is that if the maximum value and minimum value are both equal to the coded missing value, then all of the values for the variable are equal to the coded missing value. First, I use PROC MEANS to get the minimum and maximum values of all the numeric variables, transpose them as before, and then merge them. Then I select those whose minimum and maximum values are both equal to the coded missing value. proc means data=&dsout noprint; output out=biggest(drop=_:) max=; proc means data=&dsout noprint; output out=smallest(drop=_:) min=; 2

3 proc transpose data=biggest out=biggest2 (drop=_label_ rename=(col=greatest _name_=vname)); proc transpose data=smallest out=smallest2 (drop=_label_ rename=(col=least _name_=vname)); proc sort data=biggest2; by vname; proc sort data=smallest2; by vname; data dropnumericalt; merge biggest2 smallest2; by vname; if least=&altnumericmissvalue. and greatest=&altnumericmissvalue. then output; Obs vname greatest least Bm 2 B2m 3 B99m 4 B99m2 5 B99m3 6 B992m 7 B992m2 8 B992m3 9 B992m B992m2 FINDING CHARACTER VARIABLES WITH ONLY MISSING VALUES Since PROC MEANS can only be used with numeric variables, I use PROC FREQ to find character variables whose values are all missing. The NLEVELS option produces a list of variables with the number of distinct missing and non -missing values. If the number of distinct non-missing values is zero, then the variable should be dropped. I used the ODS OUTPUT statement to get the list of variables in a data set. I used _CHAR_ as a shortcut to list all of the character variables. ods output NLevels=checkchar; ods html close; /*suppress default output, because I only want a data set*/ proc freq data=&dsout nlevels; tables _char_ /noprint; ods html; ods output close; 3

4 The resulting data set looks like this: Obs TableVar TableVarLabel NLevels NMissLevels NNonMissLevels FILEID File Identification 2 STUSAB State Postal Abbreviation 3 SUMLEVEL Summary Level COMPONENT geographic Component 5 LOGRECNO Logical Record Number 6 US US 7 REGION Region 8 DIVISION Division 9 STATECE State (Census Code) State (FIPS Code) STATE From this data set, I select only the variables to be dropped: data dropchar (rename=(tablevar=vname)); set checkchar (where=(nnonmisslevels=)); I could have used PROC FREQ with TABLES _ALL_ to get the missing values for all variables, including numeric ones, but it is slower than PROC MEANS. If your data set is small and you are not concerned about coded missing values, or you have few numeric variables, then using PROC FREQ alone may be more convenient. However, if your dataset has millions of observations, it would be faster to first use both PROC MEANS (for numeric variables) and PROC FREQ (for character variables) on a subset of the observations. Only those variables whose values are all missing in that subset would be used in a PROC MEANS/PROC FREQ of the entire dataset. That would require more complicated code (see Appendix A). With any of these approaches, I could also choose to drop character variables with only one distinct value; the information they contain may be more appropriately placed in the metadata rather than the data set itself. COMBINING AND PARSING THE LIST OF VARIABLES TO DROP A simple SET statement in a DATA step is used to combine the three lists of variables to drop. The new list contains one observation per variable to drop. However, for maximum speed, I want to drop as many variables as possible in one DATA step. Therefore, the list of variables must be converted to the minimum number observations that can hold the list of variables. Since the maximum length of a character variable is 32,767, and the maximum length of a variable name is 32, a cha racter variable should be able to hold a list of at least 992 SAS variable names. It may be able to hold more if the variable names are shorter. First, I combine the three lists of variables. The result is one data set containing one observation for each variable whose values are all missing or all the specified coded value. data allvarstodrop; format vname $32.; /*maximum length of SAS variable name. Use this statement to make sure the names are not truncated*/ set dropchar(keep=vname) dropnumeric (drop=numhere) dropnumericalt (drop=greatest least) ; varsize=length(vname)+; Next, I joined the variable names into lists using the CATX function. When the list reaches the maximum size of a SAS character variable, or the last variable name is read, an observation is output. 4

5 data varstodroplist (keep=varlist varsizesum); format varlist $32767.; /*this is the maximum size of a character variable*/ retain varsizesum varlist; set allvarstodrop end=eof; varsizesum=sum(of varsize varsizesum); if varsizesum>32767 then do; output; putlog vname varsizesum; varsizesum=varsize; varlist=vname; /*reset*/ else varlist=catx(' ',varlist,vname); if eof= then output; DROPPING THE VARIABLES CALL EXECUTE (Whitlock, 997) is used to generate the code that drops the variables with missing values. For each obse rvation in the varstodroplist data set, a DATA step is generated which drops a set of variables. The label and compression of the original data set are used for the new data set. set varstodroplist; call execute("data &dsout(label='&mylabel' compress=&mycompress.);"); call execute("set &dsout(drop="); call execute(varlist); call execute(");"); call execute(""); Here is an example of the code generated by one iteration of this DATA step, with the list of variables to drop shortened for clarity. The full code as written into the SAS log is shown in Appendix B. data sas.acs2_5yr_blkgrp2(label='27-2 US Census American Survey estimates and margins of error for NYS block groups, processed by gdb2' compress=binary); set sas.acs2_5yr_blkgrp2(drop= B2529m3 B2529m32 B2529m33 B2529m34 B2529m35 B2529m36 ); CONCLUSIONS This method for dropping variables whose values are all missing is fast and works on data sets with large numbers of variables. It complements other methods. Since it does not use macros or hash tables, the code is less complex, and easier to use and debug. It can also be used to remove numeric variables whose values have all codes for missing, such as or REFERENCES Carpenter, Arthur L. (23) How Do I? There is more than one way to solve that problem; Why continuing to learn is so important. Proceedings of the SAS Global Forum p.2. 5

6 Sridharma, S (2) Dropping Automatically Variable with Only Missing Values Proceedings of the SAS Global Forum Sridharma, S. (26) How to Reduce the Disk Space Required by a SAS Data Set. NESUG 26 Proceedings. SAS Instutute, Inc. Sample 24622: Drop variables from a SAS data set when all its values are missing. Accessed December 3, 2. Tanimoto, P. (2) Meta-Programming in SAS with DATA Step. MWSUG Proceedings Whitlock, HI (997) CALL EXECUTE: How and Why. Proceedings of the Twenty-Second Annual SAS Users Group International Conference. Xu, W. (27) Identify and Remove Any Variables, Character or Numeric, That Have Only Missing Values. Proceedings Zdeb (2) An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step. NESUG 27 NESUG 2 Proceedings. ACKNOWLEDGMENTS Thanks to Steve Forand, Thomas Talbot, and the New York State Department of Health, Center for Environmental Health for supporting this work. Thanks to Mark H. Keintz for helping make the code faster. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Gwen D LaSelva New York State Department of Health Bureau of Environmental and Occupational Epidemiology Empire State Plaza-Corning Tower, Room 23 Albany, NY 2237 Work Phone: Fax: gdb2@health.state.ny.us SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Inst itute, Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. APPENDIX A: FASTER VERSION OF CODE USING PRE-TESTING This code is more complex, but faster than that discussed previously, especially for datasets containing hundreds of variables and millions of observations. The general approach is to pre-test a subset of observations using PROC MEANS or PROC FREQ to determine which variables contain only missing values in the subset. These variables are then tested in the entire dataset, and dropped if they contain only missing values. /************************************************************** *the purpose of this program is to drop variables with *missing values from a SAS dataset. *This code is designed to have a similar function to *The DROPMISS macro of Selvaratname Sridharma *see "Dropping Automatically Variables with Only Missing Values" * SAS Global Forum 2 *But the DROPMISS macro fails sometimes, so this is an alternative *Note: this program is more complex than the 'drop missing variables' program * but will be faster, especially on datasets with large numbers of observations * it was tested on NYS outpatient data with about 653 variables and over 9 million observations *programmed by Gwen LaSelva *inputs: library name name of input SAS dataset name of output SAS dataset 6

7 alternative missing value for numeric variables, for example or sample size for preliminary determination of which variables to drop *outputs: SAS dataset* *******************************************************************/ /*specify any libraries used*/ libname sas 'E:\census\ACS\27_2\SASblockout'; libname op "P:\Sections\EPHTdata\SPARCSdeidentified\Outpatient"; /*specify the input and output datasets*/ %let dsin=op.outpatient_s_2_prime; %let dsout=sas.want; /*specify an alternative missing value for numeric variables. this value should be a number*/ %let altnumericmissvalue=; /*specify the number of observations to sample. The size should be a fraction of the total number of observations. It should be your best guess of the minimum number of observations needed to find non-missing values in all variables whose values are not all missing. Changing this will affect the speed of the program*/ %let samplesize=; /*put the time in the log*/ temptime=put(time(),hhmm5.); putlog "start time=" temptime; /*shouldn't need to modify below this line*/ /*STEP : PREPARATION*/ /*get the label and compression of the input dataset and put them into a macro variables, use them at the end so the new dataset has the same label and compression as the original*/ proc contents data=&dsin noprint out=mycontents (keep=memlabel compress); set mycontents(obs=); call symputx("mylabel",memlabel); call symputx("mycompress",compress); /*first, make output dataset to work with, so we don't accidentally mess up the original dataset if something goes wrong*/ data &dsout (compress= &mycompress.); set &dsin; /*STEP 2: FIND ALL NUMERIC VARIABLES WITH ALL MISSING VALUES*/ /*proc means with the N statistic should count number of non-missing values*/ /*start with a subset of observations*/ proc means data=&dsout (obs=&samplesize.) noprint; output out=sampledropnumeric(drop=_:) n=;/*get the number of NON-MISSING for all numeric variables*/ proc transpose data=sampledropnumeric out=sampledropnumeric2 (rename=(col=numhere _name_=vname)); 7

8 data maybedropnumeric (keep=vname varsize); set sampledropnumeric2 (where=(numhere=)); varsize=length(vname)+; /*test the variables who are all missing in the first subset of observations in the entire dataset*/ data numvarstotestlist (keep=varlist varsizesum); format varlist $32767.; /*this is the maximum size of a character variable*/ retain varsizesum varlist; set maybedropnumeric end=eof; varsizesum=sum(of varsize varsizesum); if varsizesum>32767 then do; output; putlog vname varsizesum; varsizesum=varsize; varlist=vname; /*reset*/ else varlist=catx(' ',varlist,vname); if eof= then output; set numvarstotestlist ; call execute("proc means data=&dsout noprint;"); call execute("output out=checknumeric(drop=_:) n=;"); call execute("var"); call execute(varlist); call execute(";"); if _n_= then call execute("data checknumeric; set checknumeric; "); else call execute("data checknumeric; merge checknumeric checknumeric; "); proc transpose data=checknumeric out=checknumeric2 (drop=_label_ rename=(col=numhere _name_=vname)); data dropnumeric; set checknumeric2 (where=(numhere=)); /*Use PROC FREQ to get character variables whose values are all missing it is slower than PROC MEANS*/ /*start with a subset of observations*/ ods output NLevels=maybedropchar; ods html close; /*suppress default output, because we only want a dataset*/ proc freq data=&dsout (obs=&samplesize.) nlevels; tables _char_ /noprint; ods html; ods output close; data maybedropchar2 (rename=(tablevar=vname)); set maybedropchar (where=(nnonmisslevels=)); varsize=length(tablevar)+; 8

9 /*test the variables who are all missing in the first subset of observations in the entire dataset*/ data charvarstotestlist (keep=varlist varsizesum); format varlist $32767.; /*this is the maximum size of a character variable*/ retain varsizesum varlist; set maybedropchar2 end=eof; varsizesum=sum(of varsize varsizesum); if varsizesum>32767 then do; output; putlog vname varsizesum; varsizesum=varsize; varlist=vname; /*reset*/ else varlist=catx(' ',varlist,vname); if eof= then output; set charvarstotestlist ; call execute("ods output NLevels=checkchar;ods html close;"); call execute("proc freq data=&dsout nlevels; tables"); call execute(varlist); call execute("/noprint; ods html; ods output close;"); if _n_= then call execute("data dropchar; set checkchar (rename=(tablevar=vname) where=(nnonmisslevels=)); "); else call execute("data cropchar; set dropchar checkchar (rename=(tablevar=vname) where=(nnonmisslevels=)); "); /*You could use proc freq for all of the variables, but it would be slow*/ /*now, find out if any of the variables contain only the alternative missing value*/ /*let us assume that if the minimum and maximum values are both equal to the alternative missing value, the variable should be dropped*/ /*start with a subset of observations*/ proc means data=&dsout (obs=&samplesize.) noprint; output out=maybebiggest(drop=_:) max=; proc means data=&dsout (obs=&samplesize.) noprint; output out=maybesmallest(drop=_:) min=; proc transpose data=maybebiggest out=maybebiggest2 (drop= rename=(col=greatest _name_=vname)); proc transpose data=maybesmallest out=maybesmallest2 (drop= rename=(col=least _name_=vname)); proc sort data=maybebiggest2; by vname; proc sort data=maybesmallest2; by vname; data testnumericalt; merge maybebiggest2 maybesmallest2; by vname; varsize=length(vname)+; if least=&altnumericmissvalue. and greatest=&altnumericmissvalue. then output; 9

10 /*test the variables who are all the alternative missing value in the first subset of observations in the entire dataset*/ data numvarstotestlist (keep=varlist varsizesum); format varlist $32767.; /*this is the maximum size of a character variable*/ retain varsizesum varlist; set testnumericalt end=eof; varsizesum=sum(of varsize varsizesum); if varsizesum>32767 then do; output; putlog vname varsizesum; varsizesum=varsize; varlist=vname; /*reset*/ else varlist=catx(' ',varlist,vname); if eof= then output; set numvarstotestlist ; call execute("proc means data=&dsout noprint;"); call execute("output out=biggest(drop=_:) max=; var"); call execute(varlist); call execute(";"); call execute("proc means data=&dsout noprint;"); call execute("output out=smallest(drop=_:) min=; var"); call execute(varlist); call execute(";"); if _n_= then do; call execute("data biggest; set biggest; "); call execute("data smallest; set smallest; "); else do; call execute("data smallest; merge smallest smallest; "); call execute("data biggest; merge biggest biggest; "); proc transpose data=biggest out=biggest2 (drop= rename=(col=greatest _name_=vname)); proc transpose data=smallest out=smallest2 (drop= rename=(col=least _name_=vname)); proc sort data=biggest2; by vname; proc sort data=smallest2; by vname; data dropnumericalt; merge biggest2 smallest2; by vname; if least=&altnumericmissvalue. and greatest=&altnumericmissvalue. then output;

11 /*STEP 3: COMBINE LISTS OF CHARACTER AND NUMERIC VARIABLES TO DROP*/ /*for convience, combine the lists of character and numeric variables to drop*/ data allvarstodrop; format vname $32.; /*maximum length of SAS variable name. Use this statement to make sure the names are not truncated*/ set dropchar(keep=vname) dropnumeric (drop=numhere) dropnumericalt (drop=greatest least) ; varsize=length(vname)+; *proc print data=allvarstodrop (obs=); /*STEP 4: DROP THE VARIABLES*/ /* create a character variable containing the list of variables to be dropped. A character variable can be up to characters, and can therefore hold a list of at least 992 SAS variables, maybe more*/ /*if the sum of varsize is more than need to create additional observations in the dataset to hold the additional variables*/ data varstodroplist (keep=varlist varsizesum); format varlist $32767.; /*this is the maximum size of a character variable*/ retain varsizesum varlist; set allvarstodrop end=eof; varsizesum=sum(of varsize varsizesum); if varsizesum>32767 then do; output; putlog vname varsizesum; varsizesum=varsize; varlist=vname; /*reset*/ else varlist=catx(' ',varlist,vname); if eof= then output; /*use call execute to repeat the dropping for each set of variable names*/ set varstodroplist; call execute("data &dsout(label='&mylabel' compress=&mycompress.);"); call execute("set &dsout(drop="); call execute(varlist); call execute(");"); call execute(""); /*put the end time in the log*/ temptime=put(time(),hhmm5.); putlog "end time=" temptime; APPENDIX B: CODE GENERATED BY THE FINAL DATA STEP CALL EXECUTE STATEMENTS This code was created by one iteration of the final DATA step, using CALL EXECUTE statements. It is this code which drops the variables that were previously determined to have only missing values. The code is shown as it appears in the SAS log data sas.acs2_5yr_blkgrp2(label='27-2 US Census American Survey estimates and margins of error for NYS block groups, processed by gdb2' compress=binary); 22 + set sas.acs2_5yr_blkgrp2(drop= 22 + B2529m3 B2529m32 B2529m33 B2529m34 B2529m35 B2529m36 B2529m37 B2529m38 B2529m39

12 B2529m4 B2529m4 B2529m42 B2529m43 B2529m44 B2529m45 B2529m46 B2529m47 B2529m48 B2529m49 B2529m5 B2529m5 B2529m52 B2529m53 B2529m54 B2529m B2529m56 B2529m57 B2529m58 B2529m59 B2529m6 B2529m6 B2529m62 B2529m63 B2529m64 B2529m65 B2529m66 B2529m67 B2529m68 B2529m69 B2529m7 B2529m7 B2529m72 B2529m73 B2529m74 B2529m75 B2529m76 B2529m77 B2529m78 B2529m79 B2529m B2529m8 B2529m82 B2529m83 B2529m84 B2529m85 B2529m86 B2529m87 B26e B26m B98e B98e2 B982e B982e2 B98e B982e B982e2 B982e3 B983e B983e2 B983e3 B983e4 B983e5 B983e6 B983e7 B984e B982e B982e B982e3 B982e4 B982e5 B982e6 B982e7 B982e8 B982e9 B9822e B9822e2 B9822e3 B9822e4 B9822e5 B9822e6 B9822e7 B9822e8 B9822e9 B9822e B983e B9832e B98m B98m2 B982m B982m2 B98m B982m B982m2 B982m3 B983m B983m2 B983m3 B983m4 B983m5 B983m6 B983m7 B984m B982m B982m2 B982m3 B982m4 B982m5 B982m6 B982m7 B982m8 B982m9 B9822m B9822m2 B9822m3 B9822m4 B9822m5 B9822m6 B9822m7 B9822m8 B9822m9 B9822m B983m B9832m B9952PRe B9952PRe2 B9952PRe3 B9952PRe4 B9952PRe5 B9952PRe6 B9952PRe7 B9985e B9985e2 B9985e3 B9952PRm B9952PRm2 B9952PRm3 B9952PRm4 B9952PRm5 B9952PRm6 B9952PRm7 B9985m B9985m2 B9985m3 B9986e B9986e2 B9986e3 B9987e B9987e B9987e3 B9987e4 B9987e5 B9988e B9988e2 B9988e3 B9988e4 B9988e5 B9989e B9989e2 B9989e3 B9986m B9986m2 B9986m3 B9987m B9987m2 B9987m3 B9987m4 B9987m5 B9988m B9988m2 B9988m3 B9988m4 B9988m5 B9989m B9989m2 B9989m3 B993e B993e2 B993e3 B9932e B9932e2 B9932e3 B9922e B9922e2 B9922e3 B99244e B99244e2 B99244e3 B99245e B99245e2 B99245e3 B99246e B99246e2 B99246e3 B992523e B992523e2 B992523e3 B993m B993m2 B993m3 B9932m B9932m2 B9932m3 B9922m B9922m B9922m3 B99244m B99244m2 B99244m3 B99245m B99245m2 B99245m3 B99246m B99246m2 B99246m3 B992523m B992523m2 B992523m3 Bm B2m B99m B99m2 B99m3 B992m B992m2 B992m3 B992m B992m2 B992m3 B993m B993m2 B993m3 B995m 23 + B995m2 B995m3 B995m4 B995m5 B995m6 B995m7 B9952m B9952m2 B9952m3 B9952m4 B9952m5 B9952m6 B9952m7 B996m B996m2 B996m3 B997m B997m2 B997m3 B9972m B9972m2 B9972m3 B9972m4 B9972m5 B9972m6 B9972m7 B998m B998m B998m3 B998m B998m2 B998m3 B998m4 B998m5 B9982m B9982m2 B9982m3 B9982m4 B9982m5 B9983m B9983m2 B9983m3 B9983m4 B9983m5 B9984m B9984m2 B9984m3 B9984m4 B9984m5 B9992m B9992m2 B9992m3 B992m B992m2 B992m3 B993m B993m2 B993m3 B994m B994m2 B994m3 B994m4 B994m5 B994m6 B994m7 B992m B992m2 B992m3 B994m B994m2 B994m3 B9942m B9942m2 B9942m3 B995m B995m2 B995m3 B996m B996m2 B996m3 B9962m B9962m2 B9962m3 B9962m B9962m5 B9962m6 B9962m7 B9963m B9963m2 B9963m3 B9963m4 B9963m5 B997m B997m B997m B997m2 B997m3 B997m4 B997m5 B997m2 B997m3 B997m4 B997m5 B997m6 B997m7 B997m8 B997m9 B9972m B9972m B9972m B9972m B9972m3 B9972m4 B9972m5 B9972m2 B9972m3 B9972m4 B9972m5 B9972m6 B9972m7 B9972m8 B9972m9 B999m B999m2 B999m3 B999m4 B999m5 B999m6 B999m7 B999m8 B9992m B9992m2 B9992m3 B9992m4 B9992m5 B9992m6 B9992m7 B9992m8 B9993m 2

13 235 + B9993m2 B9993m3 B9993m4 B9993m5 B9993m6 B9993m7 B9993m8 B9994m B9994m2 B9994m3 B9994m4 B9994m5 B9994m6 B9994m7 B9994m8 B992m B992m2 B992m3 B992m4 B992m5 B992m6 B992m7 B992m8 B992m B992m2 B992m3 B9922m B9922m B9922m3 B9923m B9923m2 B9923m3 B99232m B99232m2 B99232m3 B99233m B99233m2 B99233m3 B99233m4 B99233m5 B99234m B99234m2 B99234m3 B99234m4 B99234m5 B9924m B9924m2 B9924m3 B99242m B99242m2 B99242m3 B99243m B99243m2 B99243m3 B9925m B9925m B9925m3 B9925m B9925m2 B9925m3 B99252m B99252m2 B99252m3 B99253m B99253m2 B99253m3 B99254m B99254m2 B99254m3 B99255m B99255m2 B99255m3 B99256m B99256m2 B99256m3 B99258m B99258m2 B99258m3 B99259m B99259m2 B99259m B99252m B99252m2 B99252m3 B99252m B99252m2 B99252m3 B992522m B992522m2 B992522m3 B992522m4 B992522m5 B992522m6 B992522m7 B99252m B99252m2 B99252m3 B99253m B99253m2 B99253m3 B99254m B99254m2 B99254m3 B99255m B99255m2 B99255m3 B99256m B99256m2 B99256m3 B99257m B99257m2 B99257m3 B99258m B99258m2 B99258m3 B99259m B99259m ); 24 + NOTE: There were 5463 observations read from the data set SAS.ACS2_5YR_BLKGRP2. NOTE: The data set SAS.ACS2_5YR_BLKGRP2 has 5463 observations and 6462 variables. NOTE: Compressing data set SAS.ACS2_5YR_BLKGRP2 decreased size by percent. Compressed is 494 pages; un-compressed would require 5495 pages. NOTE: DATA statement used (Total process time): real time 7.93 seconds cpu time 7.28 seconds 3

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

From The Little SAS Book, Fifth Edition. Full book available for purchase here. From The Little SAS Book, Fifth Edition. Full book available for purchase here. Acknowledgments ix Introducing SAS Software About This Book xi What s New xiv x Chapter 1 Getting Started Using SAS Software

More information

How to Reduce the Disk Space Required by a SAS Data Set

How to Reduce the Disk Space Required by a SAS Data Set How to Reduce the Disk Space Required by a SAS Data Set Selvaratnam Sridharma, U.S. Census Bureau, Washington, DC ABSTRACT SAS datasets can be large and disk space can often be at a premium. In this paper,

More information

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY Paper FF-007 Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY ABSTRACT SAS datasets include labels as optional variable attributes in the descriptor

More information

Reshaping & Combining Tables Unit of analysis Combining. Assignment 4. Assignment 4 continued PHPM 672/677 2/21/2016. Kum 1

Reshaping & Combining Tables Unit of analysis Combining. Assignment 4. Assignment 4 continued PHPM 672/677 2/21/2016. Kum 1 Reshaping & Combining Tables Unit of analysis Combining Reshaping set: concatenate tables (stack rows) merge: link tables (attach columns) proc summary: consolidate rows proc transpose: reshape table Hye-Chung

More information

Subsetting Observations from Large SAS Data Sets

Subsetting Observations from Large SAS Data Sets Subsetting Observations from Large SAS Data Sets Christopher J. Bost, MDRC, New York, NY ABSTRACT This paper reviews four techniques to subset observations from large SAS data sets: MERGE, PROC SQL, user-defined

More information

THE POWER OF PROC FORMAT

THE POWER OF PROC FORMAT THE POWER OF PROC FORMAT Jonas V. Bilenas, Chase Manhattan Bank, New York, NY ABSTRACT The FORMAT procedure in SAS is a very powerful and productive tool. Yet many beginning programmers rarely make use

More information

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL The SET Statement and Beyond: Uses and Abuses of the SET Statement S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT The SET statement is one of the most frequently used statements in the SAS System.

More information

Storing and Using a List of Values in a Macro Variable

Storing and Using a List of Values in a Macro Variable Storing and Using a List of Values in a Macro Variable Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT When using the macro language it is not at all unusual to need

More information

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although

More information

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn); SAS-INTERVIEW QUESTIONS 1. What SAS statements would you code to read an external raw data file to a DATA step? Ans: Infile and Input statements are used to read external raw data file to a Data Step.

More information

Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Paper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces

More information

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1 USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1 There may be times when you run a SAS procedure when you would like to direct the procedure output to a SAS data

More information

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make

More information

You have got SASMAIL!

You have got SASMAIL! You have got SASMAIL! Rajbir Chadha, Cognizant Technology Solutions, Wilmington, DE ABSTRACT As SAS software programs become complex, processing times increase. Sitting in front of the computer, waiting

More information

FINDING ORACLE TABLE METADATA WITH PROC CONTENTS

FINDING ORACLE TABLE METADATA WITH PROC CONTENTS Finding Oracle Table Metadata: When PROC CONTENTS Is Not Enough Jeremy W. Poling, B&W Y-12 L.L.C., Oak Ridge, TN ABSTRACT Complex Oracle databases often contain hundreds of linked tables. For SAS/ACCESS

More information

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC A PROC SQL Primer Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC ABSTRACT Most SAS programmers utilize the power of the DATA step to manipulate their datasets. However, unless they pull

More information

The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA

The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA PAPER IN09_05 The Program Data Vector As an Aid to DATA step Reasoning Marianne Whitlock, Kennett Square, PA ABSTRACT The SAS DATA step is easy enough for beginners to produce results quickly. You can

More information

EXST SAS Lab Lab #4: Data input and dataset modifications

EXST SAS Lab Lab #4: Data input and dataset modifications EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will

More information

Generic Automated Data Dictionary for Any SAS DATA Set or Format Library Dale Harrington, Kaiser Permanente, Oakland, CA

Generic Automated Data Dictionary for Any SAS DATA Set or Format Library Dale Harrington, Kaiser Permanente, Oakland, CA Generic Automated Data Dictionary for Any SAS DATA Set or Format Library Dale Harrington, Kaiser Permanente, Oakland, CA ABSTRACT This paper explains how to create an automated data dictionary which will

More information

A Method for Cleaning Clinical Trial Analysis Data Sets

A Method for Cleaning Clinical Trial Analysis Data Sets A Method for Cleaning Clinical Trial Analysis Data Sets Carol R. Vaughn, Bridgewater Crossings, NJ ABSTRACT This paper presents a method for using SAS software to search SAS programs in selected directories

More information

Innovative Techniques and Tools to Detect Data Quality Problems

Innovative Techniques and Tools to Detect Data Quality Problems Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful

More information

Table Lookups: From IF-THEN to Key-Indexing

Table Lookups: From IF-THEN to Key-Indexing Paper 158-26 Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine

More information

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.

More information

Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison

Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison Assumptions That you have used SAS at least a few times. It doesn t matter whether you run SAS in

More information

Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois

Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data

More information

PharmaSUG 2015 - Paper QT26

PharmaSUG 2015 - Paper QT26 PharmaSUG 2015 - Paper QT26 Keyboard Macros - The most magical tool you may have never heard of - You will never program the same again (It's that amazing!) Steven Black, Agility-Clinical Inc., Carlsbad,

More information

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular

More information

Using SAS Views and SQL Views Lynn Palmer, State of California, Richmond, CA

Using SAS Views and SQL Views Lynn Palmer, State of California, Richmond, CA Using SAS Views and SQL Views Lynn Palmer, State of Califnia, Richmond, CA ABSTRACT Views are a way of simplifying access to your ganization s database while maintaining security. With new and easier ways

More information

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7 Chapter 1 Introduction 1 SAS: The Complete Research Tool 1 Objectives 2 A Note About Syntax and Examples 2 Syntax 2 Examples 3 Organization 4 Chapter by Chapter 4 What This Book Is Not 5 Chapter 2 SAS

More information

PharmaSUG 2013 - Paper MS05

PharmaSUG 2013 - Paper MS05 PharmaSUG 2013 - Paper MS05 Be a Dead Cert for a SAS Cert How to prepare for the most important SAS Certifications in the Pharmaceutical Industry Hannes Engberg Raeder, inventiv Health Clinical, Germany

More information

Integrating Data and Business Rules with a Control Data Set in SAS

Integrating Data and Business Rules with a Control Data Set in SAS Paper 3461-2015 Integrating Data and Business Rules with a Data Set in SAS Edmond Cheng, CACI International Inc. ABSTRACT In SAS software development, data specifications and process requirements can be

More information

Paper AD11 Exceptional Exception Reports

Paper AD11 Exceptional Exception Reports Paper AD11 Exceptional Exception Reports Gary McQuown Data and Analytic Solutions Inc. http://www.dasconsultants.com Introduction This paper presents an overview of exception reports for data quality control

More information

KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG

KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG Paper BB-07-2014 Using Arrays to Quickly Perform Fuzzy Merge Look-ups: Case Studies in Efficiency Arthur L. Carpenter California Occidental Consultants, Anchorage, AK ABSTRACT Merging two data sets when

More information

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time

More information

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods Overview 1 Determining Data Relationships 1 Understanding the Methods for Combining SAS Data Sets 3

More information

The entire SAS code for the %CHK_MISSING macro is in the Appendix. The full macro specification is listed as follows: %chk_missing(indsn=, outdsn= );

The entire SAS code for the %CHK_MISSING macro is in the Appendix. The full macro specification is listed as follows: %chk_missing(indsn=, outdsn= ); Macro Tabulating Missing Values, Leveraging SAS PROC CONTENTS Adam Chow, Health Economics Resource Center (HERC) VA Palo Alto Health Care System Department of Veterans Affairs (Menlo Park, CA) Abstract

More information

Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation

Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation ABSTRACT Customer segmentation is fundamental for successful marketing

More information

Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN

Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN Paper S1-08-2013 Let the CAT Out of the Bag: String Concatenation in SAS 9 Joshua Horstman, Nested Loop Consulting, Indianapolis, IN ABSTRACT Are you still using TRIM, LEFT, and vertical bar operators

More information

Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments

Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments David A. Mabey, Reader s Digest Association Inc., Pleasantville, NY ABSTRACT When SAS applications are driven by data-generated

More information

Creating Dynamic Reports Using Data Exchange to Excel

Creating Dynamic Reports Using Data Exchange to Excel Creating Dynamic Reports Using Data Exchange to Excel Liping Huang Visiting Nurse Service of New York ABSTRACT The ability to generate flexible reports in Excel is in great demand. This paper illustrates

More information

Post Processing Macro in Clinical Data Reporting Niraj J. Pandya

Post Processing Macro in Clinical Data Reporting Niraj J. Pandya Post Processing Macro in Clinical Data Reporting Niraj J. Pandya ABSTRACT Post Processing is the last step of generating listings and analysis reports of clinical data reporting in pharmaceutical industry

More information

Using the Magical Keyword "INTO:" in PROC SQL

Using the Magical Keyword INTO: in PROC SQL Using the Magical Keyword "INTO:" in PROC SQL Thiru Satchi Blue Cross and Blue Shield of Massachusetts, Boston, Massachusetts Abstract INTO: host-variable in PROC SQL is a powerful tool. It simplifies

More information

More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make

More information

Survey Analysis: Options for Missing Data

Survey Analysis: Options for Missing Data Survey Analysis: Options for Missing Data Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Abstract A common situation researchers working with survey data face is the analysis of missing

More information

Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA

Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA ABSTRACT When we work on millions of records, with hundreds of variables, it is crucial how we

More information

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there

More information

Changing the Shape of Your Data: PROC TRANSPOSE vs. Arrays

Changing the Shape of Your Data: PROC TRANSPOSE vs. Arrays Changing the Shape of Your Data: PROC TRANSPOSE vs. Arrays Bob Virgile Robert Virgile Associates, Inc. Overview To transpose your data (turning variables into observations or turning observations into

More information

Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX

Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX Paper 126-29 Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX ABSTRACT This hands-on workshop shows how to use the SAS Macro Facility

More information

Programming Idioms Using the SET Statement

Programming Idioms Using the SET Statement Programming Idioms Using the SET Statement Jack E. Fuller, Trilogy Consulting Corporation, Kalamazoo, MI ABSTRACT While virtually every programmer of base SAS uses the SET statement, surprisingly few programmers

More information

Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI

Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI Paper #HW02 Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI ABSTRACT PROC SQL is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps into

More information

SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN

SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN SA118-2014 SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN ABSTRACT ODS (Output Delivery System) is a wonderful feature in SAS to create consistent, presentable

More information

Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI

Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper

More information

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports Jeff Cai, Amylin Pharmaceuticals, Inc., San Diego, CA Jay Zhou, Amylin Pharmaceuticals, Inc., San Diego, CA ABSTRACT To supplement Oracle

More information

Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound

Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound PharmaSUG 2014 - Paper AD21 Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound ABSTRACT Natalie Reynolds, Eli Lilly and Company, Indianapolis, IN Keith

More information

Using SAS/IntrNet as a Web-Enabled Platform for Clinical Reporting

Using SAS/IntrNet as a Web-Enabled Platform for Clinical Reporting Paper AD09 Using SAS/IntrNet as a Web-Enabled Platform for Clinical Reporting ABSTRACT Paul Gilbert, DataCeutics, Inc., Pottstown, PA Steve Light, DataCeutics, Inc., Pottstown, PA Gregory Weber, DataCeutics,

More information

MWSUG 2011 - Paper S111

MWSUG 2011 - Paper S111 MWSUG 2011 - Paper S111 Dealing with Duplicates in Your Data Joshua M. Horstman, First Phase Consulting, Inc., Indianapolis IN Roger D. Muller, First Phase Consulting, Inc., Carmel IN Abstract As SAS programmers,

More information

Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA

Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA CC13 Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA ABSTRACT Prior to SAS version 8, permanent SAS data sets cannot be moved directly

More information

A Macro to Create Data Definition Documents

A Macro to Create Data Definition Documents A Macro to Create Data Definition Documents Aileen L. Yam, sanofi-aventis Inc., Bridgewater, NJ ABSTRACT Data Definition documents are one of the requirements for NDA submissions. This paper contains a

More information

Instant Interactive SAS Log Window Analyzer

Instant Interactive SAS Log Window Analyzer ABSTRACT Paper 10240-2016 Instant Interactive SAS Log Window Analyzer Palanisamy Mohan, ICON Clinical Research India Pvt Ltd Amarnath Vijayarangan, Emmes Services Pvt Ltd, India An interactive SAS environment

More information

DBF Chapter. Note to UNIX and OS/390 Users. Import/Export Facility CHAPTER 7

DBF Chapter. Note to UNIX and OS/390 Users. Import/Export Facility CHAPTER 7 97 CHAPTER 7 DBF Chapter Note to UNIX and OS/390 Users 97 Import/Export Facility 97 Understanding DBF Essentials 98 DBF Files 98 DBF File Naming Conventions 99 DBF File Data Types 99 ACCESS Procedure Data

More information

Report Customization Using PROC REPORT Procedure Shruthi Amruthnath, EPITEC, INC., Southfield, MI

Report Customization Using PROC REPORT Procedure Shruthi Amruthnath, EPITEC, INC., Southfield, MI Paper SA12-2014 Report Customization Using PROC REPORT Procedure Shruthi Amruthnath, EPITEC, INC., Southfield, MI ABSTRACT SAS offers powerful report writing tools to generate customized reports. PROC

More information

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets

Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets Karin LaPann ViroPharma Incorporated ABSTRACT Much functionality has been added to the SAS to Excel procedures in SAS version 9.

More information

SAS Views The Best of Both Worlds

SAS Views The Best of Both Worlds Paper 026-2010 SAS Views The Best of Both Worlds As seasoned SAS programmers, we have written and reviewed many SAS programs in our careers. What I have noticed is that more often than not, people who

More information

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health INTRODUCTION There are a number of SAS tools that you may never have to use. Why? The main reason

More information

Introduction to SAS Business Intelligence/Enterprise Guide Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN

Introduction to SAS Business Intelligence/Enterprise Guide Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN Paper TS600 Introduction to SAS Business Intelligence/Enterprise Guide Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN ABSTRACT This paper provides an overview of new SAS Business Intelligence

More information

Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA

Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA PharmaSUG 2014 Paper CC07 Preserving Line Breaks When Exporting to Excel Nelson Lee, Genentech, South San Francisco, CA ABSTRACT Do you have imported data with line breaks and want to export the data to

More information

Using SAS to Create Graphs with Pop-up Functions Shiqun (Stan) Li, Minimax Information Services, NJ Wei Zhou, Lilly USA LLC, IN

Using SAS to Create Graphs with Pop-up Functions Shiqun (Stan) Li, Minimax Information Services, NJ Wei Zhou, Lilly USA LLC, IN Paper CC12 Using SAS to Create Graphs with Pop-up Functions Shiqun (Stan) Li, Minimax Information Services, NJ Wei Zhou, Lilly USA LLC, IN ABSTRACT In addition to the static graph features, SAS provides

More information

Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI

Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI Paper 191-27 AN INTRODUCTION TO PROC SQL Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI ABSTRACT PROC SQL is a powerful Base SAS 7 Procedure that combines the

More information

Parallel Data Preparation with the DS2 Programming Language

Parallel Data Preparation with the DS2 Programming Language ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming

More information

Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC

Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC ABSTRACT Have you used PROC MEANS or PROC SUMMARY and wished there was something intermediate between the NWAY option

More information

Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch

More information

What You re Missing About Missing Values

What You re Missing About Missing Values Paper 1440-2014 What You re Missing About Missing Values Christopher J. Bost, MDRC, New York, NY ABSTRACT Do you know everything you need to know about missing values? Do you know how to assign a missing

More information

PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays

PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays, continued SESUG 2012 PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays William E Benjamin

More information

Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC

Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC ABSTRACT PharmaSUG 2012 - Paper CC07 Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC In Pharmaceuticals/CRO industries, Excel files are widely use for data storage.

More information

Data Presentation. Paper 126-27. Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

Data Presentation. Paper 126-27. Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs Paper 126-27 Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs Tugluke Abdurazak Abt Associates Inc. 1110 Vermont Avenue N.W. Suite 610 Washington D.C. 20005-3522

More information

Handling Missing Values in the SQL Procedure

Handling Missing Values in the SQL Procedure Handling Missing Values in the SQL Procedure Danbo Yi, Abt Associates Inc., Cambridge, MA Lei Zhang, Domain Solutions Corp., Cambridge, MA ABSTRACT PROC SQL as a powerful database management tool provides

More information

Managing Tables in Microsoft SQL Server using SAS

Managing Tables in Microsoft SQL Server using SAS Managing Tables in Microsoft SQL Server using SAS Jason Chen, Kaiser Permanente, San Diego, CA Jon Javines, Kaiser Permanente, San Diego, CA Alan L Schepps, M.S., Kaiser Permanente, San Diego, CA Yuexin

More information

ABSTRACT INTRODUCTION %CODE MACRO DEFINITION

ABSTRACT INTRODUCTION %CODE MACRO DEFINITION Generating Web Application Code for Existing HTML Forms Don Boudreaux, PhD, SAS Institute Inc., Austin, TX Keith Cranford, Office of the Attorney General, Austin, TX ABSTRACT SAS Web Applications typically

More information

Preparing Real World Data in Excel Sheets for Statistical Analysis

Preparing Real World Data in Excel Sheets for Statistical Analysis Paper DM03 Preparing Real World Data in Excel Sheets for Statistical Analysis Volker Harm, Bayer Schering Pharma AG, Berlin, Germany ABSTRACT This paper collects a set of techniques of importing Excel

More information

ABSTRACT THE ISSUE AT HAND THE RECIPE FOR BUILDING THE SYSTEM THE TEAM REQUIREMENTS. Paper DM09-2012

ABSTRACT THE ISSUE AT HAND THE RECIPE FOR BUILDING THE SYSTEM THE TEAM REQUIREMENTS. Paper DM09-2012 Paper DM09-2012 A Basic Recipe for Building a Campaign Management System from Scratch: How Base SAS, SQL Server and Access can Blend Together Tera Olson, Aimia Proprietary Loyalty U.S. Inc., Minneapolis,

More information

E-Mail OS/390 SAS/MXG Computer Performance Reports in HTML Format

E-Mail OS/390 SAS/MXG Computer Performance Reports in HTML Format SAS Users Group International (SUGI29) May 9-12,2004 Montreal, Canada E-Mail OS/390 SAS/MXG Computer Performance Reports in HTML Format ABSTRACT Neal Musitano Jr Department of Veterans Affairs Information

More information

SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA

SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA ABSTRACT A codebook is a summary of a collection of data that reports significant

More information

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms.

More information

Nine Steps to Get Started using SAS Macros

Nine Steps to Get Started using SAS Macros Paper 56-28 Nine Steps to Get Started using SAS Macros Jane Stroupe, SAS Institute, Chicago, IL ABSTRACT Have you ever heard your coworkers rave about macros? If so, you've probably wondered what all the

More information

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility Michael G. Sadof, MGS Associates, Inc., Bethesda, MD. ABSTRACT The macro facility is an important feature of the

More information

Top Ten SAS Performance Tuning Techniques

Top Ten SAS Performance Tuning Techniques Paper AD39 Top Ten SAS Performance Tuning Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract The Base-SAS software provides users with many choices for accessing,

More information

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT Paper AD01 Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT ABSTRACT The use of EXCEL spreadsheets is very common in SAS applications,

More information

Coding for Posterity

Coding for Posterity Coding for Posterity Rick Aster Some programs will still be in use years after they are originally written; others will be discarded. It is not how well a program achieves its original purpose that makes

More information

Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD

Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD NESUG 2012 Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD ABSTRACT PROC SQL is a powerful yet still overlooked tool within our SAS arsenal. PROC SQL can create tables, sort and summarize data,

More information

Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA

Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA ABSTRACT PharmaSUG 2015 - Paper QT12 Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA It is common to export SAS data to Excel by creating a new Excel file. However, there

More information

Automation of Large SAS Processes with Email and Text Message Notification Seva Kumar, JPMorgan Chase, Seattle, WA

Automation of Large SAS Processes with Email and Text Message Notification Seva Kumar, JPMorgan Chase, Seattle, WA Automation of Large SAS Processes with Email and Text Message Notification Seva Kumar, JPMorgan Chase, Seattle, WA ABSTRACT SAS includes powerful features in the Linux SAS server environment. While creating

More information

Taming the PROC TRANSPOSE

Taming the PROC TRANSPOSE Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious

More information

Technical Paper. Migrating a SAS Deployment to Microsoft Windows x64

Technical Paper. Migrating a SAS Deployment to Microsoft Windows x64 Technical Paper Migrating a SAS Deployment to Microsoft Windows x64 Table of Contents Abstract... 1 Introduction... 1 Why Upgrade to 64-Bit SAS?... 1 Standard Upgrade and Migration Tasks... 2 Special

More information

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1

Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1 Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1 Institute for Health Care Research and Improvement, Baylor Health Care System 2 University of North

More information

Writing cleaner and more powerful SAS code using macros. Patrick Breheny

Writing cleaner and more powerful SAS code using macros. Patrick Breheny Writing cleaner and more powerful SAS code using macros Patrick Breheny Why Use Macros? Macros automatically generate SAS code Macros allow you to make more dynamic, complex, and generalizable SAS programs

More information

Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC

Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC ABSTRACT PharmaSUG 2013 - Paper CC11 Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC There are different methods such PROC

More information

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC Using SAS With a SQL Server Database M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC ABSTRACT Many operations now store data in relational databases. You may want to use SAS

More information

HELP! - My MERGE Statement Has More Than One Data Set With Repeats of BY Values!

HELP! - My MERGE Statement Has More Than One Data Set With Repeats of BY Values! HELP! - My MERGE Statement Has More Than One Data Set With Repeats of BY Values! Andrew T. Kuligowski, The Nielsen Company ABSTRACT Most users of the SAS system have encountered the following message:

More information

Search and Replace in SAS Data Sets thru GUI

Search and Replace in SAS Data Sets thru GUI Search and Replace in SAS Data Sets thru GUI Edmond Cheng, Bureau of Labor Statistics, Washington, DC ABSTRACT In managing data with SAS /BASE software, performing a search and replace is not a straight

More information

The Art of Designing HOLAP Databases Mark Moorman, SAS Institute Inc., Cary NC

The Art of Designing HOLAP Databases Mark Moorman, SAS Institute Inc., Cary NC Paper 139 The Art of Designing HOLAP Databases Mark Moorman, SAS Institute Inc., Cary NC ABSTRACT While OLAP applications offer users fast access to information across business dimensions, it can also

More information