Coders' Corner. Paper 81-26
|
|
|
- Preston McLaughlin
- 10 years ago
- Views:
Transcription
1 Paper Automating the Process of Listing the Most Frequent Values of Thousands of Variables in Large Datasets Haiping Luo, Dept. of Veterans Affairs, Washington, DC Philip Friend, Dept. of Agriculture, Washington, DC ABSTRACT Listing variable values with high frequency for large datasets which have thousands of variables can be a time-consuming process. Using common procedures like 'proc freq' and 'proc tabulate' involves individually identifying possible value range, data discretion and/or class group before running the procedure. This macro automates the process of listing the most frequent values of all or selected variables in a dataset. The value list can be tailored to reflect only those values with desirable frequencies and percentages, in the selected value ranges, and by certain groups. The resulting value list can serve as a useful reference to dataset users. A CASE CALLING FOR THE NEED In a process of recoding datasets from the Agriculture Resources Management Survey, we needed to know the range and the most frequent values of more than one thousand variables. There are some SAS procedures that can accomplish this task, but all of them require efforts to look into each target variable's features before running the procedure. 'Proc freq' can list the frequency count and percentage for each value of the target variable. But if the values are not very discrete, the 'freq' output table can be huge. Also this procedure can only output one variable s frequency count to a file. 'Proc tabulate' can list the value and its frequency count across variables. But it is suitable only to discrete variables with only a few values. Proc univariate, proc summary both work only for numeric variables. While these procedures list values, frequencies, and percentages variable by variable, extra coding efforts are needed to pull the output of each variable together into an integrated list for easy reference for a large dataset. When the datasets in hand are large with thousands of variables, the coding efforts can be tedious. This case triggered the development of macro, aimed at automating the process of listing target variables' most frequent values for datasets of any size. DESIGN The core process of has three steps: 1. Determine target variables and user-defined conditions; 2. Process the numeric variables - list their names, most frequent values, frequency counts, percentages, means, maximums, and minimums; 3. Process the character variables - list their names, most frequent values, frequency counts, and percentages. One twist to this seemingly simple process is to allow the user to run batch by setting parameters in the calling code or to run interactively by inputting parameter values in an input window. Another twist is to allow the flexibility of selecting all or part of the variables, selecting value ranges, determining by groups, and determining frequency and percentage filtering criteria. A third twist is to append each variable's output value list to a comprehensive list which includes all target variables and their value statistics. DETAIL OF THE CODE 1. COLLECT USER INPUT Macro needs these user inputs: input and output libnames and their physical paths; input and output datasets' names; and input variable filtering and sorting criteria. Two input methods are incorporated into the code: one is input by submitting parameter values in the ( ) string; the other is input by typing in parameter values in the pop up windows. The string method allows batch processing, while the pop-up-window method allows interactive processing. To allow the user to choose the run method, a macro variable screen is defined to represent the batch or interactive mode. An %if structure detects the value of &screen. When &screen is 1, a display macro will display pop-up windows to collect user s input, otherwise the parameter values in the () call will be used to run the macro: %macro getfreq(screen=1, ) *** starts do window conditioning on &screen; %if &screen=1 %then %do; ** define window; %WINDOW first color=cyan ; ** define a display macro; %MACRO INTRFACE; %DISPLAY first.fileinfo; %DISPLAY first.varinfo blank; %MEND INTRFACE; ** run the display macro %intrface; %INTRFACE; *** ends do window; After the input selection portion, the main code of the is wrapped in a nested macro called %freqmain. The contents of %freqmain are described in the following sections 2 through DETERMINE VARIABLE NUMBERS BY TYPE To determine how many numeric and character variables are in the user defined dataset, several data steps and procedures are used, together with %if structures to insert code conditionally: *** check variable type and collect variable numbers; data _inall; ** read in dataset with its conditions; set %IF &PATHIN NE %THEN %STR("&PATHIN.&indat"); %ELSE %STR(&LIBIN..&INDAT); (%if &varlist ne %then %do; keep=&varlist &byvar %if (&filtlgic) ne %then %do; where=(&filtlgic) ) end=lastobs; if lastobs then do; n=_n_; ** collect total observations; call symput('dataobs',n); drop n; ** sort if by group is defined; %LET byproc = %str(bygroup="&byvar"); %LET bylab = bygroup="frequency by This Group (bygroup)"; proc sort data=_inall; by &byvar;
2 proc format; value tp 1 = "Number" 2 = "Char"; proc format; value tpvar 1 = "No. of Numeric Variables" 2 = "No. of Character Variables"; ** keep only the type for each variable; proc contents noprint data=_inall out=_intype(keep=type); ** list the count of variables by type; proc freq data=_intype; tables type / list out=_intpsum; format type tpvar.; data _null_; set _intpsum; ** check if there are numeric and/or charactor variables; if type=1 then do; ** collect numeric var number; call symput('nvnumber',count); else if type=2 then do; ** collect charactor var number; call symput('cvnumber',count); 3. PROCESS NUMERIC VARIABLES If there are numeric variables in the defined dataset, a %do loop is used to process each numeric variable to get its value list and each value's frequency and percentage. The variable s maximum, minimum and mean values by the defined by group are also listed: %if %eval(&nvnumber)>0 %then %do; *** get numeric variables; title "Numeric Variables"; data _in2; set _inall; keep _numeric_ &byvar; ** get the name, of the numeric variabls; data _in4; set _in2(keep=_numeric_); proc contents noprint data=_in4 out=_in3(keep=name type varnum nobs); ** print a list of all numeric variables to run frequency for; title2 "Dataset &indat - variable information: number of selected numeric variables=&nvnumber"; proc print data=_in3; format type tp.; title2 "First 3 observations of the selected numeric variables in dataset &indat"; proc print data=_in4(obs=3) ; footnote; ** go through each variable to get frequency; %do i=1 %to &nvnumber; * starts do i; ** determine the variable name; data _null_; set _in3; if varnum = &i then do; call symput('nname',name); data _2n; set _in2(keep=&nname &byvar); length value 8 varname $ 32; ** assign the value of the variable to a common variable; value = &nname; ** assign the variable name as text to a common variable; varname = "&nname"; value = "Value of Frequecy Variable (value)" varname = "Frequency Variable (varname)"; The critical steps in the above code are to put the value of the processed variable into a common variable value, and the name of the processed variable into another common variable varname. These enable the frequency calculation and the output appendance to focus on these two common variables instead of individual input variables. ** get frequency for each value of the frequency variable; ** keep only those with frequency count > &dropfrq in the &byvar group; proc freq data=_2n order=freq noprint; tables varname*value / list out=_2out(where=(count gt &dropfrq)); by &byvar; ** get group mean, max, min, total obs number for the frequency variable; proc means data=_2n (keep=&nname &byvar) noprint; var &nname; by &byvar; output out=_2mean mean=mean max=max min=min n=totalobs nmiss=missobs; data _2mean; set _2mean(drop=_type freq_); length varname $ 32; varname = "&nname"; mean = "Group Mean of This Variable (mean)" max = "Group Maximum (max)" min = "Group Minimum (min)" totalobs = "Non-missing Obs in This By Group (totalobs)" missobs = "Missing Obs in This By Group (missobs)" varname = "Frequency Variable (varname)"; ** merge group means, etc. back to freq list; data _2out; merge _2out(in=a) _2mean; by varname &byvar ; if a; 2
3 &byproc; &bylab; ** create or append frequency list; %if &i=1 %then %do; data OUD.&outdat.n; set _2out; %else %do; proc datasets ; append base=oud.&outdat.n data=_2out (where=(percent gt &droppct or percent =.)); delete _2n _2out; quit; *end do i; ** print frequency list; title1 "&nvnumber Numeric variables (Dataset observations = &dataobs)"; title2 "Highest (>&droppct%) frequent (count > &dropfrq) values (by &byvar)"; title3 "of numeric variables in dataset &&indat (this output was saved as &pathout.&outdat.n)"; proc sort data=oud.&outdat.n; by varname %if &byvar ne %then %str(&byvar;); %else %str(;); proc print data=oud.&outdat.n ; by varname; var bygroup &byvar value count percent mean max min totalobs missobs; sum count; sumby varname; footnote; title; *** ends numeric variable process; 4. PROCESS CHARACTER VARIABLES The procedure for getting character variables frequent value list is similar to that for the numeric variables, except that there is no calculation of group means, maximums, and minimums. The following code presents in both the numeric and the character portions of the program, but it is more critical to the character portion. By default, SAS numeric variables have a length of eight bytes, while a character variable s length is either defined explicitly by the user or implicitly by the length of the first value ever read in. To prevent the common output variable value being truncated to the length of the first character variable in the input sequence, the following code obtains the maximal length of all character variables; the maximal length is then used to set the length of the common output variable value. ** get the name,, length, and number of observations of the character variables; data _ic4; set _ic2(keep=_character_); proc contents noprint data=_ic4 out=_ic3(keep=name type varnum nobs length); ** get the longest charactor value to set the length of the value variable; proc means data=_ic3(keep=length) noprint; output out=_ic32 max=max; data _null_; set _ic32; call symput('maxlen',max);... ** go through each variable to get frequency; %do i=1 %to &cvnumber; * starts do i; ** determine the variable name; data _null_; set _ic3; if varnum = &i then do; call symput('cname',name); data _2c; set _ic2(keep=&cname &byvar); ** set the value to the maximal length; length value $ &maxlen varname $ 32; ** assign the value of this variable; value = &cname; ** assign the variable name as text to varname; varname = "&cname"; value = "Value of Frequecy Variable (value)" varname = "Frequency Variable (varname)" ;... *end do i; After processing the character variables and performing some housekeeping chores, the definition of the %freqmain sub macro ends. The parent macro then calls %freqmain, and itself is closed by the %mend getfreq; statement. USAGE %Getfreq has been tested under SAS for Windows versions 6.12, 7, 8, and for IBM MVS/TSO version The following code can call on the Windows platform: /* put this option at the beginning of your code*/ options sasautos="path-to-this-macrodirectory"; /*assume popup windows, works for V8*/ /*assume popup windows, works for V6.12, V7*/ () /*no windows, take all default values defined in the macro code*/ (screen=0) /*no windows, run the test dataset work.a */ (screen=0, pathin=, libin=work, indat=a, pathout=d:\temp\, outdat=getfreq, varlist=%str(x y datet ), dropfrq=1, droppct=0, filtlgic=%str(x>0 or y ne. ), byvar=%str(studname) ) 3
4 There is a detail explanation of the parameters in the code file. To control the frequency output, the most important parameters are &droppct and &dropfrq. Setting dropfrq=1 will drop all the values which only occur once. This will prevent the continuous value being included in the output file. For a dataset with millions of observations, only drop frequency count equals to one may not be enough. The &droppct parameter allows the values with low frequency percentage being dropped from the output. For example, droppct=5 will keep only those values which s frequencies are higher than 5 percent of the total observations in the calculation group. %Getfreq generates one dataset for numeric variables and one for character variables. The dataset(s) contains variables name, frequent values, the frequency count and the percentage of each value, and some group statistics if it is a numeric variable. The datasets are saved in the &pathout directory as getfreqn.sas7bdat for numeric variables and getfreqc.sas7bdat for character variables. The file extension assumes that the code was run under SAS version 7 or 8. %Getfreq also generates the following lists at the output window for numeric and/or character variables: Number of variables; Variable names and s; The first 3 observations; The most frequent values for each variable. The following is a sample output generated to the output window: A 3 Numeric variables (Dataset observations = 24) Highest (>0%) frequent (count > 1) values (by studname) of numeric variables in dataset a (this output was saved as d:\temp\getfreqn.*) Frequency Variable (varname)=x B Value of Frequency by Frequecy Percent of This Group Student name Variable Frequency Total (bygroup) (studname) (value) Count Frequency studname Alex Dickins studname John Graves studname Larry Lin studname Linda Brooks studname Lynne Miller studname Mitch Tramp studname Susan Williams VARNAME 15 (print out continues) Group Mean Non-missing Missing Obs of This Group Group Obs in This in This By Variable Maximum Minimum By Group Group (mean) (max) (min) (totalobs) (missobs) VARNAME C Input and output PATHS: or work; d:\temp\ Input dataset: a(where=( ) keep=( )) Output datasets: numeric freq - d:\temp\getfreqn, char. freq - d:\temp\getfreqc, sorted by studname, where=((count > 1) and (percent > 0)) Part A in the above list is a title. The title indicates that: the dataset name is a; there are 24 observations in it; there are 3 numeric variables in dataset a; this portion of the list is for variable x (frequency variable=x); this list displays the value of x which have frequency count more than 1 (count>1); the frequency is run by a variable studname ; the output file is saved as d:\temp\getfreqn.*. Part B is the contents of the list. There are 10 columns in the table. We can read the table this way: under the group variable(studname) s first value, Alex Dickins, the frequency variable x has a most frequent value, 0, which occurred twice, and accounted for percent of all values of x under this student name Alex Dickins ; the mean value of x under Alex Dickins is 1, while the maximum is 3 and the minimum is 0. The last two columns show the total number of non-missing and missing observations for the group calculation. There is a summation by varname at the end of the table under column 4 Frequency Count. The value 15 indicates that 15 out of 24 total observations from the dataset are included in this most frequent value list for this variable x. Part C is a footnote for the list. It reports the input and output file paths, libnames, dataset names, variable filtering criteria, etc. The mainframe version of %Getfreq does not contain the input window and the output library libname OUD... statement. Assuming that a JCL statement defines the output files like this: //OUD DD DSN=USER1.SAS.DATA.GETFREQ,... The calling statement for the test dataset a is: (pathin=, libin=work, indat=a, pathout=oud, outdat=getfreq, varlist=, filtlgic=, dropfrq=1,droppct=0, byvar=studname ); The mainframe code also limited the line length to 80 characters, moved / from the first column to avoid confusion with JCL, and removed quotation marks from comment area to prevent error in reading. The output of the mainframe version of are the same as the Windows version. CONCLUSION Obtaining most frequent values of many variables in large datasets can be a tedious task. This macro automates the process and saves the frequent values to files. This macro allows a user to tailor the results to include all or part of variables, to report only the values with desirable frequencies and percentages, and to run and group results by certain variables. The resulting frequent value list can serve as a useful reference for dataset users. To obtain a complete copy of the most current version of, please go to these links: Further development of will focus on improving efficiency, allowing more options, testing for other platforms, and making the execution more robust. 4
5 CONTACT INFORMATION Your comments and questions are valued and encouraged. Please contact the authors at: Author Names Haiping Luo Philip Friend Code Web URL SAS Discussion Board SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other Countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 5
Data Presentation. Paper 126-27. Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs
Paper 126-27 Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs Tugluke Abdurazak Abt Associates Inc. 1110 Vermont Avenue N.W. Suite 610 Washington D.C. 20005-3522
How to Reduce the Disk Space Required by a SAS Data Set
How to Reduce the Disk Space Required by a SAS Data Set Selvaratnam Sridharma, U.S. Census Bureau, Washington, DC ABSTRACT SAS datasets can be large and disk space can often be at a premium. In this paper,
Search and Replace in SAS Data Sets thru GUI
Search and Replace in SAS Data Sets thru GUI Edmond Cheng, Bureau of Labor Statistics, Washington, DC ABSTRACT In managing data with SAS /BASE software, performing a search and replace is not a straight
Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA
Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch
More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board
More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make
A Method for Cleaning Clinical Trial Analysis Data Sets
A Method for Cleaning Clinical Trial Analysis Data Sets Carol R. Vaughn, Bridgewater Crossings, NJ ABSTRACT This paper presents a method for using SAS software to search SAS programs in selected directories
Creating Dynamic Reports Using Data Exchange to Excel
Creating Dynamic Reports Using Data Exchange to Excel Liping Huang Visiting Nurse Service of New York ABSTRACT The ability to generate flexible reports in Excel is in great demand. This paper illustrates
A Macro to Create Data Definition Documents
A Macro to Create Data Definition Documents Aileen L. Yam, sanofi-aventis Inc., Bridgewater, NJ ABSTRACT Data Definition documents are one of the requirements for NDA submissions. This paper contains a
Post Processing Macro in Clinical Data Reporting Niraj J. Pandya
Post Processing Macro in Clinical Data Reporting Niraj J. Pandya ABSTRACT Post Processing is the last step of generating listings and analysis reports of clinical data reporting in pharmaceutical industry
Managing Tables in Microsoft SQL Server using SAS
Managing Tables in Microsoft SQL Server using SAS Jason Chen, Kaiser Permanente, San Diego, CA Jon Javines, Kaiser Permanente, San Diego, CA Alan L Schepps, M.S., Kaiser Permanente, San Diego, CA Yuexin
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data
Salary. Cumulative Frequency
HW01 Answering the Right Question with the Right PROC Carrie Mariner, Afton-Royal Training & Consulting, Richmond, VA ABSTRACT When your boss comes to you and says "I need this report by tomorrow!" do
Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT
Paper AD01 Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT ABSTRACT The use of EXCEL spreadsheets is very common in SAS applications,
THE POWER OF PROC FORMAT
THE POWER OF PROC FORMAT Jonas V. Bilenas, Chase Manhattan Bank, New York, NY ABSTRACT The FORMAT procedure in SAS is a very powerful and productive tool. Yet many beginning programmers rarely make use
Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC
Using SAS With a SQL Server Database M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC ABSTRACT Many operations now store data in relational databases. You may want to use SAS
Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison
Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison Assumptions That you have used SAS at least a few times. It doesn t matter whether you run SAS in
Top Ten Reasons to Use PROC SQL
Paper 042-29 Top Ten Reasons to Use PROC SQL Weiming Hu, Center for Health Research Kaiser Permanente, Portland, Oregon, USA ABSTRACT Among SAS users, it seems there are two groups of people, those who
Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX
Paper 126-29 Using Macros to Automate SAS Processing Kari Richardson, SAS Institute, Cary, NC Eric Rossland, SAS Institute, Dallas, TX ABSTRACT This hands-on workshop shows how to use the SAS Macro Facility
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make
Innovative Techniques and Tools to Detect Data Quality Problems
Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful
SAS Programming Tips, Tricks, and Techniques
SAS Programming Tips, Tricks, and Techniques A presentation by Kirk Paul Lafler Copyright 2001-2012 by Kirk Paul Lafler, Software Intelligence Corporation All rights reserved. SAS is the registered trademark
CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases
3 CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases About This Document 3 Methods for Accessing Relational Database Data 4 Selecting a SAS/ACCESS Method 4 Methods for Accessing DBMS Tables
SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board
SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms.
Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY
Paper FF-007 Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY ABSTRACT SAS datasets include labels as optional variable attributes in the descriptor
REx: An Automated System for Extracting Clinical Trial Data from Oracle to SAS
REx: An Automated System for Extracting Clinical Trial Data from Oracle to SAS Edward McCaney, Centocor Inc., Malvern, PA Gail Stoner, Centocor Inc., Malvern, PA Anthony Malinowski, Centocor Inc., Malvern,
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL Kirtiraj Mohanty, Department of Mathematics and Statistics, San Diego State University, San Diego,
An Introduction to SAS/SHARE, By Example
Paper 020-29 An Introduction to SAS/SHARE, By Example Larry Altmayer, U.S. Census Bureau, Washington, DC ABSTRACT SAS/SHARE software is a useful tool for allowing several users to simultaneously access
EXST SAS Lab Lab #4: Data input and dataset modifications
EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will
Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.
Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although
Effective Use of SQL in SAS Programming
INTRODUCTION Effective Use of SQL in SAS Programming Yi Zhao Merck & Co. Inc., Upper Gwynedd, Pennsylvania Structured Query Language (SQL) is a data manipulation tool of which many SAS programmers are
An email macro: Exploring metadata EG and user credentials in Linux to automate email notifications Jason Baucom, Ateb Inc.
SESUG 2012 Paper CT-02 An email macro: Exploring metadata EG and user credentials in Linux to automate email notifications Jason Baucom, Ateb Inc., Raleigh, NC ABSTRACT Enterprise Guide (EG) provides useful
Report Customization Using PROC REPORT Procedure Shruthi Amruthnath, EPITEC, INC., Southfield, MI
Paper SA12-2014 Report Customization Using PROC REPORT Procedure Shruthi Amruthnath, EPITEC, INC., Southfield, MI ABSTRACT SAS offers powerful report writing tools to generate customized reports. PROC
Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC
ABSTRACT PharmaSUG 2012 - Paper CC07 Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC In Pharmaceuticals/CRO industries, Excel files are widely use for data storage.
You have got SASMAIL!
You have got SASMAIL! Rajbir Chadha, Cognizant Technology Solutions, Wilmington, DE ABSTRACT As SAS software programs become complex, processing times increase. Sitting in front of the computer, waiting
Data Cleaning 101. Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ. Variable Name. Valid Values. Type
Data Cleaning 101 Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ INTRODUCTION One of the first and most important steps in any data processing task is to verify that your data values
SUGI 29 Coders' Corner
Paper 074-29 Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 19 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users
B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);
SAS-INTERVIEW QUESTIONS 1. What SAS statements would you code to read an external raw data file to a DATA step? Ans: Infile and Input statements are used to read external raw data file to a Data Step.
Tips and Tricks for Creating Multi-Sheet Microsoft Excel Workbooks the Easy Way with SAS. Vincent DelGobbo, SAS Institute Inc.
Paper HOW-071 Tips and Tricks for Creating Multi-Sheet Microsoft Excel Workbooks the Easy Way with SAS Vincent DelGobbo, SAS Institute Inc., Cary, NC ABSTRACT Transferring SAS data and analytical results
Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports
Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports Jeff Cai, Amylin Pharmaceuticals, Inc., San Diego, CA Jay Zhou, Amylin Pharmaceuticals, Inc., San Diego, CA ABSTRACT To supplement Oracle
Instant Interactive SAS Log Window Analyzer
ABSTRACT Paper 10240-2016 Instant Interactive SAS Log Window Analyzer Palanisamy Mohan, ICON Clinical Research India Pvt Ltd Amarnath Vijayarangan, Emmes Services Pvt Ltd, India An interactive SAS environment
Internet/Intranet, the Web & SAS. II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA
II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA Abstract Web based reporting has enhanced the ability of management to interface with data in a
E-Mail OS/390 SAS/MXG Computer Performance Reports in HTML Format
SAS Users Group International (SUGI29) May 9-12,2004 Montreal, Canada E-Mail OS/390 SAS/MXG Computer Performance Reports in HTML Format ABSTRACT Neal Musitano Jr Department of Veterans Affairs Information
Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC
A PROC SQL Primer Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC ABSTRACT Most SAS programmers utilize the power of the DATA step to manipulate their datasets. However, unless they pull
Ten Things You Should Know About PROC FORMAT Jack Shoemaker, Accordant Health Services, Greensboro, NC
Ten Things You Should Know About PROC FORMAT Jack Shoemaker, Accordant Health Services, Greensboro, NC ABSTRACT The SAS system shares many features with other programming languages and reporting packages.
The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL
The SET Statement and Beyond: Uses and Abuses of the SET Statement S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT The SET statement is one of the most frequently used statements in the SAS System.
Using SAS to Control and Automate a Multi SAS Program Process. Patrick Halpin November 2008
Using SAS to Control and Automate a Multi SAS Program Process Patrick Halpin November 2008 What are we covering today A little background on me Some quick questions How to use Done files Use a simple example
From The Little SAS Book, Fifth Edition. Full book available for purchase here.
From The Little SAS Book, Fifth Edition. Full book available for purchase here. Acknowledgments ix Introducing SAS Software About This Book xi What s New xiv x Chapter 1 Getting Started Using SAS Software
SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN
SA118-2014 SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN ABSTRACT ODS (Output Delivery System) is a wonderful feature in SAS to create consistent, presentable
AN ANIMATED GUIDE: SENDING SAS FILE TO EXCEL
Paper CC01 AN ANIMATED GUIDE: SENDING SAS FILE TO EXCEL Russ Lavery, Contractor for K&L Consulting Services, King of Prussia, U.S.A. ABSTRACT The primary purpose of this paper is to provide a generic DDE
What You re Missing About Missing Values
Paper 1440-2014 What You re Missing About Missing Values Christopher J. Bost, MDRC, New York, NY ABSTRACT Do you know everything you need to know about missing values? Do you know how to assign a missing
SUGI 29 Posters. Mazen Abdellatif, M.S., Hines VA CSPCC, Hines IL, 60141, USA
A SAS Macro for Generating Randomization Lists in Clinical Trials Using Permuted Blocks Randomization Mazen Abdellatif, M.S., Hines VA CSPCC, Hines IL, 60141, USA ABSTRACT We developed a SAS [1] macro
PharmaSUG2011 - Paper AD11
PharmaSUG2011 - Paper AD11 Let the system do the work! Automate your SAS code execution on UNIX and Windows platforms Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.,
Integrity Constraints and Audit Trails Working Together Gary Franklin, SAS Institute Inc., Austin, TX Art Jensen, SAS Institute Inc.
Paper 8-25 Integrity Constraints and Audit Trails Working Together Gary Franklin, SAS Institute Inc., Austin, TX Art Jensen, SAS Institute Inc., Englewood, CO ABSTRACT New features in Version 7 and Version
Overview. NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT
177 CHAPTER 8 Enhancements for SAS Users under Windows NT Overview 177 NT Event Log 177 Sending Messages to the NT Event Log Using a User-Written Function 178 Examples of Using the User-Written Function
4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
Applications Development ABSTRACT PROGRAM DESIGN INTRODUCTION SAS FEATURES USED
Checking and Tracking SAS Programs Using SAS Software Keith M. Gregg, Ph.D., SCIREX Corporation, Chicago, IL Yefim Gershteyn, Ph.D., SCIREX Corporation, Chicago, IL ABSTRACT Various checks on consistency
AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates
AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C Y Associates Abstract This tutorial will introduce the SQL (Structured Query Language) procedure through a series of simple examples We will initially
The entire SAS code for the %CHK_MISSING macro is in the Appendix. The full macro specification is listed as follows: %chk_missing(indsn=, outdsn= );
Macro Tabulating Missing Values, Leveraging SAS PROC CONTENTS Adam Chow, Health Economics Resource Center (HERC) VA Palo Alto Health Care System Department of Veterans Affairs (Menlo Park, CA) Abstract
Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached.
Automating SAS Macros: Run SAS Code when the Data is Available and a Target Date Reached. Nitin Gupta, Tailwind Associates, Schenectady, NY ABSTRACT This paper describes a method to run discreet macro(s)
ABSTRACT INTRODUCTION
Automating Concatenation of PDF/RTF Reports Using ODS DOCUMENT Shirish Nalavade, eclinical Solutions, Mansfield, MA Shubha Manjunath, Independent Consultant, New London, CT ABSTRACT As part of clinical
Oracle Database: SQL and PL/SQL Fundamentals
Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along
CDW DATA QUALITY INITIATIVE
Loading Metadata to the IRS Compliance Data Warehouse (CDW) Website: From Spreadsheet to Database Using SAS Macros and PROC SQL Robin Rappaport, IRS Office of Research, Washington, DC Jeff Butler, IRS
FIRST STEP - ADDITIONS TO THE CONFIGURATIONS FILE (CONFIG FILE) SECOND STEP Creating the email to send
Using SAS to E-Mail Reports and Results to Users Stuart Summers, Alliant, Brewster, NY ABSTRACT SAS applications that are repeatedly and periodically run have reports and files that need to go to various
We begin by defining a few user-supplied parameters, to make the code transferable between various projects.
PharmaSUG 2013 Paper CC31 A Quick Patient Profile: Combining External Data with EDC-generated Subject CRF Titania Dumas-Roberson, Grifols Therapeutics, Inc., Durham, NC Yang Han, Grifols Therapeutics,
Oracle Database: SQL and PL/SQL Fundamentals
Oracle University Contact Us: +966 12 739 894 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training is designed to
Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets
Methodologies for Converting Microsoft Excel Spreadsheets to SAS datasets Karin LaPann ViroPharma Incorporated ABSTRACT Much functionality has been added to the SAS to Excel procedures in SAS version 9.
Everything you wanted to know about MERGE but were afraid to ask
TS- 644 Janice Bloom Everything you wanted to know about MERGE but were afraid to ask So you have read the documentation in the SAS Language Reference for MERGE and it still does not make sense? Rest assured
SAS Comments How Straightforward Are They? Jacksen Lou, Merck & Co.,, Blue Bell, PA 19422
SAS Comments How Straightforward Are They? Jacksen Lou, Merck & Co.,, Blue Bell, PA 19422 ABSTRACT SAS comment statements typically use conventional symbols, *, %*, /* */. Most programmers regard SAS commenting
How To Create An Audit Trail In Sas
Audit Trails for SAS Data Sets Minh Duong Texas Institute for Measurement, Evaluation, and Statistics University of Houston, Houston, TX ABSTRACT SAS data sets are now more accessible than ever. They are
SAS Credit Scoring for Banking 4.3
SAS Credit Scoring for Banking 4.3 Hot Fix 1 SAS Banking Intelligence Solutions ii SAS Credit Scoring for Banking 4.3: Hot Fix 1 The correct bibliographic citation for this manual is as follows: SAS Institute
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time
AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health
AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health INTRODUCTION There are a number of SAS tools that you may never have to use. Why? The main reason
ABSTRACT THE ISSUE AT HAND THE RECIPE FOR BUILDING THE SYSTEM THE TEAM REQUIREMENTS. Paper DM09-2012
Paper DM09-2012 A Basic Recipe for Building a Campaign Management System from Scratch: How Base SAS, SQL Server and Access can Blend Together Tera Olson, Aimia Proprietary Loyalty U.S. Inc., Minneapolis,
Ad Hoc Advanced Table of Contents
Ad Hoc Advanced Table of Contents Functions... 1 Adding a Function to the Adhoc Query:... 1 Constant... 2 Coalesce... 4 Concatenate... 6 Add/Subtract... 7 Logical Expressions... 8 Creating a Logical Expression:...
Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA
Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA ABSTRACT When we work on millions of records, with hundreds of variables, it is crucial how we
Taming the PROC TRANSPOSE
Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious
Oracle SQL. Course Summary. Duration. Objectives
Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC ABSTRACT One of the most expensive and time-consuming aspects of data management
SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA
SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA ABSTRACT A codebook is a summary of a collection of data that reports significant
Using the Magical Keyword "INTO:" in PROC SQL
Using the Magical Keyword "INTO:" in PROC SQL Thiru Satchi Blue Cross and Blue Shield of Massachusetts, Boston, Massachusetts Abstract INTO: host-variable in PROC SQL is a powerful tool. It simplifies
Oracle Database: SQL and PL/SQL Fundamentals NEW
Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the
Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC
Paper 073-29 Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT Version 9 of SAS software has added functions which can efficiently
Health Services Research Utilizing Electronic Health Record Data: A Grad Student How-To Paper
Paper 3485-2015 Health Services Research Utilizing Electronic Health Record Data: A Grad Student How-To Paper Ashley W. Collinsworth, ScD, MPH, Baylor Scott & White Health and Tulane University School
PS004 SAS Email, using Data Sets and Macros: A HUGE timesaver! Donna E. Levy, Dana-Farber Cancer Institute
PS004 SAS Email, using Data Sets and Macros: A HUGE timesaver! Donna E. Levy, Dana-Farber Cancer Institute Abstract: SAS V8+ has the capability to send email by using the DATA step, procedures or SCL.
Choosing the Best Method to Create an Excel Report Romain Miralles, Clinovo, Sunnyvale, CA
Choosing the Best Method to Create an Excel Report Romain Miralles, Clinovo, Sunnyvale, CA ABSTRACT PROC EXPORT, LIBNAME, DDE or excelxp tagset? Many techniques exist to create an excel file using SAS.
Handling Missing Values in the SQL Procedure
Handling Missing Values in the SQL Procedure Danbo Yi, Abt Associates Inc., Cambridge, MA Lei Zhang, Domain Solutions Corp., Cambridge, MA ABSTRACT PROC SQL as a powerful database management tool provides
ABSTRACT INTRODUCTION %CODE MACRO DEFINITION
Generating Web Application Code for Existing HTML Forms Don Boudreaux, PhD, SAS Institute Inc., Austin, TX Keith Cranford, Office of the Attorney General, Austin, TX ABSTRACT SAS Web Applications typically
Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server
Paper 8801-2016 Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server Nicole E. Jagusztyn, Hillsborough Community College ABSTRACT At a community
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
