Table Lookups: From IF-THEN to Key-Indexing
|
|
|
- Baldwin Merritt
- 10 years ago
- Views:
Transcription
1 Paper Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine the value of one variable based on the value of another. A series of techniques and tools have evolved over the years to make the matching of these values go faster, smoother, and easier. A majority of these techniques require operations such as sorting, searching, and comparing. As it turns out, these types of techniques are some of the more computationally intensive, and consequently an understanding of the operations involved and a careful selection of the specific technique can often save the user a substantial amount of computing resources. This workshop will demonstrate, compare, and contrast the following techniques: IF-THEN, IF-THEN-ELSE, SELECT, user defined formats (PUT function), MERGE, SQL joins, merging with two SET statements, using the KEY= option, use of ARRAYS, and Key-Indexing. KEYWORDS Indexing, Key-indexing, Lookup techniques, Match, Merge, Select, KEY= INTRODUCTION A table lookup is performed when you use the value of a variable (e.g. a part code) to determine the value of another variable (e.g. part name). Often this second piece of information must be looked up in some other secondary table or location. The process of finding the appropriate piece of information is generally fast, however as the number of items and/or observations increases, the efficiency of the process becomes increasingly important. Fortunately there are a number of techniques for performing these table lookups. The simplest form of a table lookup makes use of the IF-THEN statement. Although easy to code, this is one of the slowest table lookup methods. Substantial improvement can be gained by the inclusion of the ELSE statement. The SELECT statement has a similar efficiency to the IF-THEN-ELSE, however there are efficiency differences here as well. The use of FORMATS allows us to step away from the logical processing of assignment statements, and allows us to take advantage of the search techniques that are an inherent part of the use of FORMATS. For many users, especially those with smaller data sets and lookup tables, the efficiency gains realized here may be sufficient for most if not all tasks. Merges and joins are also used to match data values. The MERGE statement (when used with the BY statement - as it usually is) requires sorted data sets, while the SQL step does not. There are advantages to both processes. Depending on what you need to determine, it is usually also possible to build a merge process yourself through the use of arrays and without using the SQL join or the MERGE statement. Substantial performance improvements are possible by using these large temporary arrays as they can eliminate the need to sort the data. Users with very large data sets are often limited by constraints that are put upon them by memory or processor speed. Often, for instance, it is not possible/practical to sort a very large data set. Unsorted data sets cannot be merged by using BY statements. Joins in SQL may be possible by using the BUFFERSIZE option, but this still may not be a useful solution. Fortunately there are a number of techniques for handling these situations as well. The primary discussion in this workshop will be to compare and contrast the more common methods and then to look at some of the simpler of the specialty techniques. THE DATA Two data sets will used throughout this workshop. The first contains the lookup information which is a list of automobile part codes and the name of the associated parts (LOOKUP.CODES). The second is a list of number of orders for the respective parts. The order information contains the part code but not the part name (LOOKUP.PARTORDR). Neither data set is sorted, nor is there any guarantee that all ordered parts will be in the CODES data set. The PARTORDR data set may also have duplicate observations. Part Codes and Associated Names LOOKUP.CODES Obs partname partcode 1 Hubcap 1 2 Wheel Cover 2 3 Tire Type 3 1
2 4 Valve stem 5 5 Tail Light, Left 47 6 Rim 4 7 Generator 41 8 Wiring Harness 42 9 Fuse Assembly 43 Part Ordering Information LOOKUP.PARTORDR Obs partcode ordernum The assumption throughout this paper will be that the list of parts orders (PARTORDR) can not be fully processed with out the appropriate part name. This means that we somehow need to add the PARTNAME variable and determine its value. LOGICAL PROCESSING One of the easiest solutions to the lookup problem is to ignore it and just create the new variable with its associated value through the use of IF-THEN processing. Effectively we are hard coding the part name within the program. This technique is demonstrated in the following DATA step. * Use IF-THEN processing to assign partnames; if partcode= 1 then partname='hubcap if partcode= 2 then partname='wheel Cover if partcode= 3 then partname='tire Type if partcode= 5 then partname='valve stem if partcode= 4 then partname='rim if partcode= 41 then partname='generator if partcode= 42 then partname='wiring Harness if partcode= 43 then partname='fuse Assembly The problem with this approach is that it is not practical if there are more than a very few codes to lookup. Besides this is VERY inefficient. SAS must execute each IF statement even if an earlier IF statement was found to be true. To make matters worse, IF statements require a fair bit of processing time. A substantially faster method is to use the IF-THEN- ELSE statement combination. The following DATA step executes more quickly than the previous one because as soon as one IF statement is found to be true, its ELSE is not executed consequently none of the following IF-THEN-ELSE statements are executed. This technique is fastest if the more likely outcomes are placed earlier in the list. * Use IF-THEN-ELSE processing to assign partnames; if partcode= 1 then partname='hubcap else if partcode= 2 then partname='wheel Cover else if partcode= 3 then partname='tire Type else if partcode= 5 then partname='valve stem else if partcode= 4 then partname='rim else if partcode=41 then partname='generator else if partcode=42 then partname='wiring Harness else if partcode=43 then partname='fuse Assembly The SELECT statement is on par with the IF-THEN- ELSE combination when performing table lookups (actually it might be a bit faster - Virgle, 1998). Again processing time is minimized when the most likely match is located early in the list. * Use SELECT() processing to assign partnames; select(partcode); when( 1) partname='hubcap when( 2) partname='wheel Cover when( 3) partname='tire Type when( 5) partname='valve stem when( 4) partname='rim when(41) partname='generator when(42) partname='wiring Harness when(43) partname='fuse Assembly otherwise; Interestingly Virgle (1998) has found that the efficiency of the SELECT statement can be enhanced by placing the entire expression on the WHEN statement. * Use SELECT processing to assign partnames; select; when(partcode= 1) partname='hubcap when(partcode= 2) partname='wheel Cover when(partcode= 3) partname='tire Type when(partcode= 5) partname='valve stem when(partcode= 4) partname='rim when(partcode=41) partname='generator when(partcode=42) partname='wiring Harness when(partcode=43) partname='fuse Assembly otherwise; 2
3 The primary problem with each of these techniques is that the search is sequential. When the list is long the average number of comparisons goes up quickly, even when you carefully order the list. A number of other search techniques are available that do not require sequential searches. Binary searches operate by iteratively splitting a list in half until the target is found. On average these searches tend to be faster than sequential searches. SAS formats use binary search techniques. USING FORMATS Formats can be built and added to a library (permanent or temporary) through the use of PROC FORMAT. The process of creating a format is both fast and straight forward. The following format (PARTNAME.) contains an association between the part code and its name. * Building a Format for partnames; proc format; value partname 1='Hubcap ' 2='Wheel Cover ' 3='Tire Type ' 5='Valve stem ' 4='Rim ' 41='Generator ' 42='Wiring Harness' 43='Fuse Assembly Of course typing in a few values is not a big deal, however as the number of entries increases the process tends to become tedious and error prone. Fortunately it is possible to build a format directly from a SAS data set. The CNTLIN= option identifies a data set that contains specific variables. These variables store the information needed to build the format, and as a minimum must include the name of the format (FMTNAME), the incoming value (START), and the value which the incoming value will be translated too (LABEL). The following data step builds the data set CONTROL which is used by PROC FORMAT. Notice the use of the RENAME= option and the RETAIN statement. One advantage of this technique is that the control data set does not need to be sorted. * Building a Format from a data set; data control(keep=fmtname start label); set lookup.codes(rename=(partcode=start partname=label)); retain fmtname 'partname proc format cntlin=control; Once the format has been defined the PUT function can be used to assign a value to the variable PARTNAME by using the PARTNAME. format. Remember the PUT function always returns a character string; when a numeric value is required, the INPUT function can be used. * Using the PUT function.; partname = put(partcode,partname.); The previous data step will be substantially faster than the IF-THEN-ELSE or SELECT processing steps shown above. The difference becomes even more dramatic as the number of items in the lookup list increases. Notice that there is only one executable statement rather than one for each comparison. The lookup itself will use the format PARTNAME., and hence will employ a binary search. MERGES AND JOINS Another common way of getting the information contained in one table into another is to perform either a merge or a join. Both techniques can be useful and of course each has both pros and cons. The MERGE statement is used to identify two or more data sets. For the purpose of this discussion, one of these data sets will contain the information that is to be looked up. The BY statement is used to make sure that the observations are correctly aligned. The BY statement should include sufficient variables to form a unique key in all but at most one of the data sets. For our example the CODES data has exactly one observation for each value of PARTCODE. Because the BY statement is used, the data must be sorted. When the data are not already sorted, this extra stepcanbetimeconsumingorevenonoccasion impossible for very large data sets or data sets on tape. In the following steps PROC SORT is used to reorder the data into temporary (WORK) data sets. These are then merged using the MERGE statement. * Using the MERGE Statement; proc sort data=lookup.codes out=codes; proc sort data=lookup.partordr out=partordr; merge partordr codes; 3
4 proc print data=withnames; title 'Part Orders with Part Names After the Merge The following listing of the merged data shows that two part codes (41 and 47) are in the codes list but did not have any orders (absent from the data set PARTORDR). Also part code 49 was ordered, but because it is not on the parts list we do not know what part was actually to be ordered. Part Orders with Part Names After the Merge Obs partcode ordernum partname Hubcap Wheel Cover Tire Type Rim Valve stem Generator Wiring Harness Wiring Harness Fuse Assembly Tail Light, Left It is also possible to use SQL to join these two data tables. When using SQL, the merging process is called a join. There are a number of different types of joins within SQL, and one that closely matches the previous step is shown below. * Using the SQL Join; proc sql; create table withnames as select * from lookup.partordr a, lookup.codes b where a.partcode=b.partcode; quit; In this example we have added the requirement that the code be in both data tables before the match is made. Notice that SQL does not require the two data sets to be sorted prior to the join. This can avoid the use of SORT, but it can also cause memory and resource problems as the data table sizes increases. Part Orders with Part Names Using SQL Join Obs partcode ordernum partname Hubcap Wheel Cover Tire Type Valve stem Rim Wiring Harness Wiring Harness Fuse Assembly As in the above SQL step, you can also select only those observations (PARTCODE) that meet certain criteria when using the MERGE. The IN= option creates a temporary numeric variable that is either true or false (1 or 0) depending on whether the current observation (for the current value of the BY variables) is in the specified data set. This allows you to use IF-THEN- ELSE logic to eliminate or include observations that meet specific criteria. In the example below, only observations that contain a part code that is in both data sets will contribute to the new data set. * Selecting only existing values; merge partordr(in=inordr) codes(in=incodes); if inordr & incodes; proc print data=withnames; title 'Part Orders with Part Codes After the Merge Notice that this data set is very close to the one generated by the SQL step, and only really differs by the order of the observations. Part Orders with Part Codes After the Merge Obs partcode ordernum partname Hubcap Wheel Cover Tire Type Rim Valve stem Wiring Harness Wiring Harness Fuse Assembly TheMERGEorSQLjoinwilldothetrickformost instances, however it is possible to substantially speed up the lookup process. The following DATA step uses two SET statements to perform the merge. First an observation is read from the PARTORDR data set to establish the part code that is to be looked up (notice that a rename option is used). The lookup list is then read sequentially until the codes are equal and the observation is written out. As in the previous MERGE examples, logic can be included to handle observations that are in one data set and not the other. * Merging without a BY or MERGE; set partordr(rename=(partcode=code)); 4
5 * The following expression is true only when the * current CODE is a duplicate.; if code=partcode then output; do while(code>partcode); * lookup the name using the code * from the primary data set; set codes(keep=partcode partname); if code=partcode then output; proc print data=withnames; title 'Part Orders and Codes Double SET Merge The merged data set shows the following: Part Orders and Codes Double SET Merge Obs code ordernum partcode partname Hubcap Wheel Cover Tire Type Rim Valve stem Wiring Harness Wiring Harness Fuse Assembly As in the MERGE shown earlier, the data still have to be sorted before the above DATA step can be used. Although the sorting restrictions are the same as when you use the MERGE statement, the advantage of the double SET can be a substantial reduction in processing time. USING INDEXES Indexes are a way to logically sort your data without physically sorting it. While not strictly a lookup technique, if you find that you are sorting and then resorting data to accomplish your various merges, you may find that indexes will be helpful. Indexes must be created, stored, and maintained. They are most usually created through either PROC DATASETS (shown below) or through PROC SQL. The index stores the order of the data had it been physically sorted. Once an index exists, SAS will be able to access it, and you will be able to use the data set with the appropriate BY statement, even though the data have never been sorted. Obviously there are some of the same limitations to indexes that you encounter when sorting large data sets. Resources are required to create the index, and these can be similar to the SORT itself. The index(es) is stored in a separate file, and the size of this file can be substantial, especially as the number of indexes, observations, and variables used to form the indexes increases. Indexes can substantially speed up processes. They can also SLOW things down (Virgle, 1998). Be sure to read and experiment carefully before investing a lot in the use of indexes. The following example shows the creation of indexes for the two data sets of interest in this workshop. A merge is then performed on the unsorted (but indexed) data sets using a BY statement. proc datasets library=lookup; modify codes; index create partcode / unique; modify partordr; index create partcode; * Use the BY without sorting; data indexmerge; merge lookup.partordr(in=inordr) lookup.codes(in=incodes); if inordr & incodes; USING THE KEY= OPTION You can also lookup a value when an index exists on only the data set that contains the values to be looked up. The KEY= SET statement option identifies an index that is to be used. In the following example we want to lookup the PARTNAME in the CODES data set, which is indexed on PARTCODE. data keylookup; *Master data; set lookup.codes key=partcode/unique; if _iorc_>0 then partname=' A PROC PRINT of KEYLOOKUP shows: Merge using the KEY= Option Obs partcode ordernum partname Hubcap Wheel Cover Tire Type Valve stem Rim Wiring Harness Wiring Harness Fuse Assembly An observation is read from the primary data set (PARTORDR) which contains a value of PARTCODE. 5
6 This value is then used to seek out an observation in the CODES data. The automatic variable _IORC_ is 0 when the search is successful. PARTCODE 49 is not found in the CODES data, so for this value _IORC_ is greater than 0 and a missing value is assigned to PARTNAME.. If the _IORC_ is not checked PARTCODE 49 will be given a PARTNAME from the last successful read of CODES and this is almost always not what you want. The autocall macro %SYSRC can be used to decode the values contained in _IORC_. The following example is the same as the previous one, but it takes advantage of two of over two dozen values accepted by %SYSRC. data rckeylookup; set lookup.codes key=partcode; select (_iorc_); when (%sysrc(_sok)) do; * lookup was successful; output; when (%sysrc(_dsenom)) do; * No matching partcode found; partname='unknown output; otherwise do; put 'Problem with lookup ' partcode=; stop; USING ARRAYS FOR KEY-INDEXING Sometimes when sorting is not an option or when you just want to speed up a search, the use of arrays can be just what you need. Under current versions of SAS you can build arrays that can contain millions of values (Dorfman, 2000a, 2000b). Consider the problem of selecting unique values from a data set. In terms of our data sets we would like to make sure that the data set LOOKUP.PARTORDR has at most one observation for each value of PARTCODE. One solution would be to use PROC SORT as is done below. proc sort data=lookup.partordr out=partordr nodupkey; This solution of course requires sorting. SQL with the UNIQUE function could also be used, but, as was mentioned above, this technique also may have problems. To avoid sorting we somehow have to remember what part codes we have already seen. The way to do this is to use the ARRAY statement. The beauty of this technique is that the search is very quick because it only has to check one item. We accomplish this by using the part code itself as the index to the array. * Selecting Unique Observations; data unique; array check {10000} _temporary_; if check{partcode}=. then do; output; check{partcode}=1; proc print data=unique; title 'Unique Part Orders As an observation is read from the incoming data set, the numeric part code is used as the index for the ARRAY CHECK. If the array value is missing, this is the first (unique) occurrence of this part code. It is then marked as found (the value is set to 1). Notice that this step will allow a range of part codes from 1 to 10,000. Larger ranges, into the 10s of millions, are easily accommodated. The listing for the data set UNIQUE shows: Unique Part Orders Obs partcode ordernum This process of looking up a value is exactly what we do when we merge two data sets. In the following DATA step the list of codes are read sequentially, ONCE, into an array that stores the part name using the part code as the array subscript. The second DO UNTIL then reads the data set of interest. In this loop the part name is recovered from the array and assigned to the variable PARTNAME. array chkname {10000} $15 _temporary_; do until(allnames); set lookup.codes end=allnames; chkname{partcode}=partname; do until(allordr); set lookup.partordr end=allordr; 6
7 partname = chkname{partcode}; output; stop; proc print data=withnames; title 'Part Orders - Double SET with Arrays The listing below shows that each observation from the PARTORDR data set has been included in the resulting data set and that sorting was not necessary for EITHER data set. Part Orders - Double SET with Arrays Obs partname partcode ordernum 1 Hubcap Wheel Cover Tire Type Valve stem Rim Wiring Harness Wiring Harness Fuse Assembly 43 4 This technique is known as Key-indexing because the index of the array is the value of the variable that we want to use as the lookup value. This technique will not work in all situations. As the number of array elements increases the amount of memory used also increases. Paul Dorfman, 2000a, discusses memory limitations. Certainly most modern machines should accommodate arrays with the number of elements in the millions. For situations where this technique requires unreasonable amounts of memory, other techniques such as bitmapping and hashing are available. Again Paul Dorfman is the acknowledged expert in this area and his cited papers should be consulted for more details. SUMMARY There are a number of techniques that can be applied whenever you need to lookup a value in another table. Each of these techniques has both pros and cons and as programmers we must strive to understand the differences between these techniques. Some of the commonly applied techniques (IF-THEN-ELSE and the MERGE) have alternate methods that can be used to improve performance and may even be required under some conditions. REFERENCES Carpenter, Arthur L., 1999, Getting More For Less: A Few SAS Programming Efficiency Issues, published in the conference proceedings for: Northeast SAS Users Group, NESUG, October, 1999; Pacific Northwest SAS Users Group, PNWSUG, June, 2000; Western Users of SAS Software Inc., WUSS, September, Paul Dorfman, 1999, Alternative Approach to Sorting Arrays and Strings: Tuned Data Step Implementations of Quicksort and Distribution Counting, published in the conference proceedings for: SAS Users Group International, SUGI, April, Paul Dorfman, 2000a, Private Detectives In a Data Warehouse: Key-Indexing, Bitmapping, And Hashing, published in the conference proceedings for: SAS Users Group International, SUGI, April, Paul Dorfman, 2000b, Table Lookup via Direct Addressing: Key-Indexing, Bitmapping, Hashing, published in the conference proceedings for: Pacific Northwest SAS Users Group, PNWSUG, June, Virgle, Robert, 1998, Efficiency: Improving the Performance of Your SAS Applications, Cary, NC: SAS Institute Inc., 256pp. ACKNOWLEDGMENTS Paul Dorfman and Bob Virgle are internationally acknowledged experts in the fields of SAS programming efficiencies and large data set techniques. Both provided me with valuable insights into the techniques used here, and it has been a pleasure to learn from them. ABOUT THE AUTHOR Art Carpenter s publications list includes two chapters in Reporting from the Field, the three books Quick Results with SAS/GRAPH Software, Annotate: Simply the Basics, and Carpenter's Complete Guide to the SAS Macro Language, and over four dozen papers and posters presented at SUGI, PharmaSUG, NESUG, and WUSS. Art has been using SAS since 1976 and has served in various leadership positions in local, regional, and national user groups. Art is a SAS Quality Partner TM and SAS Certified Professional TM. Through California Occidental Consultants he teaches SAS courses and provides contract SAS programming support nationwide. AUTHOR CONTACT Arthur L. Carpenter California Occidental Consultants 7
8 P.O. Box 6199 Oceanside, CA (760) TRADEMARK INFORMATION SAS, SAS Certified Professional, and SAS Quality Partner are registered trademarks of SAS Institute, Inc. in the USA and other countries. indicates USA registration. 8
KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG
Paper BB-07-2014 Using Arrays to Quickly Perform Fuzzy Merge Look-ups: Case Studies in Efficiency Arthur L. Carpenter California Occidental Consultants, Anchorage, AK ABSTRACT Merging two data sets when
Subsetting Observations from Large SAS Data Sets
Subsetting Observations from Large SAS Data Sets Christopher J. Bost, MDRC, New York, NY ABSTRACT This paper reviews four techniques to subset observations from large SAS data sets: MERGE, PROC SQL, user-defined
Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint
Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint Rita Volya, Harvard Medical School, Boston, MA ABSTRACT Working with very large data is not only a question of efficiency
Storing and Using a List of Values in a Macro Variable
Storing and Using a List of Values in a Macro Variable Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT When using the macro language it is not at all unusual to need
The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL
The SET Statement and Beyond: Uses and Abuses of the SET Statement S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT The SET statement is one of the most frequently used statements in the SAS System.
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular
Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California
Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract SAS users are always interested in learning techniques related
C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods
C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods Overview 1 Determining Data Relationships 1 Understanding the Methods for Combining SAS Data Sets 3
Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA
Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA ABSTRACT When we work on millions of records, with hundreds of variables, it is crucial how we
THE POWER OF PROC FORMAT
THE POWER OF PROC FORMAT Jonas V. Bilenas, Chase Manhattan Bank, New York, NY ABSTRACT The FORMAT procedure in SAS is a very powerful and productive tool. Yet many beginning programmers rarely make use
New Tricks for an Old Tool: Using Custom Formats for Data Validation and Program Efficiency
New Tricks for an Old Tool: Using Custom Formats for Data Validation and Program Efficiency S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT PROC FORMAT is one of the old standards among SAS Procedures,
Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.
Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there
Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA
WUSS2015 Paper 84 Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA ABSTRACT Creating your own SAS application to perform CDISC
PharmaSUG2011 - Paper AD11
PharmaSUG2011 - Paper AD11 Let the system do the work! Automate your SAS code execution on UNIX and Windows platforms Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.,
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data
Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC
A PROC SQL Primer Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC ABSTRACT Most SAS programmers utilize the power of the DATA step to manipulate their datasets. However, unless they pull
PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL
PharmaSUG 2015 - Paper QT06 PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL ABSTRACT Inspired by Christianna William s paper on
Statistics and Analysis. Quality Control: How to Analyze and Verify Financial Data
Abstract Quality Control: How to Analyze and Verify Financial Data Michelle Duan, Wharton Research Data Services, Philadelphia, PA As SAS programmers dealing with massive financial data from a variety
How to Use SDTM Definition and ADaM Specifications Documents. to Facilitate SAS Programming
How to Use SDTM Definition and ADaM Specifications Documents to Facilitate SAS Programming Yan Liu Sanofi Pasteur ABSTRCT SDTM and ADaM implementation guides set strict requirements for SDTM and ADaM variable
You have got SASMAIL!
You have got SASMAIL! Rajbir Chadha, Cognizant Technology Solutions, Wilmington, DE ABSTRACT As SAS software programs become complex, processing times increase. Sitting in front of the computer, waiting
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time
SAS Logic Coding Made Easy Revisit User-defined Function Songtao Jiang, Boston Scientific Corporation, Marlborough, MA
ABSTRACT PharmaSUG 2013 - Paper CC04 SAS Logic Coding Made Easy Revisit User-defined Function Songtao Jiang, Boston Scientific Corporation, Marlborough, MA SAS programmers deal with programming logics
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University ABSTRACT In real world, analysts seldom come across data which is in
Overview. NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT
177 CHAPTER 8 Enhancements for SAS Users under Windows NT Overview 177 NT Event Log 177 Sending Messages to the NT Event Log Using a User-Written Function 178 Examples of Using the User-Written Function
Top Ten SAS Performance Tuning Techniques
Paper AD39 Top Ten SAS Performance Tuning Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract The Base-SAS software provides users with many choices for accessing,
A Method for Cleaning Clinical Trial Analysis Data Sets
A Method for Cleaning Clinical Trial Analysis Data Sets Carol R. Vaughn, Bridgewater Crossings, NJ ABSTRACT This paper presents a method for using SAS software to search SAS programs in selected directories
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
Demystifying PROC SQL Join Algorithms Kirk Paul Lafler, Software Intelligence Corporation
Paper TU01 Demystifying PROC SQL Join Algorithms Kirk Paul Lafler, Software Intelligence Corporation ABSTRACT When it comes to performing PROC SQL joins, users supply the names of the tables for joining
SAS Client-Server Development: Through Thick and Thin and Version 8
SAS Client-Server Development: Through Thick and Thin and Version 8 Eric Brinsfield, Meridian Software, Inc. ABSTRACT SAS Institute has been a leader in client-server technology since the release of SAS/CONNECT
In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues
In This Lecture Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 13 Natasha Alechina Physical DB Issues RAID arrays for recovery and speed Indexes and query efficiency Query optimisation
Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports
Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports Jeff Cai, Amylin Pharmaceuticals, Inc., San Diego, CA Jay Zhou, Amylin Pharmaceuticals, Inc., San Diego, CA ABSTRACT To supplement Oracle
Performance Tuning for the Teradata Database
Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document
Normalizing SAS Datasets Using User Define Formats
Normalizing SAS Datasets Using User Define Formats David D. Chapman, US Census Bureau, Washington, DC ABSTRACT Normalization is a database concept used to eliminate redundant data, increase computational
AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health
AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike S. Zdeb, New York State Department of Health INTRODUCTION There are a number of SAS tools that you may never have to use. Why? The main reason
PharmaSUG 2015 - Paper QT26
PharmaSUG 2015 - Paper QT26 Keyboard Macros - The most magical tool you may have never heard of - You will never program the same again (It's that amazing!) Steven Black, Agility-Clinical Inc., Carlsbad,
Need for Speed in Large Datasets The Trio of SAS INDICES, PROC SQL and WHERE CLAUSE is the Answer, continued
PharmaSUG 2014 - Paper CC16 Need for Speed in Large Datasets The Trio of SAS INDICES, PROC SQL and WHERE CLAUSE is the Answer ABSTRACT Kunal Agnihotri, PPD LLC, Morrisville, NC Programming on/with large
Parallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
USE CDISC SDTM AS A DATA MIDDLE-TIER TO STREAMLINE YOUR SAS INFRASTRUCTURE
USE CDISC SDTM AS A DATA MIDDLE-TIER TO STREAMLINE YOUR SAS INFRASTRUCTURE Kalyani Chilukuri, Clinovo, Sunnyvale CA WUSS 2011 Annual Conference October 2011 TABLE OF CONTENTS 1. ABSTRACT... 3 2. INTRODUCTION...
Identifying Invalid Social Security Numbers
ABSTRACT Identifying Invalid Social Security Numbers Paulette Staum, Paul Waldron Consulting, West Nyack, NY Sally Dai, MDRC, New York, NY Do you need to check whether Social Security numbers (SSNs) are
Parsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
Computer Science Department CS 470 Fall I
Computer Science Department CS 470 Fall I RAD: Rapid Application Development By Sheldon Liang CS 470 Handouts Rapid Application Development Pg 1 / 5 0. INTRODUCTION RAD: Rapid Application Development By
Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral
Click here to print this article. Re-Printed From SLCentral RAID: An In-Depth Guide To RAID Technology Author: Tom Solinap Date Posted: January 24th, 2001 URL: http://www.slcentral.com/articles/01/1/raid
Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA
ABSTRACT PharmaSUG 2015 - Paper QT12 Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA It is common to export SAS data to Excel by creating a new Excel file. However, there
SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN
SA118-2014 SAS ODS HTML + PROC Report = Fantastic Output Girish K. Narayandas, OptumInsight, Eden Prairie, MN ABSTRACT ODS (Output Delivery System) is a wonderful feature in SAS to create consistent, presentable
Managing Clinical Trials Data using SAS Software
Paper DM08 Managing Clinical Trials Data using SAS Software Martin J. Rosenberg, Ph.D., MAJARO InfoSystems, Inc. ABSTRACT For over five years, one of the largest clinical trials ever conducted (over 670,000
Automate Data Integration Processes for Pharmaceutical Data Warehouse
Paper AD01 Automate Data Integration Processes for Pharmaceutical Data Warehouse Sandy Lei, Johnson & Johnson Pharmaceutical Research and Development, L.L.C, Titusville, NJ Kwang-Shi Shu, Johnson & Johnson
1/20/2016 INTRODUCTION
INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We
Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD
NESUG 2012 Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD ABSTRACT PROC SQL is a powerful yet still overlooked tool within our SAS arsenal. PROC SQL can create tables, sort and summarize data,
CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases
3 CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases About This Document 3 Methods for Accessing Relational Database Data 4 Selecting a SAS/ACCESS Method 4 Methods for Accessing DBMS Tables
Imelda C. Go, South Carolina Department of Education, Columbia, SC
PO 003 Matching SAS Data Sets with PROC SQL: If at First You Don t Succeed, Match, Match Again ABSTRACT Imelda C. Go, South Carolina Department of Education, Columbia, SC Two data sets are often matched
Choosing the Best Method to Create an Excel Report Romain Miralles, Clinovo, Sunnyvale, CA
Choosing the Best Method to Create an Excel Report Romain Miralles, Clinovo, Sunnyvale, CA ABSTRACT PROC EXPORT, LIBNAME, DDE or excelxp tagset? Many techniques exist to create an excel file using SAS.
Salary. Cumulative Frequency
HW01 Answering the Right Question with the Right PROC Carrie Mariner, Afton-Royal Training & Consulting, Richmond, VA ABSTRACT When your boss comes to you and says "I need this report by tomorrow!" do
SAS Programming Tips, Tricks, and Techniques
SAS Programming Tips, Tricks, and Techniques A presentation by Kirk Paul Lafler Copyright 2001-2012 by Kirk Paul Lafler, Software Intelligence Corporation All rights reserved. SAS is the registered trademark
Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility
Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility Michael G. Sadof, MGS Associates, Inc., Bethesda, MD. ABSTRACT The macro facility is an important feature of the
INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3
INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3 Often the most compelling way to introduce yourself to a software product is to try deliver value as soon as possible. Simego DS3 is designed to get you
SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC
PharmaSUG 2013 Poster # P015 SQL SUBQUERIES: Usage in Clinical Programming Pavan Vemuri, PPD, Morrisville, NC ABSTRACT A feature of PROC SQL which provides flexibility to SAS users is that of a SUBQUERY.
SQL Query Evaluation. Winter 2006-2007 Lecture 23
SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
Demonstrating a DATA Step with and without a RETAIN Statement
1 The RETAIN Statement Introduction 1 Demonstrating a DATA Step with and without a RETAIN Statement 1 Generating Sequential SUBJECT Numbers Using a Retained Variable 7 Using a SUM Statement to Create SUBJECT
PharmaSUG 2014 Paper CC23. Need to Review or Deliver Outputs on a Rolling Basis? Just Apply the Filter! Tom Santopoli, Accenture, Berwyn, PA
PharmaSUG 2014 Paper CC23 Need to Review or Deliver Outputs on a Rolling Basis? Just Apply the Filter! Tom Santopoli, Accenture, Berwyn, PA ABSTRACT Wouldn t it be nice if all of the outputs in a deliverable
How To Restore From A Backup In Sharepoint
Recovering Microsoft Office SharePoint Server Data Granular search and recovery means time and cost savings. An Altegrity Company 2 Introduction 3 Recovering from Data Loss 5 Introducing Ontrack PowerControls
Symbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server
Paper 8801-2016 Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server Nicole E. Jagusztyn, Hillsborough Community College ABSTRACT At a community
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make
Data Deduplication: An Essential Component of your Data Protection Strategy
WHITE PAPER: THE EVOLUTION OF DATA DEDUPLICATION Data Deduplication: An Essential Component of your Data Protection Strategy JULY 2010 Andy Brewerton CA TECHNOLOGIES RECOVERY MANAGEMENT AND DATA MODELLING
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve
Search and Replace in SAS Data Sets thru GUI
Search and Replace in SAS Data Sets thru GUI Edmond Cheng, Bureau of Labor Statistics, Washington, DC ABSTRACT In managing data with SAS /BASE software, performing a search and replace is not a straight
Introduction to Criteria-based Deduplication of Records, continued SESUG 2012
SESUG 2012 Paper CT-11 An Introduction to Criteria-based Deduplication of Records Elizabeth Heath RTI International, RTP, NC Priya Suresh RTI International, RTP, NC ABSTRACT When survey respondents are
INTRODUCING AZURE SEARCH
David Chappell INTRODUCING AZURE SEARCH Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents Understanding Azure Search... 3 What Azure Search Provides...3 What s Required to
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
Internet/Intranet, the Web & SAS. II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA
II006 Building a Web Based EIS for Data Analysis Ed Confer, KGC Programming Solutions, Potomac Falls, VA Abstract Web based reporting has enhanced the ability of management to interface with data in a
Computer Systems Structure Main Memory Organization
Computer Systems Structure Main Memory Organization Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Storage/Memory
WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math
Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit
Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3
Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM
Interfacing SAS Software, Excel, and the Intranet without SAS/Intrnet TM Software or SAS Software for the Personal Computer
Interfacing SAS Software, Excel, and the Intranet without SAS/Intrnet TM Software or SAS Software for the Personal Computer Peter N. Prause, The Hartford, Hartford CT Charles Patridge, The Hartford, Hartford
While You Were Sleeping - Scheduling SAS Jobs to Run Automatically Faron Kincheloe, Baylor University, Waco, TX
CC04 While You Were Sleeping - Scheduling SAS Jobs to Run Automatically Faron Kincheloe, Baylor University, Waco, TX ABSTRACT If you are tired of running the same jobs over and over again, this paper is
Data Presentation. Paper 126-27. Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs
Paper 126-27 Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs Tugluke Abdurazak Abt Associates Inc. 1110 Vermont Avenue N.W. Suite 610 Washington D.C. 20005-3522
Configuring Backup Settings. Copyright 2009, Oracle. All rights reserved.
Configuring Backup Settings Objectives After completing this lesson, you should be able to: Use Enterprise Manager to configure backup settings Enable control file autobackup Configure backup destinations
Managing Tables in Microsoft SQL Server using SAS
Managing Tables in Microsoft SQL Server using SAS Jason Chen, Kaiser Permanente, San Diego, CA Jon Javines, Kaiser Permanente, San Diego, CA Alan L Schepps, M.S., Kaiser Permanente, San Diego, CA Yuexin
PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays
Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays, continued SESUG 2012 PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays William E Benjamin
Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files
Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1
Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1 Institute for Health Care Research and Improvement, Baylor Health Care System 2 University of North
OBJECT_EXIST: A Macro to Check if a Specified Object Exists Jim Johnson, Independent Consultant, North Wales, PA
PharmaSUG2010 - Paper TU01 OBJECT_EXIST: A Macro to Check if a Specified Object Exists Jim Johnson, Independent Consultant, North Wales, PA ABSTRACT This paper describes a macro designed to quickly tell
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD ABSTRACT This paper demonstrates important features of combining datasets in SAS. The facility to
