Leads and Lags: Static and Dynamic Queues in the SAS DATA STEP
|
|
|
- Barry Jones
- 10 years ago
- Views:
Transcription
1 Paper 7-05 Leads and Lags: Static and Dynamic Queues in the SAS DATA STEP Mark Keintz, Wharton Research Data Services ABSTRACT From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variable For the SAS user, the central operational task is typically getting lagged (lead) values for each time point in the data set While SAS has long provided a LAG function, it has no analogous lead function an especially significant problem in the case of large data series This paper will () review the lag function, in particular the powerful, but non-intuitive implications of its queueoriented basis, () demonstrate efficient ways to generate leads with the same flexibility as the lag function, but without the common and expensive recourse to data re-sorting, and () show how to dynamically generate leads and lags through use of the hash object INTRODUCTION Have you ever needed to report data from a series of monthly records comparing a given month s results to the same month in the prior year? Or perhaps you have a set of hospital admission and discharge records sorted by patient id within date and been asked to find length of stay for each patient visit (or even time to next admission for selected DRGs) These are just two examples of the frequent need SAS programmers have to find lag or lead values All too often programmers find themselves in a multi-step process of sorting, joining, and resorting data just to retrieve data items from some prior or subsequent record in a data set But it is possible to address all these tasks in single step programs, as will be shown below SAS has a LAG function (and a related DIF function) intended to provide data values (or differences) from preceding records in a data set This presentation will show the benefit of the queue-management character of the LAG function in addressing both regular and irregular times series, whether grouped or ungrouped Since there is no analogous lead function, later sections show how to use multiple SET statements to find leads The final section will address cases in which you don t know in advance how many lag series or lead series will be needed This problem of dynamic lags and leads can t be effectively handled through clever use of the lag function or multiple SET statements Instead a straightforward application of a hash object provides the best solution THE LAG FUNCTION RETRIEVING HISTORY The term lag function suggests retrieval of data via looking back by some user-specified number of periods or observations, and that is what most simple applications of the LAG function seem to do For instance, consider the task of producing -month and -month price returns for this monthly stock price file (created from the sashelpstocks data see appendix for creation of data set SAMPLE): Table Five Rows From SAMPLE (re-sorted from sashelpstocks) Obs DATE STOCK CLS 0AUG86 0SEP86 $875 $50
2 Table Five Rows From SAMPLE (re-sorted from sashelpstocks) Obs DATE STOCK CLS 5 0OCT86 0NOV86 0DEC86 $6 $7 $000 This program below uses the LAG and LAG ( record lag) functions to compare the current closing price (CLS) to its immediate and it third prior predecessors, needed to calculate one-month and three-month returns (RTN and RTN): Example : Simple Creation of Lagged Values data simple_lags; set sample; CLS=lag(CLS); CLS=lag(CLS); if CLS ^= then RETN = CLS/CLS - ; if CLS ^= then RETN = CLS/CLS - ; Example yields the following data in the first 5 rows: Table Example Results from using LAG and LAG Obs DATE STOCK CLS CLS CLS RETN RETN 5 0AUG86 0SEP86 0OCT86 0NOV86 0DEC86 $875 $50 $6 $7 $ LAGS ARE QUEUES NOT LOOK BACKS At this point LAG functions have all the appearance of simply looking back by one (or ) observations, with the additional feature of imputing missing values when looking back beyond the beginning of the data set But actually the lag function instructs SAS to construct a fifo (first-in/first-out) queue with () as many entries as the specified length of the lag, and () the queue elements initialized to missing values Every time the lag function is executed, the oldest entry is retrieved (and removed) from the queue and a new entry (ie the current value) is added Why is this queue management vs lookback distinction significant? That becomes evident when the LAG function is executed conditionally, as in the treatment of BY groups below
3 As an illustration, consider observations through 7 generated by Example program, and presented in Table This shows the last two cases for and the first four for For the first observation (obs ), the lagged value of the closing stock price (CLS=80) is taken from the series Of course, it should be a missing value, as should all the shaded cells Lag values are not completely correct until the fourth record (obs 7) Table The Problem of BY Groups for Lags Obs DATE STOCK CLS CLS CLS RETN RETN 0NOV05 $ DEC05 $ AUG86 $ SEP86 $ OCT86 $ NOV86 $ The natural way to address this problem is to use a BY statement in SAS and (for the case of the single month return) avoid executing lags when the observation in hand is the first for a given stock Example is such a program (dealing with CLS only for illustration purposes), and its results are in Table Example : A naïve implementation of lags for BY groups data naive_lags; set sample; by stock; if not(firststock) then CLS=lag(CLS); else CLS=; if CLS ^= Then RETN= CLS/CLS - ; format ret: 6; Table Results of "if not(firststock) then CLS=lag(CLS)" Obs STOCK DATE CLS CLS RETN NOV05 0DEC05 0AUG86 0SEP86 0OCT86 0NOV86 $8890 $80 $00 $950 $05 $ This fixes the first record, setting both CLS and RETN to missing values But look at the second record (Obs 5) CLS has a value of 80, taken not from the beginning of the series, but rather from the end
4 of the series, generating an erroneous value for RETN as well In other words, CLS did not come from the prior record (lookback), but rather it came from the queue, whose contents were most recently updated in the last record LAGS FOR BY GROUPS (IFN AND IFC ALWAYS CALCULATE BOTH OUTCOMES) The fix is eazy, unconditionally execute a lag, and then reset the result when necessary This is usually done in two statements, but Example shows a more compact solution (described by Howard Schrier see References) It uses the IFN function instead of an IF statement - because IFN executes the lag function embedded in its second argument regardless of the status of firststock (the condition being tested) Of course, IFN returns the lagged value only when the tested condition is true Accommodating BY groups for lags longer than one period simply requires comparing lagged values of the BY-variable to the current values ( lag(stock)=stock ), as in the CLS= statement below Example : A robust implementation of lags for BY groups data bygroup_lags; set sample; by stock; CLS = ifn(not(firststock),lag(cls),); CLS = ifn(lag(stock)=stock,lag(cls),) ; if CLS ^= then RETN = CLS/CLS - ; if CLS ^= then RETN = CLS/CLS - ; format ret: 6; The data set BYGROUP_LAGS now has missing values for the appropriate records at the start of the monthly records Table 5 Result of Robust Lag Implementation for BY Groups Obs STOCK DATE CLS CLS CLS RETN RETN NOV05 0DEC05 0AUG86 0SEP86 0OCT86 0NOV86 $8890 $80 $00 $950 $05 $ MULTIPLE LAGS MEANS MULTIPLE QUEUES A WAY TO MANAGE IRREGULAR SERIES While BY-groups benefit from avoiding conditional execution of lags, there are times when conditional lags are the best solution In the data set SALES below (see Appendix for its generation) are monthly sales by product But a
5 given product month combination is only present when sales are reported As a result each month has a varying number of records, depending on which product sales were reported Table 6 shows 5 records for month, but only records for month Unlike the data in Sample, this time series is not regular, so comparing (say) sales of product B in January (observation ) to February (7) would imply a LAG5, while comparing March (0) to February would need a LAG Table 6 Data Set SALES (first obs) OBS MONTH PROD SALES A B C D X A B D X B C D X The solution to this task is to use conditional lags, with one queue for each product The program to do so is surprisingly simple: Example : LAGS for Irregular Time Series data irregular_lags; set sales; select (product); when ('A') change_rate=(sales-lag(sales))/(month-lag(month)); when ('B') change_rate=(sales-lag(sales))/(month-lag(month)); when ('C') change_rate=(sales-lag(sales))/(month-lag(month)); when ('D') change_rate=(sales-lag(sales))/(month-lag(month)); otherwise; Example generates four pairs of lag queues, one pair for each of the products A through D Each invocation of a lag maintains a separate queue, resulting in eight distinct queues in Example Because a given queue is updated only when the specified product is in hand (the when clauses), the output of the lag function must come from an earlier observation having the same PRODUCT as the current observation, no matter how far back it may be in the data set Note that, for most functions, clean program design would collapse the four when conditions into one, such as A more compact expression would have used DIF instead of LAG, as in change_rate=dif(sales)/dif(month) 5
6 when( A, B, C, D ) change_rate= function(sales)/function(month); Succumbing to this temptation would misuse the queue-management features of LAG (and DIF) LEADS USING EXTRA SET STATEMENTS TO LOOK AHEAD SAS does not offer a lead function As a result many SAS programmers sort a data set in descending order and then apply lag functions to create lead values Often the data are sorted a second time, back to original order, before any analysis is done For large data sets, this is a costly three-step process There is a much simpler way, through the use of extra SET statements in combination with the FIRSTOBS parameter and other elements of the SET statement The following program generates both a one-month and threemonth lead of the data from Sample: Example 5: Simple generation of one-record and three-record leads data simple_leads; set sample; if eof=0 then set sample (firstobs= keep=cls rename=(cls=lead)) end=eof; else lead=; if eof=0 then set sample (firstobs= keep=cls rename=(cls=lead)) end=eof; else lead=; The use of multiple SET statements to generate leads makes use of two features of the SET statement and two data set name parameters: SET Feature : Multiple SET statements reading the same data set do not read from a single series of records (unlike a collection of INPUT statements) Instead they produce separate streams of data (much like each LAG function generating its own independent queue) It is as if the three SET statements above were reading from three different data sets In fact the log from Example 5 displays these notes reporting three incoming streams of data: NOTE: There were 699 observations read from the data set WORKSAMPLE NOTE: There were 698 observations read from the data set WORKSAMPLE NOTE: There were 696 observations read from the data set WORKSAMPLE Data Set Name Parameter (FIRSTOBS): Using the FIRSTOBS= parameter provides a way to look ahead in a data set For instance the second SET has FIRSTOBS= (the third has FIRSTOBS= ), so that it starts reading from record () while the first SET statement is reading from record This provides a way to synchronize leads with any given current record Data Set Name Parameter (RENAME, and KEEP): All three SET statements read in the same variable (CLS), yet a single variable can t have more than one value at a time In this case the original value would be overwritten by the subsequent SETs To avoid this problem CLS is renamed (to LEAD and LEAD) in the additional SET statements, resulting in three variables: CLS (from the current record, LEAD (one pe- 6
7 riod lead) and LEAD (three period lead) To avoid overwriting any other variables, only variables to be renamed should be in the KEEP= parameter SET Feature (EOF=): Using the IF EOFx=0 then set in tandem with the END=EOFx parameter avoids a premature end of the data step Ordinarily a data step with three SET statements stops when any one of the SETs attempts to go beyond the end of input In this case, the third SET ( FIRSTOBS= ) would stop the DATA step even though the first would have unread records remaining The answer is to prevent unwanted attempts at reading beyond the end of data The end= parameter generates a dummy variable indicating whether the record in hand is the last incoming record The program can test its value and stop reading each data input stream when it is exhausted That s why the log notes above report differing numbers of observations read The last records in the resulting data set are below, with lead values as expected: Table 7: Simple Leads Obs STOCK DATE CLS LEAD LEAD Microsoft Microsoft Microsoft Microsoft 0SEP05 0OCT05 0NOV05 0DEC05 $57 $570 $768 $65 $570 $768 $65 $65 > LEADS FOR BY GROUPS Just as in the case of lags, generating lead in the presence of BY group requires a little extra code A singleperiod lead is relatively easy if the current record is the last in a BY group, reset the lead to missing That test is shown after the second SET statement below But in the case of leads beyond one period, a little extra is needed namely a test comparing the current value of the BY-variable (STOCK) to its value in the lead period That s done in the code below for the three-period lead by reading in (and renaming) the STOCK variable in the third SET statement, comparing it to the current STOCK value, and resetting LEAD to missing when needed Example 6: Generating Leads for By Groups data bygroup_leads; set sample; by stock; if eof=0 then set sample (firstobs= keep=cls rename=(cls=lead)) end=eof; if laststock then lead=; if eof=0 then set sample (firstobs= keep=stock CLS rename=(stock=stock CLS=LEAD)) end=eof; if stock ^= stock then lead=; drop stock; 7
8 The result for the last four observations and the first three observations are below, with LEAD set to missing for the final, and LEAD for the last observations Table 8 Leads With By Groups Obs STOCK DATE CLS LEAD LEAD 0 5 0SEP05 0OCT05 0NOV05 0DEC05 0AUG86 0SEP86 $80 $888 $8890 $80 $00 $950 $888 $8890 $80 $950 $05 $80 $00 $00 GENERATING IRREGULAR LEAD QUEUES USING PRECEDENCE OF WHERE OVER FIRSTOBS Generating lags for the SALES data above required the utilization of multiple queues a LAG function for each product, resolving the problem of varying distances between successive records for a given product Developing leads for such irregular time series requires the same approach one queue for each product Similar to the use of several LAG functions to manage separate queues, generating leads for irregular times series requires a collection of SET statements each filtered by an appropriate WHERE condition The relatively simple program below demonstrates: Example 7: Generating Leads for Irregular Time Series data irregular_leads; set sales; select (product); when ('A') if eofa=0 then set sales (where=(product='a') firstobs= keep=product sales rename=(sales=lead)) end=eofa; when ('B') if eofb=0 then set sales (where=(product='b') firstobs= keep=product sales rename=(sales=lead)) end=eofb; when ('C') if eofc=0 then set sales (where=(product='c') firstobs= keep=product sales rename=(sales=lead)) end=eofc; when ('D') if eofd=0 then set sales (where=(product='d') firstobs= keep=product sales rename=(sales=lead)) end=eofd; otherwise lead=; 8
9 The logic of Example 7 is straightforward If the current product is A [WHEN( A )] and the stream of PRODUCT A records is not exhausted (if eofa=0), then read the next product A record, renaming its SALES variable to LEAD The technique for reading only product A records is to use the WHERE= data set name parameter Most important to this technique is the fact that the WHERE parameter is honored prior to the FIRSTOBS parameter So FIRSTOBS= means the second product A record, not the second record overall The first 8 product A and B records are as follows, with the LEAD value always the same as the SALES values for the next identical product Table 9 Lead produced by Independent Queues Obs MONTH PRODUCT SALES LEAD A B A B B A B A Of course this approach could become burdensome if the intent is to produce not only a period lead for each product, but also a period and period lead Although this might at first suggest issuing SET statements ( products and differing lead levels), one can still work with one SET per product A complete program to address that issue is in Appendix An abridged version follows Example 8: Shorter Leads = Lag of Longer Leads data irregular_leads (drop=i); set sales; select (product); when ('A') do i= to ifn(lag(product) =' ',,); if eofa=0 then set sales (where=(product='a') firstobs= keep=product sales rename=(sales=lead)) end=eofa; else lead= ; lead=lag(lead); lead=lag(lead); when ('B') when ('C') when ('D') otherwise lead= ; 9
10 The approach above gets the -period lead (LEAD), and then simply applies lag functions to it to produce shorter leads To do this of course, the program must initially read, for each PRODUCT, observations,, and initially, and subsequently once for each reference record This is done with the do i= to ifn(lag(product) =' ',,); which merely utilizes the fact the lag(product) queue is initialized to a missing value Consequently the ifn condition is true (and observations are read) only for the first instance of each SET statement DYNAMICALLY GENERATING LAGS AND LEADS In earlier sections on generating lags (or leads) for irregular series, a LAG function (or SET statement) was issued for each product But what if you don t know in advance how many products were in the data set? If you don t, then a static approach is no longer sufficient A technique is needed for dynamically establishing a time series for each product as it is discovered in the data set Hash objects fit this need Their content, unlike ordinary variables in the data set, persists from record to record, and can be expanded and revised dynamically as well DYNAMIC LAGS DROP THE LAG FUNCTION In the case of lags, this problem is most readily addressed by use of two hash objects, one (CURRSEQ) for maintaining a running count of records for each product, and one (LAGS) to hold data for later retrieval as lagged values The program below, after declaring these two objects, has three elements: Example 9: Dynamically Generating Lags data dynamic_lags (drop=_:); set sales; _seq=0; if _n_= then do; /* Hash object CURRSEQ: track current count by product */ declare hash currseq(); currseqdefinekey('product'); currseqdefinedata('_seq'); currseqdefinedone(); _sl= ; _ml= ; /* Lagged Sales and Month */ /* hash object LAGS: _SL and _ML keyed by PRODUCT and _SEQ */ declare hash lags(); lagsdefinekey('product','_seq'); lagsdefinedata('_sl','_ml'); lagsdefinedone(); /* Part : Update _SEQ for the current PRODUCT */ _rc=currseqfind(); /* Retrieve _SEQ for current PRODUCT */ _seq=_seq+; currseqreplace(); /* Replace old value with new */ /* Part : Add SALES and MONTH (as data elements _SL and _ML) */ /* to the LAGS object, keyed on PRODUCT and _SEQ */ lagsadd(key:product,key:_seq,data:sales,data:month); 0
11 /* Part : Retrieve historical value and make the monthly rate*/ _rc = lagsfind(key:product,key:_seq-); if _rc=0 then change_rate=(sales-_sl)/(month-_ml); In the program above, the CURRSEQ hash object maintains only one item (ie one row ) per product, containing the variable _SEQ, the current number of records read for that product In part, the currseqfind() method retrieves _SEQ for the current product (only if the product is already in CURRSEQ, otherwise _SEQ is unmodified) _SEQ is then incremented and replaced into CURRSEQ with every incoming record for the product The LAGS object (keyed on PRODUCT and _SEQ), gets one item for each incoming record, and contains a running history of the variables _SL and _ML (so named to allow later retrieval without overwriting the current value of SALES and MONTH) In part, the new data is added to LAGS ( lagsadd( ) ) The use of the two data: arguments assigns the values of SALES and MONTH to _SL and _ML as they are stored in LAGS The third major element of this program simply checks whether the product in hand is beyond its first occurrence, and if so, retrieves lagged values from LAGS and generates the needed variable Note the key:seq- argument tells lagfind() to retrieve the item (with variables _SL and _ML) corresponding to a single period lag An n-period lag would require nothing more than the argument key:seq-n DYNAMIC LEADS ORDERED HASH OBJECT PLUS HASH ITERATOR In the case of leads, if you don t know how many PRODUCTs are expected, building a history record-by-record as in the case of lags won t work Instead the hash object needs to have a complete history That problem is solved in Example 0 below, a data step with two parts: Example 0: Dynamically Generating Leads data dynamic_leads (drop=_:) ; /* Part : Populate LEADS hash object, and assign an iterator */ if _n_= then do; if 0 then set sales (keep=product sales month rename=(product=_pl sales=_sl month=_ml)); declare hash leads (dataset:'sales (keep=product MONTH SALES rename=(product=_pl MONTH=_ML SALES=_SL)',ordered:'A') ; leadsdefinekey('_pl','_ml'); leadsdefinedata('_pl','_ml','_sl'); leadsdefinedone(); declare hiter li ('leads'); /* Part : Read the dataset and retrieve leads */ set sales ; /* Point hash iterator at hash item for current product/month */ lisetcur(key:product,key:month); _rc=linext(); /* Advance one item in the hash object */ if _rc=0 and _PL=PRODUCT then changerate=(_sl-sales)/(_ml-month);
12 Before reading the data set record-by-record, the leads object is fully populated with all the PRODUCT, SALES and MONTH values (renamed to _PL, _SL and _ML), stored in ascending order by _PL and _ML Because the object is ordered, moving from one item to the next within the hash is equivalent to traversing the series for a single PRODUCT The hash iterator li supports this process In part, a record is read, and the lisetcur method points the iterator at the data item in the object corresponding to the record Moving the iterator forward by one item ( _rc=linext() ) retrieves the _PL, _ML, and _SL values for a one period lead CONCLUSION While at first glance the queue management character of the LAG function may seem counterintuitive, this property offers robust techniques to deal with a variety of situations, including BY groups and irregularly spaced time series The technique for accommodating those structures is relatively simple In addition, the use of multiple SET statements produces the equivalent capability in generating leads, all without the need for extra sorting of the data set The more complex problem is how to address irregular time series dynamically, ie how to handle situations in which the user does not know how many queues will be needed For this situation, the construction of hashes used as tools to look up lagged (or lead) data for each queue provides a graceful solution REFERENCES Matlapudi, Anjan, and J Daniel Knapp (00) Please Don t Lag Behind LAG In the Proceedings of the North East SAS Users Group Schreier, Howard (undated) Conditional Lags Don t Have to be Treacherous URL as or 7//0: ACKNOWLEDGMENTS SAS and all other SAS Institute Inc product or service names are registered trademarks or trademarks of SAS Institute Inc in the USA and other countries indicates USA registration Other brand and product names are trademarks of their respective companies CONTACT INFORMATION This is a work in progress Your comments and questions are valued and encouraged Please contact the author at: Author Name: Mark Keintz mkeintz@whartonupenedu
13 APPENDIX: CREATION OF SAMPLE DATA SETS FROM THE SASHELP LIBRARY /*Sample : Monthly Stock Data for,, Microsoft for Aug Dec 005 */ proc sort data=sashelpstocks (keep=stock date close) out=sample (rename=(close=cls)) ; by stock date; /*Sample : Irregular Series: Monthly Sales by Product, */ data SALES; do MONTH= to ; do PRODUCT='A','B','C','D','X'; if ranuni(098098)< 09 then do; SALES =ceil(0*ranuni( )); output;
14 APPENDIX : EXAMPLE PROGRAM OF MULTI-PERIOD LEADS FOR IRREGULAR TIME SERIES /*To get leads for each product of interest ( A, B, C, D ) for three different periods (,, and ), you might issue SET statements, as per the section on leads for irregular series However, only SET statements are needed when combined with lag functions, as below */ data irregular_leads (drop=_: label=',, & period leads for products A, B, C, and D'); set sales; select (product); when ('A') do _I= to ifn(lag(product) =' ',,); if eofa=0 then set sales (where=(product='a') firstobs= keep=product sales rename=(sales=lead)) end=eofa; else lead= ; LEAD=lag(lead); LEAD=lag(lead); when ('B') do _I= to ifn(lag(product) =' ',,); if eofb=0 then set sales (where=(product='b') firstobs= keep=product sales rename=(sales=lead)) end=eofb; else lead= ; lead=lag(lead); lead=lag(lead); when ('C') do _I= to ifn(lag(product) =' ',,); if eofc=0 then set sales (where=(product='c') firstobs= keep=product sales rename=(sales=lead)) end=eofc; else lead= ; lead=lag(lead); lead=lag(lead); when ('D') do _I= to ifn(lag(product) =' ',,); if eofd=0 then set sales (where=(product='d') firstobs= keep=product sales rename=(sales=lead)) end=eofd; else lead= ; lead=lag(lead); lead=lag(lead); otherwise lead= ;
LEADS AND LAGS: HANDLING QUEUES IN THE SAS DATA STEP
LEADS AND LAGS: HANDLING QUEUES IN THE SAS DATA STEP Mark Keintz, Wharton Research Data Services, University of Pennsylvania ASTRACT From stock price histories to hospital stay records, analysis of time
LEADS AND LAGS IN SAS
LEDS ND LGS IN SS Mark Keintz, Wharton Research Data Services, University of Pennsylvania STRCT nalysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular
Subsetting Observations from Large SAS Data Sets
Subsetting Observations from Large SAS Data Sets Christopher J. Bost, MDRC, New York, NY ABSTRACT This paper reviews four techniques to subset observations from large SAS data sets: MERGE, PROC SQL, user-defined
PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays
Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays, continued SESUG 2012 PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays William E Benjamin
Finding National Best Bid and Best Offer
ABSTRACT Finding National Best Bid and Best Offer Mark Keintz, Wharton Research Data Services U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Trade System (CTS)
The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL
The SET Statement and Beyond: Uses and Abuses of the SET Statement S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT The SET statement is one of the most frequently used statements in the SAS System.
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data
Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server
Paper 8801-2016 Do It Yourself (DIY) Data: Creating a Searchable Data Set of Available Classrooms using SAS Enterprise BI Server Nicole E. Jagusztyn, Hillsborough Community College ABSTRACT At a community
Everything you wanted to know about MERGE but were afraid to ask
TS- 644 Janice Bloom Everything you wanted to know about MERGE but were afraid to ask So you have read the documentation in the SAS Language Reference for MERGE and it still does not make sense? Rest assured
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD ABSTRACT This paper demonstrates important features of combining datasets in SAS. The facility to
Statistics and Analysis. Quality Control: How to Analyze and Verify Financial Data
Abstract Quality Control: How to Analyze and Verify Financial Data Michelle Duan, Wharton Research Data Services, Philadelphia, PA As SAS programmers dealing with massive financial data from a variety
Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC
Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.
Microfinance Credit Risk Dashboard User Guide
Microfinance Credit Risk Dashboard User Guide PlaNet Finance Microfinance Robustness Program Risk Management Public Tools Series Microfinance is at a new stage where risk management will be decisive for
Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA
ABSTRACT PharmaSUG 2015 - Paper QT12 Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA It is common to export SAS data to Excel by creating a new Excel file. However, there
Parallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1
Intro to Longitudinal Data: A Grad Student How-To Paper Elisa L. Priest 1,2, Ashley W. Collinsworth 1,3 1 Institute for Health Care Research and Improvement, Baylor Health Care System 2 University of North
Applications Development
Portfolio Backtesting: Using SAS to Generate Randomly Populated Portfolios for Investment Strategy Testing Xuan Liu, Mark Keintz Wharton Research Data Services Abstract One of the most regularly used SAS
Accounts Receivable System Administration Manual
Accounts Receivable System Administration Manual Confidential Information This document contains proprietary and valuable, confidential trade secret information of APPX Software, Inc., Richmond, Virginia
Directions for the Well Allocation Deck Upload spreadsheet
Directions for the Well Allocation Deck Upload spreadsheet OGSQL gives users the ability to import Well Allocation Deck information from a text file. The Well Allocation Deck Upload has 3 tabs that must
Managing Data Issues Identified During Programming
Paper CS04 Managing Data Issues Identified During Programming Shafi Chowdhury, Shafi Consultancy Limited, London, U.K. Aminul Islam, Shafi Consultancy Bangladesh, Sylhet, Bangladesh ABSTRACT Managing data
SUGI 29 Coders' Corner
Paper 074-29 Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 19 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users
Introduction to Criteria-based Deduplication of Records, continued SESUG 2012
SESUG 2012 Paper CT-11 An Introduction to Criteria-based Deduplication of Records Elizabeth Heath RTI International, RTP, NC Priya Suresh RTI International, RTP, NC ABSTRACT When survey respondents are
UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool
UNIVERSITY OF WATERLOO Software Engineering Analysis of Different High-Level Interface Options for the Automation Messaging Tool Deloitte Inc. Toronto, ON M5K 1B9 Prepared By Matthew Stephan Student ID:
Directions for the AP Invoice Upload Spreadsheet
Directions for the AP Invoice Upload Spreadsheet The AP Invoice Upload Spreadsheet is used to enter Accounts Payable historical invoices (only, no GL Entry) to the OGSQL system. This spreadsheet is designed
Salary. Cumulative Frequency
HW01 Answering the Right Question with the Right PROC Carrie Mariner, Afton-Royal Training & Consulting, Richmond, VA ABSTRACT When your boss comes to you and says "I need this report by tomorrow!" do
Chapter 25 Specifying Forecasting Models
Chapter 25 Specifying Forecasting Models Chapter Table of Contents SERIES DIAGNOSTICS...1281 MODELS TO FIT WINDOW...1283 AUTOMATIC MODEL SELECTION...1285 SMOOTHING MODEL SPECIFICATION WINDOW...1287 ARIMA
KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG
Paper BB-07-2014 Using Arrays to Quickly Perform Fuzzy Merge Look-ups: Case Studies in Efficiency Arthur L. Carpenter California Occidental Consultants, Anchorage, AK ABSTRACT Merging two data sets when
Table Lookups: From IF-THEN to Key-Indexing
Paper 158-26 Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine
Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint
Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint Rita Volya, Harvard Medical School, Boston, MA ABSTRACT Working with very large data is not only a question of efficiency
Switching from PC SAS to SAS Enterprise Guide Zhengxin (Cindy) Yang, inventiv Health Clinical, Princeton, NJ
PharmaSUG 2014 PO10 Switching from PC SAS to SAS Enterprise Guide Zhengxin (Cindy) Yang, inventiv Health Clinical, Princeton, NJ ABSTRACT As more and more organizations adapt to the SAS Enterprise Guide,
SAS Views The Best of Both Worlds
Paper 026-2010 SAS Views The Best of Both Worlds As seasoned SAS programmers, we have written and reviewed many SAS programs in our careers. What I have noticed is that more often than not, people who
Teamstudio USER GUIDE
Teamstudio Software Engineering Tools for IBM Lotus Notes and Domino USER GUIDE Edition 30 Copyright Notice This User Guide documents the entire Teamstudio product suite, including: Teamstudio Analyzer
SAS Task Manager 2.2. User s Guide. SAS Documentation
SAS Task Manager 2.2 User s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS Task Manager 2.2: User's Guide. Cary, NC: SAS Institute
Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.
Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although
User Guide for TASKE Desktop
User Guide for TASKE Desktop For Avaya Aura Communication Manager with Aura Application Enablement Services Version: 8.9 Date: 2013-03 This document is provided to you for informational purposes only.
Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC
Paper 073-29 Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT Version 9 of SAS software has added functions which can efficiently
SonicWALL GMS Custom Reports
SonicWALL GMS Custom Reports Document Scope This document describes how to configure and use the SonicWALL GMS 6.0 Custom Reports feature. This document contains the following sections: Feature Overview
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University ABSTRACT In real world, analysts seldom come across data which is in
What You re Missing About Missing Values
Paper 1440-2014 What You re Missing About Missing Values Christopher J. Bost, MDRC, New York, NY ABSTRACT Do you know everything you need to know about missing values? Do you know how to assign a missing
AN ANIMATED GUIDE: SENDING SAS FILE TO EXCEL
Paper CC01 AN ANIMATED GUIDE: SENDING SAS FILE TO EXCEL Russ Lavery, Contractor for K&L Consulting Services, King of Prussia, U.S.A. ABSTRACT The primary purpose of this paper is to provide a generic DDE
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC ABSTRACT One of the most expensive and time-consuming aspects of data management
Experiences in Using Academic Data for BI Dashboard Development
Paper RIV09 Experiences in Using Academic Data for BI Dashboard Development Evangeline Collado, University of Central Florida; Michelle Parente, University of Central Florida ABSTRACT Business Intelligence
Creating tables in Microsoft Access 2007
Platform: Windows PC Ref no: USER 164 Date: 25 th October 2007 Version: 1 Authors: D.R.Sheward, C.L.Napier Creating tables in Microsoft Access 2007 The aim of this guide is to provide information on using
ProDoc Tech Tip Creating and Using Supplemental Forms
ProDoc Tech Tip You can add your own forms and letters into ProDoc. If you want, you can even automate them to merge data into pre-selected fields. Follow the steps below to create a simple, representative
Binary Heap Algorithms
CS Data Structures and Algorithms Lecture Slides Wednesday, April 5, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] 2005 2009 Glenn G. Chappell
Chapter 1 Overview of the SQL Procedure
Chapter 1 Overview of the SQL Procedure 1.1 Features of PROC SQL...1-3 1.2 Selecting Columns and Rows...1-6 1.3 Presenting and Summarizing Data...1-17 1.4 Joining Tables...1-27 1-2 Chapter 1 Overview of
Excel for Data Cleaning and Management
Excel for Data Cleaning and Management Background Information This workshop is designed to teach skills in Excel that will help you manage data from large imports and save them for further use in SPSS
Managing User Accounts and User Groups
Managing User Accounts and User Groups Contents Managing User Accounts and User Groups...2 About User Accounts and User Groups... 2 Managing User Groups...3 Adding User Groups... 3 Managing Group Membership...
How To Write A Clinical Trial In Sas
PharmaSUG2013 Paper AD11 Let SAS Set Up and Track Your Project Tom Santopoli, Octagon, now part of Accenture Wayne Zhong, Octagon, now part of Accenture ABSTRACT When managing the programming activities
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board
Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board INTRODUCTION In 20 years as a SAS consultant at the Federal Reserve Board, I have seen SAS users make
Procurement Planning
MatriX Engineering Project Support Pinnacle Business Solutions Ltd www.pinsol.com www.matrix-eps.com Proc_Plan.doc Page 2 of 21 V_09_03 Contents 1.0 INTRODUCTION... 5 1.1 KPIS AND MEASURABLE TARGETS...
The Query Builder: The Swiss Army Knife of SAS Enterprise Guide
Paper 1557-2014 The Query Builder: The Swiss Army Knife of SAS Enterprise Guide ABSTRACT Jennifer First-Kluge and Steven First, Systems Seminar Consultants, Inc. The SAS Enterprise Guide Query Builder
SPSS: Getting Started. For Windows
For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering
Post Processing Macro in Clinical Data Reporting Niraj J. Pandya
Post Processing Macro in Clinical Data Reporting Niraj J. Pandya ABSTRACT Post Processing is the last step of generating listings and analysis reports of clinical data reporting in pharmaceutical industry
Chapter 23 File Management (FM)
Chapter 23 File Management (FM) Most Windows tasks involve working with and managing Files and Folders.Windows uses folders to provide a storage system for the files on your computer, just as you use manila
C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods
C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods Overview 1 Determining Data Relationships 1 Understanding the Methods for Combining SAS Data Sets 3
Excel Database Management Microsoft Excel 2003
Excel Database Management Microsoft Reference Guide University Technology Services Computer Training Copyright Notice Copyright 2003 EBook Publishing. All rights reserved. No part of this publication may
Unleash the Power of e-learning
Unleash the Power of e-learning Version 1.5 November 2011 Edition 2002-2011 Page2 Table of Contents ADMINISTRATOR MENU... 3 USER ACCOUNTS... 4 CREATING USER ACCOUNTS... 4 MODIFYING USER ACCOUNTS... 7 DELETING
Designing and Implementing Forms 34
C H A P T E R 34 Designing and Implementing Forms 34 You can add forms to your site to collect information from site visitors; for example, to survey potential customers, conduct credit-card transactions,
Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA
Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch
THE CHALLENGE OF ADMINISTERING WEBSITES OR APPLICATIONS THAT REQUIRE 24/7 ACCESSIBILITY
THE CHALLENGE OF ADMINISTERING WEBSITES OR APPLICATIONS THAT REQUIRE 24/7 ACCESSIBILITY As the constantly growing demands of businesses and organizations operating in a global economy cause an increased
DBF Chapter. Note to UNIX and OS/390 Users. Import/Export Facility CHAPTER 7
97 CHAPTER 7 DBF Chapter Note to UNIX and OS/390 Users 97 Import/Export Facility 97 Understanding DBF Essentials 98 DBF Files 98 DBF File Naming Conventions 99 DBF File Data Types 99 ACCESS Procedure Data
Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI
Paper BtB-16 Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI SESUG 2013 ABSTRACT When dealing with data from multiple or unstructured
SAS Add-In 2.1 for Microsoft Office: Getting Started with Data Analysis
SAS Add-In 2.1 for Microsoft Office: Getting Started with Data Analysis The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS Add-In 2.1 for Microsoft Office: Getting
Editor Manual for SharePoint Version 1. 21 December 2005
Editor Manual for SharePoint Version 1 21 December 2005 ii Table of Contents PREFACE... 1 WORKFLOW... 2 USER ROLES... 3 MANAGING DOCUMENT... 4 UPLOADING DOCUMENTS... 4 NEW DOCUMENT... 6 EDIT IN DATASHEET...
A Method for Cleaning Clinical Trial Analysis Data Sets
A Method for Cleaning Clinical Trial Analysis Data Sets Carol R. Vaughn, Bridgewater Crossings, NJ ABSTRACT This paper presents a method for using SAS software to search SAS programs in selected directories
Paper 23-28. Hot Links: Creating Embedded URLs using ODS Jonathan Squire, C 2 RA (Cambridge Clinical Research Associates), Andover, MA
Paper 23-28 Hot Links: Creating Embedded URLs using ODS Jonathan Squire, C 2 RA (Cambridge Clinical Research Associates), Andover, MA ABSTRACT With SAS/BASE version 8, one can create embedded HTML links
SAS 9.4 Logging. Configuration and Programming Reference Second Edition. SAS Documentation
SAS 9.4 Logging Configuration and Programming Reference Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2014. SAS 9.4 Logging: Configuration
Identifying Invalid Social Security Numbers
ABSTRACT Identifying Invalid Social Security Numbers Paulette Staum, Paul Waldron Consulting, West Nyack, NY Sally Dai, MDRC, New York, NY Do you need to check whether Social Security numbers (SSNs) are
Alternatives to Merging SAS Data Sets But Be Careful
lternatives to Merging SS Data Sets ut e Careful Michael J. Wieczkowski, IMS HELTH, Plymouth Meeting, P bstract The MERGE statement in the SS programming language is a very useful tool in combining or
Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access
Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix Jennifer Clegg, SAS Institute Inc., Cary, NC Eric Hill, SAS Institute Inc., Cary, NC ABSTRACT Release 2.1 of SAS
Smart Web. User Guide. Amcom Software, Inc.
Smart Web User Guide Amcom Software, Inc. Copyright Version 4.0 Copyright 2003-2005 Amcom Software, Inc. All Rights Reserved. Information in this document is subject to change without notice. The software
Web Development using PHP (WD_PHP) Duration 1.5 months
Duration 1.5 months Our program is a practical knowledge oriented program aimed at learning the techniques of web development using PHP, HTML, CSS & JavaScript. It has some unique features which are as
DMSplus for Microsoft SharePoint 2010
DMSplus for Microsoft SharePoint 2010 A professional add-on for DMS (Document Management System) solutions based on Microsoft SharePoint 2010 DMSplus Product Information Version 1.0 DMSplus Overview DMSplus
NETWORK PRINT MONITOR User Guide
NETWORK PRINT MONITOR User Guide Legal Notes Unauthorized reproduction of all or part of this guide is prohibited. The information in this guide is subject to change without notice. We cannot be held liable
Scatter Chart. Segmented Bar Chart. Overlay Chart
Data Visualization Using Java and VRML Lingxiao Li, Art Barnes, SAS Institute Inc., Cary, NC ABSTRACT Java and VRML (Virtual Reality Modeling Language) are tools with tremendous potential for creating
PROJECTS SCHEDULING AND COST CONTROLS
Professional Development Day September 27th, 2014 PROJECTS SCHEDULING AND COST CONTROLS Why do we need to Control Time and Cost? Plans are nothing; Planning is everything. Dwight D. Eisenhower Back to
Using SQL Queries in Crystal Reports
PPENDIX Using SQL Queries in Crystal Reports In this appendix Review of SQL Commands PDF 924 n Introduction to SQL PDF 924 PDF 924 ppendix Using SQL Queries in Crystal Reports The SQL Commands feature
Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC
Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC ABSTRACT Have you used PROC MEANS or PROC SUMMARY and wished there was something intermediate between the NWAY option
OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS
OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments
CareAware Capacity Management - Patient Flow Patient List Gadget
CareAware Capacity Management - Patient Flow Patient List Gadget When using the Patient List gadget in CareAware Patient Flow, the following tasks can be completed: Launch PowerChart Modify Patient Attributes
Using simulation to calculate the NPV of a project
Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial
Designing Adhoc Reports
Designing Adhoc Reports Intellicus Enterprise Reporting and BI Platform Intellicus Technologies [email protected] www.intellicus.com Designing Adhoc Reports i Copyright 2010 Intellicus Technologies This
Grain Stocks Estimates: Can Anything Explain the Market Surprises of Recent Years? Scott H. Irwin
Grain Stocks Estimates: Can Anything Explain the Market Surprises of Recent Years? Scott H. Irwin http://nationalhogfarmer.com/weekly-preview/1004-corn-controversies-hog-market http://online.wsj.com/news/articles/sb10001424052970203752604576641561657796544
Welcome to MaxMobile. Introduction. System Requirements. MaxMobile 10.5 for Windows Mobile Pocket PC
MaxMobile 10.5 for Windows Mobile Pocket PC Welcome to MaxMobile Introduction MaxMobile 10.5 for Windows Mobile Pocket PC provides you with a way to take your customer information on the road. You can
Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC
Using SAS With a SQL Server Database M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC ABSTRACT Many operations now store data in relational databases. You may want to use SAS
Oracle Primavera. P6 Resource Leveling Demo Script
Oracle Primavera P6 Resource Leveling Demo Script Script Team Information Role Name Email Primary Author L. Camille Frost [email protected] Contributor Reviewer Geoff Roberts [email protected]
Accounts Receivable Invoice Upload
Directions for the AR Invoice Upload Spreadsheet The AR Invoice Upload Spreadsheet is used to enter Accounts Receivable invoice information (History only, no GL Entry) to the OGsql system. This spreadsheet
SAS. Cloud. Account Administrator s Guide. SAS Documentation
SAS Cloud Account Administrator s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2014. SAS Cloud: Account Administrator's Guide. Cary, NC:
