Portfolio Backtesting: Using SAS to Generate Randomly Populated Portfolios for Investment Strategy Testing Xuan Liu, Mark Keintz Wharton Research Data Services Abstract One of the most regularly used SAS programs at our business school is to assess the investment returns from a randomly populated set of portfolios covering a student-specified historic period, rebalancing frequency, portfolio count, size, and type. The SAS program demonstrates how to deal with dynamically changing conditions, including periodic rebalancing, replacement of delisted stock, and shifting of stocks from one type of portfolio to another. The application is a good example of the effective use of hash tables, especially for tracking holdings, investment returns. 1. What is Backtesting and how does it work? Backtesting is the process of applying an investment strategy to historical financial information to asses the results (i.e. change in value). That is, it answers the question what if I had applied investment strategy X during the period Y?. The backtest application developed at the Wharton school, used for instructional rather than research purposes, is currently applied only to publically traded stocks. Later, in the Creating a Backtest section, we go over the more important considerations in creating a backtest program. However, as an example, a user might request a backtest of 4 portfolios, each with 20 stocks, for the period 2000 through 2008. The four portfolios might be from a cross-classification of (a) the top 20% of market capitalization (and bottom 20%) crossed with (b) top 20% and bottom 20% of book-to-market ratios. The user might rebalance the stocks every 3 months (i.e. redivide the investment equally among the stocks) and refill (i.e. replace no-longer eligible stocks) every 6 months. 2. Source file for Backtesting The source file used for Backtesting is prepared by merging monthly stocks data (for monthly prices and returns), event data (to track when stocks stopped or restarted trading), and annual accounting (for book equity data) data filed with the SEC. (shown in Figure 2.1). 1
Monthly Stocks&Event Files Annual accounting datafile Sourcefile for Backtesting Fig. 2.1 Data file used for Backtesting This yielded a monthly file, with changes in price, monthly cumulative returns, and yearly changes in book value). Because the data is sorted by stock identifier (STOCK_ID) and DATE, it allows the calculation of a monthly cumulative return (CUMRET0) for each stock in the dataset using the single month returns (RETURN), as below. CUMRET0 will be used later to determine the actual performance of each portfolio. /*Calculation of monthly cumulative returns */ data monthly_cumreturns; set monthly_file; by stockid date; if first.stock_id then do; if missing(return)=0 then cumret0=(1+return); else cumret0=1; else do; if missing(return)=0 then cumret0=cumret0*(1+return); else cumret0=cumret0; retain cumret0; Now, as mentioned earlier users may restrict portfolios to specific percentile ranges of variables like market capitalization (MARKETCAP). These percentiles (deciles in this example) are generated via PROC RANK for each refill date, as below: /*Portfolio deciles using the market cap criteria*/ proc sort data=monthly_cumreturns out=source; by date stockid; proc rank data=source out=temp group=10; by date; var marketcap; ranks rmarketcap; The resulting dataset looks like this: 2
stock_id date 50091 19711031 0.010000 0.67726 50104 19711031 0.019802 2.34271 Exampleofth edatafileusedforbacktesting Return Cumret0 marketcap (Marketcap) 33270.00 49723.25 Table 2.1 Data file used for Backtesting rmarketcap (Rankfor marketcap) 5 6 3. Creating a Backtest Once the primary file has been created, the backtest can be defined throughh these parameters: Structural Parameters: - Date range of the investment. - Number of portfolios and amount invested in each portfolio. - Number of stocks in each portfolio. - Rebalancing Frequency: For portfolios designated as equally weighted the stocks in each portfolio are periodically reallocated so they have equal value. (Portfolios that are value weighted are not rebalanced). - Refilling Frequency: The frequency of determining whether a stock still qualifies for a portfolio (seee Portfolio Criteria below) and replacing the stock if it doesn t. Portfolio Criteria (these are specified in date-specific percentile, not absolute values): - Market capitalization: the total value of all publically traded shares for a firm. - Book-to-Market Ratio: the accounting value of a firm vs. its market capitalization. - Lagged Returns: the return for the previous fiscal period. - P/ /E Ratio: Ratio of the price of each share to the company earnings per share - Price: Price of a share. The process of taking these parameters and generating a backtest is displayed in the following figure: Initialcash Startandenddate Stocksperportfolio Typeofportfolio weighting(market caporequal weighted) rebalance&refill period Screening(optional) Screenbydeciles.Screen metricsaremarketcap, booktomarketratio, earningstopriceratioorlag returns,priceetc.) keepeverythinginone portfolio Usedeifferentmetricsto dividesecuriesintomultiple potfolios Partition(optional) Analysis Sourcefilesetup Fig. 3.1 Creating a Backtest 3
This introduces a number of programming tasks. The primary tasks are: 1. For the start date and each refill date, generate percentiles for the portfolio criteria. 2. At the start date, randomly draw stocks for each portfolio from qualifying stock. 3. Track monthly cumulative return (i.e. cumulative increase or decrease) in the value of each stock in each portfolio. Each stock is tracked so that rebalancing can be done, if needed. 4. If a stock stops trading at any point, reallocate its residual value to the rest of the portfolio. 5. At every refill point, keep all stocks in the portfolio that are still eligible (buy and hold) and randomly select replacements for all stocks no longer eligible. By default, all available securities are considered for inclusion in the backtest. The universe can be filtered by adding one or more screens based on the portfolio criteria (expressed in deciles in this paper). Multiple portfolios can be created by dividing securities into distinct partitions based on the value of one or two metrics. For example, using two metrics, book to market and price with 2 partitions for book-to-market and 3 partitions for price will result in 6 portfolios. Once the portfolio is constructed, performance of each portfolio will be analyzed. 4. Portfolios are populated by randomly selected securities During the creation of a backtest, securities within a portfolio are randomly selected, which is made possible by generation of a random number for each stock_id, /*randomization of the stocks*/ proc sort data=inds out=outds; by stock_id date; %let seed =10; data randomized_stocks / view = randomized_stocks; set outds; by stock_id; retain ru; if first.stock_id then ru=ranuni(&seed); output; inds is the input dataset with one record per stock_id - date. outds is the output dataset with added random variable sorted by stock_id - date. ru is the random variable generated from the seed. A constant unique random value is generated for each stock_id. Each call with different seed will cause a new set of random numbers generated for the stock_ids (See table 4.1). ru ru stock_id date (seed=10) (seed=30) 10042 20050831 0.70089 0.10266 10042 20060831 0.70089 0.10266 10042 20070831 0.70089 0.10266 10078 20050831 0.99824 0.99473 10078 20060831 0.99824 0.99473 10078 20070831 0.99824 0.99473 Table 4.1 Sample outputs with different seed 4
5. The refill process People buy and hold securities for a certain period of time. During the holding period, some stocks may disappear due to delisting or become disqualified using the initial portfolio set up criteria. In either case, the size of the portfolio shrinks. To bring the portfolio back to its original size, a refill process is performed on each user specified date. One possible problem that can distort the refill process is the possibility that a stock can cease trading (become delisted ) and later reappear on the market. If the stock retains the same randomly assigned priority used in the initial sampling then it would be included in the refill event after its re-entry on the market. In order to avoid this problem we used the following approach: generate the random number that associates with the date variable and assign a stage variable to indicate its on-off appearance if any. Whenever the stock reappears, generate a new random number for that stock. Sort the stock pool by date and random number. When it is the time for refill, the first nth stocks (n is the number of stocks asked by the user) should be selected to form the desired portfolio. /* Randomization procedure used for portfolio Buy & Hold and Refill process*/ data stocks_held(drop=lagdate); set stocks; by stock_id; retain ru stage; lagdate = lag(date); if first.stock_id then do; stage =1; ru = date + ranuni(&seed); else if intck('month', lagdate, date)>1 then do; stage = stage +1; ru = date + ranuni(&seed); proc sort data= stocks_held; by date ru; 6. Rebalance Rebalancing brings your portfolio back to your original asset allocation mix. This is necessary because over time some of your investments may become out of alignment. Table 6.1 illustrates a simple example for equal- weighted portfolio with two stocks, 5
OnJan31,1990,Initialcash:$120.Bought12sharesofstock1and15sharesofstock2 stock_id=1 Stock_id=2 total Date price cumret0 money money amountin return Price return cumret0 forstock1 invested invested Portfolio 19900131 $5. 1 $60 $4. 1 $60 $120 19900228 $6 0.2000 1.2000 $72 $5 0.2500 1.2500 $75 $147 19900331 $10 0.6667 2.0000 $120 $6 0.2000 1.5000 $90 $210 OnApril1,1990,theportfolioisrebalanced.Initialcash:$210.Sold1.5sharesofstock1andpurchased2.5sharesofstock2 19900430 $12 0.2000 2.4000 $126 $10 0.6667 2.5000 $175 $301 19900531 $15 0.2500 3.0000 $157.5 $10 0.0000 2.5000 $175 $332.5 19900630 $18 0.2000 3.6000 $189 $12 0.2000 3.0000 $210 $399 OnJuly1,1990,,theportfolioisrebalanced.Initialcash:$399.Bought0.583sharesofstock1andsold0.875sharesofstock2 Note:$399=$210*[(1+0.2000)*(1+0.2500)*(1+0.2000)+(1+0.6667)*(1+0.0000)*(1+0.2000)]/2 =$210*(3.6000/2.0000+3.0000/1.5000)/2 Table 6.1 Equal- weighted portfolio with two stocks The task is to calculate the cumret0 divide by the cumret0 at beginning of the rebalance period (denoted by eq_rebal_wgt in the following SAS code). The following SAS uses SAS hash object. It can quickly retrieve cumret_rebal_start (cumret0 at beginning of the rebalance period). The hash object is uniquely suited to this step in the process. Not only does it provide a quick lookup of the starting values for each stock, it easily accommodates the changing composition of a portfolio, and updating of those values in place. The result is listed in table 6.2. /* equal- weighted portfolio rebalance weight calculation for a single stock*/ data bal_source; if _n_=1 then do; declare hash ht(); ht.definekey("stock_id"); ht.definedata("cumret_rebal_start"); ht.definedone(); set source_sample end=done; if rebal_flag=1 then do; cumret_rebal_start= (cumret0)/ (1+return); rc=ht.replace(); else do; rc=ht.find(); drop rc; eq_rebal_wgt = cumret0 / cumret_rebal_start; return cumret0 rebal_flag cumret_rebal_start eq_rebal_wgt date Stock_id=1 19900131. 1 1 19900228 0.2000 1.2000 0 19900331 0.6667 2.0000 0 19900430 0.2000 2.4000 1 2.0000 19900531 0.2500 3.0000 0 2.0000 19900630 0.2000 3.6000 0 2.0000 1.8 Table 6.2 Calculation of eq_rebal_wgt 6
Once eq_rebal_wgt is calculated for all the stocks, the rebalance weight for the portfolio (p_rebal_wt) can easily be calculated by use of proc means on eq_rebal_wgt as following, /* equal- weighted portfolio rebalance weight calculation*/ proc means data= bal_source ; class portfolio_id date;/*date here corresponds to rebalance date.*/ var eq_rebal_wgt ; output out = outds mean(eq_rebal_wgt)= p_rebal_wt; Conclusion This paper focuses on the randomization procedure used for portfolio construction for backtesting, as well as how the portfolio is refilled and rebalanced during its evolution. The randomization procedure is designed to accommodate the buy-and-hold strategy of portfolio management. We also illustrate how a SAS hash object is used for fast and simple retrieval of stock cumulative returns, making the calculation of multi-stock portfolio returns a simple use of proc means. CONTACT INFORMATION Author: Address: Email Xuan Liu Wharton Research Data Services 216 Vance Hall 3733 Spruce St Philadelphia, PA 19104-6301 xuanliu@wharton.upenn.edu Author: Address: Email Mark Keintz Wharton Research Data Services 216 Vance Hall 3733 Spruce St Philadelphia, PA 19104-6301 mkeintz@wharton.upenn.edu TRADEMARKS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. Indicates USA registration. 7