Report of summer project Institute for development and research in banking technology 13 May -13 July, 2013 CASH DEMAND FORECASTING FOR ATMS Guided By Dr. Mahil Carr Associate Professor IDRBT, Hyderabad Submitted By Shanu Agrawal Industrial and Management Engineering M. Tech. 1 st Year IIT Kanpur 1
INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY (IDRBT) Road No. 1, Castle Hills, Masab Tank, Hyderabad-500057 CERTIFICATE OF COMPLETION This is to certify that Miss Shanu Agrawal, pursuing M. Tech degree in the Department of Industrial and Management Engineering at Indian Institute of Technology, Kanpur, has undertaken a project as an intern in the Institute of Development and Research in Banking Technology (IDRBT), Hyderabad from 13 th May, 2013 to 13 th July, 2013. She was assigned the project CASH DEMAND FORECASTING FOR ATMS which she completed successfully under the guidance of Dr. Mahil Carr, IDRBT. We wish her all the best in all her endeavors. Dr. Mahil Carr (Project Guide) Associate Professor IDRBT, Hyderabad 2
ACKNOWLEDGEMENT I owe thanks to many people who helped and guided me in the completion of this project. But most of all, I thank my supervisor Dr. Mahil Carr for providing me such an excellent opportunity to work for an interesting project idea and for motivating me all the time. At every stage of the project, he was patient in hearing and clearing all my doubts. I always got help from him despite his busy schedule and my irregular schedule. Without his support, this project would not have been possible. I am thankful for all his support. I am thankful to Industrial and Management Engineering Department, IIT Kanpur for giving me this golden opportunity to work in a high-end research institute like IDRBT. I am thankful to IDRBT for providing such an amazing platform to work on real application oriented research. Finally, I thank one and all who made this project successful either directly or indirectly. Shanu Agrawal Industrial and Management Engineering M. Tech. 1 st Year IIT Kanpur 3
ABSTRACT Cash management is a crucial activity for any bank. Banks need to predict the customer demand for cash reasonably accurately. For instance demand is subject to change according to period of time, position of ATM, socio-economic features of users. In addition, the quantity of cash drawn may be differ before holidays, may also follow weekly, monthly and annual cycles. It has been verified by using graphs and one sample t test. After analyzing the time series data, the forecasting model was designed that addresses the multiple seasonality s and calendar day effects that are prevalent in the demand for cash. Time series models are used to determine the demand for cash for the region and per ATM. The time series models we used to analyze the data are Simple Exponential model Double Exponential Smoothing Model Holt-Winters model and ARIMA After forecasting accuracy was measured by using Mean Absolute Percentage Error (MAPE), Autocorrelation plot, Ljung-box test and Normal Distribution. 4
Table of Contents Acknowledgement... 1 Abstract... 4 1 Introduction... 6 1.1 Objective... 6 1.2 Motivation... 6 1.3 Data... 6 2. Project Description... 7 2.1Analysis of Time series data... 7 2.2 Forecasting Using Time Series Methods... 9 2.2.1 Exponential Smoothing... 9 2.2.2 Double Exponential Smoothing... 10 2.2.3 Holt-Winters Method... 10 2.2.4 ARIMA Model... 11 2.3 Validity of Forecast... 13 2.3.1 Mean Absolute Percentage Error (MAPE)... 13 2.3.2 Autocorrelation Function... 13 2.3.3 Ljung-Box Test... 13 2.3.4 Check Normal Distribution... 14 3 Modeling Procedure in Excel... 14 4 Modeling Procedure in R... 14 5 Interpretation of Results... 15 6 Conclusions... 18 7 References... 19 8 Appendix... 20 5
1 Introduction 1.1 Objective To forecast the cash demand in ATM refilling using time series models through Excel and R coding. It has been divided into three parts: 1. Analysis of time series data. 2. Forecasting using time series methods. 3. Checking validity of forecast 1.2 Motivation ATM refilling is done by fixed amount and when it reaches to below threshold limit refilling is done again. The threshold limit depending on where the ATM is placed and the turnaround time it will take to replenish the cash. The services, involved in ATM management, say, The frequency of replenishing cash per week will increase the additional costs (for vendors refilling cash and in turn banks) as instead of five trips, he (the vendor refilling cash) may have to make six trips in a week to the ATM to refill cash. Though the frequency of refilling cash in ATMs depends on the area where it is located, agencies usually refill between bi-weekly to per day. In some areas such as MG Road in Bangalore we need to refill twice a day, says an official with a cash replenishing agency. Banks will need to observe the withdrawal behaviour before changing the amount deposited in each ATM. However, refilling ATMs with a higher amount unnecessarily too will not be a viable option for banks. Banks do not want to keep more money in ATMs as it is idle cash and increases their cost of funds. Depending on the dispensing pattern, the frequency of refilling cash will have to be increased. Banks will have to pay more charges, which could be around Rs 300-500 per day (per additional visit for each ATM), says the official of the replenishing agency [2]. 1.3 Data Banks store data for each and every transaction made. Hence this withdrawal data consists of historical time series data ranging over a period of certain years. The data used in this project is the daily ATM data for Union Bank of India for the period Apr 2007 to Sep 2011. 6
2. Project Description 2.1Analysis of Time series data In this we observe three things that are follows: Is there any level in time series data? Is there any trend in time series data? Is there any seasonality in time series data? Before analyzing time series data we made following assumptions: 1. There will be seasonality pattern (12 seasons) in months of a year because of difference in number of days in each month and festival (mostly it will not differ from month to month) in each month. 2. There will be seasonality pattern (52 seasons) in weeks of a year (considering 52 weeks in a year) because of difference in withdrawal in each week. 3. There will be seasonality pattern (4 seasons) in weeks of a successive months because of difference in withdrawal in each week (first two weeks are generally more crowded). 4. There will be seasonality pattern in weeks of particular month due to difference in withdrawal in each week (first two weeks are generally more crowded because pay days in occur in initial 10 days).e.g. seasonality (5 seasons) in weeks of September month. 5. There will be seasonality pattern (7 seasons) in days of weeks due to difference in withdrawal in each day. ATMs are mostly used on Friday and Saturday and least on Sunday. For above assumptions graphs shown in Figure 1-4. From graphs we can see that the difference in withdrawal is small in monthly pattern, weekly, daily patterns. Now question is that Is this difference is significant difference? It has checked by one sample t test and found that there is no significant difference in withdrawal in data points of monthly, weekly and daily pattern. So there is no seasonality in monthly, weekly and daily pattern. Now next question is that Is there any trend from one period to next period in time series data? We can clearly see from graph that there is a trend (increasing) from last quarter of 2010 to 2011. 7
Monthly withdrawal in a year 25000000 withdrawal(in Rupee) 20000000 15000000 10000000 5000000 2008 2009 2010 2011 0 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC Month Figure 1.Monthly rn in a year Weekly Withdrawal in a year 7000000 Withdrawal (in Rupee) 6000000 5000000 4000000 3000000 2000000 1000000 2008 2009 2010 2011 0 1 3 5 7 9 111315171921232527293133353739414345474951 Week Figure 2.Weekly pattern in a year 8
Weekly pattern of successive months(2011) Withdrawal(in Rupee) 6000000 5000000 4000000 3000000 2000000 1000000 0 1 2 3 4 Week apr may june july Aug Sep Figure 3.Weekly pattern of successive months in a year Withdrawal(in Rupee) 1200000 1000000 800000 600000 400000 200000 0 Daily pattern in 2011 Sat Sun Mon Tue Wed Thu Fri Day 20Aug-26Aug 27Aug-2sep 3Sep-9Sep 10Sep-16Sep 17Sep-23Sep 24Sep-30Sep Figure 4.Daily pattern of in 2011 year 2.2 Forecasting Using Time Series Methods 2.2.1 Exponential Smoothing Exponential smoothing is a technique that can be applied to time series data, either to produce smoothed data for presentation, or to make forecasts. Exponential smoothing assigns exponentially decreasing 9
weights over time. Exponential smoothing is commonly applied to financial market and economic data, but it can be used with any discrete set of repeated measurements. The raw data sequence is often represented by {x t }, and the output of the exponential smoothing algorithm is commonly written as {s t }, which may be regarded as a best estimate of what the next value of x will be. When the sequence of observations begins at time t = 0, the simplest form of exponential smoothing is given by the formulae: Where α is the smoothing factor, and 0 < α < 1. Values of α close to one have less of a smoothing effect and give greater weight to recent changes in the data, while values of α closer to zero have a greater smoothing effect and are less responsive to recent changes. 2.2.2 Double Exponential Smoothing Simple exponential smoothing does not do well when there is a trend in the data. In such situations, double exponential smoothing" or "second-order exponential smoothing is useful. Again, the raw data sequence of observations is represented by {x t }, beginning at time t = 0. We use {s t } to represent the smoothed value for time t, and {b t } is our best estimate of the trend at time t. The output of the algorithm is now written as F t+m, an estimate of the value of x at time t+m, m>0 based on the raw data up to time t. Double exponential smoothing is given by the formulas And for t > 1 by Where α is the data smoothing factor, 0 < α < 1, and β is the trend smoothing factor, 0 < β < 1. To forecast beyond x_t 2.2.3 Holt-Winters Method The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations one for the level, one for trend, and one for the seasonal component denoted by, with smoothing parameters α, β and ( α, β,. We use L to denote the number of periods in a cycle (12 if months of year, 7 if days of week ). 10
The point forecast made in time period t for is calculated as: (h=1, 2 ) (1) The smoothing equations: 1 (2) 1 (3) 1 (4) 2.2.4 ARIMA Model An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). AR part of a time series Y t is that the observed value y t depends on linear combination of previous observed values up to a defined lag (denoted p), plus a random error term ε t because partial dependence The AR(p) model can be expressed algebraically as: y t = φ 1 y t-1 + φ 2 y t-2 + + φ p y t-p + ε t MA part of a time series Y t is that the observed value y t is a random error term plus linear combination of previous random error terms up to a defined lag (denoted q). The MA(q) model can be expressed algebraically as: y t = ε t + θ 1 ε t-1 + θ 2 ε t-2 + + θ q ε t-q If we combine differencing with auto regression and a moving average model, we obtain a non-seasonal ARIMA model. Then equation can be written as: (1- φ 1 B- φ 2 B 2 - - φ p B p ) (1-B) d y t = c+(1- θ 1 B- θ 1 B 2 - - θ 1 B p ) ε t AR(p) d differences MA(q) Where B: Backward shift operator (shift-ing the data back one period) A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA models we have seen so far. It is written as follows: ARIMA(p,q,d)(P,Q,D) L Non-seasonal part of model seasonal part of model We use L to denote the number of periods in a cycle (12 if months of year, 7 if days of week ) 11
Flow chart for ARIMA model 12
2.3 Validity of Forecast The forecast intervals for models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these are assumptions do not hold, then the forecast intervals may be incorrect. For this reason, always plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals. The validity of forecast is checked by using following methods: 2.3.1 Mean Absolute Percentage Error (MAPE) It is a measure of accuracy of a method for constructing fitted time series values in statistics, specifically in trend estimation. It usually expresses accuracy as a percentage, and is defined by the formula: where A t is the actual value and F t is the forecast value. The absolute value in this calculation is summed for every fitted or forecasted point in time and divided again by the number of fitted points n. Multiplying by 100 makes it a percentage error. In general a MAPE of 10% is considered very good but normally it range in 20-30%. 2.3.2 Autocorrelation Function Residuals should be random and independent, for this it should not be auto correlated. 2.3.3 Ljung-Box Test It is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero. The Ljung Box test can be defined as follows: H0: The data are independently distributed (i.e. the correlations in the population from which the sample is taken are 0, so that any observed correlations in the data result from randomness of the sampling process). Ha: The data are not independently distributed. The test statistic is: where n is the sample size, is the sample autocorrelation at lag k, and h is the number of lags being tested. For significance level α, the critical region for rejection of the hypothesis of randomness is 13
where is the α-quantile of the chi-squared distribution with h degrees of freedom. 2.3.4 Check Normal Distribution For white noise residuals should follow normally distributed. After plotting histogram fitting normal distributed curve. 3 Modeling Procedure in Excel In excel forecasting was done by using Simple Exponential model, Double Exponential Smoothing model and Holt-Winters model. Modeling procedure is approximately same for all these models are follows: 1. Find initial values for the level, the trend rate and the seasonal factors according to selected model. 2. Then calculate the new estimates of level, trend and seasonality using above equations according to selected model 3. Then calculate the point forecast value by using above estimates. 4. Now we have both actual and forecast value of historical data, from this we can calculate absolute value of error (MAE) i.e. difference between actual and forecast value. 5. Determine the smoothing constants α, β and that minimizes the mean absolute error (MAE) using Excel Solver. 6. Then checking validity of forecast result using auto correlation function, Ljung test and Normal Distribution plot of residual in R. 7. After that we can make point forecast for future periods. 4 Modeling Procedure in R In R forecasting was done by using Simple Exponential model, Double Exponential Smoothing model, Holt-Winters model and ARIMA. Modeling procedure is approximately same for all these models are follows: Once you have installed R on a Windows computer, you can install an additional package by following the steps below: 1. Once you have started R, you have to install R package ( XLConnect and forecast package) by choosing Install package(s) from the Packages menu at the top of the R console. 2. Load library functions i.e. XLConnect and forecast. 3. Then read time series from workbook by using loadworkbook( ) function from XLConnect package. 4. Apply t test to the data for checking Is there any significant difference in withdrawal of different different data points? 5. Then we convert data into time series frame assuming first no seasonality in data means frequency=1. 14
6. After this we make forecast using simple exponential model by HoltWinters ( ) function from forecast package setting beta and gamma FALSE because in simple exponential no trend and seasonality. 7. After this checking validity of forecast using acf( ), Box.Test( ) and by normal distribution plot of residuals. 8. After this we make forecasting for further period using forecast.holtwinters ( ) function according to required periods. 9. Similarly for Double Exponential Smoothing and Holt Winter model we applied by adjusting beta and gamma value. 10. For forecasting using ARIMA model we use auto.arima( ) function from forecast package. Then repeat 7 th and 8 th step. For Monthly forecasting R code is given in appendix, for weekly and daily forecasting also code will be same with few changes. 5 Interpretation of Results 1. On Monthly Basis withdrawal Taking from Jan-2008 to Jun -2011 as historical data and forecasting for next 3 months (July-sep 2011). We can see from graph shown in figure1 that there is trend (increasing) from year 2008 to 2011. Forecast Results using Excel Month Actual Fore_HW Fore_DES Fore_ES JUL 18614300 18448082.16 18877299 17963806.7 AUG 20450533.33 18447689.65 19111582 17963806.7 SEP 18022300 20820396.24 19345865 17963806.7 MAPE 7.01 7.39 7.18 ACF passed passed passed L jung Test passed passed passed Normal Dist. Followed Followed Followed Forecast Results using R Month Actual fore_es fore_des fore_hw fore_arima Jul 18614300 18469116.53 19002892.62 19042659.42 18809953 Aug 20450533.33 18469116.53 19255709.7 20513614.89 19019292 Sep 18022300 18469116.53 19508526.77 21814274.84 19228630 MAPE 4.316008748 5.540136923 7.339768224 4.511494 ACF passed passed failed passed L jung Test passed passed passed passed Normal Dist. Followed Followed Followed Followed 15
2. On week basis: We can do forecasting on week basis using two methods below: a) By considering 52 seasons in a year. b) By considering 4 seasons in successive month. a) For forecasting on weekly basis taking from week1 of 2008 to week 36 of 2011 as historical data and forecasting for next 3weeks (37-39 weeks of 2011).we can see from graph shown in figure 2 that there is trend (increasing) from year 2008 to 2011 and due to transaction difference in each week this taking 52 seasons in a year. Forecast Results using Excel Week(2011) Actual Fore_ES Fore_DES Fore_HW 37 4795300 4347122 4389653 4272329 38 3068500 4347122 4390689 4526822 39 4272500 4347122 4391725 4506860 MAPE 17.58735 18.11297 21.30559 ACF failed failed passed L- Jung Test failed failed passed Normal Dist. followed followed followed Forecast Results using R Week(2011) Actual Fore_ES Fore_DES Fore_HW Fore_ARIMA 37 4795300 4360478 4399495 4710448 4766318.483 38 3068500 4360478 4402583 4686353 4644281.332 39 4272500 4360478 4405672 4772860 4109684.12 MAPE 17.7438 18.28257 22.06841 18.58954464 ACF failed failed passed passed L jung Test failed failed passed passed Normal Dist. followed followed followed followed b) For forecasting on week basis taking from week1of January 2008 to week1 of Sep 2011 as historical data and forecasting for next 3 weeks (2-4 week of Sep 2011). Pattern is shown in figure 3. Forecast Results using Excel Week(Sep 2011) Actual Fore_ES Fore_DES Fore_HW 2 5333700 4344150 4140268 4946206 3 3177700 4344150 4075195 4481201 4 4317000 4344150 4010122 4171006 MAPE 18.62969 19.24248 17.22237 ACF failed failed passed L jung Test passed failed passed Normal Dist. followed followed followed 16
Forecast Results using R Week(Sep 2011) Actual Fore_ES Fore_DES Fore_HW Fore_ARIMA 2 5333700 4348899 4175056 5481334 4740220 3 3177700 4348899 4124223 4984934 4748460 4 4317000 4348899 4073390 3993368 4571047 MAPE 18.68649 19.05085 22.37902 22.1475 ACF failed failed failed passed L jung Test passed passed passed passed Normal Dist. followed followed followed followed 3. On daily basis: For forecasting on daily basis taking from 1/1/2008 to 27/9/2011 as historical data and forecasting for next 3 days (28-29 Sep 2011). Pattern for daily basis shown in figure 4. Forecast Results using Excel Day(2011) Actual Fore_ES Fore_DES Fore_HW 28/09/2011 617100 605149.3 589531.8 569801.9 29/09/2011 571400 605149.3 589861.9 599025.2 30/09/2011 470700 605149.3 590192 668046.8 MAPE 12.13557 11.02813 18.14182 ACF failed failed passed L jung Test failed passed failed Normal Dist. followed followed followed Forecast Results using R Day(2011) Actual Fore_ES Fore_DES Fore_HW Fore_ARIMA 28/09/2011 617100 601719.2 585397.4 562390.5 631656.9 29/09/2011 571400 601719.2 585320.1 596371.8 619255.5 30/09/2011 470700 601719.2 585242.8 603665.4 617706.5 MAPE 11.87784 10.63602 13.8281 13.9885 ACF failed failed failed failed L jung Test failed passed failed passed Normal Dist. followed followed followed followed 17
6 Conclusions Following conclusion drawn on the basis of data obtained from particular ATM, this will vary from ATM to ATM depends on position of ATM, socio-economic features of users. For UBI (Muland) ATM there is no significant difference in data points due to this forecasting using Simple Exponential Smoothing Model, Double Exponential Model and ARIMA Model forecasting well. By forecasting using R gives upper and lower bound, so based on economic scenario we can choose bound value of cash for refilling ATM. Suppose if we know previously that next week there will inflation/deflation/normal according to that we can choose lower/upper/forecast value of cash for refilling ATM. 18
7 References 1. Avril Coghlan. Little Book of R for Time Series. Available from http://a-little-book-of-r-for-timeseries.readthedocs.org/en/latest/. 2. http://www.dnaindia.com/money/1391225/report-atm-refilling-costs-up-as-withdrawal-limit-ishiked 3. Joe Choong. Forecasting With MS Excel. McGraw-Hill; 1 edition (January 11, 1993) 4. Mirai Solutions GmbH. XLConnect: Excel Connector for R. Retrieved from http://cran.rproject.org/web/packages/xlconnect/index.html, http://mirai-solutions.com. 5. Rob J Hyndman. forecast: Forecasting functions for time series and linear models. Retrieved from http://cran.r-project.org/web/packages/forecast/index.html. 6. Rob J Hyndman and George Athanasopoulos. Forecasting: Principles and practice. Available from http://otexts.com/fpp/. 7. Ruey S. Tsay. An Introduction to analysis of financial data with R. John Wiley &Sons Inc., Hoboken, New Jersey. 19
8 Appendix R code: #Using library require(xlconnect) library(forecast) #Reading data from excel file wb_m <- loadworkbook("e:/idrbt/r/worksheet.xlsx", create = TRUE) data = readworksheet(wb_m,sheet = "Month",startRow = 3, endrow = 47,startCol = 2, endcol = 3,header=TRUE) data #Checking Is there difference in withdrawal in each month t.test(data$amount,mu=mean(data$amount)) #Converting data into time series frame assuming no seasonality datats<-ts(data$amount,frequency=1) datats plot.ts(datats) #Forecasting using Simple Exponential Method monthf_es<-holtwinters(datats,beta=false,gamma=false) monthf_es plot(monthf_es) series_es<-forecast.holtwinters(monthf_es,h=2) series_es #Checking Validity of Exponential Smoothing forecast method acf(series_es$residuals, lag.max=20) Box.test(series_ES$residuals, lag=20, type="ljung-box") #writing into excel file writeworksheettofile("e:/idrbt/r/worksheet.xlsx", data = series_es,sheet = "month_outn", startrow = 3, startcol = 1) #Forecasting using Double Exponential Method monthf_des<-holtwinters(datats,gamma=false) monthf_des head(monthf_des) plot(monthf_des) series_des<-forecast.holtwinters(monthf_des,h=2) series_des #Checking Validity of Double Exponential Smoothing forecast method acf(series_des$residuals, lag.max=20) Box.test(series_DES$residuals, lag=20, type="ljung-box") #writing into excel file writeworksheettofile("e:/idrbt/r/worksheet.xlsx", data = series_des,sheet = "month_outn", startrow = 3, startcol = 8) #Converting data into time series frame assuming seasonality datats_s<-ts(data$amount,frequency=12) datats_s plot.ts(datats_s) #Forecasting using Holt-Winter Method 20
monthf_hw<-holtwinters(datats_s) monthf_hw plot(monthf_hw) series_hw<-forecast.holtwinters(monthf_hw,h=2) #Checking Validity of Holt Winter method acf(series_hw$residuals, lag.max=20) Box.test(series_HW$residuals, lag=20, type="ljung-box") #writing into excel file writeworksheettofile("e:/idrbt/r/worksheet.xlsx", data = series_hw,sheet = "month_outn", startrow = 3, startcol = 15) #Forecasting using Arima Model fit <- auto.arima(datats) fit head(fit) x<-forecast(fit,h=2) x plot(x) #Checking Validity of forecast by ARIMA acf(residuals(fit), lag.max=20) Box.test(residuals(fit), lag=20, type="ljung-box") writeworksheettofile("e:/idrbt/r/worksheet.xlsx", data = x,sheet = "month_outn", startrow = 3, startcol = 21) plotforecasterrors <- function(forecasterrors) { # make a histogram of the forecast errors: mybinsize <- IQR(forecasterrors)/4 mysd <- sd(forecasterrors) mymin <- min(forecasterrors) - mysd*5 mymax <- max(forecasterrors) + mysd*3 # generate normally distributed data with mean 0 and standard deviation mysd mynorm <- rnorm(10000, mean=0, sd=mysd) mymin2 <- min(mynorm) mymax2 <- max(mynorm) if (mymin2 < mymin) { mymin <- mymin2 } if (mymax2 > mymax) { mymax <- mymax2 } # make a red histogram of the forecast errors, with the normally distributed data overlaid: mybins <- seq(mymin, mymax, mybinsize) hist(forecasterrors, col="red", freq=false, breaks=mybins) # freq=false ensures the area under the histogram = 1 # generate normally distributed data with mean 0 and standard deviation mysd myhist <- hist(mynorm, plot=false, breaks=mybins) # plot the normal curve as a blue line on top of the histogram of forecast errors: points(myhist$mids, myhist$density, type="l", col="blue", lwd=2) } plotforecasterrors(series_es$residuals) plotforecasterrors(series_des$residuals) plotforecasterrors(series_hw$residuals) plotforecasterrors(residuals(fit)) 21