Automoble Demand Forecastng: An Integrated Model of PLS Regresson and ANFIS 1 SUN Bao-feng, 2 L Bo-ln, 3 LI Gen-dao, 4 ZHANG Ka-mng 1. College of Transportaton, sunbf@jlu.edu.cn 2. College of Transportaton, 823127386@qq.com 3. School of Management, lgendao@jlu.edu.cn 4. College of Transportaton, kamng.zhang@faw-vw.com Abstract The accuracy of demand forecastng s of great mportance for automoble manufacturng, transportaton and sellng. Consderng automoble demand much more senstve to prce wavng and customer s purchasng decson deeply reled on nternet searchng nformaton, ths paper offered an ntegrated model of PLS regresson and ANFIS for the automoble demand forecastng. To smplfy the problem, we use partal least square (PLS) regresson to select the most nfluental factors frst ncludng sales prce, advertsng cost, searchng nde, consumer satsfacton nde. Then these factors and sales data are taken as the nput varables nto an adaptve-network-based fuzzy nference system (ANFIS) to get the forecast. To test and valdate the proposed method, we compare our model wth Autoregressve Integrated Movng Average (ARIMA) model and Tme Seres Decomposton Multplcatve (TSDM) model usng sales data collected from Changchun. The results show that the proposed model outperforms the other two models, especally when the market envronment fluctuates dramatcally. 1. Introducton Keywords: Automoble Demand Forecastng, Partal Least Square, ANFIS Wth the rapd development of economy n Chna, the demand for automobles ncreases sgnfcantly. In 2010, the automoble sales n Chna were about 18 mllon, 32.37% ncrease compared to the sales of 2009. Meanwhle, most global car makers want to reap proft from the huge market n Chna, whch results n serous competton. Effectve management determnes the proftablty of car manufacturer. Accurate demand forecast plays an mportant role n the management, snce t s the bass of all other actvtes. Accurate demand forecast can keep the manufacture a lower nventory level wth hgher customer satsfacton. For mature car type, tme seres models are usually used for demand forecast n companes, for eample, autoregressve ntegrated movng average (ARIMA) model and tme seres decomposton (TSD) model. Although these methods perform well n a stable market, the sharp change of macro economc envronment snce the fnancal crss makes these methods nvald. Moreover, more and more factors ncludng regulatons, ol and gas prces and ta polcy sgnfcantly mpact the demand, whch should be taken nto account n car demand forecast. Furthermore, the development of Internet makes consumers easly to access the nformaton of the ntended products. One of the most useful ways to fnd the nformaton s the use of search engne. Studes reveal that the sales and the searchng number have sgnfcant postve relatonshp (Cho and Varan (2009)[1]). Some search engne companes such as Google, Badu have compled searchng ndees for popular products. Therefore, ths ntenton nde must be also consdered n car demand forecast. In addton, consumers can easly obtan the feedbacks of former buyers on the web. Some portal webstes such as Sohu, Tencent perodcally conduct surveys for certan products and publsh consumer satsfacton report. The reports have sgnfcant mpact on the consumers purchasng decson. So, ths factor should not be gnored. In ths study, we propose an ntegrated method that combnes PLS regresson and ANFIS for automoble demand forecastng. The remander of the paper s organzed as follows. The related lterature s revewed n Secton 2. In secton 3, we dentfy possble factors that mpact car demand. The proposed forecastng model s presented n Secton 4. To valdate the model, we compare the proposed model and other commonly used models n Secton 5. Conclusons are drawn n Secton 6. Advances n nformaton Scences and Servce Scences(AISS) Volume5, Number8, Aprl 2013 do:10.4156/aiss.vol5.ssue8.52 429
2. Lterature revew The studes of automoble demand forecast can be classfed nto two categores. The frst s the set of studes of forecastng the sales of automoble ndustry, and the second ntroduces forecast models for predctng the automoble demand for manufacturers (Wang et al. (2011)[2]). Both categores wll be revewed below. There are etensve studes developng technques to forecast the sales of automoble ndustry (Wang et al. (2011)[2]), whch play a central role n the plannng and decson makng of numerous publc agences and prvate organzatons. The forecast models have evolved from unvarate modelng to multple nputs modelng to mprove forecast accuracy. Multple regresson models are used wth many varables ncludng ncome, traffc polces, mantenance cost, car sales prce, fuel prce, average earnngs of employees n ndustry and servces, the ndees of producer s nventory, etc. (Carlson and Umble (1980)[3], Chn and Smth (1997)[4], Romlly et al. (1998)[5], Wang et al. (2011)[2], Han-Chen Huang et al. (2012)[6]). Besdes the lnear and statstcal analyss models, several non-lnear methods have been used to forecast sales, such as artfcal neural networks (ANNs) and ANFIS. Geng L-Yan et al (2012) [7] proposed a novel forecastng model based on a systemcally ntegrated approach to mprove the forecastng accuracy and modelng speed of regonal logstcs demand, whch comprsed of the kernel prncpal component analyss (KPCA), the least squares support vector machnes (LSSVMs) and the PSO wth tme varyng acceleraton coeffcents (PSOTVAC) algorthm. Dfferent from forecastng ndustry sales, demand forecast for specfc automoble type s commonly appled n companes that operate n consumer markets, whch helps the manufacturer make effectve management decson n an uncertan envronment. As stated by Wang et al. (2011)[2], researchers have proposed varous models ncludng Delph technque, Naïve, Gompertz, Logt, eponental smoothng, regresson analyss, ARIMA, TSD, ANNs, etc. Among these models, ARIMA and TSD are commonly used n practce. However, as ndcated by Levenhach and Cleary (2006)[8], dfferent forecast methods are sutable for dfferent product lfe cycles, whch nclude four stages: ntroducton, growth, maturty and declne. For eample, qualtatve methods, e.g. market research are sutable for the ntroducton stage, whle tme seres methods such as ARIMA can be appled for the maturty stage. It should be noted that the hstorcal sales data are usually used n the methods mentoned above. Even n the multple nput models such as multvarate regresson and ANN, only a few factors ncludng prce, advertsng nvestment, promotons are consdered. These methods can produce accurate predct n a relatvely stable envronment. Poor performance, however, s often reported n volatle market envronment. Snce the fnancal crss from 2008, the automoble market envronment has changed dramatcally. More factors ncludng macroeconomc factor, polcy factor etc. have sgnfcant mpact on the automoble sales. In addton, wth the development of Internet, people who am to buy a partcular car wll frst search for related nformaton from the Internet. The search logs are converted nto searchng nde by search engnes, whch could be used as the buyers purchasng ntenton. Studes have shown that purchasng ntenton has great contrbuton to forecast accuracy (Armstrong et al. (2000)[9]). Therefore, new market envronment calls for new forecast models. In ths paper, we propose an ntegrated models that combnes the ANFIS (frst proposed by Jang (1993)[10]) and PLS regresson to forecast the demand of certan automoble type. The ANFIS combnes the benefts of ANNs and fuzzy nference systems. Many applcatons such as hydrology, stock prce etc., have proved the good performance of ths method (Wang et al. (2011)[2]). Ths knd of ntegrated predcton model gves a nce try wth the advantages of two or more models stage by stage n the process predctng and decson-makng. Han-Chen Huang et al. (2012)[6]) smlarly, used the partcle swarm optmzaton algorthm combned wth a back-propagaton neural network (PSOBPN) to establsh a demand estmaton model and then used gray relatonal analyss to select factors hghly correlated to travel demand as tranng and predcton nput factors n the predcton model. However, the computaton burden s heavy when the number of nput varables s large. Therefore, the PLS regresson s used to reduce the dmenson of the nput data. The reason for choosng PLS s that t can deal wth large nput varables wth hgh colnearty whch s the stuaton of our problem. 430
3. Varable selecton For automoble manufacturers, the forecast usually conducted quarterly. Therefore, quarterly data are used to establsh the forecast. Through lterature surveys and ntervews wth sales managers, 13 mportant ndcators (see Table 1) are used as dependent varables, of whch hstorcal data are collected ether from Chnese Statstcs of Changchun or publc database from the nternet. Classfcaton Macroecono mc factors 1 Table 1. Summary of selected dependent varables Varables Gross domestc producton (GDP) Average dsposable ncome per person Consumer purchasng nde (CPI) Tme unt Data type Data source Notaton Y A Dstrct Statstcal Yearbook 11 Y A Dstrct Statstcal Yearbook 12 Y D Dstrct Statstcal Yearbook 13 Unemployment rate(ur) Y A Dstrct Statstcal Yearbook 14 Populaton Y A Dstrct Statstcal Yearbook 15 Personal consumpton(pc) Y A Dstrct Statstcal Yearbook 16 Prce factors 2 Prce of automoble Q;M A Interest rate M A Ol prce Q;M A Webste of the automoble manufacturer Webste of The People s Bank of Chna Webste of Natonal Bureau of Statstcs of Chna 21 22 23 Consumer factors 3 Other factors 4 Searchng nde Q;M E Data center of Badu 31 Consumer satsfacton nde Sales of compettve automoble types Q Q;M Advertsng nvestment Q;M A E D Survey report of automoble satsfacton of Tencent Webste of other automoble manufacturers Webste of automoble manufacture Note: In the tme unt column, Y denotes year, Q denotes quarter and M denotes Month. In the data type column, A denotes accurate data, D denotes relatve value and E denotes estmated data by eperts. Generally, the nfluence factors for regonal sales of automoble may nclude: macroeconomc factors, prce factors, consumer factors and other factors. Gross domestc producton, average dsposable ncome per person, consumer purchasng nde, unemployment, populaton and personal consumpton are commonly used as the macroeconomc ndcators. However, mult-collnearty ests between the fve factors. Therefore, we do not use the factors drectly, but aggregate them usng prncpal component analyss as the macroeconomc factor. The sales prce of automoble and the ol 1 prce are man factors mpactng the sales of automoble. In addton, the nterest rate also mpacts the sales snce many car dealers use credt sales nstead of cash sale. Compettve automoble sales and advertsng nvestment have heavy mpact on the sale of certan type automoble. The actual sale data of all compettve automoble types are not easy to obtan, so we use the market share to nstead. The advertsng nvestment s obtaned from the fnancal report of the manufacturer. It should be noted that two factors, the searchng nde and the consumer satsfacton nde, are adopted n our paper. Wth the development of Internet, before gong to the car dealers, the consumers usually search for the nformaton of ther ntended car. So, the searchng nde, establshed by searchng engne provders, e.g. Google, Badu etc. s a leadng ndcator for the automoble sale. Ths knd of nde can be seen as 32 X 41 X 42 431
an ndcaton of consumers purchase ntentons, whch s proved to have great contrbutons to the forecast accuracy. Furthermore, the consumer satsfacton nde, compled by some portals such as Yahoo, Tencent usng surveys, has sgnfcant mpact on the decson of the ntended buyers. Thus, we nclude the two ndcators. Other factors nclude sales of compettve automoble types and advertsng nvestment. 4. Forecastng model In ths secton, an automoble demand forecast model s proposed, whch ntegrates the PLS regresson and the ANFIS. The forecastng procedure conssts of the followng two stages: Stage 1. Selectng the most nfluental factors by PLS regresson To determne the useful economc factors, we use the PLS regresson to select the most nfluental factors as our nput varables. PLS regresson s a recent technque that generalzes and combnes features from prncpal component analyss and multple regresson. The procedures are as follows: Step 1. Mean-centerng and scalng of varables Before the model s developed, t s convenent to talor the data n the observaton set n order to make the calculatons easer. Let X and Y denote the observatons of ndependent varables and dependent varables, respectvely. Let E 0 and F 0 be the standardzed matres for ndependent varables and dependent varables, respectvely, where E0 F0 * ( j ) n p * ( yj ) nq (1.1) wher e yk y s * j j * j yk s j 1, 2,..., n j 1, 2,..., pk 1, 2,..., q k s s the standard devaton of the j j th column of X and s k s the standard devaton of the k th column of Y. It should be noted that there are also dfferent ways of scalng the varables. Step 2. Latent factor etracton Latent factors are etracted usng an teraton process (Abd (2003)[12]). The latent factors should account for as much of the manfest factor varaton as possble whle modelng the responses well. The detaled procedure s referred to Abd (2003). Step 3. Determnaton of latent factor number An obvous queston s to fnd the number of latent factors needed to obtan the best generalzaton for the predcton of new observatons. Ths s, generally, acheved by cross-valdaton technques such as bootstrappng. In our proposed forecastng model, the PLS regresson s used to etract the latent factors to reduce the dmensons of the observatons. The etracted factors wll be, n turn, as the nput varables of the ANFIS. The PLS regresson can be easly mplemented n statstcal software such as Mntab. Stage 2. Forecastng by ANFIS From Stage 1, 2 or 3 factors whch have sgnfcant mpact on automoble sales are dentfed. The hstorcal data of those factors wll be nput nto ANFIS. An ANFIS can help us fnd the mappng relaton between the nput and output data through hybrd learnng to determne the optmal dstrbuton of membershp functons. For the nference system, fve layers are used as shown n Fgure 1, where two nputs, y and one output F are pctured. 432
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 y A1 1 1 1 f 1 N A2 B 1 N y 2 2 2 f 2 F B 2 y Fgure 1. The framework of ANFIS Layer 1: In ths layer, every node s a square node wth a node functon O ( ) 1 A where s the nput to node, and 1 node functon. In other words, O s the membershp functon of A s the lngustc label (small, large, etc.) assocated wth ths A and t specfes the degree to whch the gven satsfes the quantfer A. In ths paper, we choose the Gaussan membershp functon. Layer 2: Every node n ths layer s a crcle node labeled whch multples the ncomng sgnals and sends the product out. For eample, ( ) ( y), 1,2. A B Each node output denotes the frng strength of a rule. Layer 3: Every node n ths layer s a crcle node labeled N. The th node n ths layer calculates the rato of the th rule s frng strength to the sum of all rules frng strengths:, 1,2. 1 2 The outputs of ths layer are called normalzed frng strengths. Layer 4: In ths layer, each node s a square node wth a node functon 4 O f ( p q y r), where s the output of layer 3, and { p, q, r } s the parameter set. Parameters n ths layer are referred to as consequent parameters. Layer 5. The sngle node n ths layer s a crcle node labeled that computes the overall output as the summaton of all ncomng sgnals,.e., O overall output f. 5 1 The forecastng procedure of ANFIS can be llustrated as follows: Step 1: Generate tranng data; Step 2: Determne the membershp functon types and numbers of nput varables; Step 3: Generate the ntal FIS structure; Step 4: Set the tranng parameters of ANFIS; Step 5: Tranng an ANFIS; Step 6: Test the effect of the FIS. All of the ANFIS functons were carred out n the software Matlab s fuzzy nference toolbo. The 433
ANFIS tranng functon n the toolbo can be used to tranng the nput data. After tranng, an ANFIS model wth forecastng functon wll be obtaned for output forecastng. 5. Emprcal study In ths secton, we employ the annual sales data of Jetta n Changchun from 1999 to 2010 (omtted) as our research data. The data are dvded nto two data sets: the tranng data set (from 1999 to 2006) and the testng data set (from 2007 to 2010). The data set of ndependent varables s collected ether from Chnese Statstcs of Changchun or publc database from the nternet, whch s presented n Table A1 of the Append (omtted). To llustrate the applcaton of the proposed methodology, we frst use PLS regresson to obtan the prncpal factors mpactng on the sales. Second, we compare the ANFIS model wth ARIMA model and TSDM model. The mean absolute percentage error (MAPE) s taken as performance measure of forecastng accuracy. 5.1 Analyss of PLS regresson PLS regresson s used to etract the prncpal factors, whch s mplemented n Mntab. Accordng to the cross valdaton, we get three man factors. The output and loads of varables are presented n Fgure 2 and Fgure 3, respectvely. Fgure 2 eplctly shows that the 3 latent varables should be chosen, wth R 2 79.8% n cross valdaton. Fgure 3 presents the mpacts of varables on the dependent varable. It can be seen that the lnes for customer satsfacton ( ) and advertsng ( ) are 32 42 short, meanng that they don t have sgnfcant mpact on the sales; whle the lnes for prce 21, ol prce 23 and searchng nde 31 are long, meanng they have bg mpact on the sales. Therefore, the three varables are chosen as the nput of ANFIS. 0.8 0.6 PLS Loadng Plot 23 31 Component 2 0.4 0.2 0.0 21 41 42 1 32-0.2 22-0.4-0.75-0.50-0.25 0.00 Component 1 0.25 0.50 Fgure 2. The output of Mntab Fgure 3. Loads of varables It should be noted that although the loads of and are relatvely large, there s lttle 22 41 mprovement for the sgnfcance of the model. Therefore, the 3-varable s adopted n ths paper. Of course, the 5-varable can also be used nstead. 5.2 Forecastng results In ths sub-secton, we compare the ANFIS model wth the other two forecastng models (ARIMA and TSDM) commonly used n automoble demand forecast. In the ANFIS model, the Gaussan membershp functon s used. Mean average percentage error (MAPE) s adopted as the accuracy measure. The results are lsted n Table 2. Table 2 shows that, n the automoble sales forecastng, based on the same forecastng perods, the ANFIS model has smaller MAPE values than the ARIMA and TSDM models. In other words, the forecastng performance of the ANFIS model s better than that 434
of the others. Table 2 also shows that ANFIS model has smaller errors n MAD, MSE,MAPE,MPE than TSD and ARIMA model. Table 2. The error comparson of three forecastng models Forecastng Forecastng error judgment models MAD MSE MAPE MPE TSD 220.7019 76896.24 8.01% 5.53% ARIMA 269.4875 99499.42 9.60% -1.84% ANFIS 70.9406 7455.57 2.61% -0.91% Fgure 4. Forecasted sales by three models (from 2007 to 2010) Fgure 4 plots the forecasted sales by ARIMA model, TSDM model and the ANFIS model, compared to the actual sales data from 2007 to 2010. We can see that all the three models are performed very well before the frst quarter of 2008 and after the frst quarter of 2010. Meanwhle the ARIMA model and TSDM model perform much poor than the ANFIS model durng the two tme pont eactly whle actual sales fluctuate from hgher sales wth 3452 to lower sales wth 2492. The reason behnd t s that from the thrd quarter of 2008 to the end of 2009, economc cars are preferred due to the hgh ol prce, automoble wth small output volumn subsdy polcy and the fnancal crss, whch results n a jump of the sales. As whle as the above subsdy polcy tghtens up and customers look on the compettve automoble types wth Jetta, those actual sales go down eactly at the end of 2009. The deep reason s that three nputs varables such as prce 21, ol prce 23 and searchng nde 31 n ANFIS model and ther nfluences are reflected well not only fttng sales characters of Jetta but also makng vsble that sales of Jetta s much more senstve to the envronment. 6. Conclusons Accurate sales forecastng for automoble s helpful for car manufacturers plannng ther purchasng, assemblng, transportaton etc. Ths paper proposes a hybrd model of PLS regresson and ANFIS to forecast the automoble regonal sales. Usng the sales data of Jetta n Changchun, we test our proposed model. The results demonstrate that the hybrd model yelds more accurate regonal automoble sales forecastng than other two models commonly used n practce, meanng that the hybrd model s a vald and promsng model for forecastng regonal automoble sales. 7. Acknowledgement Ths research was partally supported by the Natural Scence Foundaton of Chna (Grant No. 71201070) and the Fundamental Research Funds for the Central Unverstes n Jln Unversty (Grant No. 2009JC046, 2010ZZ022). 435
8. References [1] Hynyoung Cho, Hal Varan, "Predctng the present wth goole trends", The Economc Record, Vol. 88, Specal Issue, pp. 2~9, 2012 [2] Fu-Kwun Wang, Ku-Kuang Chang, Chh-We Tzeng, "Usng adaptve network-based fuzzy nference system to forecast automoble sales", Epert Systems wth Applcatons, Vol.38, No. 8, pp. 10587~10593, 2011 [3] Rodney L. Carlson, M. Mchael Umble, "Statstcal demand functons for automobles and ther use for forecastng n an energy crss", The Journal of Busness, No.53, pp.193~204, 1980 [4] Authony Chn, Smth Peter, "Automoble ownershp and government polcy: The economcs of sngapore's vehcle quota scheme", Transportaton Research: Part A, Vol.31, No.2, pp.129~140, 1997 [5] Romlly, P, Song, H, Lu, X, "Modelng and forecastng car ownershp n brtan", Transport Economcs and Polcy, Vol.32, No.2, pp. 165~185, 1998 [6] Han-Chen Huang, Chh-Chung Ho, "Back-Propagaton Neural Network Combned Wth a Partcle Swarm Optmzaton Algorthm for Travel Package Demand Forecastng", JDCTA: Journal of Dgtal Content Technology and ts Applcatons, Vol. 6, No. 17, pp. 194 ~ 203, 2012 [7] Geng L-Yan, Dong Qao-Tng, "Forecast of Regonal Logstcs Demand Usng KPCA-Based LSSVMs Optmzed by PSOTVAC", AISS: Advances n Informaton Scences and Servce Scences, Vol. 4, No. 19, pp. 313 ~ 319, 2012 [8] Levenhach, H, Cleary, J P, Forecastng: Practce and process for demand management, Thomson, Dubury, Belmont, 2006. [9] J.Scott Armstrong, Vck G.Morwtz, V.Kumar, "Sales forecasts for estng consumer products and servces: Do purchase ntentons contrbute to accuracy? ", Internatonal Journal of Forecastng, Vol. 16, No.3, pp.383~397, 2000 [10] Jang, J-S R, "ANFIS: Adaptve-network-based fuzzy nference system", IEEE Transactons on Systems, Man, and Cybernetcs, Vol. 23, No. 3, pp.665~685, 1993 [11] Abd, H. Partal least squares (PLS) regresson. In: Lews-Beck, M., Bryman, A., Futng, T. (eds.), Encyclopeda of socal scences research methods. Sage: Thousand Oaks (CA), 2003 436