 Merryl Abigayle Holmes
 2 years ago
1 THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate students School of Mathematcal & Physcal Scences, Natonal Techncal Unversty of Athens, Greece Introducton Statstcal outlers are unusual ponts n a set of data that dffer substantally from the rest. An outler could be dfferent from other ponts wth respect to the value of one varable (e.g. the breakng stran for a beam that broke at exceptonally low load) or, n multvarate data, t could be unusual n respect of the combnaton of values of several varables. One partcular reason for the mportance of detectng the presence of outlers s that potentally they have strong nfluence on the estmates of the parameters of a model that s beng ftted to the data. Ths could lead to mstaken conclusons and naccurate predctons. Fgures and below gve two examples of apparent outlers, one n a tme seres and the other n a set of bvarate data Yt Tme Fgure. A possble outler (at tme 43) n a tme seres.
2 0 Y X Fgure. A possble outler n a sample of bvarate data. The presence of ths pont has a strong nfluence on the value of the correlaton between X and Y, reducng t from 0.97 to There s a very extensve bblography on the topc of outlers. For example, Barnett & Lews [] gve nearly 00 references. However, relatvely lttle work has been done on outlers n tme seres. Outlers are qute lkely to arse n tme seres for example n an economc tme seres affected at some pont by an external event such as war or major strkes and may have severe effects on model fttng and estmaton. Classcal methods of tme seres analyss apply to a sngle seres of long duraton. However, n many stuatons, sets of relatvely short tme seres arse. Our research focuses chefly on the dentfcaton of outlers n data of ths knd. Detecton of an outlyng seres The frst objectve s to develop a method of detectng an outlyng seres, rather than outlyng ponts, n a set of tme seres. 0 Varable C C C3 C4 C 3 4 T m e Fgure 3. A possble outlyng seres (C) among a set of tme seres.
3 We assume the followng model for a set of m AR() seres: y µ =α(y µ ) + u =,..., m, t=,...,n, u d N(0, σ ) t,t t, t Two dfferent models for the seres levels µ are nvestgated: I) µ =µ  all seres have the same level II) 0 0 µ ~ N( µ, σ )  a random effects model In the presence of an outler, the two cases are modfed as follows: I ) µ =µ, j; µ j =µ+δ for some j II ) 0 0 Wthn ths framework: µ ~ N( µ, σ ), j ; j 0 0 µ ~ N( µ +δ, σ ), for some j a) we used the twostage maxmum lkelhood method to construct test statstcs for testng between the hypotheses I and I and between II and II, and nvestgated the propertes of the tests; b) we examned the possblty of applyng smple tests for an outler n a sngle sample of unvarate data, to the means of the seres (or to a smple functon of the means). Although unequal length of the seres mples that ther means have unequal varances, we found that ths very smple approach works well. Some of these results have been publshed n Karot & Caron [3]. Smultaneous outler n every seres We suppose that an external factor affects every one of a set of tme seres, causng the appearance of an outler at the same tme n each seres. We examne two cases, supposng the outler to be an nnovatve outler (IO) or an addtve outler (AO). The theory s developed for a set of AR(p) seres. Random IO: t q u ~ N(0, σ ), =,...,m; t=,...,n ; t q δ u ~ N(, σ +σ ) In ths case, we used twostage maxmum lkelhood to construct a test statstc for the presence of the outlers and obtaned crtcal values by smulaton. Equal IO: When the outler has the same sze n each seres, the model can be wrtten n the form of a tmeseres regresson y = X β+ε, =,...,m % % %
4 wth V( ε ) =Σ Ι. The presence of the outlers s equvalent to the addton of an % extra column to the matrces X. We examne three models, dfferng n the form assumed for the covarances: ( m) Σ= da σ, σ,..., σ (heteroscedastcty between seres) Σ Σ=σ unrestrcted { ρ Ι+ρ } ( ) J (equcorrelaton) Applyng the method of generalzed least squares (GLS) gves the estmator ( ) ˆ β= Χ 'V X X 'V y % % for the regresson coeffcents, and a formula for the estmaton of V whch takes a dfferent form for each of the three models. The two equatons are solved teratvely. In ths way, we are agan able to obtan a twostage maxmum lkelhood test statstc. Asymptotc crtcal values are obtaned from the χ dstrbuton and ther accuracy was verfed by smulaton. Ths materal has been publshed n Caron & Karot []. Random AO: the case of a random AO was developed along the same lnes as the analyss of the random IO. Lfetme data The methods of lfetme data analyss are used n studyng survval and relablty (for example, the tme untl a patent des, the tme untl a machne breaks down or the load under whch a beam breaks). Outlers n lfetme data are unusually small or unusually large values. They may have a strong nfluence on the choce of model and on the estmates of the model s parameters. Some ntal nvestgatons of lfetme data models were undertaken n the course of ths study (Economou & Caron [4]). Conclusons Our two publcatons (Caron & Karot []; Karot & Caron [3]) are the frst to present methods for detectng outlers n sets of tme seres. They represent a sgnfcant contrbuton to statstcal methodology snce data of ths form are common n varous areas of applcaton of statstcs. Further papers arsng from ths research
5 project have appeared n the proceedngs of varous conferences (Economou & Caron [4]; Karot & Caron [], [6], [7]). References. Barnett, V. and Lews, T.: Outlers n Statstcal Data, 3 rd ed., Wley, 994. Caron, C. and Karot, V.: Detectng an nnovatve outler n a set of tme seres, Computatonal Statstcs and Data Analyss 46, 670, Karot, V. and Caron, C.: Smple detecton of outlyng short tme seres, Statstcal Papers 4, 6778, 004. Conference Papers 4. Economou, P. and Caron, C.: Investgaton and development of models for survval analyss, 6 th Panhellenc Statstcs Conference, Hellenc Statstcal Insttute. Kavala, Karot, V. and Caron, C.: Detectng an outler n a set of tme seres, pp Proceedngs of the 7 th Internatonal Workshop n Statstcal Modellng. Chana, Greece, Karot, V. and Caron, C.: Fxed and random nnovatve outlers n sets of tme seres. Internatonal Workshop on Computatonal Management Scence, Economcs, Fnance and Engneerng. Lmassol, Cyprus, Karot, V. and Caron, C.: Detecton of an addtve or nnovatve outler n a set of tme seres, 6 th Panhellenc Statstcs Conference, Hellenc Statstcal Insttute, Kavala, 003.
