SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN



Similar documents
ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

Simple Linear Regression

The simple linear Regression Model

Numerical Methods with MS Excel

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

Regression Analysis. 1. Introduction

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Credibility Premium Calculation in Motor Third-Party Liability Insurance

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Numerical Comparisons of Quality Control Charts for Variables

Curve Fitting and Solution of Equation

APPENDIX III THE ENVELOPE PROPERTY

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

RUSSIAN ROULETTE AND PARTICLE SPLITTING

ON SLANT HELICES AND GENERAL HELICES IN EUCLIDEAN n -SPACE. Yusuf YAYLI 1, Evren ZIPLAR 2. yayli@science.ankara.edu.tr. evrenziplar@yahoo.

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Settlement Prediction by Spatial-temporal Random Process

On formula to compute primes and the n th prime

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

Preparation of Calibration Curves

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING Sunflowers Apparel

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

Classic Problems at a Glance using the TVM Solver

Average Price Ratios

Speeding up k-means Clustering by Bootstrap Averaging

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

MODELLING OF STOCK PRICES BY THE MARKOV CHAIN MONTE CARLO METHOD

Analysis of one-dimensional consolidation of soft soils with non-darcian flow caused by non-newtonian liquid

Green Master based on MapReduce Cluster

Near Neighbor Distribution in Sets of Fractal Nature

On Error Detection with Block Codes

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

Measuring the Quality of Credit Scoring Models

AP Statistics 2006 Free-Response Questions Form B

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

Bayesian Network Representation

Loss Distribution Generation in Credit Portfolio Modeling

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Chapter Eight. f : R R

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

where p is the centroid of the neighbors of p. Consider the eigenvector problem

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS

Reinsurance and the distribution of term insurance claims

CHAPTER 2. Time Value of Money 6-1

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM

Report 52 Fixed Maturity EUR Industrial Bond Funds

Powerful Modifications of Williams Test on Trend

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

Common p-belief: The General Case

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Experimental Education.

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring

Software Aging Prediction based on Extreme Learning Machine

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

The Digital Signature Scheme MQQ-SIG

10.5 Future Value and Present Value of a General Annuity Due

STOCHASTIC approximation algorithms have several

arxiv:math/ v1 [math.pr] 19 Oct 2005

SPATIAL INTERPOLATION TECHNIQUES (1)

Statistical Intrusion Detector with Instance-Based Learning

Discrete-Event Simulation of Network Systems Using Distributed Object Computing

Session 4: Descriptive statistics and exporting Stata results

Generalized Methods of Integrated Moments for High-Frequency Data

of the relationship between time and the value of money.

Response surface methodology

Automated Event Registration System in Corporation

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

Aggregation Functions and Personal Utility Functions in General Insurance

1. The Time Value of Money

Transcription:

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN Wojcech Zelńsk Departmet of Ecoometrcs ad Statstcs Warsaw Uversty of Lfe Sceces Nowoursyowska 66, -787 Warszawa e-mal: wojtekzelsk@statystykafo Zofa Hausz, Joaa Tarasńska Departmet of Appled Mathematcs ad Computer Scece Uversty of Lfe Sceces Lubl Akademcka, -95 Lubl e-mals: zofahausz@uplublpl, joaatarasska@uplublpl Summary Adaptato of Shapro-Wlk W test to the case of ormalty wth a kow mea s cosdered The table wth crtcal values for dfferet sample szes ad several sgfcace levels s gve The power of ths test s vestgated ad compared wth Kolmogorov ad the two-step procedure of Shapro-Wlk W ad t-tests Addtoally, the ormalzg coeffcets for test statstc are gve The advatage of ths test over the classc Shapro-Wlk W test s llustrated by a example Keywords ad phrases: Shapro-Wlk W test, ormalty Classfcato AMS : 6G Itroducto Testg ormalty o a bass of a radom sample X, X, X plays a mportat role classcal statstcal aalyss I lterature, there exst may dfferet tests for the ull hypothess that dstrbuto of radom varable X s ormal wth a ukow expectato µ ad varace a ombus oe σ However, the Shapro-Wlk W statstc (Shapro, Wlk, 965) s regarded as

I practce, frequetly we are terested testg ull hypothess that dstrbuto of X s ormal wth a kow expectato µ I the paper we focus o testg ths partcular ull hypothess We propose a modfcato W of the Shapro-Wlk W statstc I Secto we defe W statstc ad descrbe ts propertes I Secto we preset smulato results o the power of the test Applcato of the test for the chose regresso problem s preseted Secto Some cocludg remarks are eclosed Secto 5 Dervato of W statstc ad ts propertes Suppose that we observe a radom varable X wth dstrbuto F ad we are terested testg the hypothess o a bass of a sample X, X, X (, ) H : F s N µ σ Shapro ad Wlk (965) proposed W test based o the statstc W = ( ) ( X X ) a X, () where X ( ) X () X ( ) are the ordered values of the sample, ad a are tabulated coeffcets Now, let us assume that we kow the expected value, say µ Thus we are terested testg the ull hypothess ( µ ) H : σ () F s N, Applcato of Shapro ad Wlk s techque to the problem of testg () gves the statstc W = ( X µ ) a X ( ) The ull hypothess () s rejected whe W < W ( ; ), where W ( α ;) s the crtcal value at a sgfcace level α α The statstc W has propertes smlar to the W statstc, amely, W s scale varat ad the maxmum value of W s oe The mmum value of W s ε = a (Shapro ad Wlk, 965)

Lemma The mmum value of W s zero Proof Sce subject to = large x W s scale varat t suffces to cosder the maxmzato of ( µ ) a The lemma follows from the fact that ( µ ) x x may be arbtrarly Shapro ad Wlk (965) gave the aalytc form of the probablty desty fucto for W statstc the case of sample sze whch s equal to It s of the form g( w) = ( w) w for w < π They also stated that W s depedet of radom varables X ad ( X X ) Thus, t s easy to obta the probablty desty fucto of W for samples of sze = Let us otce that W = W C, where C = ( X X ) ( X µ ) = ( X X ) ( X X ) + ( X µ ) s a radom varable dstrbuted as = we have the probablty desty fucto of C Takg the ew varable Beta,, depedet of W Thus the case of ( 5) ( ) Γ f ( c) = c for < c < π w = w c the jot probablty desty fucto g ( w) f ( c) ad tegratg ths fucto over c, we get the probablty desty fucto for W the followg form ϕ ( w ) Γ = π Γ π ( 5) π ( 5) π w w w w w ( c) ( c w ) ( c) ( c w ) dc dc for for < w w < Fally, after tegratg, we get

ϕ ( w ) ( 5) Γ π π = Γ π ( 5) w w 5w arcs ( w ) π + for for < w w < The plot of ϕ ( w ) s show Fgure here Fgure For sample sze > the aalytcal form of the ull dstrbuto of W s ot avalable Hece, to obta ay formato about the dstrbuto a Mote Carlo expermet was performed I smulatos for each =,,, 5, N =,, samples from the dstrbuto (, ) sample w,, N were draw ad for each sample the value W was calculated, so the w N of values of the was take as the α-th quatle of W statstc were obtaed The crtcal value W ( ;) N α w,, w All calculatos were doe R program usg the procedure shaprotest whch Roysto s procedure s used (Roysto, 99) The results are gve Table here Table Shapro ad Wlk (968) approxmated the dstrbuto of the W statstc by a Johso curve For each they made the least squares regresso of the emprcal samplg value of o p W ( p) ε u( p) = l W ( p) z, where ε was the mmum value of the W statstc, W ( p) was the p-th emprcal samplg quatle, z p was the p-th quatle of the stadard ormal dstrbuto They took the followg values of p ad gave the tables for dstrbuto p =,, 5 ε, γ, δ such that ( 5) 5 ( 5) 75 ( 5) 95, 98, 99 W ε Z = γ + δl has approxmately stadard ormal W I ths paper, a smlar approach was appled for the W statstc for sample szes =,,, 5 The least squares regresso of W ( p) l o z p was based o,, W ( p),

pseudoradom samples from N (, ) The values of γ ad δ such that Z W = γ + δl has W approxmately stadard ormal dstrbuto are eclosed Table The lower tal of Z dcates oormalty here Table To check the goodess of approxmato aother N=,, pseudoradom samples from (, ) N were geerated ad for each of them W ad = The ratos calculated (,,, N ) # { Z : Z z } < N p wth p =,, 5,, 5, 9, 95, 98, 99 Z W = γ + δl were W are gve Table here Table Power comparsos Suppose that the hypothess : F s N ( µ σ ) H s verfed wth the ad of the W test It, s terestg to kow the power of the W test Three kds of alteratves are cosdered Namely: (a) F s ( µ,σ ) N wth µ µ ; (b) F s ot ormal wth µ = µ ; (c) F s ot ormal wth µ µ The Shapro-Wlk W test was vestgated agast dfferet oormal alteratves Very exhaustve research was doe by Shapro et al (968) ad Che (97) Those researches showed that the W test s very powerful comparso to other ormalty tests such as Kolmogorov, ch-square, β, β ad agast very dfferet dstrbutos lke Studet s t, Gamma, Beta or Uform Because the costructo of W s smlar to the W test, t may be expected that the W test wll also be powerful agast alteratves of kd (b) ad (c) Hece our studes we cofe ourselves to (a) alteratve, e whe the true dstrbuto s ormal wth a mea other tha µ The W test was compared wth two other procedures The frst oe s the stadard Kolmogorov test The test statstc of the Kolmogorov test s gve by 5

where F X ( ) X µ s ( ) ( ) = Φ ormal dstrbuto max F( X ( ) ), F( X ( ) ) X, s = ( µ ), ad Φ s the cdf of the stadard The secod procedure s a two step oe I the frst step the ormalty s verfed by the classcal W test If ormalty s ot rejected, the the hypothess of equalty of the mea to a gve umber µ s verfed by the t test All tests were calculated o the sgfcace level α I the two step procedure there s a eed of applyg two sgfcace levels α w ad chose such a way that the overall sgfcace level s α, e α t for both used tests Those umbers were { accepts ormalty ad accepts mea µ } ( α + α ) = α P H t W w t Because there are o prefereces to W or t test hece α w = αt = α were take The power comparso of three tests was performed by the Mote Carlo method A sample of sze from the ormal dstrbuto wth a gve µ was geerated ad ths sample was used all tests The sample was the shfted to dfferet values of µ ad each of the tests were the appled to shfted samples Ths procedure was repeated, tmes The umber of rejectos of the hypothess () was calculated I the smulatos the hypothess : F s N ( σ ) H was verfed for samples of szes,,,,, 5 ad sgfcace levels α =, 5, The varace σ = was used all cases The smulated powers are gve the Table Here Table The relatve powers of W wth respect to Kolmogorov ad W+t tests are show Fgure O the x axs there are values of µ ad o the y axs there are gve values of power of W test power of W test power of Kolmogorov test power of W = t test ( sold le) ad ( dotted le) Oe may see that geerally les are above oe whch shows that W s more powerful tha the other two tests Here Fgure 6

Example Cosder a problem of fttg a regresso le I the aalyss of the model Y = f (x) + ε oe has to check whether ε s dstrbuted as N (, σ ) for each x I the expermet the radom varable Y was geerated accordg to the model wth f ( x) = x + 7x +, σ = ad x =,,, 6, 8,, te tmes at each pot Two regresso fuctos ( x) = β + x ad f ( x) = β + β x + β were ftted Note that f β x the secod model s the true oe Classcal aalyss of varace the F test showed that both models are acceptable e f ( x) as well as ( x) regresso fucto Results are preseted Table 5 Here Table 5 f may be cosdered as a approprate The ext step of the aalyss of fttg s to check whether the resduals are ormally dstrbuted wth zero mea e for each x ad regresso le the hypothess that resduals are dstrbuted as (, σ ) N should be verfed Results, rouded to the fourth decmal place, are show Table 6 I the W colum, values of a approprate test statstc are gve The crtcal value for = ad α = 5 s equal to 585 (see Table ) I the last colum of Table 6 the p-values of the Shapro-Wlk W test are gve here Table 6 I the case of lear fucto, the hypothess of ormalty wth zero mea was rejected at four x pots, whle the case of quadratc fucto the hypothess was ever rejected Hece, fucto f ( ) s ot acceptable as a regresso fucto whereas f ( ) s acceptable Let us x otce that the Shapro-Wlk W test ever rejected the ormalty of resduals, ether a lear or a quadratc case 5 Cocludg remarks I may statstcal models t s assumed that errors are ormally dstrbuted wth zero mea Thus the W test s more adequate ad should be used stead of the classcal Shapro- Wlk W test I the paper t s show va smulato study that the W test s geerally more powerful tha the Kolmogorov, ad W ad Studet t tests combed x 7

Refereces Che, E H (97) The Power of the Shapro-Wlk W Test for Normalty Samples from Cotamated Normal Dstrbutos Joural of the Amerca Statstcal Assocato 66, 76 76 Roysto P (99) Approxmatg the Shapro-Wlk W-test for o-ormalty Statstcs ad Computg,, 7-9 R Developmet Core Team (8) R: A laguage ad evromet for statstcal computg R Foudato for Statstcal Computg Vea, Austra ISBN -95-7-, URL http://wwwr-projectorg Shapro SS, Wlk MB (965) A aalyss of varace test for ormalty (complete samples) Bometrka 5,, 59-6 Shapro SS, Wlk MB (968) Approxmatos for the ull dstrbuto of the W statstc Techometrcs, 86-866 Shapro, S S, Wlk, M B, Che, H J (968) A Comparatve Study of Varous Tests for Normalty Joural of the Amerca Statstcal Assocato 6, 7 8

Table Crtcal values of W statstc for sample szes ad sgfcace level α α α 5 5 8 88 7 7 779 8 86 7 7 7 8 76 887 865 5 9 86 9 9 759 8 8688 6 9 867 95 76 89 87 7 7 55 55 7677 87 8765 8 99 55 5998 776 88 88 9 785 59 67 78 85 88 585 668 787 8565 886 66 665 695 5 797 86 889 9 6 75 6 7969 86 89 56 666 76 7 88 867 897 59 686 75 8 86 87 897 5 579 78 765 9 89 87 8996 6 595 796 7778 85 876 98 7 66 77 789 89 8787 9 8 69 776 7998 87 886 96 9 678 759 888 87 889 98 666 7696 876 8 886 9 676 779 85 5 8 8887 9 6876 7875 89 6 87 89 98 78 7965 89 7 8 89 95 7 8 86 8 8 895 969 5 75 8 85 9 87 897 987 6 796 87 855 5 89 8989 9 9

Table The ormalzg costats for W for sample szes γ δ γ δ γ δ -,7,555 9 -,56,698 5-59 5 -,679,78 -,58 87 6-588 57 5 -,9586,85 -,5 98 7-56 56 6 -,99,98 -,565 95 8-65 58 7 -,778,9-6 6 9-679 567 8 -,695,67-767 9-786 595 9 -,896,57 5-7869 -777 557 -,79,57 6-86 5-895 5597 -,7,99 7-96 66-87 5659 -,9,8 8-77 7-97 569 -,55,57 9-77 78 5-976 5769 -,68,755-58 89 6-58 5797 5 -,8,979-8 95 7-55 586 6 -,9,8-78 5 8-598 5858 7 -,,5-5 586 9-57 595 8 -,55,5-7 57 5-5795 595

Table The smulated probabltes W P γ + δl < z p for sample szes W Probablty 5 5 9 95 98 99 5 7 6 58 99 957 979 987 9 9 5 9 957 98 989 5 5 9 5 98 955 98 99 6 5 95 5 96 956 98 99 7 5 96 56 95 956 98 99 8 5 97 57 9 955 98 99 9 5 97 57 9 955 98 99 5 98 57 9 955 98 99 5 99 56 9 95 98 5 99 58 9 95 98 99 99 5 59 9 95 98 99 5 99 58 899 95 98 99 5 5 99 56 898 95 98 99 6 5 7 5 99 58 57 899 95 98 898 95 98 99 99 8 5 99 57 897 95 98 99 9 5 57 897 95 98 99 5 58 897 95 98 99 5 57 897 95 98 99 5 57 897 95 98 99 55 58 897 95 98 99 5 59 897 95 98 99 5 5 58 897 95 98 99 6 5 59 898 95 98 99 7 55 58 897 95 985 99 8 5 57 897 95 98 99 9 5 58 898 95 98 99 55 58 897 95 98 99 5 58 897 95 985 99 55 59 9 95 98 99 5 57 897 95 98 99 55 59 897 95 98 99 5 5 58 896 95 98 99 6 5 58 897 95 985 99 7 5 57 896 95 98 99 8 5 58 897 95 985 99 9 5 56 896 95 98 99 55 57 897 95 985 99 5 56 896 95 98 99 55 58 897 95 985 99 5 57 896 95 985 99 5 99 56 896 95 98 99 5 55 57 896 95 98 99 6 55 58 897 95 98 99 7 55 58 897 95 985 99 8 5 57 897 95 985 99 9 5 55 58 5 5 57 896 896 95 95 985 98 99 99

Table Power of W, Kolmogorov ad W + t tests α = α = 5 α = µ W K W+t W K W+t W K W+t 5 89 96 58 8 55 955 9 5 8 98 76 7 7 8 6 59 7 68 6 76 9 5 5 99 9 99 8 9 75 6856 689 85 85 768 6855 78 69 986 97 885 967 967 877 5 896 95 898 989 988 95 9969 9969 95 8 978 9865 968 9988 9987 956 9998 9998 95 997 999 988 955 95 9997 9999 99 955 95 5 99 9 9 95 57 5 5 56 96 87 6 99 99 766 5 7 875 9 756 56 56 7 6 57 8 6 78 657 68 89 89 78 8 767 675 75 97 868 8766 958 958 869 986 887 999 9855 9695 96 997 997 896 9877 9757 978 998 997 97 9999 9999 8977 9986 9975 988 9998 98 8978 6 9998 989 98 8978 8 989 98 8978 5 5 68 7 5 5 5 9 9 69 7 6 9 9 85 5 56 5 89 6 65 65 5 9 9 885 6 55 66 7559 7559 696 6 6868 577 688 876 8 85 9 9 86 75 8978 899 89 976 97 9 996 996 898 9 989 99 97 9965 99 998 9987 9987 899 5 997 997 9869 9999 9989 956 9999 9999 8998 9999 999 9896 9999 958 8998 5 9998 9897 958 8998 5 9898 958 8998 98 9 68 8 75 5 987 987 97 5 9 7 8 7 7 67 75 75 5 5 7 9 77 8 5686 5686 5 5 569 6 556 7775 699 75 859 859 79 6 85 759 86 95 966 96 9779 9779 889 75 978 99 9686 995 98 95 998 998 97 9 9977 989 998 999 956 95 5 999 99 957 95 99 957 95 95 9 68 55 5 8 96 5 585 6 67 79 7 7 77 77 68 9 8 5 8 5 65 65 677 5 5 676 557 6859 867 78 87 969 969 88 6 9 856 959 98 959 9 99 99 898 75 99 976 9878 999 996 956 9995 9995 9 9 9996 998 999 9995 957 96 5 9998 99 957 96

Table 5 Estmated coeffcets regresso fuctos ad p values Fucto ˆβ ˆβ ˆβ p-value of F test f ( x) 55-77 9 f ( x) 9799 57-8 6