On The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data

Similar documents

I. Chi-squared Distributions

PSYCHOLOGICAL STATISTICS

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Output Analysis (2, Chapters 10 &11 Law)

Modified Line Search Method for Global Optimization

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Maximum Likelihood Estimators.

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

A probabilistic proof of a binomial identity

Reliability Analysis in HPC clusters

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Hypothesis testing. Null and alternative hypotheses

Soving Recurrence Relations

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

Properties of MLE: consistency, asymptotic normality. Fisher information.

THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY

LECTURE 13: Cross-validation

Confidence Intervals for One Mean

Chapter 7 Methods of Finding Estimators

A modified Kolmogorov-Smirnov test for normality

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

One-sample test of proportions

The Gompertz Makeham coupling as a Dynamic Life Table. Abraham Zaks. Technion I.I.T. Haifa ISRAEL. Abstract

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Statistical inference: example 1. Inferential Statistics

1 Computing the Standard Deviation of Sample Means

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Normal Distribution.

Chapter 7: Confidence Interval and Sample Size

Chapter 14 Nonparametric Statistics

1 Correlation and Regression Analysis

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Overview of some probability distributions.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

5: Introduction to Estimation

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Lesson 15 ANOVA (analysis of variance)

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Inverse Gaussian Distribution

Ekkehart Schlicht: Economic Surplus and Derived Demand

Section 11.3: The Integral Test

Decomposition of Gini and the generalized entropy inequality measures. Abstract

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Subject CT5 Contingencies Core Technical Syllabus

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Estimating Probability Distributions by Observing Betting Practices

Research Article Sign Data Derivative Recovery

INVESTMENT PERFORMANCE COUNCIL (IPC)

arxiv: v1 [stat.me] 10 Jun 2015

Institute of Actuaries of India Subject CT1 Financial Mathematics

Systems Design Project: Indoor Location of Wireless Devices

THE HEIGHT OF q-binary SEARCH TREES

Incremental calculation of weighted mean and variance

SPC for Software Reliability: Imperfect Software Debugging Model

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Lesson 17 Pearson s Correlation Coefficient

A Mathematical Perspective on Gambling

Study on the application of the software phase-locked loop in tracking and filtering of pulse signal

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Now here is the important step

Quadrat Sampling in Population Ecology

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

7. Concepts in Probability, Statistics and Stochastic Modelling

Present Values, Investment Returns and Discount Rates

Extreme changes in prices of electricity futures

Intelligent Sensor Placement for Hot Server Detection in Data Centers - Supplementary File

TO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC

AP Calculus AB 2006 Scoring Guidelines Form B

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

CHAPTER 3 THE TIME VALUE OF MONEY

Determining the sample size

A Recursive Formula for Moments of a Binomial Distribution

Sampling Distribution And Central Limit Theorem

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

A Review and Comparison of Methods for Detecting Outliers in Univariate Data Sets

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

W. Sandmann, O. Bober University of Bamberg, Germany

OMG! Excessive Texting Tied to Risky Teen Behaviors

Convexity, Inequalities, and Norms

Sequences and Series

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

Building Blocks Problem Related to Harmonic Series

Actuarial Models for Valuation of Critical Illness Insurance Products

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

LOCATIONAL MARGINAL PRICING FRAMEWORK IN SECURED DISPATCH SCHEDULING UNDER CONTINGENCY CONDITION

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

Measures of Spread and Boxplots Discrete Math, Section 9.4

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

Simulation-based Analysis of Service Levels in Stable Production- Inventory Systems

Transcription:

Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES O The Compariso of Several Goodess of Fit Tests: With Applicatio to Wid Speed Data FAZNA ASHAHABUDDIN, KAMARULZAMAN IBRAHIM, AND ABDUL AZIZ JEMAIN School of Mathematical Scieces, Faculty of Sciece ad Techology, Uiversiti Kebagsaa Malaysia, 436 UKM Bagi, Selagor MALAYSIA aza@ukmmy, kamarulz@ukmmy & azizj@ukmmy, Abstract: - I this paper a study is coducted to ivestigate the of several goodess of fit tests such as Kolmogorov Smirov (), Aderso-Darlig(), Cramer- vo- Mises () ad a proposed modificatio of Kolmogorov-Smirov goodess of fit test which icorporates a variace stabilizig trasformatio (F) The performaces of these selected tests are studied ad applied usig wid speed data This study shows that, the proposed test (F) performs better tha other GOF tests Key-Words: - Empirical distributio fuctio, goodess-of-fit, order statistics, wid speed, GEV 1 Itroductio Assessig the goodess of fit test of proposed probability model is a fudametal cocer i the applicatio of statistical methods Goodess of fit test has may applicatios i the area of applied statistics ad may works have bee carried out to compare the efficiecy of several goodess of fit tests procedures Goodess of fit tests (GOF) measure the degree of agreemet betwee the distributio of a observed sample data ad a theoretical statistical distributio The problems ivolve a compariso of the empirical distributio fuctio (EDF) for a set of ordered observatios of size, say F ( y ( i: ) ), with a particular theoretical distributio with kow parameters, deoted as F ( y ) The problem ca be formulated uder the ( i: ) test of hypothesis ivolvig H : F( y) F ( y ) ( i : ) where F is the hypothesized cotiuous cumulative distributio fuctio (cdf) with kow parameters agaist H : F( y) F ( y ) 1 ( i : ) Numerous GOF test methods based o empirical distributio fuctio have bee developed over the years by various researchers, see for example, Gree ad Hegazy [1], Stephes ad D Agostio [2], Gues etal [3], Swaepoel etal [4], Zhag [5] ad Zhag & Wu [6] Gree ad Hegazy [1] have proposed some modificatios o the origial EDF tests usig various empirical distributios F ( y ( i: ) ) ad foud that their modificatios have better performace tha the origial GOF tests Gues etal [3], study the performace of his proposed modificatios for the Iverse Gaussia distributio ad quite recetly, Zhag [5] proposed modificatios based o the likelihood ratio tests Zhag [5] claimed that his tests yield the best overall tests uder several popular alteratives distributios I this paper, the performace of several goodess of fit tests such as Kolmogorov-Smirov (), Aderso-Darlig(), Cramer-vo-Mises () ad Zhag s [5] modified versio of test () are ivestigated I additio a modified goodess of fit test which icorporates a variace stabilizig trasformatio (F) is proposed The performaces of these selected tests are illustrated by usig the wid speed data 2 Goodess of Fit Test I order to test whether or ot a radom sample of size, deoted as, y1, y2,, y comes from a particular distributio, for example a stadard ormal N (,1), the ull hypothesis H : F( y) N(,1) is tested agaist the alterative hypothesis H : F( y) N(,1) To study the degree 1 of discrepacies betwee the EDF ad the theoretical distributio, various GOF statistics had bee proposed i the literature The GOF tests that are of particular iterest i this study iclude the Kolmogorov-Smirov (), Aderso-Darlig (), Cramer-vo-Mises () as give by [2], ISSN: 179-595 394 ISBN: 978-96-474-93-2

Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES Zhag s test () [5] ad a proposed modificatio of which icorporates the variace stabilizig trasformatio (F) The popular test is defied as max( D, D ) (1) where i i 1 D max F ( y ), D max F ( y ) ( i: ) ( i: ) ad ( i: ) ( i: ) F ( y ) P( Y y ), i 1,2, is the cumulative probability of the i-th ordered statistics The respective Aderso-Darlig ad Cramervo-Mises tests are defie as i5 log F y( : ) i (2) 2 i1 i5 log 1F y( i: ) ad 2 i 5 1 ( i: ) i1 12 (3) F y The test as proposed by [5] is give by 1 i 1 log 2 i 2 F( y( i: ) ) max 1i 1 i 1 i log 2 2 (1 F( y( i: ) ) (4) A alterative GOF test is offered beside the statistics show i equatios (1) to (4) The proposed modified statistics which icorporates variace stabilizig trasformatio to the modified of Gree ad Hegazy [1], which is called F, is defied as F max D, D (5) where, a 1 1 D a max si si F( y( i : ) b i 1 1 1 i 1 max si F( y( : ) si 1 ad D b i The performaces of the above tests are ivestigated usig simulatios To test the hypothesis the followig simulatio steps are take: (i) Geerates a radom sample of size from the selected distributio uder the ull hypothesis (the parameters are assume kow) (ii) Sort the sample i ascedig order to obtai a order statistics y, y,, y (1: ) (2: ) ( : ) (iii) Assumig that the ull hypothesis is true, F ( y ) are calculated ( i: ) (iv) To study the degree of discrepacies betwee the EDF ie F ( y( i: ) ) ad F ( y( i: ) ) the above GOF tests from equatios (1) to (5) are calculated (v) Steps (i) to (iv) are repeated 5 times to geerate 5 idepedet test statistics for each GOF test (vi) The 9 th, 95 th ad 99 th percetiles of the ordered sample are obtaied uder the ull hypothesis These values are the critical values upo which the test statistics obtaied uder the assumed alterative hypotheses are compared (vii) The performace of the test statistics is the evaluated based o the of each test The is calculated by determiig the proportio of times the test statistics uder the assumptio of the alterative hypothesis is true, falls i the rejectio regio For calculatig the of the test, the simulatio is repeated 1, times The of each test is the proportio of times the ull hypothesis is rejected 3 Example usig Wid speed data To illustrate the above procedures we use wid speed data take from [8] The data is show i Table 1 below Table 1: Aual maximum widspeed data i miles per hour, Browsville, Texas, 1947-1977 32, 33, 34, 34, 35, 36, 37, 37, 38, 38, 39, 39,4, 4, 41, 41,42,42, 43,43,43, 44,44,46, 46, 48, 48, 49, 51,53, 53, 53, 56, 63, 66 The quatile-quatile plot i Fig1 is close to a straight lie which idicates that Browsville s wid speed data has a close fit to the GEV distributio ISSN: 179-595 395 ISBN: 978-96-474-93-2

Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES (vi) GEV ( 45, 6, k 37) wid speed (miles/hr) 35 4 45 5 55 6 35 4 45 5 55 6 Fig1 Quatile-Quatile Plot For Browsville Wid Speed Data q(x) Graphs i figures Fig2(a) to Fig2(c) show the shapes of the distributio uder the alterative hypotheses i relatio to the hypothesized distributio as the locatio ad scale chage Figure 2(a), shows the chages i the shape of the distributio uder the alterative hypotheses whe the scale parameter varies Figure 2(b), shows the distributios uder the alterative hypotheses shifted to the right whe the locatio varies Figure 2(c), shows the shape of the distributios uder the alterative hypotheses whe both the locatio ad scale parameters vary The Geeralized Extreme Value (GEV) distributio is give by x 1 1(1 k ) xe f ( y) e (6) k 1 log 1 k( x ) /, k where x, x /, k 5 1 15 2 Ho:GEV(3978,626,-37) (i)ha:gev(3978,55,-37) (ii)ha:gev(3978,4,-37) 35 4 45 5 55 6 65 / k y if k ; y / k if k ; y if k= Parameter estimatio for locatio ( ), scale ( ) ad shape ( k ) parameters for wid speed data are obtaied usig L-momet method itroduced by Hoskigs [7] The estimated parameters foud are: 3978, 626 ad k 37 The data has mea=4363, variace= 449, L-skewess, 3 1937 ad L- kurtosis, 4 159 The determied distributio is show i Fig2(a) A simulatio study was carried out to test the hypothesis H : F( y) GEV (,, k ) agaist H : F( y) GEV (,, k) where, ad k are the locatio, scale ad shape parameters respectively The followig alterative hypotheses are cosidered agaist the ull hypothesis to allow for differeces i locatios ad scales i the alterative distributios The alteratives are listed as follows: (i) GEV ( 3978, 55, k 37) (ii) GEV ( 3978, 4, k 37) (iii) GEV ( 42, 626, k 37) (iv) GEV ( 45, 626, k 37) (v) GEV ( 42, 55, k 37) widspeed(miles/hour) -Browsville Fig2(a) Shape of the distributio uder alterative hypotheses (i) ad (ii) 5 1 15 2 35 4 45 5 55 6 65 widspeed(miles/hour) -Browsville Ho:GEV(3978,626,-37) (iii)ha:gev(42,626,-37) (iv)ha:gev(45,626,-37) Fig2(b) Shape of the distributio uder alterative hypotheses (iii) ad (iv) 5 1 15 2 35 4 45 5 55 6 65 widspeed(miles/hour) -Browsville Ho:GEV(3978,626,-37) (v)ha:gev(42,55,-37) (vi)ha:gev(45,6,-37) Fig2(c) Shape of the distributio uder alterative hypotheses (v) ad (vi) ISSN: 179-595 396 ISBN: 978-96-474-93-2

Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES 4 Simulatio Results The simulatio results for compariso purposes are show i Fig 3(i) to Fig3(vi) Fig3 (i) ad Fig3(ii) show the results whe the scale parameter is chaged but the locatio ad shape are the same as the hypothesized distributio I this case, the performace of F is foud to be most ful i both cases ad perform better tha all other tests I the cases of Fig3(iii) ad Fig3(iv), where the differeces are due to the differet locatio but the same scale ad shape parameters, F has better performace tha other tests whe the sample size is small ie 5,(see Fig3(iii)), whe 5, both ad outperform F I Fig3(iv), F has better performace iitially for smaller values of but as the sample sizes icreases ad perform equally good as F followed by ad However, i the case of allowig for the differeces both i locatio ad scale, the proposed test, F still outperform all other test as show i Fig3(v) ad Fig3(vi) [7] JRM Hoskigs199 Aalysis ad Estimatio of Distributios usig Liear Combiatios of Order Statistics Joural of the Royal Statistical Society Series B, Vol52 No1, pp 15-124 [8] E Simiu, MJ Chagery, ad JJFillibe 1979 Extreme wid speeds at 129 statios i the cotiguous Uited States Buildig Sciece Series 118, Natioal Bureau of Stadards, Washigto, DC 5 Coclusio I this paper a ew GOF tests which icorporates variace stabilizig trasformatio is itroduced The proposed modified GOF test, ie F is foud to perform better tha other GOF tests i most of the cases ivestigated Further aalysis eed to be carried out to study the properties of the proposed test uder various distributios Refereces: [1] JR Gree ad YAS Hegazy, 1976 Powerful Modified Goodess of fit test, Joural of the America Statistical Associatio, Vol 71, No353, pp 24-29 [2] RBD Agostio ad MA Stephes 1986 Goodess of fit techiques, Marcel Dekker, New York [3] HGues, DCDietz, PF Auclair ad AH Moore 1997 Modified Goodess of Fit tests for the Iverse Gaussia Computatioal Statistics & Data Aalysis Vol 24 pp 63-67 [4] JWH Swaepoel ad CV Graa 21 Goodess of fit tests based o Estimated Expectatios of Probability Itegral Trasformed Order Statistics A Ist Statist Math Vol 54, No 3, pp 531-542 [5] J Zhag 22 Powerful goodess of fit tests based o the likelihood ratio Joural of Royal Statist Soc B, No 64, Part 2, pp 281-294 [6] JZhag ad Y Wu 22 Beta Approximatio to the Distributio of Kolmogorov-Smirov Statistics A Ist Statist Math Vol 54, No 3, pp 577-584 ISSN: 179-595 397 ISBN: 978-96-474-93-2

Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES (i) Ho:GEV(3978,626,-37) vs Ha:GEV(3978,55,-37) (ii) Ho:GEV(3978,626,-37) vs Ha:GEV(3978,4,-37) 2 4 6 8 1 F 2 4 6 8 1 F 5 1 15 5 1 15 (iii) Ho:GEV(3978,626,-37) vs Ha:GEV(42,626,-37) (iv) Ho:GEV(3978,626,-37) vs Ha:GEV(45,626,-37) 2 4 6 8 1 F 2 4 6 8 1 F 5 1 15 5 1 15 (v) Ho:GEV(3978,626,-37) vs Ha:GEV(42,55,-37) (vi) Ho:GEV(3978,626,-37) vs Ha:GEV(45,6,-37) 2 4 6 8 1 F 2 4 6 8 1 F 5 1 15 5 1 15 Fig 3 Power compariso of GOF tests agaist various alterative hypotheses ISSN: 179-595 398 ISBN: 978-96-474-93-2