Package nortest. R topics documented: July 30, 2015. Title Tests for Normality Version 1.0-4 Date 2015-07-29



Similar documents
Package empiricalfdr.deseq2

Package benford.analysis

THE KRUSKAL WALLLIS TEST

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

t-test Statistics Overview of Statistical Tests Assumptions

Variables Control Charts

Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Package retrosheet. April 13, 2015

Package dsstatsclient

Nonparametric Statistics

Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

THE OPEN SOURCE SOFTWARE R IN THE STATISTICAL QUALITY CONTROL

Package dunn.test. January 6, 2016

Diskussionsbeiträge des Fachbereichs Wirtschaftwissenschaft. Jarque-Bera Test and its Competitors for Testing Normality - A Power Comparison

Package sendmailr. February 20, 2015

START Selected Topics in Assurance

Tutorial 5: Hypothesis Testing

Descriptive Statistics: Summary Statistics

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

The Variability of P-Values. Summary

Package uptimerobot. October 22, 2015

Package SHELF. February 5, 2016

Normality Testing in Excel

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

SAS Software to Fit the Generalized Linear Model

Package cpm. July 28, 2015

Goodness-of-Fit Test

Package EstCRM. July 13, 2015

Package survpresmooth

Package lmertest. July 16, 2015

Package TSfame. February 15, 2013

Package tagcloud. R topics documented: July 3, 2015

SAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation

Package missforest. February 20, 2015

NCSS Statistical Software

Package polynom. R topics documented: June 24, Version 1.3-8

Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Chapter 12 Nonparametric Tests. Chapter Table of Contents

Package HHG. July 14, 2015

Study Guide for the Final Exam

Recall this chart that showed how most of our course would be organized:

Introduction. Statistics Toolbox

Package jrvfinance. R topics documented: October 6, 2015

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Package trimtrees. February 20, 2015

Normal Probability Plots and Tests for Normality

Simple Linear Regression Inference

Package decompr. August 17, 2016

Package bigdata. R topics documented: February 19, 2015

Time Series Analysis

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Package hive. July 3, 2015

Rank-Based Non-Parametric Tests

Estimating Industry Multiples

Java Modules for Time Series Analysis

Package syuzhet. February 22, 2015

Package CIFsmry. July 10, Index 6

Package CoImp. February 19, 2015

Permutation Tests for Comparing Two Populations

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Bandwidth Selection for Nonparametric Distribution Estimation

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Chapter 7 Section 1 Homework Set A

Package neuralnet. February 20, 2015

A MULTIVARIATE OUTLIER DETECTION METHOD

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

An Introduction to Modeling Stock Price Returns With a View Towards Option Pricing

Maximally Selected Rank Statistics in R

Package MBA. February 19, Index 7. Canopy LIDAR data

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

NCSS Statistical Software

Lecture Notes Module 1

N-Way Analysis of Variance

Package robfilter. R topics documented: February 20, Version 4.1 Date Title Robust Time Series Filters

2 Sample t-test (unequal sample sizes and unequal variances)

The Dummy s Guide to Data Analysis Using SPSS

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

Package GSA. R topics documented: February 19, 2015

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Package date. R topics documented: February 19, Version Title Functions for handling dates. Description Functions for handling dates.

12: Analysis of Variance. Introduction

ANALYZING NETWORK TRAFFIC FOR MALICIOUS ACTIVITY

Projects Involving Statistics (& SPSS)

Standard Deviation Estimator

Package MDM. February 19, 2015

Package hive. January 10, 2011

A Simple Formula for Operational Risk Capital: A Proposal Based on the Similarity of Loss Severity Distributions Observed among 18 Japanese Banks

Statistical Models in R

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

How To Check For Differences In The One Way Anova

Transcription:

Title Tests for Normality Version 1.0-4 Date 2015-07-29 Package nortest July 30, 2015 Description Five omnibus tests for testing the composite hypothesis of normality. License GPL (>= 2) Imports stats Author Juergen Gross [aut], Uwe Ligges [aut, cre] Maintainer Uwe Ligges <ligges@statistik.tu-dortmund.de> NeedsCompilation no Repository CRAN Date/Publication 2015-07-30 00:14:57 R topics documented: ad.test............................................ 1 cvm.test........................................... 3 lillie.test........................................... 4 pearson.test......................................... 6 sf.test............................................ 7 Index 10 ad.test Anderson-Darling test for normality Description Performs the Anderson-Darling test for the composite hypothesis of normality, see e.g. Thode (2002, Sec. 5.1.4). 1

2 ad.test Usage ad.test(x) Arguments Details Value x a numeric vector of data values, the number of which must be greater than 7. Missing values are allowed. The Anderson-Darling test is an EDF omnibus test for the composite hypothesis of normality. The test statistic is A = n 1 n [2i 1][ln(p (i) ) + ln(1 p (n i+1) )], n i=1 where p (i) = Φ([x (i) x]/s). Here, Φ is the cumulative distribution function of the standard normal distribution, and x and s are mean and standard deviation of the data values. The p-value is computed from the modified statistic Z = A(1.0 + 0.75/n + 2.25/n 2 )\ according to Table 4.9 in Stephens (1986). A list with class htest containing the following components: statistic p.value method data.name the value of the Anderson-Darling statistic. the p-value for the test. the character string Anderson-Darling normality test. a character string giving the name(s) of the data. Note The Anderson-Darling test is the recommended EDF test by Stephens (1986). Compared to the Cramer-von Mises test (as second choice) it gives more weight to the tails of the distribution. Author(s) Juergen Gross References Stephens, M.A. (1986): Tests based on EDF statistics. In: D Agostino, R.B. and Stephens, M.A., eds.: Goodness-of-Fit Techniques. Marcel Dekker, New York. See Also Thode Jr., H.C. (2002): Testing for Normality. Marcel Dekker, New York. shapiro.test for performing the Shapiro-Wilk test for normality. cvm.test, lillie.test, pearson.test, sf.test for performing further tests for normality. qqnorm for producing a normal quantilequantile plot.

cvm.test 3 Examples ad.test(rnorm(100, mean = 5, sd = 3)) ad.test(runif(100, min = 2, max = 4)) cvm.test Cramer-von Mises test for normality Description Performs the Cramer-von Mises test for the composite hypothesis of normality, see e.g. Thode (2002, Sec. 5.1.3). Usage cvm.test(x) Arguments x a numeric vector of data values, the number of which must be greater than 7. Missing values are allowed. Details Value The Cramer-von Mises test is an EDF omnibus test for the composite hypothesis of normality. The test statistic is W = 1 n 12n + ( p (i) 2i 1 ) 2, 2n i=1 where p (i) = Φ([x (i) x]/s). Here, Φ is the cumulative distribution function of the standard normal distribution, and x and s are mean and standard deviation of the data values. The p-value is computed from the modified statistic Z = W (1.0 + 0.5/n) according to Table 4.9 in Stephens (1986). A list with class htest containing the following components: statistic p.value method data.name the value of the Cramer-von Mises statistic. the p-value for the test. the character string Cramer-von Mises normality test. a character string giving the name(s) of the data. Author(s) Juergen Gross

4 lillie.test References Stephens, M.A. (1986): Tests based on EDF statistics. In: D Agostino, R.B. and Stephens, M.A., eds.: Goodness-of-Fit Techniques. Marcel Dekker, New York. See Also Thode Jr., H.C. (2002): Testing for Normality. Marcel Dekker, New York. shapiro.test for performing the Shapiro-Wilk test for normality. ad.test, lillie.test, pearson.test, sf.test for performing further tests for normality. qqnorm for producing a normal quantilequantile plot. Examples cvm.test(rnorm(100, mean = 5, sd = 3)) cvm.test(runif(100, min = 2, max = 4)) lillie.test Lilliefors (Kolmogorov-Smirnov) test for normality Description Usage Performs the Lilliefors (Kolmogorov-Smirnov) test for the composite hypothesis of normality, see e.g. Thode (2002, Sec. 5.1.1). lillie.test(x) Arguments Details x a numeric vector of data values, the number of which must be greater than 4. Missing values are allowed. The Lilliefors (Kolmogorov-Smirnov) test is an EDF omnibus test for the composite hypothesis of normality. The test statistic is the maximal absolute difference between empirical and hypothetical cumulative distribution function. It may be computed as D = max{d +, D } with D + = max {i/n p (i)}, D = max {p (i) (i 1)/n}, i=1,...,n i=1,...,n where p (i) = Φ([x (i) x]/s). Here, Φ is the cumulative distribution function of the standard normal distribution, and x and s are mean and standard deviation of the data values. The p-value is computed from the Dallal-Wilkinson (1986) formula, which is claimed to be only reliable when the p-value is smaller than 0.1. If the Dallal-Wilkinson p-value turns out to be greater than 0.1, then the p-value is computed from the distribution of the modified statistic Z = D( n 0.01 + 0.85/ n), see Stephens (1974), the actual p-value formula being obtained by a simulation and approximation process.

lillie.test 5 Value A list with class htest containing the following components: statistic p.value method data.name the value of the Lilliefors (Kolomogorv-Smirnov) statistic. the p-value for the test. the character string Lilliefors (Kolmogorov-Smirnov) normality test. a character string giving the name(s) of the data. Note The Lilliefors (Kolomorov-Smirnov) test is the most famous EDF omnibus test for normality. Compared to the Anderson-Darling test and the Cramer-von Mises test it is known to perform worse. Although the test statistic obtained from lillie.test(x) is the same as that obtained from ks.test(x, "pnorm", mean(x), sd(x)), it is not correct to use the p-value from the latter for the composite hypothesis of normality (mean and variance unknown), since the distribution of the test statistic is different when the parameters are estimated. The function call lillie.test(x) essentially produces the same result as the S-PLUS function call ks.gof(x) with the distinction that the p-value is not set to 0.5 when the Dallal-Wilkinson approximation yields a p-value greater than 0.1. (Actually, the alternative p-value approximation is provided for the complete range of test statistic values, but is only used when the Dallal-Wilkinson approximation fails.) Author(s) Juergen Gross References Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors test for normality. The American Statistician, 40, 294 296. Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730 737. See Also Thode Jr., H.C. (2002): Testing for Normality. Marcel Dekker, New York. shapiro.test for performing the Shapiro-Wilk test for normality. ad.test, cvm.test, pearson.test, sf.test for performing further tests for normality. qqnorm for producing a normal quantilequantile plot. Examples lillie.test(rnorm(100, mean = 5, sd = 3)) lillie.test(runif(100, min = 2, max = 4))

6 pearson.test pearson.test Pearson chi-square test for normality Description Performs the Pearson chi-square test for the composite hypothesis of normality, see e.g. Thode (2002, Sec. 5.2). Usage pearson.test(x, n.classes = ceiling(2 * (n^(2/5))), adjust = TRUE) Arguments x a numeric vector of data values. Missing values are allowed. n.classes The number of classes. The default is due to Moore (1986). adjust logical; if TRUE (default), the p-value is computed from a chi-square distribution with n.classes-3 degrees of freedom, otherwise from a chi-square distribution with n.classes-1 degrees of freedom. Details The Pearson test statistic is P = (C i E i ) 2 /E i, where C i is the number of counted and E i is the number of expected observations (under the hypothesis) in class i. The classes are build is such a way that they are equiprobable under the hypothesis of normality. The p-value is computed from a chi-square distribution with n.classes-3 degrees of freedom if adjust is TRUE and from a chi-square distribution with n.classes-1 degrees of freedom otherwise. In both cases this is not (!) the correct p-value, lying somewhere between the two, see also Moore (1986). Value A list with class htest containing the following components: statistic p.value method data.name n.classes the value of the Pearson chi-square statistic. the p-value for the test. the character string Pearson chi-square normality test. a character string giving the name(s) of the data. the number of classes used for the test. df the degress of freedom of the chi-square distribution used to compute the p- value.

sf.test 7 Note The Pearson chi-square test is usually not recommended for testing the composite hypothesis of normality due to its inferior power properties compared to other tests. It is common practice to compute the p-value from the chi-square distribution with n.classes - 3 degrees of freedom, in order to adjust for the additional estimation of two parameters. (For the simple hypothesis of normality (mean and variance known) the test statistic is asymptotically chi-square distributed with n.classes - 1 degrees of freedom.) This is, however, not correct as long as the parameters are estimated by mean(x) and var(x) (or sd(x)), as it is usually done, see Moore (1986) for details. Since the true p-value is somewhere between the two, it is suggested to run pearson.test twice, with adjust = TRUE (default) and with adjust = FALSE. It is also suggested to slightly change the default number of classes, in order to see the effect on the p-value. Eventually, it is suggested not to rely upon the result of the test. The function call pearson.test(x) essentially produces the same result as the S-PLUS function call chisq.gof((x-mean(x))/sqrt(var(x)), n.param.est=2). Author(s) Juergen Gross References Moore, D.S. (1986): Tests of the chi-squared type. In: D Agostino, R.B. and Stephens, M.A., eds.: Goodness-of-Fit Techniques. Marcel Dekker, New York. Thode Jr., H.C. (2002): Testing for Normality. Marcel Dekker, New York. See Also shapiro.test for performing the Shapiro-Wilk test for normality. ad.test, cvm.test, lillie.test, sf.test for performing further tests for normality. qqnorm for producing a normal quantilequantile plot. Examples pearson.test(rnorm(100, mean = 5, sd = 3)) pearson.test(runif(100, min = 2, max = 4)) sf.test Shapiro-Francia test for normality Description Performs the Shapiro-Francia test for the composite hypothesis of normality, see e.g. Thode (2002, Sec. 2.3.2).

8 sf.test Usage sf.test(x) Arguments x a numeric vector of data values, the number of which must be between 5 and 5000. Missing values are allowed. Details The test statistic of the Shapiro-Francia test is simply the squared correlation between the ordered sample values and the (approximated) expected ordered quantiles from the standard normal distribution. The p-value is computed from the formula given by Royston (1993). Value A list with class htest containing the following components: statistic p.value method data.name the value of the Shapiro-Francia statistic. the p-value for the test. the character string Shapiro-Francia normality test. a character string giving the name(s) of the data. Note The Shapiro-Francia test is known to perform well, see also the comments by Royston (1993). The expected ordered quantiles from the standard normal distribution are approximated by qnorm(ppoints(x, a = 3/8)), being slightly different from the approximation qnorm(ppoints(x, a = 1/2)) used for the normal quantile-quantile plot by qqnorm for sample sizes greater than 10. Author(s) Juergen Gross References Royston, P. (1993): A pocket-calculator algorithm for the Shapiro-Francia test for non-normality: an application to medicine. Statistics in Medicine, 12, 181 184. Thode Jr., H.C. (2002): Testing for Normality. Marcel Dekker, New York. See Also shapiro.test for performing the Shapiro-Wilk test for normality. ad.test, cvm.test, lillie.test, pearson.test for performing further tests for normality. qqnorm for producing a normal quantilequantile plot.

sf.test 9 Examples sf.test(rnorm(100, mean = 5, sd = 3)) sf.test(runif(100, min = 2, max = 4))

Index Topic htest ad.test, 1 cvm.test, 3 lillie.test, 4 pearson.test, 6 sf.test, 7 ad.test, 1, 4, 5, 7, 8 cvm.test, 2, 3, 5, 7, 8 lillie.test, 2, 4, 4, 7, 8 pearson.test, 2, 4, 5, 6, 8 qqnorm, 2, 4, 5, 7, 8 sf.test, 2, 4, 5, 7, 7 shapiro.test, 2, 4, 5, 7, 8 10