On Mardia s Tests of Multinormality


 Jessie Dorsey
 2 years ago
 Views:
Transcription
1 On Mardia s Tests of Multinormality Kankainen, A., Taskinen, S., Oja, H. Abstract. Classical multivariate analysis is based on the assumption that the data come from a multivariate normal distribution. The tests of multinormality have therefore received very much attention. Several tests for assessing multinormality, among them Mardia s popular multivariate skewness kurtosis statistics, are based on stardized third fourth moments. In Mardia s construction of the affine invariant test statistics, the data vectors are first stardized using the sample mean vector the sample covariance matrix. In this paper we investigate whether, in the test construction, it is advantagoeus to replace the regular sample mean vector sample covariance matrix by their affine equivariant robust competitors. Limiting distributions of the stardized third fourth moments the resulting test statistics are derived under the null hypothesis are shown to be strongly dependent on the choice of the location vector scatter matrix estimate. Finally, the effects of the modification on the limiting finitesample efficiencies are illustrated by simple examples in the case of testing for the bivariate normality. In the cases studied, the modifications seem to increase the power of the tests. 1. Introduction Classical methods of multivariate analysis are for the most part based on the assumption that the data are coming from a multivariate normal population. In the onesample multinormal case, the regular sample mean vector sample covariance matrix are complete sufficient, but on the other h highly sensitive to outlying observations. It is also well known that the tests estimates based on the sample mean vector sample covariance matrix have poor efficiency properties in the case of heavy tailed noise distributions. Testing for departures from multinormality helps to make practical choices between competing methods of analysis. In the univariate case, stardized third fourth moments b 1 b 2 are often used to indicate the skewness kurtosis. For a rom sample x 1,..., x n from a pvariate distribution with sample mean vector x sample covariance Received by the editors May 2, Mathematics Subject Classification. Primary 62H15; Secondary 62F02. Key words phrases. Kurtosis, Multinormality, Robustness, Skewness.
2 2 Kankainen, A., Taskinen, S., Oja, H. matrix S, Mardia (1970,1974,1980) defined the pvariate skewness kurtosis statistics as b 1,p = ave i,j [(x i x) T S 1 (x j x)] 3 b 2,p = ave i [(x i x) T S 1 (x i x)] 2, respectively. The statistics b 1,p b 2,p are functions of stardized third fourth moments, respectively. From the definition one easily sees that b 1,p b 2,p are invariant under affine transformations. In the univariate case these reduce to the usual univariate skewness kurtosis statistics b 1 b 2. Mardia advocated using the skewness kurtosis statistics to test for multinormality as they are distributionfree under multinormality. See also Bera John (1983) Koziol (1993) for the use of the stardized third fourth moments in the test construction. Intuitively, as in the univariate case, the skewness kurtosis statistics b 1,p b 2,p compare the variation measured by third fourth moments to that measured by second moments. The second, third fourth moments are all highly nonrobust statistics, however, more efficient procedures may be obtained through comparisons of robust nonrobust measures of variations. It is therefore an appealing idea to study whether it is useful or reasonable to replace the regular sample mean vector x sample covariance matrix S by some affine equivariant robust competitors, location vector estimate ˆµ scatter matrix estimate ˆΣ. The paper is organised as follows. In Section 2 the limiting distributions of Mardia s test statistics under the null model of multinormality are discussed. The test statistics are based on the stardized third fourth moments; the regular mean vector regular covariance matrix are used in the stardization. Section 3 lists the tools assumptions for alternative nconsistent affine equivariant location scatter estimates. The location scatter estimates are then used in the regular manner to stardize the data vectors. The limiting distributions of the stardized third fourth moments in this general case are found in Section 4. As an illustration of the theory, the effect of the way of stardization on the limiting small sample efficiencies in the bivariate case are studied in Sections 5 6. The auxiliary lemmas proofs have been postponed to the Appendix. 2. Limiting null distributions of Mardia s statistics In this section we recall the wellknown results concerning the limiting null distributions of the Mardia s skewness kurtosis statistics. As, due to affine invariance, the distributions of the test statistics do not depend on the unknown µ Σ, it is not a restriction to assume in the following derivations that x 1,..., x n is a rom sample from the N p (0, I p ) distribution. Write z i = S 1/2 (x i x), i = 1,..., n,
3 On Mardia s Tests of Multinormality 3 for the data vectors stardized in the classical way. Mardia s skewness statistic can be decomposed as b 1,p = 6 j<k<l[ave i {z ij z ik z il }] i {zijz j k[ave 2 ik }] 2 + [ave i {zij}] 3 2 j the Mardia s kurtosis statistic is similarly b 2,p = j k ave i {z 2 ijz 2 ik} + j ave i {z 4 ij}. See e.g. Koziol (1993). Under multinormality b 1,p b 2,p are affine invariant, all the stardized third fourth moments are asymptotically normal asymptotically independent consequently the limiting distributions of n b 1,p 6 n b 2,p p(p + 2) 8p(p + 2) are a chisquare distribution with p(p+1)(p+2)/6 degrees of freedom a N(0, 1) distribution, respectively. 3. Location scatter estimates In Mardia s test construction, the data vectors are stardized using the regular sample mean vector x sample covariance matrix S. In this paper we thus consider what happens if these are replaced by other nconsistent affine equivariant location scatter estimates of µ Σ, say, ˆµ ˆΣ. Next we list assumptions useful notations for location vector estimate ˆµ scatter matrix estimate ˆΣ. First we assume that, for the N p (0, I p ) distribution, the influence functions of these location scatter functionals at z are γ(r) u α(r) uu T β(r) I p, respectively, where r = z u = z 1 z. We will later see that the limiting distributions of the modified test statistics depend on the used location scatter estimates through these functions. For the inluence functions of location scatter functionals, see also Hampel et al. (1986), Croux Haesbroek (2000) Ollila et al. (2003a,b). We further assume that, again under the N p (0, I p ) distribution, t = n ave{γ(r i )u i } = n ˆµ + o P (1) C = n ave{α(r i )u i u T i β(r i )I p } = n (ˆΣ I p ) + o P (1). Finally, assume that, for the N p (0, I p ) distribution, the expected values E(γ 2 (r)), E(α 2 (r)) E(β 2 (r)) are finite. These assumptions imply that the limiting distributions of nˆµ n vec(ˆσ I p ) are p p 2 variate normal distributions
4 4 Kankainen, A., Taskinen, S., Oja, H. with zero mean vectors covariance matrices E[γ 2 (r)] I p p E[α 2 (r)] p(p + 2) [I p 2+I p,p+vec(i p )vec(i p ) T ]+ (E[β 2 (r)] + 2p ) E[α(r)β(r)] vec(i p )vec(i p ) T, respectively. (Here I p,p is the so called commutation matrix. See e.g. Ollila et al. (2003b).) 4. Limiting distribution of the stardized third fourth moments Consider now any estimators ˆµ ˆΣ write z i = ˆΣ 1/2 (x i ˆµ), i = 1,..., n, for the stardized observations. Denote the stardized third fourth moments by U jkl = n ave i {z ij z ik z il }, V jk = n ave i {z 2 ijz 2 ik 1} V jj = n ave i {z 4 ij 3}. The limiting distributions of the stardized third fourth moments are given by the following theorem. The results easily follow from Lemmas stated proven in the Appendix. Theorem 4.1. Under multinormality, the limiting distribution of the vector of all possible third moments U jkl (j < k < l), U jjk (j k) U jjj is a multivariate normal distribution with zero mean vector limiting variances covariances where V ar(u jkl ) = 1 V ar(u jjk ) = 2 + κ V ar(u jjj ) = 6 + 9κ Cov(U jjk, U llk ) = κ Cov(U jjk, U kkk ) = 3κ κ = 1 + E[γ2 (r)] p All the other covariances are zero. 2 E[r3 γ(r)] p(p + 2).
5 On Mardia s Tests of Multinormality 5 Theorem 4.2. Under multinormality, the limiting distribution of the vector of fourth moments V jk (j k) V jj is a multivariate normal distribution with zero mean vector limiting variances covariances V ar(v jk ) = 4 + τ 1 + 2τ 2 V ar(v jj ) = τ τ 2 Cov(V jk, V jl ) = τ 1 + τ 2 Cov(V jk, V ml ) = τ 1 Cov(V jj, V kk ) = 9τ 1 Cov(V jk, V jj ) = 3τ 1 + 6τ 2 Cov(V jk, V ll ) = 3τ 1 where τ 1 = 4 [ E[α 2 ] (r)] p(p + 2) 2E[α(r)β(r)] + E[β 2 (r)] E[α(r)r4 ] p p(p + 2)(p + 4) + E[β(r)r4 ] p(p + 2) τ 2 = E[α2 (r)] p(p + 2) 4 E[α(r)r 4 ] p(p + 2)(p + 4). We close this section with some discussion on the implications of the above results. First note that, if the regular mean vector sample covariance matrix are used in the stardization, then simply which gives γ(r) = r, α(r) = r 2 β(r) = 1 κ = τ 1 = τ 2 = 0 all the third moments fourth moments are asymptotically mutually independent, respectively. For other location scatter statistics this is not necessarily true also the limiting variances may vary. This means that the limiting distributions efficiencies of b 1,p b 2,p may drastically change. The limiting distribution of nb 1,p /6 is a weighted sum of chisquare variables with one degree of freedom which makes the calculation of the approximate pvalue less practical. The limiting distribution of n(b 2,p p(p + 2)) is still normal with mean value zero, but its limiting variance depends on the chosen scatter matrix. In the last two section we illustrate the impact of the choice of the location vector scatter matrix estimate on the limiting distribution efficiency in the bivariate case. The extension to the general pvariate case is straightforward. 5. Tests for bivariate normality Assume now that x 1,..., x n is a rom sample from an unknown bivariate distribution we wish to test the null hypothesis H 0 of bivariate normality. For
6 6 Kankainen, A., Taskinen, S., Oja, H. limiting power considerations we use the contaminated bivariate normal distribution, the so called Tukey distribution: We say that the distribution of x is a bivariate Tukey distribution T (, µ, σ) if the pdf of x is f(x) = (1 )φ(x) + σ 2 φ((x µ)/σ), where φ(x) is the pdf of N 2 (0, I 2 ) distribution [0, 1]. Alternative sequences H n 1 : T (, µ, 1) H n 2 : T (, 0, σ) with = δ/ n, r = µ > 0 σ > 1 are then used for the comparisons of the skewness kurtosis test statistics, respectively. Write now T 1 = (U 112, U 221, U 111, U 222 ) T T 2 = (V 11, V 22, V 12 ) T for the vectors of the stardized third fourth moments. Using the results in Section 4, one directly gets the following results. Corollary 5.1. Under H 0, the limiting distribution of nt 1 is a 4variate normal distribution with zero mean vector covariance matrix Ω 1 = κ Corollary 5.2. Under H 0, the limiting distribution of nt 2 is a 3variate normal distribution with zero mean vector covariance matrix Ω 2 = τ τ It is now possible to construct the limiting distributions of the affine invariant test statistics b 1,2 b 2,2. The comparison of the effect of different stardizations is possible but not easy, however, as the limiting distributions may be of different type. This is avoided if one uses the following test statistic versions: Definition 5.3. Modified Mardia s skewness kurtosis statistics in the bivariate case are b 1,2 = T 1 T Ω 1 1 T 1 b 2,2 = (1, 1, 2)T 2. Note that, if the sample mean covariance matrix are used, the definition gives b 1,2 = b 1,2 /6 b 2,2 = b 2,2 8, respectively. In the former case, with general ˆµ ˆΣ, the affine equivariance property is lost. The alternative to the regular sample mean vector sample covariance matrix used in this example is the affine equivariant Oja median (Oja, 1983) the related scatter matrix estimate based on the Oja sign covariance matrix (SCM)..
7 On Mardia s Tests of Multinormality 7 See Visuri et al. (2000) Ollila et al. (2003b) for the SCM. Note also that the scatter matrix estimate based on the Oja SCM is asymptotically equivalent with the zonoid covariance matrix (ZCM) can be seen as an affine equivariant extension on mean deviation. For the definition use of the ZCM, see Koshevoy et al (2003). We chose these statistics as their influence functions are relatively simple at the N 2 (0, I 2 ) case, the influence functions are given by [ ] 2 2 γ(r) = 2 π, α(r) = 2 2 π r 1 β(r) = 1. The constants needed for the asymptotic variances covariances are then κ = 4 π 1 2, τ 1 = 32 π 29 τ 2 = 16 3 π Consider first the alternative contiguous sequence H1 n with µ = (r, 0,..., 0) T. Using classical LeCam s third lemma one then easily sees that, under the alternative sequence H1 n, the limiting distribution of nb 1,2 is a noncentral chisquare distribution with 4 degrees of freedom noncentrality parameter δ T 1 Ω 1 1 δ 1, where δ 1 depends on the location estimate used in the stardization. In the regular case (sample mean vector), the Oja median gives δ 1 = (0, 0, r 3, 0) T δ 1 = (0, r q(r)2 2/π, r 3 + 3r q(r)6 2/π, 0) T. The function q(r) = R(µ) is easily computable defined through the so called spatial rank function R(µ) = E( µ x 1 (µ x)) with x N 2 (0, I 2 ). See Möttönen Oja (1995). Next we consider the sequence H2 n.then V ar(b 2,2) = (1, 1, 2) Ω under H2 n the limiting distribution of n(b 2,2) 2 /V ar(b 2,2) is a noncentral chisquare distribution with one degree of freedom noncentrality parameter [(1, 1, 2)δ 2 ] 2 /V ar(b 2,2), where δ 2 in this case depends on the scatter estimate used in the stardization. In the regular case (sample covariance matrix) we get the ZCM gives δ 2 = (3(σ 2 1) 2, 3(σ 2 1) 2, (σ 2 1) 2 ) T δ 2 = (3(σ 4 1) 12(σ 1), 3(σ 4 1) 12(σ 1), (σ 4 1) 4(σ 1)) T In Figure 1 the asymptotical relative efficiencies of the new tests (using the Oja median the ZCM estimate) relative to the regular tests (using the sample mean sample covariance matrix) are illustrated for different values of r
8 8 Kankainen, A., Taskinen, S., Oja, H. for different values of σ. The AREs are then simply the ratios of the noncentrality parameters. The new tests seem to perform very well: The test statistics using robust stardization seem to perform better than the tests with regular location scatter estimates. ARE ARE r σ Figure 1. Limiting efficiencies of the tests based on the Oja median the ZCM relative to the regular Mardia s tests for different values of r σ. The left (right) panel illustrates the efficiency of b 1,2 (b 2,2). 6. A real data example In our real data example we consider two bivariate data sets of n = 50 observations the combined data set (n = 100). The data sets are shown explained in Figure 2. We wish to test the null hypothesis of bivariate normality. Again the the skewness kurtosis statistics to be compared are those based on the sample mean vector sample covariance matrix (regular Mardia s b 1,2 b 2,2 ) those based on the Oja median the ZCM (denoted by b 1,2 b 2,2). The values of the test statistics the approximate pvalues were calculated for the three data sets ( Boys, Girls, Combined ). See Table 1 for the results. As expected, the observed pvalues were much smaller in the combined data case. As discussed in the introduction, the statistics b 1,2 b 2,2 compare the third fourth central moments to the second ones (given by the sample covariance matrix). The statistics b 1,2 b 2,2 use robust location vector scatter matrix estimates in this comparison. One can then expect that, for data sets with outliers or for data sets coming from heavy tailed distributions, the comparison to robust estimates tends to produce higher values of the test statistics smaller pvalues.
9 On Mardia s Tests of Multinormality 9 Weights at birth (kg) Heights of the mothers (cm) Figure 2. The height of the mother the birth weight of the child measured on 50 boys who have siblings (o) 50 girls (x) who do not have siblings. Table 1. The observed values of the test statistics with corresponding pvalues for the three data sets. Boys Girls Combined Test statistic pvalue Test statistic pvalue Test statistic pvalue b 1, b 1, b 2, b 2, In our examples for the analyses of bivariate data we used the Oja median the ZCM to stardize the data vectors. In the general pvariate case (with large p large n) these estimates are, however, computationally highly intensive, should be replaced by more practical robust location scatter estimates.
10 10 Kankainen, A., Taskinen, S., Oja, H. 7. Appendix: Auxiliary lemmas The limiting distributions of the stardized third fourth moments are given by the following linear approximations. Lemma 7.1. Let x 1,..., x n be a rom sample from N(0, I p )distribution let z i = ˆΣ 1/2 (x i ˆµ), i = 1,..., n, be the stardized observations. Write also r i = x i u i = x i 1 x i. Then n avei {z ij z ik z il } = n ave i {x ij x ik x il } + o P (1) = n ave i {r 3 i u ij u ik u il } + o P (1) for j < k < l, n avei {z 2 ijz ik } = n ave i {x 2 ijx ik } t k + o P (1) = n ave i {r 3 i u 2 iju ik γ(r i )u ik } + o P (1) for j k n avei {z 3 ij} = n ave i {x 3 ij} 3t j + o P (1) = n ave i {r 3 i u 3 ij 3γ(r i )u ij } + o P (1). Lemma 7.2. Let x 1,..., x n be a rom sample from N(0, I p )distribution let z i = ˆΣ 1/2 (x i ˆµ), i = 1,..., n, be the stardized observations. Again, r i = x i u i = x i 1 x i. Then for j k, n avei {z 2 ijz 2 ik 1} = n ave i {x 2 ijx 2 ik 1} C jj C kk + o P (1) = n ave i {r 4 i u 2 iju 2 ik 1 α(r i )[u 2 ij + u 2 ik] + 2β(r i )} + o P (1) n avei {z 4 ij 3} = n ave i {x 4 ij 3} 6C jj + o P (1) = n ave i {r 4 i u 4 ij 3 6α(r i )u 2 ij + 6β(r i )} + o P (1). Proofs of the lemmas: Note first that Thus z i = ˆΣ 1/2 (x i ˆµ) = x i 1 n t 1 2 n Cx i + o(1/ n). z ij = x ij 1 n t j 1 2 n z 2 ij = x 2 ij 2 n t j x ij 1 n z 3 ij = x 3 ij 3 n t j x 2 ij 3 2 n p C jr x ir + o(1/ n), r=1 p C jr x ir x ij + o(1/ n), r=1 p C jr x ir x 2 ij + o(1/ n) r=1
11 On Mardia s Tests of Multinormality 11 z 4 ij = x 4 ij 4 n t j x 3 ij 2 n p C jr x ir x 3 ij + o(1/ n). As the observations come from the N p (0, I p ) distribution (all the moments exist), the stard law of large numbers central limit theorem together with Slutsky s lemma yield the result. r=1 Acknowledgements We thank the referees for the critical constructive comments. Finally, we wish to mention that, after writing the paper, we learned about the paper Croux et al. (2003) with a very similar but more general approach. Croux his cowriters study the behaviour of the infromation matrix (IM) test when the maximum likelihood estimates are replaced by robust estimates. One of their examples is the modified JarqueBera test for univariate skewness kurtosis. Their conclusion also is that the use of the robust estimates icreases the power of the IM test. References [1] A. Bera S. John, Tests for multivariate normality with Pearson alternatives. Commun. Statist. Theor. Meth. 12(1) (1983), [2] C. Croux, G. Dhaene D. Hoorelbeke, Testing the information matrix equality with robust estimators. Submitted. (Website address: [3] C. Croux G. Haesbroeck, Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions efficiencies. Biometrika 87 (2000), [4] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, P. J. W.A. Stahel, Robust Statistics: The Approach Based on Influence Functions. Wiley, [5] G. Koshevoy, J. Mttnen H. Oja, A scatter matrix estimate based on the zonotope. Ann. Statist. 31 (2003), [6] J. A. Koziol, Probability plots for assessing multivariate normality. Statistician 42 (1993), [7] K. V. Mardia, Measures of multivariate skewness kurtosis with applications. Biometrika 57 (1970), [8] K. V. Mardia, Applications of some measures of multivariate skewness kurtosis in testing normality robustness studies. Sankhyã, Ser B 36(2) (1974), [9] K. V. Mardia, Tests of univariate multivariate normality. In: P.R. Krishnaiah (ed.), Hbook of Statistics, vol 1, , North Holl, [10] J. Möttönen H. Oja, Multivariate spatial sign rank methods. J. Nonparam. Statist. 5 (1995), [11] H. Oja, Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1 (1983),
12 12 Kankainen, A., Taskinen, S., Oja, H. [12] E. Ollila, T.P. Hettmansperger H. Oja, Affine equivariant multivariate sign methods. (2003a) Under revision. [13] E. Ollila, H. Oja C. Croux, The affine equivariant sign covariance matrix: Asymptotic behavior efficiencies. J. Multiv.Analysis (2003b), in print. [14] S. Visuri, V. Koivunen, H. Oja, Sign rank covariance matrices. J. Statist. Plann. Inference 91 (2000), University of Jyväskylä, Department of Mathematics Statistics, P.O. Box 35 (MaD), FIN University of Jyväskylä, Finl address: University of Jyväskylä, Department of Mathematics Statistics, P.O. Box 35 (MaD), FIN University of Jyväskylä, Finl address: University of Jyväskylä, Department of Mathematics Statistics, P.O. Box 35 (MaD), FIN University of Jyväskylä, Finl address:
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationSections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 14730278 On Testing for Diagonality of Large Dimensional
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X 1, X,... of independent and
More informationMultivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 Onesample ttest (univariate).................................................. 3 Twosample ttest (univariate).................................................
More informationMultivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A WileyInterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationPROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION ChinDiew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationLinear combinations of parameters
Linear combinations of parameters Suppose we want to test the hypothesis that two regression coefficients are equal, e.g. β 1 = β 2. This is equivalent to testing the following linear constraint (null
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationConvex Hull Probability Depth: first results
Conve Hull Probability Depth: first results Giovanni C. Porzio and Giancarlo Ragozini Abstract In this work, we present a new depth function, the conve hull probability depth, that is based on the conve
More informationAMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015
AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationIRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationEC327: Advanced Econometrics, Spring 2007
EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Appendix D: Summary of matrix algebra Basic definitions A matrix is a rectangular array of numbers, with m
More informationA MULTIVARIATE OUTLIER DETECTION METHOD
A MULTIVARIATE OUTLIER DETECTION METHOD P. Filzmoser Department of Statistics and Probability Theory Vienna, AUSTRIA email: P.Filzmoser@tuwien.ac.at Abstract A method for the detection of multivariate
More informationLife Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
More informationAn extension of the factoring likelihood approach for nonmonotone missing data
An extension of the factoring likelihood approach for nonmonotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationRobust Estimation of the Correlation Coefficient: An Attempt of Survey
AUSTRIAN JOURNAL OF STATISTICS Volume 40 (2011), Number 1 & 2, 147 156 Robust Estimation of the Correlation Coefficient: An Attempt of Survey Georgy Shevlyakov and Pavel Smirnov St. Petersburg State Polytechnic
More information, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (
Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we
More informationOPTIMAl PREMIUM CONTROl IN A NONliFE INSURANCE BUSINESS
ONDERZOEKSRAPPORT NR 8904 OPTIMAl PREMIUM CONTROl IN A NONliFE INSURANCE BUSINESS BY M. VANDEBROEK & J. DHAENE D/1989/2376/5 1 IN A OPTIMAl PREMIUM CONTROl NONliFE INSURANCE BUSINESS By Martina Vandebroek
More informationProbability and Statistics
CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b  0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute  Systems and Modeling GIGA  Bioinformatics ULg kristel.vansteen@ulg.ac.be
More informationMath 576: Quantitative Risk Management
Math 576: Quantitative Risk Management Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 4 Haijun Li Math 576: Quantitative Risk Management Week 4 1 / 22 Outline 1 Basics
More information3. The Multivariate Normal Distribution
3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationSummary of Probability
Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Nonparametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationUncertainty quantification for the familywise error rate in multivariate copula models
Uncertainty quantification for the familywise error rate in multivariate copula models Thorsten Dickhaus (joint work with Taras Bodnar, Jakob Gierl and Jens Stange) University of Bremen Institute for
More informationFinancial Time Series Analysis (FTSA) Lecture 1: Introduction
Financial Time Series Analysis (FTSA) Lecture 1: Introduction Brief History of Time Series Analysis Statistical analysis of time series data (Yule, 1927) v/s forecasting (even longer). Forecasting is often
More informationSpearman s correlation
Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship
More informationA SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZVILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationMathematics within the Psychology Curriculum
Mathematics within the Psychology Curriculum Statistical Theory and Data Handling Statistical theory and data handling as studied on the GCSE Mathematics syllabus You may have learnt about statistics and
More informationStatistical Significance and Bivariate Tests
Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Refamiliarize ourselves with basic statistics ideas: sampling distributions,
More informationA THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA
A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects
More informationThe Hadamard Product
The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationON THE CALCULATION OF A ROBUST SESTIMATOR OF A COVARIANCE MATRIX
STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST SESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information
More informationTest of Hypotheses. Since the NeymanPearson approach involves two statistical hypotheses, one has to decide which one
Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win $1 if head and lose $1 if tail. After 10 plays, you lost $2 in net (4 heads
More informationNon Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 19967 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationDescriptive Data Summarization
Descriptive Data Summarization (Understanding Data) First: Some data preprocessing problems... 1 Missing Values The approach of the problem of missing values adopted in SQL is based on nulls and threevalued
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationOn UMVU Estimator of the Generalized Variance for Natural Exponential Families
Monografías del Seminario Matemático García de Galdeano. 27: 353 360, (2003). On UMVU Estimator of the Generalized Variance for Natural Exponential Families Célestin C. Kokonendji Université de Pau et
More informationTwo modified Wilcoxon tests for symmetry about an unknown location parameter
Biometrila (1982). 69. 2. pp. 37782 377 Printed in Great Britain Two modified Wilcoxon tests for symmetry about an unknown location parameter BY P. K. BHATTACHARYA Department of Statistics, University
More informationAn empirical goodnessoffit test for multivariate distributions
An empirical goodnessoffit test for multivariate distributions Michael P. McAssey Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Abstract An empirical test is presented
More informationDescriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination
Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg  ) Chromium (mg kg  ) Copper (mg kg  ) Lead (mg kg  ) Mercury
More informationMälardalen University
Mälardalen University http://www.mdh.se 1/38 Value at Risk and its estimation Anatoliy A. Malyarenko Department of Mathematics & Physics Mälardalen University SE72 123 Västerås,Sweden email: amo@mdh.se
More informationDEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES
DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES Forecasting General Insurance Liabilities by Piet de Jong piet.dejong@mq.edu.au Research Paper No. 2004/03 February 2004 Division of Economic and Financial
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationEach copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.
Robust Tests for the Equality of Variances Author(s): Morton B. Brown and Alan B. Forsythe Source: Journal of the American Statistical Association, Vol. 69, No. 346 (Jun., 1974), pp. 364367 Published
More informationExploratory Data Analysis as an Efficient Tool for Statistical Analysis: A Case Study From Analysis of Experiments
Developments in Applied Statistics Anuška Ferligoj and Andrej Mrvar (Editors) Metodološki zvezki, 9, Ljubljana: FDV, 003 Exploratory Data Analysis as an Efficient Tool for Statistical Analysis: A Case
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationStatistical Foundations: Measures of Location and Central Tendency and Summation and Expectation
Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #49/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location
More informationTHE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service
THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal
More informationOn Small Sample Properties of Permutation Tests: A Significance Test for Regression Models
On Small Sample Properties of Permutation Tests: A Significance Test for Regression Models Hisashi Tanizaki Graduate School of Economics Kobe University (tanizaki@kobeu.ac.p) ABSTRACT In this paper we
More informationOnce saved, if the file was zipped you will need to unzip it.
1 Commands in SPSS 1.1 Dowloading data from the web The data I post on my webpage will be either in a zipped directory containing a few files or just in one file containing data. Please learn how to unzip
More information1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: estim.tex, Dec8, 009: 6 p.m. (draft  typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationPÓLYA URN MODELS UNDER GENERAL REPLACEMENT SCHEMES
J. Japan Statist. Soc. Vol. 31 No. 2 2001 193 205 PÓLYA URN MODELS UNDER GENERAL REPLACEMENT SCHEMES Kiyoshi Inoue* and Sigeo Aki* In this paper, we consider a Pólya urn model containing balls of m different
More informationMedian of the pvalue Under the Alternative Hypothesis
Median of the pvalue Under the Alternative Hypothesis Bhaskar Bhattacharya Department of Mathematics, Southern Illinois University, Carbondale, IL, USA Desale Habtzghi Department of Statistics, University
More informationPearson s correlation
Pearson s correlation Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there
More informationMultiple group discriminant analysis: Robustness and error rate
Institut f. Statistik u. Wahrscheinlichkeitstheorie Multiple group discriminant analysis: Robustness and error rate P. Filzmoser, K. Joossens, and C. Croux Forschungsbericht CS006 Jänner 006 040 Wien,
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More informationSection 6.1  Inner Products and Norms
Section 6.1  Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
More informationCommunication on the Grassmann Manifold: A Geometric Approach to the Noncoherent MultipleAntenna Channel
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent MultipleAntenna Channel Lizhong Zheng, Student
More information2 Testing under Normality Assumption
1 Hypothesis testing A statistical test is a method of making a decision about one hypothesis (the null hypothesis in comparison with another one (the alternative using a sample of observations of known
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationHypothesis Testing in the Classical Regression Model
LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationOn Importance of Normality Assumption in Using a TTest: One Sample and Two Sample Cases
On Importance of Normality Assumption in Using a TTest: One Sample and Two Sample Cases Srilakshminarayana Gali, SDM Institute for Management Development, Mysore, India. Email: lakshminarayana@sdmimd.ac.in
More informationChapter 4: Constrained estimators and tests in the multiple linear regression model (Part II)
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part II) Florian Pelgrin HEC SeptemberDecember 2010 Florian Pelgrin (HEC) Constrained estimators SeptemberDecember
More informationNonparametric TwoSample Tests. Nonparametric Tests. Sign Test
Nonparametric TwoSample Tests Sign test MannWhitney Utest (a.k.a. Wilcoxon twosample test) KolmogorovSmirnov Test Wilcoxon SignedRank Test TukeyDuckworth Test 1 Nonparametric Tests Recall, nonparametric
More informationProbability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0
Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall
More informationData Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments  Introduction
Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments  Introduction
More informationA repeated measures concordance correlation coefficient
A repeated measures concordance correlation coefficient Presented by Yan Ma July 20,2007 1 The CCC measures agreement between two methods or time points by measuring the variation of their linear relationship
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationMultivariate EWMA Control Charts for Monitoring Dispersion Matrix
The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 265273 Multivariate EWMA Control Charts for Monitoring Dispersion Matrix DukJoon Chang 1) and Jae Man Lee 2) Abstract In this paper, we
More informationApplied Statistics Handbook
Applied Statistics Handbook Phil Crewson Version 1. Applied Statistics Handbook Copyright 006, AcaStat Software. All rights Reserved. http://www.acastat.com Protected under U.S. Copyright and international
More informationIdentification of noisy variables for nonmetric and symbolic data in cluster analysis
Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationCOMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS
Bull Austral Math Soc 77 (2008), 31 36 doi: 101017/S0004972708000038 COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS IGOR V EROVENKO and B SURY (Received 12 April 2007) Abstract We compute
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I  1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More information