On Mardia s Tests of Multinormality

Size: px
Start display at page:

Download "On Mardia s Tests of Multinormality"

Transcription

1 On Mardia s Tests of Multinormality Kankainen, A., Taskinen, S., Oja, H. Abstract. Classical multivariate analysis is based on the assumption that the data come from a multivariate normal distribution. The tests of multinormality have therefore received very much attention. Several tests for assessing multinormality, among them Mardia s popular multivariate skewness kurtosis statistics, are based on stardized third fourth moments. In Mardia s construction of the affine invariant test statistics, the data vectors are first stardized using the sample mean vector the sample covariance matrix. In this paper we investigate whether, in the test construction, it is advantagoeus to replace the regular sample mean vector sample covariance matrix by their affine equivariant robust competitors. Limiting distributions of the stardized third fourth moments the resulting test statistics are derived under the null hypothesis are shown to be strongly dependent on the choice of the location vector scatter matrix estimate. Finally, the effects of the modification on the limiting finite-sample efficiencies are illustrated by simple examples in the case of testing for the bivariate normality. In the cases studied, the modifications seem to increase the power of the tests. 1. Introduction Classical methods of multivariate analysis are for the most part based on the assumption that the data are coming from a multivariate normal population. In the one-sample multinormal case, the regular sample mean vector sample covariance matrix are complete sufficient, but on the other h highly sensitive to outlying observations. It is also well known that the tests estimates based on the sample mean vector sample covariance matrix have poor efficiency properties in the case of heavy tailed noise distributions. Testing for departures from multinormality helps to make practical choices between competing methods of analysis. In the univariate case, stardized third fourth moments b 1 b 2 are often used to indicate the skewness kurtosis. For a rom sample x 1,..., x n from a p-variate distribution with sample mean vector x sample covariance Received by the editors May 2, Mathematics Subject Classification. Primary 62H15; Secondary 62F02. Key words phrases. Kurtosis, Multinormality, Robustness, Skewness.

2 2 Kankainen, A., Taskinen, S., Oja, H. matrix S, Mardia (1970,1974,1980) defined the p-variate skewness kurtosis statistics as b 1,p = ave i,j [(x i x) T S 1 (x j x)] 3 b 2,p = ave i [(x i x) T S 1 (x i x)] 2, respectively. The statistics b 1,p b 2,p are functions of stardized third fourth moments, respectively. From the definition one easily sees that b 1,p b 2,p are invariant under affine transformations. In the univariate case these reduce to the usual univariate skewness kurtosis statistics b 1 b 2. Mardia advocated using the skewness kurtosis statistics to test for multinormality as they are distribution-free under multinormality. See also Bera John (1983) Koziol (1993) for the use of the stardized third fourth moments in the test construction. Intuitively, as in the univariate case, the skewness kurtosis statistics b 1,p b 2,p compare the variation measured by third fourth moments to that measured by second moments. The second, third fourth moments are all highly non-robust statistics, however, more efficient procedures may be obtained through comparisons of robust nonrobust measures of variations. It is therefore an appealing idea to study whether it is useful or reasonable to replace the regular sample mean vector x sample covariance matrix S by some affine equivariant robust competitors, location vector estimate ˆµ scatter matrix estimate ˆΣ. The paper is organised as follows. In Section 2 the limiting distributions of Mardia s test statistics under the null model of multinormality are discussed. The test statistics are based on the stardized third fourth moments; the regular mean vector regular covariance matrix are used in the stardization. Section 3 lists the tools assumptions for alternative n-consistent affine equivariant location scatter estimates. The location scatter estimates are then used in the regular manner to stardize the data vectors. The limiting distributions of the stardized third fourth moments in this general case are found in Section 4. As an illustration of the theory, the effect of the way of stardization on the limiting small sample efficiencies in the bivariate case are studied in Sections 5 6. The auxiliary lemmas proofs have been postponed to the Appendix. 2. Limiting null distributions of Mardia s statistics In this section we recall the wellknown results concerning the limiting null distributions of the Mardia s skewness kurtosis statistics. As, due to affine invariance, the distributions of the test statistics do not depend on the unknown µ Σ, it is not a restriction to assume in the following derivations that x 1,..., x n is a rom sample from the N p (0, I p ) distribution. Write z i = S 1/2 (x i x), i = 1,..., n,

3 On Mardia s Tests of Multinormality 3 for the data vectors stardized in the classical way. Mardia s skewness statistic can be decomposed as b 1,p = 6 j<k<l[ave i {z ij z ik z il }] i {zijz j k[ave 2 ik }] 2 + [ave i {zij}] 3 2 j the Mardia s kurtosis statistic is similarly b 2,p = j k ave i {z 2 ijz 2 ik} + j ave i {z 4 ij}. See e.g. Koziol (1993). Under multinormality b 1,p b 2,p are affine invariant, all the stardized third fourth moments are asymptotically normal asymptotically independent consequently the limiting distributions of n b 1,p 6 n b 2,p p(p + 2) 8p(p + 2) are a chi-square distribution with p(p+1)(p+2)/6 degrees of freedom a N(0, 1) distribution, respectively. 3. Location scatter estimates In Mardia s test construction, the data vectors are stardized using the regular sample mean vector x sample covariance matrix S. In this paper we thus consider what happens if these are replaced by other n-consistent affine equivariant location scatter estimates of µ Σ, say, ˆµ ˆΣ. Next we list assumptions useful notations for location vector estimate ˆµ scatter matrix estimate ˆΣ. First we assume that, for the N p (0, I p ) distribution, the influence functions of these location scatter functionals at z are γ(r) u α(r) uu T β(r) I p, respectively, where r = z u = z 1 z. We will later see that the limiting distributions of the modified test statistics depend on the used location scatter estimates through these functions. For the inluence functions of location scatter functionals, see also Hampel et al. (1986), Croux Haesbroek (2000) Ollila et al. (2003a,b). We further assume that, again under the N p (0, I p ) distribution, t = n ave{γ(r i )u i } = n ˆµ + o P (1) C = n ave{α(r i )u i u T i β(r i )I p } = n (ˆΣ I p ) + o P (1). Finally, assume that, for the N p (0, I p ) distribution, the expected values E(γ 2 (r)), E(α 2 (r)) E(β 2 (r)) are finite. These assumptions imply that the limiting distributions of nˆµ n vec(ˆσ I p ) are p- p 2 -variate normal distributions

4 4 Kankainen, A., Taskinen, S., Oja, H. with zero mean vectors covariance matrices E[γ 2 (r)] I p p E[α 2 (r)] p(p + 2) [I p 2+I p,p+vec(i p )vec(i p ) T ]+ (E[β 2 (r)] + 2p ) E[α(r)β(r)] vec(i p )vec(i p ) T, respectively. (Here I p,p is the so called commutation matrix. See e.g. Ollila et al. (2003b).) 4. Limiting distribution of the stardized third fourth moments Consider now any estimators ˆµ ˆΣ write z i = ˆΣ 1/2 (x i ˆµ), i = 1,..., n, for the stardized observations. Denote the stardized third fourth moments by U jkl = n ave i {z ij z ik z il }, V jk = n ave i {z 2 ijz 2 ik 1} V jj = n ave i {z 4 ij 3}. The limiting distributions of the stardized third fourth moments are given by the following theorem. The results easily follow from Lemmas stated proven in the Appendix. Theorem 4.1. Under multinormality, the limiting distribution of the vector of all possible third moments U jkl (j < k < l), U jjk (j k) U jjj is a multivariate normal distribution with zero mean vector limiting variances covariances where V ar(u jkl ) = 1 V ar(u jjk ) = 2 + κ V ar(u jjj ) = 6 + 9κ Cov(U jjk, U llk ) = κ Cov(U jjk, U kkk ) = 3κ κ = 1 + E[γ2 (r)] p All the other covariances are zero. 2 E[r3 γ(r)] p(p + 2).

5 On Mardia s Tests of Multinormality 5 Theorem 4.2. Under multinormality, the limiting distribution of the vector of fourth moments V jk (j k) V jj is a multivariate normal distribution with zero mean vector limiting variances covariances V ar(v jk ) = 4 + τ 1 + 2τ 2 V ar(v jj ) = τ τ 2 Cov(V jk, V jl ) = τ 1 + τ 2 Cov(V jk, V ml ) = τ 1 Cov(V jj, V kk ) = 9τ 1 Cov(V jk, V jj ) = 3τ 1 + 6τ 2 Cov(V jk, V ll ) = 3τ 1 where τ 1 = 4 [ E[α 2 ] (r)] p(p + 2) 2E[α(r)β(r)] + E[β 2 (r)] E[α(r)r4 ] p p(p + 2)(p + 4) + E[β(r)r4 ] p(p + 2) τ 2 = E[α2 (r)] p(p + 2) 4 E[α(r)r 4 ] p(p + 2)(p + 4). We close this section with some discussion on the implications of the above results. First note that, if the regular mean vector sample covariance matrix are used in the stardization, then simply which gives γ(r) = r, α(r) = r 2 β(r) = 1 κ = τ 1 = τ 2 = 0 all the third moments fourth moments are asymptotically mutually independent, respectively. For other location scatter statistics this is not necessarily true also the limiting variances may vary. This means that the limiting distributions efficiencies of b 1,p b 2,p may drastically change. The limiting distribution of nb 1,p /6 is a weighted sum of chi-square variables with one degree of freedom which makes the calculation of the approximate p-value less practical. The limiting distribution of n(b 2,p p(p + 2)) is still normal with mean value zero, but its limiting variance depends on the chosen scatter matrix. In the last two section we illustrate the impact of the choice of the location vector scatter matrix estimate on the limiting distribution efficiency in the bivariate case. The extension to the general p-variate case is straightforward. 5. Tests for bivariate normality Assume now that x 1,..., x n is a rom sample from an unknown bivariate distribution we wish to test the null hypothesis H 0 of bivariate normality. For

6 6 Kankainen, A., Taskinen, S., Oja, H. limiting power considerations we use the contaminated bivariate normal distribution, the so called Tukey distribution: We say that the distribution of x is a bivariate Tukey distribution T (, µ, σ) if the pdf of x is f(x) = (1 )φ(x) + σ 2 φ((x µ)/σ), where φ(x) is the pdf of N 2 (0, I 2 ) distribution [0, 1]. Alternative sequences H n 1 : T (, µ, 1) H n 2 : T (, 0, σ) with = δ/ n, r = µ > 0 σ > 1 are then used for the comparisons of the skewness kurtosis test statistics, respectively. Write now T 1 = (U 112, U 221, U 111, U 222 ) T T 2 = (V 11, V 22, V 12 ) T for the vectors of the stardized third fourth moments. Using the results in Section 4, one directly gets the following results. Corollary 5.1. Under H 0, the limiting distribution of nt 1 is a 4-variate normal distribution with zero mean vector covariance matrix Ω 1 = κ Corollary 5.2. Under H 0, the limiting distribution of nt 2 is a 3-variate normal distribution with zero mean vector covariance matrix Ω 2 = τ τ It is now possible to construct the limiting distributions of the affine invariant test statistics b 1,2 b 2,2. The comparison of the effect of different stardizations is possible but not easy, however, as the limiting distributions may be of different type. This is avoided if one uses the following test statistic versions: Definition 5.3. Modified Mardia s skewness kurtosis statistics in the bivariate case are b 1,2 = T 1 T Ω 1 1 T 1 b 2,2 = (1, 1, 2)T 2. Note that, if the sample mean covariance matrix are used, the definition gives b 1,2 = b 1,2 /6 b 2,2 = b 2,2 8, respectively. In the former case, with general ˆµ ˆΣ, the affine equivariance property is lost. The alternative to the regular sample mean vector sample covariance matrix used in this example is the affine equivariant Oja median (Oja, 1983) the related scatter matrix estimate based on the Oja sign covariance matrix (SCM)..

7 On Mardia s Tests of Multinormality 7 See Visuri et al. (2000) Ollila et al. (2003b) for the SCM. Note also that the scatter matrix estimate based on the Oja SCM is asymptotically equivalent with the zonoid covariance matrix (ZCM) can be seen as an affine equivariant extension on mean deviation. For the definition use of the ZCM, see Koshevoy et al (2003). We chose these statistics as their influence functions are relatively simple at the N 2 (0, I 2 ) case, the influence functions are given by [ ] 2 2 γ(r) = 2 π, α(r) = 2 2 π r 1 β(r) = 1. The constants needed for the asymptotic variances covariances are then κ = 4 π 1 2, τ 1 = 32 π 29 τ 2 = 16 3 π Consider first the alternative contiguous sequence H1 n with µ = (r, 0,..., 0) T. Using classical LeCam s third lemma one then easily sees that, under the alternative sequence H1 n, the limiting distribution of nb 1,2 is a noncentral chi-square distribution with 4 degrees of freedom noncentrality parameter δ T 1 Ω 1 1 δ 1, where δ 1 depends on the location estimate used in the stardization. In the regular case (sample mean vector), the Oja median gives δ 1 = (0, 0, r 3, 0) T δ 1 = (0, r q(r)2 2/π, r 3 + 3r q(r)6 2/π, 0) T. The function q(r) = R(µ) is easily computable defined through the so called spatial rank function R(µ) = E( µ x 1 (µ x)) with x N 2 (0, I 2 ). See Möttönen Oja (1995). Next we consider the sequence H2 n.then V ar(b 2,2) = (1, 1, 2) Ω under H2 n the limiting distribution of n(b 2,2) 2 /V ar(b 2,2) is a noncentral chisquare distribution with one degree of freedom noncentrality parameter [(1, 1, 2)δ 2 ] 2 /V ar(b 2,2), where δ 2 in this case depends on the scatter estimate used in the stardization. In the regular case (sample covariance matrix) we get the ZCM gives δ 2 = (3(σ 2 1) 2, 3(σ 2 1) 2, (σ 2 1) 2 ) T δ 2 = (3(σ 4 1) 12(σ 1), 3(σ 4 1) 12(σ 1), (σ 4 1) 4(σ 1)) T In Figure 1 the asymptotical relative efficiencies of the new tests (using the Oja median the ZCM estimate) relative to the regular tests (using the sample mean sample covariance matrix) are illustrated for different values of r

8 8 Kankainen, A., Taskinen, S., Oja, H. for different values of σ. The AREs are then simply the ratios of the noncentrality parameters. The new tests seem to perform very well: The test statistics using robust stardization seem to perform better than the tests with regular location scatter estimates. ARE ARE r σ Figure 1. Limiting efficiencies of the tests based on the Oja median the ZCM relative to the regular Mardia s tests for different values of r σ. The left (right) panel illustrates the efficiency of b 1,2 (b 2,2). 6. A real data example In our real data example we consider two bivariate data sets of n = 50 observations the combined data set (n = 100). The data sets are shown explained in Figure 2. We wish to test the null hypothesis of bivariate normality. Again the the skewness kurtosis statistics to be compared are those based on the sample mean vector sample covariance matrix (regular Mardia s b 1,2 b 2,2 ) those based on the Oja median the ZCM (denoted by b 1,2 b 2,2). The values of the test statistics the approximate p-values were calculated for the three data sets ( Boys, Girls, Combined ). See Table 1 for the results. As expected, the observed p-values were much smaller in the combined data case. As discussed in the introduction, the statistics b 1,2 b 2,2 compare the third fourth central moments to the second ones (given by the sample covariance matrix). The statistics b 1,2 b 2,2 use robust location vector scatter matrix estimates in this comparison. One can then expect that, for data sets with outliers or for data sets coming from heavy tailed distributions, the comparison to robust estimates tends to produce higher values of the test statistics smaller p-values.

9 On Mardia s Tests of Multinormality 9 Weights at birth (kg) Heights of the mothers (cm) Figure 2. The height of the mother the birth weight of the child measured on 50 boys who have siblings (o) 50 girls (x) who do not have siblings. Table 1. The observed values of the test statistics with corresponding p-values for the three data sets. Boys Girls Combined Test statistic p-value Test statistic p-value Test statistic p-value b 1, b 1, b 2, b 2, In our examples for the analyses of bivariate data we used the Oja median the ZCM to stardize the data vectors. In the general p-variate case (with large p large n) these estimates are, however, computationally highly intensive, should be replaced by more practical robust location scatter estimates.

10 10 Kankainen, A., Taskinen, S., Oja, H. 7. Appendix: Auxiliary lemmas The limiting distributions of the stardized third fourth moments are given by the following linear approximations. Lemma 7.1. Let x 1,..., x n be a rom sample from N(0, I p )-distribution let z i = ˆΣ 1/2 (x i ˆµ), i = 1,..., n, be the stardized observations. Write also r i = x i u i = x i 1 x i. Then n avei {z ij z ik z il } = n ave i {x ij x ik x il } + o P (1) = n ave i {r 3 i u ij u ik u il } + o P (1) for j < k < l, n avei {z 2 ijz ik } = n ave i {x 2 ijx ik } t k + o P (1) = n ave i {r 3 i u 2 iju ik γ(r i )u ik } + o P (1) for j k n avei {z 3 ij} = n ave i {x 3 ij} 3t j + o P (1) = n ave i {r 3 i u 3 ij 3γ(r i )u ij } + o P (1). Lemma 7.2. Let x 1,..., x n be a rom sample from N(0, I p )-distribution let z i = ˆΣ 1/2 (x i ˆµ), i = 1,..., n, be the stardized observations. Again, r i = x i u i = x i 1 x i. Then for j k, n avei {z 2 ijz 2 ik 1} = n ave i {x 2 ijx 2 ik 1} C jj C kk + o P (1) = n ave i {r 4 i u 2 iju 2 ik 1 α(r i )[u 2 ij + u 2 ik] + 2β(r i )} + o P (1) n avei {z 4 ij 3} = n ave i {x 4 ij 3} 6C jj + o P (1) = n ave i {r 4 i u 4 ij 3 6α(r i )u 2 ij + 6β(r i )} + o P (1). Proofs of the lemmas: Note first that Thus z i = ˆΣ 1/2 (x i ˆµ) = x i 1 n t 1 2 n Cx i + o(1/ n). z ij = x ij 1 n t j 1 2 n z 2 ij = x 2 ij 2 n t j x ij 1 n z 3 ij = x 3 ij 3 n t j x 2 ij 3 2 n p C jr x ir + o(1/ n), r=1 p C jr x ir x ij + o(1/ n), r=1 p C jr x ir x 2 ij + o(1/ n) r=1

11 On Mardia s Tests of Multinormality 11 z 4 ij = x 4 ij 4 n t j x 3 ij 2 n p C jr x ir x 3 ij + o(1/ n). As the observations come from the N p (0, I p ) distribution (all the moments exist), the stard law of large numbers central limit theorem together with Slutsky s lemma yield the result. r=1 Acknowledgements We thank the referees for the critical constructive comments. Finally, we wish to mention that, after writing the paper, we learned about the paper Croux et al. (2003) with a very similar but more general approach. Croux his cowriters study the behaviour of the infromation matrix (IM) test when the maximum likelihood estimates are replaced by robust estimates. One of their examples is the modified Jarque-Bera test for univariate skewness kurtosis. Their conclusion also is that the use of the robust estimates icreases the power of the IM test. References [1] A. Bera S. John, Tests for multivariate normality with Pearson alternatives. Commun. Statist. Theor. Meth. 12(1) (1983), [2] C. Croux, G. Dhaene D. Hoorelbeke, Testing the information matrix equality with robust estimators. Submitted. (Website address: [3] C. Croux G. Haesbroeck, Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions efficiencies. Biometrika 87 (2000), [4] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, P. J. W.A. Stahel, Robust Statistics: The Approach Based on Influence Functions. Wiley, [5] G. Koshevoy, J. Mttnen H. Oja, A scatter matrix estimate based on the zonotope. Ann. Statist. 31 (2003), [6] J. A. Koziol, Probability plots for assessing multivariate normality. Statistician 42 (1993), [7] K. V. Mardia, Measures of multivariate skewness kurtosis with applications. Biometrika 57 (1970), [8] K. V. Mardia, Applications of some measures of multivariate skewness kurtosis in testing normality robustness studies. Sankhyã, Ser B 36(2) (1974), [9] K. V. Mardia, Tests of univariate multivariate normality. In: P.R. Krishnaiah (ed.), Hbook of Statistics, vol 1, , North Holl, [10] J. Möttönen H. Oja, Multivariate spatial sign rank methods. J. Nonparam. Statist. 5 (1995), [11] H. Oja, Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1 (1983),

12 12 Kankainen, A., Taskinen, S., Oja, H. [12] E. Ollila, T.P. Hettmansperger H. Oja, Affine equivariant multivariate sign methods. (2003a) Under revision. [13] E. Ollila, H. Oja C. Croux, The affine equivariant sign covariance matrix: Asymptotic behavior efficiencies. J. Multiv.Analysis (2003b), in print. [14] S. Visuri, V. Koivunen, H. Oja, Sign rank covariance matrices. J. Statist. Plann. Inference 91 (2000), University of Jyväskylä, Department of Mathematics Statistics, P.O. Box 35 (MaD), FIN University of Jyväskylä, Finl address: University of Jyväskylä, Department of Mathematics Statistics, P.O. Box 35 (MaD), FIN University of Jyväskylä, Finl address: University of Jyväskylä, Department of Mathematics Statistics, P.O. Box 35 (MaD), FIN University of Jyväskylä, Finl address:

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Convex Hull Probability Depth: first results

Convex Hull Probability Depth: first results Conve Hull Probability Depth: first results Giovanni C. Porzio and Giancarlo Ragozini Abstract In this work, we present a new depth function, the conve hull probability depth, that is based on the conve

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

A MULTIVARIATE OUTLIER DETECTION METHOD

A MULTIVARIATE OUTLIER DETECTION METHOD A MULTIVARIATE OUTLIER DETECTION METHOD P. Filzmoser Department of Statistics and Probability Theory Vienna, AUSTRIA e-mail: P.Filzmoser@tuwien.ac.at Abstract A method for the detection of multivariate

More information

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS

OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS ONDERZOEKSRAPPORT NR 8904 OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS BY M. VANDEBROEK & J. DHAENE D/1989/2376/5 1 IN A OPTIMAl PREMIUM CONTROl NON-liFE INSURANCE BUSINESS By Martina Vandebroek

More information

3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

More information

The Hadamard Product

The Hadamard Product The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would

More information

Robust Estimation of the Correlation Coefficient: An Attempt of Survey

Robust Estimation of the Correlation Coefficient: An Attempt of Survey AUSTRIAN JOURNAL OF STATISTICS Volume 40 (2011), Number 1 & 2, 147 156 Robust Estimation of the Correlation Coefficient: An Attempt of Survey Georgy Shevlyakov and Pavel Smirnov St. Petersburg State Polytechnic

More information

Probability and Statistics

Probability and Statistics CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

More information

Financial Time Series Analysis (FTSA) Lecture 1: Introduction

Financial Time Series Analysis (FTSA) Lecture 1: Introduction Financial Time Series Analysis (FTSA) Lecture 1: Introduction Brief History of Time Series Analysis Statistical analysis of time series data (Yule, 1927) v/s forecasting (even longer). Forecasting is often

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

Uncertainty quantification for the family-wise error rate in multivariate copula models

Uncertainty quantification for the family-wise error rate in multivariate copula models Uncertainty quantification for the family-wise error rate in multivariate copula models Thorsten Dickhaus (joint work with Taras Bodnar, Jakob Gierl and Jens Stange) University of Bremen Institute for

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

Mathematics within the Psychology Curriculum

Mathematics within the Psychology Curriculum Mathematics within the Psychology Curriculum Statistical Theory and Data Handling Statistical theory and data handling as studied on the GCSE Mathematics syllabus You may have learnt about statistics and

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

On UMVU Estimator of the Generalized Variance for Natural Exponential Families

On UMVU Estimator of the Generalized Variance for Natural Exponential Families Monografías del Seminario Matemático García de Galdeano. 27: 353 360, (2003). On UMVU Estimator of the Generalized Variance for Natural Exponential Families Célestin C. Kokonendji Université de Pau et

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

More information

An empirical goodness-of-fit test for multivariate distributions

An empirical goodness-of-fit test for multivariate distributions An empirical goodness-of-fit test for multivariate distributions Michael P. McAssey Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Abstract An empirical test is presented

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES Forecasting General Insurance Liabilities by Piet de Jong piet.dejong@mq.edu.au Research Paper No. 2004/03 February 2004 Division of Economic and Financial

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. Robust Tests for the Equality of Variances Author(s): Morton B. Brown and Alan B. Forsythe Source: Journal of the American Statistical Association, Vol. 69, No. 346 (Jun., 1974), pp. 364-367 Published

More information

9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test. Neyman-Pearson lemma 9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

More information

Estimating an ARMA Process

Estimating an ARMA Process Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

More information

Exploratory Data Analysis as an Efficient Tool for Statistical Analysis: A Case Study From Analysis of Experiments

Exploratory Data Analysis as an Efficient Tool for Statistical Analysis: A Case Study From Analysis of Experiments Developments in Applied Statistics Anuška Ferligoj and Andrej Mrvar (Editors) Metodološki zvezki, 9, Ljubljana: FDV, 003 Exploratory Data Analysis as an Efficient Tool for Statistical Analysis: A Case

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Mälardalen University

Mälardalen University Mälardalen University http://www.mdh.se 1/38 Value at Risk and its estimation Anatoliy A. Malyarenko Department of Mathematics & Physics Mälardalen University SE-72 123 Västerås,Sweden email: amo@mdh.se

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

Statistical Foundations: Measures of Location and Central Tendency and Summation and Expectation

Statistical Foundations: Measures of Location and Central Tendency and Summation and Expectation Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #4-9/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location

More information

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal

More information

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

PÓLYA URN MODELS UNDER GENERAL REPLACEMENT SCHEMES

PÓLYA URN MODELS UNDER GENERAL REPLACEMENT SCHEMES J. Japan Statist. Soc. Vol. 31 No. 2 2001 193 205 PÓLYA URN MODELS UNDER GENERAL REPLACEMENT SCHEMES Kiyoshi Inoue* and Sigeo Aki* In this paper, we consider a Pólya urn model containing balls of m different

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

A repeated measures concordance correlation coefficient

A repeated measures concordance correlation coefficient A repeated measures concordance correlation coefficient Presented by Yan Ma July 20,2007 1 The CCC measures agreement between two methods or time points by measuring the variation of their linear relationship

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

RECENT advances in digital technology have led to a

RECENT advances in digital technology have led to a Robust, scalable and fast bootstrap method for analyzing large scale data Shahab Basiri, Esa Ollila, Member, IEEE, and Visa Koivunen, Fellow, IEEE arxiv:542382v2 [statme] 2 Apr 25 Abstract In this paper

More information

Multiple group discriminant analysis: Robustness and error rate

Multiple group discriminant analysis: Robustness and error rate Institut f. Statistik u. Wahrscheinlichkeitstheorie Multiple group discriminant analysis: Robustness and error rate P. Filzmoser, K. Joossens, and C. Croux Forschungsbericht CS-006- Jänner 006 040 Wien,

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: Profile Analysis Introduction Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: ) Comparing the same dependent variables

More information

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form. One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Introduction to Multivariate Analysis

Introduction to Multivariate Analysis Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Derivative Approximation by Finite Differences

Derivative Approximation by Finite Differences Derivative Approximation by Finite Differences David Eberly Geometric Tools, LLC http://wwwgeometrictoolscom/ Copyright c 998-26 All Rights Reserved Created: May 3, 2 Last Modified: April 25, 25 Contents

More information

Vector Time Series Model Representations and Analysis with XploRe

Vector Time Series Model Representations and Analysis with XploRe 0-1 Vector Time Series Model Representations and Analysis with plore Julius Mungo CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin mungo@wiwi.hu-berlin.de plore MulTi Motivation

More information

Identification of noisy variables for nonmetric and symbolic data in cluster analysis

Identification of noisy variables for nonmetric and symbolic data in cluster analysis Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

The degrees of freedom of the Lasso in underdetermined linear regression models

The degrees of freedom of the Lasso in underdetermined linear regression models The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS

COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS Bull Austral Math Soc 77 (2008), 31 36 doi: 101017/S0004972708000038 COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS IGOR V EROVENKO and B SURY (Received 12 April 2007) Abstract We compute

More information

Description. Textbook. Grading. Objective

Description. Textbook. Grading. Objective EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

State Space Time Series Analysis

State Space Time Series Analysis State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information