MAT3378 (Winter 2016)

Size: px

Start display at page:

Download "MAT3378 (Winter 2016)"

Audrey Mills
7 years ago
Views:

1 MAT3378 (Winter 2016) Assignment 3 - SOLUTIONS The following questions will be marked: 2a),b),c), 3, 5 Total number of points for Assignment 3: 13 Q1. (Normality) (1) Simulate data from a normal distribution. Plot boxplot, histogram and qq-plot. Apply Jarque-Bera Test by hand as in R-3.html. Confirm your calculations by using the function jarqueberatest. Comment on your findings. (2) Simulate data from a non-normal distribution (pick your favourite distribution). Plot boxplot, histogram and qq-plot. Apply Jarque-Bera Test by hand as in R-3.html. Confirm your calculations by using the function jarqueberatest. Comment on your findings. Solution to Q1: (a) Z1=rnorm(1000); library(moments); # you need this package to calculate kurtosis and skewness; sk=skewness(z1); kurt=kurtosis(z1); n=length(z1); test.stat=sk^2*n/6+(kurt-3)^2*n/24; p.value=1-pchisq(test.stat,2); # chi-square distribution with 2 degrees of freedom; test.stat;p.value; #We should accept normality here (recall that the data come from the normal population); The p-value is The test does not reject normality (of course, you will have a different p-value!). Confirmation: library(fbasics); jarqueberatest(z1); Output: X-squared: Asymptotic p Value: (b) Z1=rexp(1000); library(fbasics); jarqueberatest(z1); Output: X-squared: Asymptotic p Value: < 2.2e-16 The normality is rejected, as it should be. Marking scheme for Q1: 1

2 2 This question will not be marked. Q2. (Equality of the variance. 3 points) (a) Simulate realizations of H distribution with (4, 20) degrees of freedom. Store them under the vector called H. Obtain 95% quantile. Compare with the appropriate entry in Table B.10. (b) Simulate 4 normal vectors of size 21 with mean zero and the same variances. Apply the Hartley test. Calculating the p-value using the vector H calculated in (a). (c) Simulate 4 normal vectors of size 21 with mean zero and different variances. Apply the Hartley test. Calculating the p-value using the vector H calculated in (a). (d) This part is not compulsory. In this part we study the performance of the Hartley test when the sample sizes are not equal - I had this question in class. (d-1) Simulate 4 normal vectors of size 21 with mean zero and the same variances. Calculate the value of the test statistics. Repeat it 1000 times and store the values of the test statistics in the vector HartleyTest. Use length(hartleytest[hartleytest>3.29])/1000 Solution to Q2: to get the rejection rate. The rejection rate should be around Simulate 4 normal vectors of sizes 21, 25, 29, 33 with mean zero and the same variances. Proceed as in (d-1). Simulate 4 normal vectors of sizes 21, 17, 13, 9 with mean zero and the same variances. Proceed as in (d-1). Simulate 4 normal vectors of sizes 21, 21, 17, 25 with mean zero and the same variances. Proceed as in (d-1). (a) H=NULL; no.of.rep=100000; n1=21;n2=21;n3=21;n4=21; #you need 4 samples of size 21; sigma1=1;sigma2=1;sigma3=1;sigma4=1; for(i in 1:no.of.rep) { pop1=rnorm(n1,0,sigma1); pop2=rnorm(n2,0,sigma2); pop3=rnorm(n3,0,sigma3); pop4=rnorm(n4,0,sigma4); var1=var(pop1);var2=var(pop2);var3=var(pop3);var4=var(pop4); test.stat=max(var1,var2,var3,var4)/min(var1,var2,var3,var4); H=c(H,test.stat) } As the result we obtain of size with realizations of the appropriate H distribution. We have quantile(h,0.95) 95% From the table B.10 we have the true value of the quantile: Note: you will not get the same quantile, but you should be close to it. (b) n1=21;n2=21;n3=21;n4=21; sigma1=1;sigma2=1;sigma3=1;sigma4=1; pop1=rnorm(n1,0,sigma1); pop2=rnorm(n2,0,sigma2); pop3=rnorm(n3,0,sigma3); pop4=rnorm(n4,0,sigma4); var1=var(pop1);var2=var(pop2);var3=var(pop3);var4=var(pop4);

3 3 test.stat=max(var1,var2,var3,var4)/min(var1,var2,var3,var4); The test statistics and the p-value are: test.stat; [1] length(h[h>test.stat])/no.of.rep [1] The test does not reject equality of the variances. Note: you will get a different test statistics and p-value. (b) n1=21;n2=21;n3=21;n4=21; sigma1=1;sigma2=2;sigma3=3;sigma4=3; pop1=rnorm(n1,0,sigma1); pop2=rnorm(n2,0,sigma2); pop3=rnorm(n3,0,sigma3); pop4=rnorm(n4,0,sigma4); var1=var(pop1);var2=var(pop2);var3=var(pop3);var4=var(pop4); test.stat=max(var1,var2,var3,var4)/min(var1,var2,var3,var4); The test statistics and the p-value are: test.stat; [1] length(h[h>test.stat])/no.of.rep [1] 1e-05 The test rejects equality of the variances. Note: you will get a different test statistics and p-value. Marking scheme for Q2: 1 point for each of the parts a), b), c). Total - 3 points. Q3. (3 points) Simulate data from a t-distribution with 4 degrees of freedom. Test for normality using the Jarque- Bera test (you can use R command directly). If the test rejects normality, apply a transformation to get normality. Confirm normality by apply the Jarque-Bera test again. Solution to Q3: Z1=rt(1000,4); library(fbasics); jarqueberatest(z1); Output: X-squared: Asymptotic p Value: < 2.2e-16 The normality is rejected. We apply ( Y ) transform. Z1.1=sqrt(abs(Z1)) par(mfrow=c(1,2)); qqnorm(z1); qqnorm(z1.1); jarqueberatest(z1.1); Normality is still rejected. Another transformation. Z1.2=(abs(Z1))^(1/3)

4 4 par(mfrow=c(1,2)); qqnorm(z1); qqnorm(z1.2); jarqueberatest(z1.2); The p-value is For α = 0.01 the normality is not rejected. Let s try another transformation. Z1.3=(abs(Z1))^(1/3.5) par(mfrow=c(1,2)); qqnorm(z1); qqnorm(z1.3); jarqueberatest(z1.3); The p-value is The normality is rejected. Marking scheme for Q3: Total 3 points. Maximal number of points if the test is performed for the original data and normality is accepted. Maximal number of points if the test is performed for the original data and normality is rejected as well as the transformation is applied to get normality. subtract 2 points if normality rejected for the original data and no transformation is applied. subtract 1 point if normality rejected for the original data, transformation is applied but not test. Q4. Simulate data from two normal populations with mean 0 and the variances 1 and 9. Apply the Brown-Forsythe test. If the test rejects equality of the variances, apply remedial measures. Test again for equality of the variances. Solution to Q4: pop1=rnorm(100,1); n1=length(pop1); pop2=rnorm(100,3); n2=length(pop1); data<-data.frame(values=c(pop1,pop2), Treatment=c(c(rep(1,n1)),c(rep(2,n2))) ) y<-data$values; x<-factor(data$treatment); levene.test(y,x,location="median"); #Brown-Forsythe test data: y Test Statistic = , p-value = No transformation is needed. Marking scheme for Q4: This question will not be marked. Q5. (7 points) Consider the SENIC data from Appendix C.1. The variables are explained on page We would like to know if the mean length of stay (variable 2) is the same in the four geographic regions (variable 9). (a) Write an appropriate ANOVA model for this study. (b) Produce side-by-side boxplots. What are your observations? Compare central tendencies of the length of stay between geographic regions. Do the within region variabilities appear to be the same?

5 5 (c) Examine by means of the Brown-Forsythe test whether or not the error variances are equal? What are your findings at α = 0.05? (d) Based on our simple guide to transformations, which transformation on the response would be the best? (e) Apply the transformation from (d) to the length of stay. Examine by means of the Brown-Forsythe test whether or not the error variances are equal? What are your findings at α = 0.05? (f) Verify that the distribution that the random error for the transformed response is normally distributed. What are your findings? (g) Assume that the ANOVA model is appropriate for the transformed response. Test wether or not the mean length of stay in the transformed units is the same in the four geographic regions. Give the p-value and your conclusion. Solution to Q5: I stored my data under the name senic. y<-senic$v2; x<-factor(senic$v9) (a) Y ij = µ i + ε ij, i = 1,..., 4, j = 1,..., 113. (b) par(mfrow=c(1,1)) boxplot(y~x); # See the bottom of the file for the graph There seem to be some differences between regions, especially the first and the last region. (c) y<-senic$v2; x<-factor(senic$v9) levene.test(y,x,location="median"); #Brown-Forsythe test data: y Test Statistic = , p-value = The equality of variances is rejected. (d) means=tapply(y,x,mean); stdev=tapply(y,x,sd); print(stdev^2/means); print(stdev/means); print(stdev/means^2); It seems that log(y ) or 1/Y should be suitable (second and third case). (e) For log(y ): logy=log(y); levene.test(logy,x,location="median"); #Brown-Forsythe test data: logy Test Statistic = , p-value = For 1/Y : yinv=1/y; levene.test(yinv,x,location="median"); #Brown-Forsythe test data: yinv Test Statistic = , p-value = 0.41 Both transformations lead to equality of the variances. (f) For log(y ): jarqueberatest(logy)

6 6 X-squared: Asymptotic p Value: 1.078e-09 For 1/Y : jarqueberatest(yinv) X-squared: Asymptotic p Value: Conclusion: 1/Y is the appropriate transform. (g) summary(aov(yinv~x)) Df Sum Sq Mean Sq F value Pr(>F) x e-08 *** Residuals Signif. codes: 0 *** ** 0.01 * Conclusion: means for the transformed variable 1/Y are different. Marking scheme for Q5: a) - 1 point; b) - 1 point, c) - 1 point, d) - 1 point, e) - 1 point, f) - 1 point, g) - 1 point. Total: 7 points

Two-way ANOVA and ANCOVA

Two-way ANOVA and ANCOVA In this tutorial we discuss fitting two-way analysis of variance (ANOVA), as well as, analysis of covariance (ANCOVA) models in R. As we fit these models using regression methods