During this course we use two tests: a t-test and a chi-square test.

Statistical tests in Mlwin During this course we use two tests: a t-test and a chi-square test. T-test A t-test is based on a t-distribution with a certain number of degrees of freedom (notation: df). The test-statistic in this t-distribution is the estimate divided by its standard error, this is known as the t-value. It indicates how many standard errors the actual estimate differs from the value in the null hypothesis (notation: H 0 ). Mostly this value in H 0 is 0. Suppose the estimate is -0.037 while the standard error is 0.009. Then the t-value is -0.037 / 0.009 = -4.11. The probability to find this t-value (or a more extreme t-value than -4.11) while H 0 (estimate=0) is true, is extremely small. For instance in case df=200, this probability or p-value is 0,00003. If the level of significance (notation: α) of 0.05, H 0 is clearly refuted. Note that the p-value is calculated from one tail of the t-distribution, which is correct as long as we have directional hypotheses. In case we have non directional hypotheses p-values must be multiplied by 2. As you may recall, in SPSS very often p * 2 (twotailed test) is used. In Figure 1 all relevant information is summarized. One-tailed p-value = 0,00003 (black area) Sampling distribution (t-distributed, df=200) t = -4.11 (-0.37 / 0.009) b = -0.037 α H 0: b 0 = 0 Figure 1 Estimate (b=-0.037), t-distribution, t-value, p-valu (df=200) and α In MLwin, a t-test can be done with the use of t-prob To illustrate our point we use a multi level analysis with the number of weekly visits to best friends as dependent variable, and income, age, and gender as independent variables. The effect for income is negative (-0.037): as people earn more money they visit best friend less. The standard error of this estimate amounts to 0.009, so t-value is -4.11 (-0.037 / 0.009), cf. Figure 2. Figure 2 Outcomes (b-parameters) from a 2-level MLwiN analysis 1

For age we find an b-estimate of -0.009, meaning that as people grow older, they visit best friend less. In this respect we have to add that the sample consists of 4380 respondents who are between 65 and 85 years old. The t- value is -2.25 (-0.009/0.004). The variable gender is coded 0=female and 1=male, hence we labeled this variable male. As the estimate is positive (0.0173) males visit best friend more often than females (t-value = 0.173 / 0.048 = 3.6. Next, we have to open the command interface (click on Data manipulation in the MLwiN menu and click on Command Interface ). We have to type in the box tprob, t-value, and df. For income it reads: tprob -4.11 4377 (cf. Figure 3). Note that df = n level 1 p (where n level 1 = number of observations at first level (= 4380, cf. Figure 2) and p = number of x- variables, i.e., age, income, and male = 3) so df = 4377 (cf Figure 4). activate output screen Figure 3 tprob command in command interface After pressing [Enter] the p-value is calculated. You will find the outcome in the MLwiN output screen (see Figure 4). If it does not show automatically it can be activated by clicking the output button (see Figure 3). Figure 4 Output screen with p-value from tprob 2

Note that the p-value = 0.000020123 (2.0123e-005). To determine whether this is a significant outcome one must compare it to α. Suppose it is set to 0.01. Next it is of importance to know whether the alternative hypothesis is directional or non directional. Suppose we like to test whether a higher income leads to less visits. Because the p-value is from one tail of the t- distribution we can directly compare it to α. because p < α (0,00002 < 0.01) and because the effect is indeed negative, we have to refute H 0. In case the hypothesis is non directional we use p * 2, which is smaller than 0.01 as well. This testing is summarized with some old and new fashioned symbols: p tprob α H 0, H a (directional) p tprob * 2 α H 0, H a (non directional) Note that a t-test can be used for level 2 variables as well. The only thing that is different is degrees of freedom. This time df is calculated with n level2 p 1 (where n level 2 = number of observations at level 2, p=number of level 2 variables and 1 stands for the constant which is a level 2 variable). Note that cross-level interactions can also be tested with a t-test, again with df = n level2 p 1 as will be exemplified in the next section. Chi-square test A chi-square (notation: χ 2 ) test is based on a chi-square distribution with a certain number of degrees of freedom. The chi-square value in multilevel is derived from the comparison of two nested models. Each model has a certain -2 loglikelihood (as a measurement of general fit ). The difference between the two -2 loglikelihood figures is χ 2 distributed when the number of observations is sufficiently large (as a rule of thumb one may use n > 15). The null hypohesis (H 0 ) states that both models fit the data equally well, i.e., the difference in -2 loglikelihood is 0. Suppose that we run a multilevel model with a -2 loglikelihood of 1000. Next we add an extra parameter to the model and as a consequence the -2 loglikelihood drops to 990. The χ 2 - value then amounts to 10 (1000-990). The degrees of freedom equals 1 (i.e., the extra parameter!!). The p-value then is 0.00156. It again depends on the alternative hypotheses what to do next. If the alternative hypothesis (notation H a ) states that the extra parameter has an effect, so irrespective of the negative or positive sign of the effect, then this p-value is compared directly to α. The p-value has to be divided by 2 however, if a directional hypothesis is being tested. The reason is that the p-value is the result of both negative and positive estimates of the extra parameter. In both cases, the model fit will increase, hence χ 2 > 0) As the total number of negative estimates balances the total number of positive estimates, the p-value (0,00156) must be divided by 2 if one is interested in positive (or negative) outcomes of the extra parameter only. In Figure 5 all relevant information is summarized: χ 2 -distribution (df=1) p-value (grey area) = 0.00156 0 χ 2 -value = 10 Figure 5 χ 2 -distribution, χ 2 -value and p-value (test-statistic =10, df=1) 3

This way of testing, can also be summarized in symbols: p chi-square α H 0, H a (non directional) p chi-square / 2 α H 0, H a (directional) Chi-square tests in MLwiN can be used in at least two cases: level 2 variance test random slope test In both cases you may use the cprob command. It also can be typed in the command interface box. It has the same structure as tprob, namely cprob chisquare value and df. As an example we take chi-square 10 and df=1 (cf. Figure 6 and 7). Figure 6 Cprob command in command interface Figure 7 Output screen with p-value from cprob 4

Testing level 2 variance Figure 8 Level 2 variance set to zero Figure 9 Level 1 and level 2 variance estimations The -2 loglikelihood difference is 16830.9 16341.13 = 489.77 (p-value is 0.00000, df=1). In fact we have to divide this p-value by two because there are no negative variances possible! As the p-value is already very low we leave that out here and conclude that we have a significant amount of level 2 variance. Hence, the Intra Class Correlation is > 0! 5

Testing the variance of a random slope Figure 10 A random slope model The model in Figure 10, is an extension of the model in Figure 2. Note that two extra parameters have been added: the variance of the effect of male (0.022) and the co-variance of this effect and the constant (0.084). Also note that the -2 loglikelihood dropped from 16312.35 to 16300.11, a difference of 12.24. The df equals 2 (two extra parameters). With cprob 12.24 2 we find a p-value of 0.0021985. For similar reasons we have to divide this outcome by 2, so we get 0.00109925. This is a significant outcome for any standard α. This means that we safely can assume that the effect of male varies across level 2 units (i.e., countries). The effect of the variable male is different from the effect in Figure 2. In Figure 10, the effect turned into an estimated average effect within the population of countries, therefore the degrees of freedom are no longer based on level 1 units but on level 2 units. A t-test on cross level interaction Next, we may introduce a level 2 variable, for instance social security rates and a cross level interaction between male and social security rates. To test whether this interaction reaches significance, a t-test can be used with df = n level2 p 1 (where n level 2 = number of observations at level 2, p = number of level 2 variables and 1 stands for the constant which is a level 2 variable). In this case we have 13 countries, so df is 9 (calculation: 13 3 (= social security rates, male, and social security rates * male ) 1 (=constant). Note that the effect of male now turned into a very specific effect, namely the estimated average male effect in all countries in which social security is absent (=0). Apart from the fact that we do not have any countries with social security = 0 in our sample, it should be treated as an effect on the country level. To get a more realistic figure, one may subtract a logical value, like 10.85 (lowest score on social security rates) from the original social security rates and estimate the model again. The effect of male then is the estimated average effect for countries with social security at 10.85% of GDP. In Figure 11 we added social security rates ( socsec ) and the cross level interaction between male and socsec. 6

Figure 11 Cross level interaction between male and social security rates As the cross level interaction is -0.025, the difference between females and males decreases as social security rates increase. Main effect of male is 0.725 (where social security is = 0). With every unit increase of social security (i.e., 1% of GDP) this effect deceases with 0.025. If one wants to highlight the other side of the interaction, then the effect of socsec (-0.062 for females) is -0.025 higher for males (-0.062 + -0.025). This point of view we prefer in this case: for females the number of visits to best friends is less dependent on social welfare compared to males. We discuss this interesting finding during this course in assignment 3. To test whether the interaction is significant, we use tprob -2.5 9. P-value is 0.016 and directly comparable to α=0.05 in case the hypothesis reads that the effect of social security is stronger for males compared to females. For your information: part of the information on statistical testing in this summary is taken from Ben Pelzer s Handouts, other parts are taken from Statistiek als hulpmiddel / Statistical Tools. In case you are curious they are on http://www.vangorcum.nl/en_toonboek.asp?publid=4503 http://www.vangorcum.nl/nl_toonboek.asp?publid=4445 These are excellent books! Did I write excellent? Cheap I mean, (dirt) cheap! Regards, Manfred te Grotenhuis / Ben Pelzer, 2010 7