14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed varable s squared, t s sad to follow a ch-squared dstrbuton wth one degree of freedom, denoted by χ (1). In general, the sum of the squares of n ndependent standardsed normal varables follows a ch-squared dstrbuton wth n degrees of freedom. The shape of the ch-squared dstrbuton s a functon of ts number of degrees of freedom, denoted by Greek letter ν (nu). The followng dagram gves us the shapes for ν = 1, ν = and ν 3. f (x) ν =1 ν = ν 3 (generally) Fg. 14.1 x 14.1 The ch-squared test for ndependence The ch-squared dstrbuton s very often used n hypothess testng, especally when establshng whether two factors are ndependent or assocated. Ths s done by buldng a contngency table nvolvng the attrbutes of these factors. Ths type of testng s dfferent from our usual way of testng for a populaton parameter (mean or proporton) n the sense that the statements of the null and alternatve hypotheses are very straghtforward: H 0 : the factors are ndependent H : the factors are not ndependent (or assocated) 1
Note Independence s always assumed n the null hypothess and, obvously, the factors have to be stated n the context of the problem. The problem s normally presented as a table of observed frequences of the factors n terms of ther attrbutes (subfactors). We are requred to test whether there s a sgnfcant dfference between these observed frequences and ther theoretcal (expected) counterparts. Snce we always assume that the null hypothess s true n any testng procedure, we therefore calculate the expected frequences on the assumpton that the factors are ndependent. These expected frequences are thus computed accordng to probablty theory whereby, f two varables (factors) A and B are ndependent, P ( A B) = P( A) P( B). It s then a matter of performng a test-statstc to fnd the sgnfcance of the dfference. Ths s known as the ch-squared test-statstc and s defned as χ n = = 1 ( O E ) where n s the number of frequency cells n the contngency table, O and E are the th observed and expected frequences respectvely. The test-statstc value wll be compared wth a crtcal ch-squared value (from a table of ch-squared values) before decdng whether the null hypothess wll be eventually accepted or rejected - the sgnfcance level of the test s usually gven. Note In ths course, we assume that the shape of the ch-squared dstrbuton wll always resemble that for the case ν 3 (even though we may have ν =1 or ν = n the exams). Indeed, the above explanaton s confusng and meanngless wthout havng a look at an example for a better llustraton. E 14. Example The members of a sports team are nterested n whether the weather has an effect on ther results. They play 50 matches, wth the followng results. Weather Result Good Bad Total Wn 1 4 16 Draw 5 8 13 Lose 7 14 1 Total 4 6 50
Formulate sutable null and alternatve hypotheses, and use a χ test to test the clam, at the 1% sgnfcance level, that the weather has no effect on the team s results. State your concluson clearly. Soluton We start by formulatng our null and alternatve hypothess: H 0 : results are ndependent of the weather H : results are affected by the weather 1 The above contngency table s of order 3 (pronounced 3 by ) as shown by the shaded cells (wn, draw, lose) and (good, bad). Note The general formula for the number of degrees of freedom for an contngency table s ν = ( m 1) ( n 1). m n The expected frequences are computed, as mentoned above, by usng the multplcatve rule for ndependent events. Let us compute the expected frequency for the cell good and wn, whch has an observed frequency of 1. If a result s 16 selected at random, the probablty that t s a wn s 50 and the probablty that 4 the weather was good on that day s 50. Assumng that the null hypothess s true, that s, result and weather are ndependent, the probablty that a wn was obtaned when the weather was good 16 4 s the product of these two probabltes, whch s. The expected number of 50 50 16 16 4 such results out of 50 matches s therefore 4 50 =, whch, f well 50 50 50 observed, s the product of the margnal totals (n bold) over the grand total of 50. We can thus safely use the formula Product of margnal totals Expected frequency = Grand total However, there are two condtons to be satsfed: (1) All expected frequences must be at least 5 f not one or more cells should be merged. () The total observed and expected frequences must be equal.
Result Weather Good Bad Total Wn 1 4 16 7.68 8.3 Draw 5 8 13 6.4 6.76 Lose 7 14 1 10.08 10.9 Total 4 6 50 Table 14. We can calculate all the expected frequences (n bold above) usng the above formula but there s no need to. Careful observaton shows that, once we compute the expected frequences of the cells wth observed frequences 1 and 5, all the remanng expected frequences may be readly obtaned by makng use of the margnal totals snce they are constant. In fact, we need only two of them (not n the same row) to complete the table. That s why we wll use two degrees of freedom for the crtcal ch-squared value. In smple Englsh, two of these frequences are free but the remanng ones depend on the frst two and the margnal totals. Ths s confrmed by the formula for the number of degrees of freedom ν = ( 3 1) ( 1) = 1 =. (1 7.68) 7.68 We now calculate the statstc value usng (4 8.3) + 8.3 0.941 + 0.869 = 6.956 + + (14 10.9) 10.9 χ n = = 1 ( O E ) E, that s, =.43 +.43 + 0.46 + 0.7 + Now, at a 1% level of sgnfcance, the crtcal ch-squared value s 9.1034 (see table). The dagram below ndcates the crtcal regon (level of sgnfcance) and the crtcal ch-squared value.
f (x) Accept H 0 (Reject H 0 ) 0.01 x 6.956 9.1034 (crtcal value) (test-statstc value) Fg. 14.3 Snce 6.956 < 9.1034, we cannot reject H 0 ; we conclude that results and weather are ndependent. It s worth notng that the crtcal regon s always on the rght-hand sde of the curve (there s no such thng as two-taled tests!)