Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they apply more broadly... at the expese of a less powerful test (eedig more observatios to draw a coclusio with the same certaity). Let s thik about the media µ. Give a sample x 1,..., x draw radomly from a ukow cotiuous distributio, say we wat to test: H 0 : µ = µ 0 H 1 : µ > µ 0 For example, test whether the media household icome exceeds 5K. Sig Test Step 1 Cout the umber of x i s that exceed µ 0. Call this s +. Let s = = s +. Step Reject H 0 if s + is too large (or if s is too small). Why does this make sese? What if the true media µ is 1000 ad µ 0 is 1? How large should s + be i order to reject? To fid out, we eed to kow the distributio of the r.v. for s +. Call that r.v. S +. Let p = P (X i > µ 0) ad 1 p = P (X i < µ 0). Here s a helpful picture. Note that the distributio of the populatio is t ormal! 1
If you thik of: 1 if X i > µ 0 Y i = 0 otherwise as a Beroulli r.v. with parameter p, the S + is a sum of the Y i s. So S + is a sum of Beroulli s. So it s biomial! S + Bi(, p) ad S Bi(, 1 p). (1) Now, if H 0 is true, µ 0 is the true media ad p = 1/, so: S + Bi(, 1/) ad S Bi(, 1/). () So reject whe s + b,α, where b,α is the upper α critical poit for Bi(, 1/). (Or reject whe s b,1 α.) 1 That is, α =. i i=b,α Let s calculate the pvalue usig the biomial distributio: pvalue = P (S + s + ) = 1 i i=s + s ( ) 1 = P (S s ) =. i i=0
The step with the (*) is from symmetry of Bi(, 1/). As usual, reject if pvalue < α. (Also if is large, the biomial distributio ca be replaced with the ormal distributio ad we could use a z-test.) Example Ca you see ow why we eeded the assumptio of a cotiuous r.v.? (Thik about p uder the ull hypothesis.) Also we could rewrite the hypotheses: H 0 : p = 1/ H 1 : p > 1/. 3
Summary of Sig Test: Data & Assumptios: X 1,..., X ukow cotiuous distributio, o other assumptios! Test Statistic: S + = umber of observatios X i that exceed µ 0 (or s = s + ). Hypotheses Reject whe pvalue H 0 : µ µ 0 s + b,α P (S + s + ) = H 1 : µ > µ 0 H 0 : µ µ 0 s b,α P (S s ) = H 1 : µ < µ 0 H 0 : µ = µ 0 H 1 : µ = µ 0 s max b,α where s max := max(s +, s ) i=s max i=s + i=s ( ) i ( ) i ( ) i 1 1 1 4
Wilcoxo Siged Rak Test Let us add a assumptio i order to gai more power from the test. Namely, the assumptio that the distributio is symmetric. Symmetric meas that reflectio aroud the media yields the same thig. (The sig test did ot require this... remember, geerally more assumptios meas more coclusios.) The Wilcoxo Siged Rak Test looks at magitudes d i = X i µ 0. Also assume o ties: d i = 0 for ay i, ad o absolute ties d j = d j for ay i, j. H 0 : µ = µ 0 H 1 : µ > µ 0. Step 1 Rak the d i s. Let r i be the rak of d i. Here, r i = 1 for the smallest d i. Step Let w + = sum of raks of the positive d i s w = sum of raks of the egative d i s. ( ) ( + 1) So, w + + w = r 1 + r + + r = 1 + + + =. 5
Step 3 Reject H 0 if w + is too large (or if w is to small.) Example How large to reject? Our r.v. is W + which is a sum of raks. We ve ever see W + s distributio before, but tail probabilities for it are i Appedix A10 o page 683. As a aside: To make the distributio of W +, take all possible assigmets of sigs to the raks of d i s: i = 1 3 4 possible assigmets = = (Each assigmet gets a + or so there are possibilities of sigs for each rak.) For each assigmet, calculate w +. Sice assigmets are equally likely, we get a distributio over w + values. It ca be show that W + ad W have the same distributio. So call W = W + = W. The we ca use the table to get the pvalues: pvalue = P (W w + ) = P (W w ). Reject H 0 if pvalue α or if w + w,α. (For large, ca approximate ull distributio of W by a ormal distributio.) 6
Summary of Wilcoxo Siged Rak Test: Data & Assumptios: X 1,..., X ukow symmetric distributio Test Statistic: w + = sum of raks of positive d i s where d i = x i µ 0. Hypotheses Reject whe pvalue H 0 : µ µ 0 w + w,α P (W w + ) H 1 : µ > µ 0 H 0 : µ µ 0 w w,α P (W w ) H 1 : µ < µ 0 H 0 : µ = µ 0 H 1 : µ = µ 0 w max = max(w +, w ) w,α P (W w max ) Example cotiued Why do we eed the assumptio of a symmetric distributio? Importat**** There are may cases i which H 0 is rejected by the Wilcoxo Siged Rak Test but ot the Sig Test 7
Ifereces for Two Idepedet Samples (Rak Sum Test ad Ma-Whitey U Test We wat to kow whether observatios from oe populatio (give sample x 1,..., x 1 ) ted to be larger tha those from aother populatio (give y 1,..., y ). Mouse Data Example Let s make precise X larger tha Y. Give r.v. s X ad Y with cdf s F 1 ad F, X is stochastically larger tha Y (deoted X > Y ) if for all real umbers u, F 1 (u) F (u), i other words P (X u) P (Y u), with strict iequality for at least oe u. Deote F 1 < F to mea X > Y. Let us test: H 0 : F 1 = F H 1 : F 1 < F 8
Wilcoxo-Ma-Whitey U Test ad Wilcoxo Rak Sum Test ( equivalet tests) Wilcoxo Rak Sum Step 1 Rak all N = 1 + observatios i ascedig order (assume o ties) ' ' Step Sum the raks of the x s ad y s separately. Deote sums by w 1 ad w. Step 3 Reject H 0 if w 1 is large (or equivaletly if w is small). Example To do testig, we eed the distributio of W 1 (the radom variable for w 1 ) or W uder H 0 (soo). Ma-Whitey U Step 1 Compare each x i with each y j Step Let u 1 be the umber of pairs i which x i > y j. Let u be the umber of pairs i which x i < y j. Step 3 Reject H 0 if u 1 is large (or equivaletly if u is small). It is true that Demo of this fact 1 ( 1 + 1) ( + 1) u 1 = w 1 ad u = w. Sice u 1 ad w 1 are just a costat apart, the distributios of u 1 (r.v. U 1 ) ad w 1 (r.v. W 1 ) have the same shape: 9
The distributio of U 1 turs out to be symmetric about ( 1 )/ ad i fact, U has the same distributio as U 1. Tail probabilities for this distributio are i Table A.11. So we defie U := U 1 = U. So give x 1,..., x 1, y 1,..., y, to test: H 0 : F 1 = F H 1 : F 1 < F Steps 1 ad Compute u 1 = umber of pairs i which x i > y i. 1 ( 1 + 1) or u 1 = w 1 where remember that w 1 is the sum of raks of the x i s. Step 3 Reject H 0 whe u 1 u 1,,α (usig the table) or compute: pvalue = P (U u 1 ) = P (U u ), reject if it s less tha α. (If 1 ad are large, we ca approximate the distributio of U uder H 0 by a ormal distributio.) Example 10
MIT OpeCourseWare http://ocw.mit.edu 15.075J / ESD.07J Statistical Thikig ad Data Aalysis Fall 011 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.