Chapter 14 Nonparametric Statistics



Similar documents
Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

1. C. The formula for the confidence interval for a population mean is: x t, which was

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Properties of MLE: consistency, asymptotic normality. Fisher information.

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011


Hypothesis testing. Null and alternative hypotheses

One-sample test of proportions

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Maximum Likelihood Estimators.

5: Introduction to Estimation

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Overview of some probability distributions.

Sampling Distribution And Central Limit Theorem

Practice Problems for Test 3

Lesson 17 Pearson s Correlation Coefficient

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Chapter 7 Methods of Finding Estimators

PSYCHOLOGICAL STATISTICS

Lesson 15 ANOVA (analysis of variance)

Math C067 Sampling Distributions

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

1 Correlation and Regression Analysis

I. Chi-squared Distributions

1 Computing the Standard Deviation of Sample Means

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Asymptotic Growth of Functions

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Normal Distribution.

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

OMG! Excessive Texting Tied to Risky Teen Behaviors

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Determining the sample size

Output Analysis (2, Chapters 10 &11 Law)

Measures of Spread and Boxplots Discrete Math, Section 9.4

3. Greatest Common Divisor - Least Common Multiple

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Confidence intervals and hypothesis tests

Confidence Intervals

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Chapter 7: Confidence Interval and Sample Size

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Sequences and Series

Section 11.3: The Integral Test

Central Limit Theorem and Its Applications to Baseball

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Now here is the important step

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Statistical inference: example 1. Inferential Statistics

3 Basic Definitions of Probability Theory

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Convexity, Inequalities, and Norms

Topic 5: Confidence Intervals (Chapter 9)

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

A Recursive Formula for Moments of a Binomial Distribution

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Descriptive Statistics

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Chapter 5: Inner Product Spaces

Incremental calculation of weighted mean and variance

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

4.3. The Integral and Comparison Tests

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

CS103X: Discrete Structures Homework 4 Solutions

5 Boolean Decision Trees (February 11)

Exploratory Data Analysis

The Stable Marriage Problem

A Mathematical Perspective on Gambling

LECTURE 13: Cross-validation

Confidence Intervals for One Mean

Infinite Sequences and Series

Present Values, Investment Returns and Discount Rates

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

A probabilistic proof of a binomial identity

Quadrat Sampling in Population Ecology

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Elementary Theory of Russian Roulette

Transcription:

Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they apply more broadly... at the expese of a less powerful test (eedig more observatios to draw a coclusio with the same certaity). Let s thik about the media µ. Give a sample x 1,..., x draw radomly from a ukow cotiuous distributio, say we wat to test: H 0 : µ = µ 0 H 1 : µ > µ 0 For example, test whether the media household icome exceeds 5K. Sig Test Step 1 Cout the umber of x i s that exceed µ 0. Call this s +. Let s = = s +. Step Reject H 0 if s + is too large (or if s is too small). Why does this make sese? What if the true media µ is 1000 ad µ 0 is 1? How large should s + be i order to reject? To fid out, we eed to kow the distributio of the r.v. for s +. Call that r.v. S +. Let p = P (X i > µ 0) ad 1 p = P (X i < µ 0). Here s a helpful picture. Note that the distributio of the populatio is t ormal! 1

If you thik of: 1 if X i > µ 0 Y i = 0 otherwise as a Beroulli r.v. with parameter p, the S + is a sum of the Y i s. So S + is a sum of Beroulli s. So it s biomial! S + Bi(, p) ad S Bi(, 1 p). (1) Now, if H 0 is true, µ 0 is the true media ad p = 1/, so: S + Bi(, 1/) ad S Bi(, 1/). () So reject whe s + b,α, where b,α is the upper α critical poit for Bi(, 1/). (Or reject whe s b,1 α.) 1 That is, α =. i i=b,α Let s calculate the pvalue usig the biomial distributio: pvalue = P (S + s + ) = 1 i i=s + s ( ) 1 = P (S s ) =. i i=0

The step with the (*) is from symmetry of Bi(, 1/). As usual, reject if pvalue < α. (Also if is large, the biomial distributio ca be replaced with the ormal distributio ad we could use a z-test.) Example Ca you see ow why we eeded the assumptio of a cotiuous r.v.? (Thik about p uder the ull hypothesis.) Also we could rewrite the hypotheses: H 0 : p = 1/ H 1 : p > 1/. 3

Summary of Sig Test: Data & Assumptios: X 1,..., X ukow cotiuous distributio, o other assumptios! Test Statistic: S + = umber of observatios X i that exceed µ 0 (or s = s + ). Hypotheses Reject whe pvalue H 0 : µ µ 0 s + b,α P (S + s + ) = H 1 : µ > µ 0 H 0 : µ µ 0 s b,α P (S s ) = H 1 : µ < µ 0 H 0 : µ = µ 0 H 1 : µ = µ 0 s max b,α where s max := max(s +, s ) i=s max i=s + i=s ( ) i ( ) i ( ) i 1 1 1 4

Wilcoxo Siged Rak Test Let us add a assumptio i order to gai more power from the test. Namely, the assumptio that the distributio is symmetric. Symmetric meas that reflectio aroud the media yields the same thig. (The sig test did ot require this... remember, geerally more assumptios meas more coclusios.) The Wilcoxo Siged Rak Test looks at magitudes d i = X i µ 0. Also assume o ties: d i = 0 for ay i, ad o absolute ties d j = d j for ay i, j. H 0 : µ = µ 0 H 1 : µ > µ 0. Step 1 Rak the d i s. Let r i be the rak of d i. Here, r i = 1 for the smallest d i. Step Let w + = sum of raks of the positive d i s w = sum of raks of the egative d i s. ( ) ( + 1) So, w + + w = r 1 + r + + r = 1 + + + =. 5

Step 3 Reject H 0 if w + is too large (or if w is to small.) Example How large to reject? Our r.v. is W + which is a sum of raks. We ve ever see W + s distributio before, but tail probabilities for it are i Appedix A10 o page 683. As a aside: To make the distributio of W +, take all possible assigmets of sigs to the raks of d i s: i = 1 3 4 possible assigmets = = (Each assigmet gets a + or so there are possibilities of sigs for each rak.) For each assigmet, calculate w +. Sice assigmets are equally likely, we get a distributio over w + values. It ca be show that W + ad W have the same distributio. So call W = W + = W. The we ca use the table to get the pvalues: pvalue = P (W w + ) = P (W w ). Reject H 0 if pvalue α or if w + w,α. (For large, ca approximate ull distributio of W by a ormal distributio.) 6

Summary of Wilcoxo Siged Rak Test: Data & Assumptios: X 1,..., X ukow symmetric distributio Test Statistic: w + = sum of raks of positive d i s where d i = x i µ 0. Hypotheses Reject whe pvalue H 0 : µ µ 0 w + w,α P (W w + ) H 1 : µ > µ 0 H 0 : µ µ 0 w w,α P (W w ) H 1 : µ < µ 0 H 0 : µ = µ 0 H 1 : µ = µ 0 w max = max(w +, w ) w,α P (W w max ) Example cotiued Why do we eed the assumptio of a symmetric distributio? Importat**** There are may cases i which H 0 is rejected by the Wilcoxo Siged Rak Test but ot the Sig Test 7

Ifereces for Two Idepedet Samples (Rak Sum Test ad Ma-Whitey U Test We wat to kow whether observatios from oe populatio (give sample x 1,..., x 1 ) ted to be larger tha those from aother populatio (give y 1,..., y ). Mouse Data Example Let s make precise X larger tha Y. Give r.v. s X ad Y with cdf s F 1 ad F, X is stochastically larger tha Y (deoted X > Y ) if for all real umbers u, F 1 (u) F (u), i other words P (X u) P (Y u), with strict iequality for at least oe u. Deote F 1 < F to mea X > Y. Let us test: H 0 : F 1 = F H 1 : F 1 < F 8

Wilcoxo-Ma-Whitey U Test ad Wilcoxo Rak Sum Test ( equivalet tests) Wilcoxo Rak Sum Step 1 Rak all N = 1 + observatios i ascedig order (assume o ties) ' ' Step Sum the raks of the x s ad y s separately. Deote sums by w 1 ad w. Step 3 Reject H 0 if w 1 is large (or equivaletly if w is small). Example To do testig, we eed the distributio of W 1 (the radom variable for w 1 ) or W uder H 0 (soo). Ma-Whitey U Step 1 Compare each x i with each y j Step Let u 1 be the umber of pairs i which x i > y j. Let u be the umber of pairs i which x i < y j. Step 3 Reject H 0 if u 1 is large (or equivaletly if u is small). It is true that Demo of this fact 1 ( 1 + 1) ( + 1) u 1 = w 1 ad u = w. Sice u 1 ad w 1 are just a costat apart, the distributios of u 1 (r.v. U 1 ) ad w 1 (r.v. W 1 ) have the same shape: 9

The distributio of U 1 turs out to be symmetric about ( 1 )/ ad i fact, U has the same distributio as U 1. Tail probabilities for this distributio are i Table A.11. So we defie U := U 1 = U. So give x 1,..., x 1, y 1,..., y, to test: H 0 : F 1 = F H 1 : F 1 < F Steps 1 ad Compute u 1 = umber of pairs i which x i > y i. 1 ( 1 + 1) or u 1 = w 1 where remember that w 1 is the sum of raks of the x i s. Step 3 Reject H 0 whe u 1 u 1,,α (usig the table) or compute: pvalue = P (U u 1 ) = P (U u ), reject if it s less tha α. (If 1 ad are large, we ca approximate the distributio of U uder H 0 by a ormal distributio.) Example 10

MIT OpeCourseWare http://ocw.mit.edu 15.075J / ESD.07J Statistical Thikig ad Data Aalysis Fall 011 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.