Non Parametric Inference

Similar documents
. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Nonparametric Statistics

Statistical tests for SPSS

Tutorial 5: Hypothesis Testing

The Variability of P-Values. Summary

Study Guide for the Final Exam

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Basics of Statistical Machine Learning

Normality Testing in Excel

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

Descriptive Statistics

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Permutation Tests for Comparing Two Populations

Chapter G08 Nonparametric Statistics

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Chapter 4: Statistical Hypothesis Testing

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Dongfeng Li. Autumn 2010

HYPOTHESIS TESTING WITH SPSS:

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

From the help desk: Bootstrapped standard errors

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

12: Analysis of Variance. Introduction

Bandwidth Selection for Nonparametric Distribution Estimation

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Projects Involving Statistics (& SPSS)

Confidence Intervals for One Standard Deviation Using Standard Deviation

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Chapter 1 Introduction. 1.1 Introduction

Simple Linear Regression Inference

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Difference tests (2): nonparametric

Nonparametric adaptive age replacement with a one-cycle criterion

LOGNORMAL MODEL FOR STOCK PRICES

APPLIED MATHEMATICS ADVANCED LEVEL

Principle of Data Reduction

Nonparametric Tests for Randomness

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

NCSS Statistical Software

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

The Binomial Distribution

13: Additional ANOVA Topics. Post hoc Comparisons

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

p ˆ (sample mean and sample

4. Continuous Random Variables, the Pareto and Normal Distributions

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

160 CHAPTER 4. VECTOR SPACES

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Lecture Notes Module 1

Sample Size and Power in Clinical Trials

4 Lyapunov Stability Theory

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Poverty Indices: Checking for Robustness

Probability and Random Variables. Generation of random variables (r.v.)

Two-sample inference: Continuous data

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

START Selected Topics in Assurance

PS 271B: Quantitative Methods II. Lecture Notes

Lecture 8. Confidence intervals and the central limit theorem

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

MEASURES OF LOCATION AND SPREAD

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

3.4 Statistical inference for 2 populations based on two samples

Interpretation of Somers D under four simple models

Chapter 7 Section 7.1: Inference for the Mean of a Population

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

T-test & factor analysis

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Supplement to Call Centers with Delay Information: Models and Insights

II. DISTRIBUTIONS distribution normal distribution. standard scores

Descriptive Analysis

Non-Parametric Tests (I)

Fairfield Public Schools

The Use of Event Studies in Finance and Economics. Fall Gerald P. Dwyer, Jr.

STAT 830 Convergence in Distribution

5.1 Identifying the Target Parameter

Comparing Means in Two Populations

NCSS Statistical Software

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

An Internal Model for Operational Risk Computation

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Final Exam Practice Problem Answers

Nonparametric Statistics

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

A study on the bi-aspect procedure with location and scale parameters

Using Excel for inferential statistics

The Wilcoxon Rank-Sum Test

Lesson19: Comparing Predictive Accuracy of two Forecasts: Th. Diebold-Mariano Test

Lesson 20. Probability and Cumulative Distribution Functions

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Gambling and Data Compression

Testing for differences I exercises with SPSS

CALCULATIONS & STATISTICS

Transcription:

Maura Department of Economics and Finance Università Tor Vergata

Outline 1 2 3

Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable with cumulative distribution function (cdf) F (x). Let Y be defined such that Y = F 1 (U). Y has c.d.f. equal to F.

Inverse distribution function

Inverse distribution function

Why nonparametric statistics? While in many situations parametric assumptions are reasonable (e.g. assumption of Normal distribution for the background noise), we often have no prior knowledge of the underlying distributions. In such situations, the use of parametric statistics can give misleading or even wrong results. We need statistical procedures which are insensitive to the model assumptions in the sense that the procedures retain their properties in the neighborhood of the model assumptions.

What is the nonparametric inference? The basic idea of nonparametric inference is to use data to infer an unknown quantity while making as few assumptions as possible. Usually, this means using statistical models that are infinite-dimensional. Indeed, a better name for nonparametric inference might be infinite-dimensional inference. But it is difficult to give a precise definition of nonparametric inference. For the purposes of this course, we will use the phrase nonparametric inference to refer to a set of modern statistical methods that aim to keep the number of underlying assumptions as weak as possible.

What is the advantage of nonparametric statistics? The rapid and continuous development of nonparametric statistical procedures over the past six decades is due to the following advantages enjoyed by nonparametric techniques Require few assumptions about the underlying populations from which the data are obtained It enables the user to obtain exact p values for tests, exact coverage probabilities for confidence regions, and exact experimentwise error rates for multiple comparison procedures. easy to understand (often) Usually they are only slightly less efficient than their normal competitors when the underlying populations are normal, and they can be mildly or wildly more efficient than these competitors when the underlying populations are not normal. insensitive to outliers

What is the advantage of nonparametric statistics? Because many nonparametric approaches require just the ranks of the observations, rather than the actual magnitude of the observations, they are applicable in many situations where normal theory procedures cannot be utilized.

The empirical distribution function We will begin with the problem of estimating a CDF (cumulative distribution function) Suppose X F, where F (x) = P(X x) is a distribution function The empirical distribution function, ˆF, is the CDF that puts mass 1/n at each data point x i ˆF (x) = 1 n n I (x i x) i=1 where I is the indicator function

Properties of ˆF At any fixed value of x, E(ˆF (x)) = F (x) Var(ˆF (x)) = 1 nf (x)(1 F (x)) Note that these two facts imply that ˆF (x) P F (x) An even stronger proof of convergence is given by the Glivenko-Cantelli Theorem: sup x ˆF (x) F (x) a.s. 0

Non parametric test In order to be able to employ the test proposed below, we have to make the supplementary (but mild) assumption that F is continuous. Thus the hypothesis to be tested here is H 0 : F (x) = F 0 (x) a given continuous d.f., against the alternative H 0 : F (x) F 0 (x) (in the sense that F (x) F 0 (x) for at least one one x. Define the random variable D n as D n = sup x ˆF (x) F (x)

Kolmogorov test Idea: If the difference between the sample and the theoretical distribution functions is severe, the null hypothesis H 0 is rejected. Statistic: The probability distribution of D n is not one of the well-known models. Its probabilities are given in a specific table for small n, while an asymptotic result is applied for big n. Rule: Critical region of the form D n (x) k

Kolmogorov One-sample test In order for this determination to be possible, we would have to know the distribution of D n, under H 0, or of some known multiple of it. It has been shown in the literature that P( nd n x H 0 ) n ( 1) j e 2j2 x 2, x > 0 j= Thus for large n, the right-hand side of previous equation may be used for the purpose of determining critical region. The test employed above is known as the Kolmogorov one-sample test.

Kolmogorov-Smirnov Two sample test The testing hypothesis problem just described is of limited practical importance. What arise naturally in practice are problems of the following type: Let X i, i = 1,..., m be i.i.d. r.v. with continuous but unknown d.f. F and let Y j, j = 1,..., n be i.i.d. r.v. with continuous but unknown d.f. G. The two random samples are assumed to be independent and the hypothesis of interest here is H 0 : F = G. One possible alternative is the following: H 1 : F G (in the sense that F (x) G(x) for at least one x R).

Kolmogorov-Smirnov Two sample test

Kolmogorov-Smirnov Two sample test

Robustness Any statistical procedure should possess the following desirable features: It has reasonably relative efficiency under the assumed model It is robust in the sense that small deviations from the assumed model assumptions should impair the perfomance only slighly Somewhat larger deviations from the model should not a cause a catastrophe

Robustness In addition to the classical concept of efficiency, new concepts are introduced to de- scribe the local stability of a statistical procedure (the influence function and derived quantities) its global reliability or safety (the breakdown point).

Sample median x (1), x (2),..., x (n) denotes a sample in ascending order. Definition. The (sample or empirical) median denoted by Me, is given by { x( n+1 Me = 2 ) if n is odd x ( n 2 ) + x ( n 2 +1) if n is even