Lecture 2: Statistical Estimation and Testing

Transcription

1 Bioinformatics: In-depth PROBABILITY AND STATISTICS Spring Semester 2012 Lecture 2: Statistical Estimation and Testing Stefanie Muff 1

2 Problems in statistics 2

3 The three main questions in statistics are Estimation: estimate the unknown value of θ, given observations of X. Question: what is the most likely value for θ? Testing: test a hypothesis about the unknown value of θ. Base acceptance/rejection upon observation of X. Question: is my hypothesis compatible with the observed data? Confidence intervals: give an interval of parameter values that explain the data reasonably well. Question: which parameters would be compatible with my data? We will concentrate on the first two questions. 3

4 4

5 Given: a probability model X P θ For example: X Bin(100,p) but the probability p is unknown. How to obtain a guess of p? => Estimation! The collection x1, x2,..., xn is called (observed) sample of X1, X2,..., Xn. 5

6 Estimator, estimate 6

7 Examples of Estimators 7

8 Desirable properties of estimators 8

9 9

10 Likelihood function for discrete RVs 10

11 Likelihood function for continuous RVs The likelihood function for continuous random variables can be set equal to the density function L(x 1,x 2,..., x n ; ˆθ) =f X (x 1,x 2,...,x n ; ˆθ), whereas f X is the joint density of (X 1,X 2,..., X n ). If X 1,X 2,..., X n are independent L(x 1,x 2,..., x n ; ˆθ) =f X1 (x 1 ; ˆθ) f X2 (x 2 ; ˆθ)...f Xn (x n ; ˆθ). 11

12 Maximum likelihood estimator 12

13 Maximum likelihood estimate 13

14 Properties of MLEs 14

15 Example: ML for the binomial distribution 15

16 16

17 Compare this to the estimators on Slide 7: the ML estimator! is 17

18 The Log Likelihood 18

19 Likelihoods are not just for independent observations! 19

20 Example: Log likelihood for the binomial distribution Instead of optimizing The log likelihood x 1 log(θ) + (100 x 1 ) log(1 θ) x n log(θ) + (100 x n ) log(1 θ) = log(θ) x i + log(1 θ) (100 x i ) i i has to be optimized to obtain the ML estimator. The result is exactly the same as in the non-log case (check as an exercise). 20

21 Example: MLE for a normal distribution Remember: f(x, µ, σ 2 )= 1 (x µ)2 e 2σ 2 2πσ 2 Given a set of n independent observations x1, x2,...,xn.the log likelihood then is log(f(x 1,...,x n ; µ, σ 2 )) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 n (x i µ) 2 i=1 This expression has to be derived with respect to σ 2 and µ separately and be set to 0. => Obtain two equations to estimate two parameters. See example in the exercises. 21

22 MLE in practice Analytical formulas for the ML estimator can be found only in relatively simple models. In other cases, approximate ML estimators can be found by iterative numerical optimization (Expectation-Maximization algorithm, Newton- Raphson algorithm) second-order Taylor approximations. These calculations are left to the computer (R). 22

23 Statistical Testing 23

24 24

25 Introductory example revisited g a g g a t t a c g g t a c t a g a t t c a t a a a c a c t g a c a c a t c a c t g c a c t c g c t a a Two DNA sequences of length 26. Matches at 11 of 26 positions. Is this sufficient to conclude that the two sequences are evolutionarily related? In order to answer this question, we have to find out how unlikely it would be to see 11 out of 26 matches by chance. Need to know the probability distribution of the random variable describing this experiment. Can then calculate the probability of the event. This is the essence of statistical testing. 25

26 Steps in a statistical test 1. Formulate null and alternative hypotheses H0 and H1. 2. Determine a test statistic T. 3. Determine the distribution of T under H0. 4. Choose the significance level α. 5. Calculate the critical value C. 6. Obtain the data and decide. For illustration, we now go through steps 1-6 for the binomal test. 26

27 1. Formulate the hypotheses A hypothesis typically specifies a value in a distribution. Here: X Bin(26, p), but p is not known. The null hypothesis H0 is the default hypothesis: H 0 : X Bin(26,p), p=0.25 The alternative hypothesis H1 is the controversial hypothesis. Strong evidence is needed to accept it in favour of H0: H 1 : X Bin(26,p), p > 0.25 Aim of a test: to find evidence against H0 in order to reject it. 27

28 2. Determine a test statistic A test statistic T is a numerical value that can be determined from the outcome of a chance experiment. Note that, by definition, T is a random variable as well! Here, T = number of matches between the two sequences (= X) (There is only one realization) Usually there is more than one realization in a random sample, and the test statistic depends on all realizations: Other examples: T (X 1,...,X n )= X X n n T (X 1,..., X n )= (X µ 0) ˆσ/ n = X (mean) (T-statistic) 28

29 3. Distribution of T under H0 In case of H0 (pure chance alignment), the distribution of T is T Bin(26, 0.25) (Note that in reality Bin(26,p) is not the right distribution for this problem, we only use it to illustrate the idea of statistical testing.) 29

30 4. Choose the significance level α In our example we reject H0 if the number of matches is too high, so that it is unlikely to happen by chance. α determines what unlikely means. Let us choose α=0.05. The significance level α fixes the probability with which H0 is rejected, although it is true. Interpretation: In 5% of the cases (1 out of 20) we will find a value of T so high that we do not believe it has happened by chance - although it did! α = probability to reject a true null hypothesis = probability to make a type I error. 30

31 5. Calculate the critical value We now calculate a value C for the test statistic T, above which we consider it unlikely that H0 is true: P(T C H 0 )=α In our example with H0: T=X Bin(26,0.25) P(X 7 H 0 ) = P(X 8 H 0 ) = P(X 9 H 0 ) = P(X 10 H 0 ) = P(X 11 H 0 ) = => C = 11! 26 ( ) 26 (which is calculated as P(X k H 0 )= 0.25 i i ) i i=k 31

32 6. Decide Only now is it finally allowed to calculate the value of T. Here, we already know that T=11, since X=T. From step 5 we have the following rule: Reject H0 if T 11 and do not reject H0 if T < 11 Decision: we reject H0. Thus we do not believe that 11 out of 26 matches can happen by chance. We say: There is statistical evidence that the two sequences are related due to evolution. 32

33 Statistical significance Note: The decision to reject H0 on the previous slide depends on the significance level α. We would not have rejected H0 if α < 0.04! Whether the outcome of an experiment is statistically significant or not depends crucially on α! For α=1 any result is significant... (but meaningless). Scientific results that claim statistical significance without giving α should at least be doubted... 33

34 p-values The p-value is the probability to see something at least as extreme as just observed under H0. It depends on the data. In our example: P(X 11 H 0 ) = Thus the p-value of our experiment is p= Many statistics programmes (R, SPSS,...) compute directly this. Your results are then significant if p < α. Interpretation: The p-value tells you for which α your data would be significant. 34

35 Type I and type II errors The type I error depends on the significance level α. It is the probability to reject the null hypothesis, although it is true. The probability for a type I error is The type II error is the other kind of false decisions: it is the probability that the null hypothesis is not rejected, although it is wrong: 35

36 The power of a test The power is typically more complicated to compute, especially if H1 is unknown. 36

37 Example 37

38 BUT if we would have chosen α=0.01, the power (1-β) would be lower! E.g. 1 β = P(X 11 p =0.26) = β = P(X 11 p =0.3) =

39 Fact: The decrease of the type I error comes at the expense of an increased type II error - and vice versa. There is a compromise between a low significance level α and high power 1-β. 39

40 Bin(20,0.25) and Bin(20,0.3) distribution f(x) x Power if H0 : p =0.25, H1 : p =0.3 α =

41 Bin(20,0.25) and Bin(20,0.6) distribution f(x) Power if H0 : p =0.25, H1 : p =0.6 x α =

42 Bayesian Hypothesis Testing Remember: P(A j B) = P(B A j ) P(A j ) n i=1 P(B A i) P(A i ) Bayes theorem Example (from Ewans/Grant): A bag contains 10 coins, where only 3 of them are fair. The other 7 have a chance to show heads with ph=0.6. Take one coin at random and flip it five times. All five flips give heads (event D). Then: P(H)=0.3 (prior probability that coin is fair) P(H c )=0.7 (prior probability that coin is unfair) P(D H)=0.5 5 P(D H c )=

43 Now, the posterior probability that the coin was fair, given the outcome, can be calculated: P(H D) = = P(D H) P (H) P(D H) P (H)+P(D H c ) P (H c ) =0.147 This is lower than the prior distribution of H, so evidence against it. Moreover: P(H c D) = So there is a much higher posterior probability (given the outcome and the prior) that the coin I picked was unfair. The same setup works mit multiple hypotheses H1, H2,..., Hn. Identical calculations as above lead to posterior probabilities and the hypothesis with the highest posterior is chosen. 43

44 Other statistical tests There is a large variety of statistical tests. The choice of the correct test depends on the type and qualitiy of the data, the assumptions and the question to be answered. Examples: z-test t-test sign-test Wilcoxon-test Mann-Whitney / U-test χ 2 goodness-of-fit test / χ 2 test for independence... 44

45 The z-test The simplest version of a z-test: One-sample problem Situation: Given n independent measurements Xi, 1 i n. Question: Can the expected value E[X]=µ be equal to, larger or lower than some theoretical value µo? Paired two-sample problem Situation: Given n independent measurements Yi and Zi, 1 i n of the same feature in two different states. E.g., the blood pressure of each person is measured before and after the intake of a special drug. Question: Is there a significant difference between the two states? I.e., is the difference Xi = Yi - Zi 0 (or < 0, >0) or, equivalently: is E[X] 0? 45

46 Assumptions In the z-test it is assumed that X i N(µ X, σ 2 ) Thus the measurements should follow a normal distribution. Moreover, the variance σ 2 of Xi is known. 46

47 1. Hypotheses H 0 : X i N(µ 0, σ0 2 ), 1 i n, independent, with known variance σ2 0 H 1 : X i N(µ 1, σ 0 2 ), 1 i n, independent, with known variance σ2 0 with either µ 1 >µ 0, µ 1 <µ 0 or µ 0 µ 1 2. Test statistic Z = X µ 0 σ 0 / n 3. Distribution of Z under H0 Z N(0, 1) 47

48 4. Choose the significance level α E.g., α=5% (or a lower level, is stronger signifiance is needed). 5. Calculate the critical value The values can be looked up in a table. The most important ones (for the α=5% level) are given here: µ 1 >µ 0 : c =1.64 with R: > qnorm(0.95) => Ho is rejected, if Z > 1.64 µ 1 <µ 0 : c = 1.64 => Ho is rejected, if Z < with R: > qnorm(0.05) µ 0 µ 1 : c =1.96 => Ho is rejected, if Z > 1.96 with R: > qnorm(0.975) where do these values come from...? 48

49 One-sided test µ 1 >µ 0 : µ 1 <µ 0 : N(0,1) distribution N(0,1) distribution f(x) f(x) x Rejection range 5% x 49

50 Two-sided test µ 0 µ 1 N(0,1) distribution f(x) % 2.5% x 50