University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording of the problem before you start. here are four problems altogeher. You may use a A4 sheet of paper and a mathematical handbook. Please write all the answers on the sheets provided. You have two hours. Problem a. b. c. d. 1. 2. 3. 4. otal

1. 20 Suppose the population of size N is divided into subpopulations of size K so that N = K. A sample is selected in two steps: first m subpopulations are selected among the by simple random sampling. On the second step k units are selected in each subpopulation selected by simple random sampling. he final sample is of size n = mk. a. 5 Is the sample mean an unbiased estimate of the population mean? Explain. Solution: Every unit in the population will be selected with the same probability. his means that the sample average is an unbiased estimate. b. 5 Denote for j = 1,2,..., by µ j the j-th subpopulation mean and by σj 2 the population variance in the j-th subpopulation and let { 1 if the j-th subpopulation is selected I j = 0 else and let X 1, X2,..., X be the sample means for samples selected in subpopulations. Assume that X 1,..., X are independent and independent of I 1,...,I. Argue that the sample mean can be written as Show that and Solution: We know that and We compute X = 1 m var X j I j == m X1 I 1 + X 2 I 2 + + X I. cov X j I j, X l I l = m µ jµ l var X j + m µ2 j var X j = σ2 j k K k K 1 m 1. covi j,i l = m m 1 m 2 1 m m = 2 1. var X j I j = E X 2 j I j E X j I j 2 = E X 2 jei j E X j 2 EI j 2 = var X j +µ 2 j m m µ2 j = m var X j + m µ2 j. 2 2

and cov X j I j, X l I l = E X j I j Xl I l E X j I j E X l I l = E X j E X l EI j I l E X j EI j E X l EI l m = µ j µ l covi j,i l +EI j EI l µ j µ l mm 1 m 2 = µ j µ l 1 = m µ m jµ l 1. 2 c. 10 Show that var X = 1 var m X j + m µ j µ 2 1 where µ is the population mean. Assume as known that µ 2 j 2 1 j<l µ j µ l = 1 µ j µ 2. Solution: We have var X 1 = var X1 I 1 + m X 2 + + X I = 1 var m X 2 j I j +2 cov X j I j, X l I l j<l = 1 m 2 = = 1 m 1 m m var X j + m µ2 j 2 m µ jµ l j<l var X j + m µ 2 2 j m 2 µ j µ l 1 var X j + m 1 n µ j µ 2 j<l m 1 d. 5 How would you estimate the standard error from the data? Just give the idea with no calculations. 3

Solution: For the quantities var X j we only have estimates for m selected subpopulations. ultiplying their sum by m/ would give an estimate for the average 1 var m X j. he sum n µ j µ 2 could be estimated by for some appropriate constant. c m X j X 2 4

2. 25 Let x 1,x 2,...,x n be an i.i.d. sample from the distribution with density for x > 0 and λ > 0. fx = λ2 λx 12 xe a. 15 Find the Fisher information. Assume as known that 0 x 3/2 e λx dx = 48 λ 5/2. Rešitev: he log-likelihood function is aking the second derivative we get lλ x = 2logλ log12+logx λx. It follows l = 2 λ 2 + x 4λ 3/2. Iλ = 2 λ 1 2 4λ E X 3/2 = 2 λ 1 λ2 2 4λ3/2 12 0 = 2 λ 1 λ2 2 4λ3/2 12 48 λ 5/2 = 1 λ 2. x 3/2 e λx dx b. 10 Write explicitely the 99%-confidence interval for λ on the basis of the data x 1,x 2,...,x n. Rešitev: he log-likelihood function is lλ x 1,...,x n = 2nlogλ nlog12+ aking derivatives we get the equation 2n λ 1 2 λ n logx k λ k=1 n xk = 0 k=1 n xk. k=1 5

with the solution 2 4n ˆλ = n. k=1 xk he 99%-confidence interval is ˆλ±2.56 ˆλ n. 6

3. 20 he χ 2 statistic can be used to test whether a roulette wheel is unbiased. If O i is the number of observed occurences of i and E i is the number of expected occurences we define χ 2 = 36 i=0 O i E i 2 E i. Large values of the χ 2 statistic indicate that the roulette wheel is biased. We are assuming that individual spins are independent and that the probabilities are constant throughout the observation period. Suppose the gambling house tests all the weels at the end of every month on the basis of data collected in that month. he rule is that a wheel is examined more closely if the p-value is below 0.01. a. 5 Suppose that for a roulette wheel we got the p-value p = 0.005. Can this happen with an unbiased wheel? With what probability? Solution: Yes, it can happen with probability 0.005. b. 5 Suppose that for a roulette wheel the p-value was p = 0,23. Is this conclusive evidence that the wheel is unbiased? Explain. Solution: No, it is not conclusive evidence. c. 5 Suppose a gambling house has 100 roulette wheels which are tested every month on the basis of data collected. Suppose all the wheels were unbiased. How many wheels per month would be examined on average over a long period of time. Explain. Solution: he probability of examining an unbiased wheel is 0.01. So on average one wheel would be examined. d. 5Supposeoneofthewheelsisbiased. Istheprobabilitythatitwillbeexamined more or less than 0.01? Explain. Solution: Any sensible test would have to have power exceeding its size. 7

4. 20 Assume the usual regression model Y = Xβ +ǫ Denote by Y i the vector Y with the i-th component deleted, and similarly X i and ǫ i. Let ˆβ i be the least squares estimate of β with the i-th observation deleted, i. e. ˆβ i = X ix i 1 X i Y i. a. 5 Show that ˆβ i is an unbiased estimate of β. Solution: If the i-th observation is deleted all the assumptions of linear regression are still valid. he estimate is unbiased. b. 10 Find an expression for covˆβ, ˆ β i. Solution: We compute cov ˆβ, ˆβi = cov X X 1 X Y, X ix 1 i X i Y i = X X 1 X cov Y,Y i Xi X i X i 1 = σ 2 X X 1 X I i X i X i X i 1 = σ 2 X X 1 X ix i X i X i 1 = σ 2 X X 1. Here I i stands for the identity matrix with i-th column deleted. c. 10 Show that [ E ˆβ i ˆβ X X ˆβ i ˆβ ] X. = σ r 2 i X 1 i r X X 1. Hint: Remember that for a random vector Z and a matrix A EZ AZ = r E AZZ. Solution: Using the hint the expression to compute is equal to X r[ X E [ˆβ i ˆβˆβ i ˆβ ]]. Because both estimates of β are unbiased the expectation is the covariance matrix of ˆβ i ˆβ which is σ 2 X i X i 1 X X 1. Hence the result is X σ r 2 i X 1 i r X X 1. 8