University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.



Similar documents
Master s Theory Exam Spring 2006

Multiple Linear Regression in Data Mining

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Least Squares Estimation

Introduction to General and Generalized Linear Models

Problem sets for BUEC 333 Part 1: Probability and Statistics

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Penalized regression: Introduction

Sections 2.11 and 5.8

Factor analysis. Angela Montanari

Recall this chart that showed how most of our course would be organized:

Degrees of Freedom and Model Search

Multivariate Normal Distribution

Data Mining: Algorithms and Applications Matrix Math Review

Joint Exam 1/P Sample Exam 1

Introduction to Matrix Algebra

Multivariate Analysis of Variance (MANOVA): I. Theory

Regression analysis of probability-linked data

Statistical Models in R

Factors affecting online sales

Multivariate Analysis (Slides 13)

Recall that two vectors in are perpendicular or orthogonal provided that their dot

CS229 Lecture notes. Andrew Ng

Random Variables. Chapter 2. Random Variables 1

Sample Size Calculation for Longitudinal Studies

From the help desk: Bootstrapped standard errors

Time Series Analysis III

Chapter 6: Multivariate Cointegration Analysis

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Stat 704 Data Analysis I Probability Review

Forecasting in supply chains

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Applications to Data Smoothing and Image Processing I

Fitting Subject-specific Curves to Grouped Longitudinal Data

4. Simple regression. QBUS6840 Predictive Analytics.

Corrections to the First Printing

5. Linear Regression

Basics of Statistical Machine Learning

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Recursive Estimation

THE CENTRAL LIMIT THEOREM TORONTO

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Statistics in Retail Finance. Chapter 6: Behavioural models

SAS Software to Fit the Generalized Linear Model

1 Another method of estimation: least squares

Vector and Matrix Norms

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Math 312 Homework 1 Solutions

Autocovariance and Autocorrelation

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Estimation of σ 2, the variance of ɛ

Life Table Analysis using Weighted Survey Data

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Simple Linear Regression Inference

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

3.2 Roulette and Markov Chains

Lecture Notes 1. Brief Review of Basic Probability

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Maximum Likelihood Estimation

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Lecture 2 Matrix Operations

Quantile Regression under misspecification, with an application to the U.S. wage structure

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

4: SINGLE-PERIOD MARKET MODELS

Permanents, Order Statistics, Outliers, and Robustness

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Lecture 8: Gamma regression

Estimating an ARMA Process

Sums of Independent Random Variables

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Statistics 100A Homework 8 Solutions

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Poisson Models for Count Data

Section 1: Simple Linear Regression

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Inner products on R n, and more

Deflator Selection and Generalized Linear Modelling in Market-based Accounting Research

CAPM, Arbitrage, and Linear Factor Models

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

Statistics 104: Section 6!

From the help desk: Swamy s random-coefficients model

Multivariate normal distribution and testing for means (see MKB Ch 3)

How to do hydrological data validation using regression

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Simple Second Order Chi-Square Correction

Forecast covariances in the linear multiregression dynamic model.

LOGNORMAL MODEL FOR STOCK PRICES

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Variations of Statistical Models

Lecture 3: Linear methods for classification

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM

Multivariate Statistical Inference and Applications

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

33. STATISTICS. 33. Statistics 1

17. SIMPLE LINEAR REGRESSION II

Transcription:

University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording of the problem before you start. here are four problems altogeher. You may use a A4 sheet of paper and a mathematical handbook. Please write all the answers on the sheets provided. You have two hours. Problem a. b. c. d. 1. 2. 3. 4. otal

1. 20 Suppose the population of size N is divided into subpopulations of size K so that N = K. A sample is selected in two steps: first m subpopulations are selected among the by simple random sampling. On the second step k units are selected in each subpopulation selected by simple random sampling. he final sample is of size n = mk. a. 5 Is the sample mean an unbiased estimate of the population mean? Explain. Solution: Every unit in the population will be selected with the same probability. his means that the sample average is an unbiased estimate. b. 5 Denote for j = 1,2,..., by µ j the j-th subpopulation mean and by σj 2 the population variance in the j-th subpopulation and let { 1 if the j-th subpopulation is selected I j = 0 else and let X 1, X2,..., X be the sample means for samples selected in subpopulations. Assume that X 1,..., X are independent and independent of I 1,...,I. Argue that the sample mean can be written as Show that and Solution: We know that and We compute X = 1 m var X j I j == m X1 I 1 + X 2 I 2 + + X I. cov X j I j, X l I l = m µ jµ l var X j + m µ2 j var X j = σ2 j k K k K 1 m 1. covi j,i l = m m 1 m 2 1 m m = 2 1. var X j I j = E X 2 j I j E X j I j 2 = E X 2 jei j E X j 2 EI j 2 = var X j +µ 2 j m m µ2 j = m var X j + m µ2 j. 2 2

and cov X j I j, X l I l = E X j I j Xl I l E X j I j E X l I l = E X j E X l EI j I l E X j EI j E X l EI l m = µ j µ l covi j,i l +EI j EI l µ j µ l mm 1 m 2 = µ j µ l 1 = m µ m jµ l 1. 2 c. 10 Show that var X = 1 var m X j + m µ j µ 2 1 where µ is the population mean. Assume as known that µ 2 j 2 1 j<l µ j µ l = 1 µ j µ 2. Solution: We have var X 1 = var X1 I 1 + m X 2 + + X I = 1 var m X 2 j I j +2 cov X j I j, X l I l j<l = 1 m 2 = = 1 m 1 m m var X j + m µ2 j 2 m µ jµ l j<l var X j + m µ 2 2 j m 2 µ j µ l 1 var X j + m 1 n µ j µ 2 j<l m 1 d. 5 How would you estimate the standard error from the data? Just give the idea with no calculations. 3

Solution: For the quantities var X j we only have estimates for m selected subpopulations. ultiplying their sum by m/ would give an estimate for the average 1 var m X j. he sum n µ j µ 2 could be estimated by for some appropriate constant. c m X j X 2 4

2. 25 Let x 1,x 2,...,x n be an i.i.d. sample from the distribution with density for x > 0 and λ > 0. fx = λ2 λx 12 xe a. 15 Find the Fisher information. Assume as known that 0 x 3/2 e λx dx = 48 λ 5/2. Rešitev: he log-likelihood function is aking the second derivative we get lλ x = 2logλ log12+logx λx. It follows l = 2 λ 2 + x 4λ 3/2. Iλ = 2 λ 1 2 4λ E X 3/2 = 2 λ 1 λ2 2 4λ3/2 12 0 = 2 λ 1 λ2 2 4λ3/2 12 48 λ 5/2 = 1 λ 2. x 3/2 e λx dx b. 10 Write explicitely the 99%-confidence interval for λ on the basis of the data x 1,x 2,...,x n. Rešitev: he log-likelihood function is lλ x 1,...,x n = 2nlogλ nlog12+ aking derivatives we get the equation 2n λ 1 2 λ n logx k λ k=1 n xk = 0 k=1 n xk. k=1 5

with the solution 2 4n ˆλ = n. k=1 xk he 99%-confidence interval is ˆλ±2.56 ˆλ n. 6

3. 20 he χ 2 statistic can be used to test whether a roulette wheel is unbiased. If O i is the number of observed occurences of i and E i is the number of expected occurences we define χ 2 = 36 i=0 O i E i 2 E i. Large values of the χ 2 statistic indicate that the roulette wheel is biased. We are assuming that individual spins are independent and that the probabilities are constant throughout the observation period. Suppose the gambling house tests all the weels at the end of every month on the basis of data collected in that month. he rule is that a wheel is examined more closely if the p-value is below 0.01. a. 5 Suppose that for a roulette wheel we got the p-value p = 0.005. Can this happen with an unbiased wheel? With what probability? Solution: Yes, it can happen with probability 0.005. b. 5 Suppose that for a roulette wheel the p-value was p = 0,23. Is this conclusive evidence that the wheel is unbiased? Explain. Solution: No, it is not conclusive evidence. c. 5 Suppose a gambling house has 100 roulette wheels which are tested every month on the basis of data collected. Suppose all the wheels were unbiased. How many wheels per month would be examined on average over a long period of time. Explain. Solution: he probability of examining an unbiased wheel is 0.01. So on average one wheel would be examined. d. 5Supposeoneofthewheelsisbiased. Istheprobabilitythatitwillbeexamined more or less than 0.01? Explain. Solution: Any sensible test would have to have power exceeding its size. 7

4. 20 Assume the usual regression model Y = Xβ +ǫ Denote by Y i the vector Y with the i-th component deleted, and similarly X i and ǫ i. Let ˆβ i be the least squares estimate of β with the i-th observation deleted, i. e. ˆβ i = X ix i 1 X i Y i. a. 5 Show that ˆβ i is an unbiased estimate of β. Solution: If the i-th observation is deleted all the assumptions of linear regression are still valid. he estimate is unbiased. b. 10 Find an expression for covˆβ, ˆ β i. Solution: We compute cov ˆβ, ˆβi = cov X X 1 X Y, X ix 1 i X i Y i = X X 1 X cov Y,Y i Xi X i X i 1 = σ 2 X X 1 X I i X i X i X i 1 = σ 2 X X 1 X ix i X i X i 1 = σ 2 X X 1. Here I i stands for the identity matrix with i-th column deleted. c. 10 Show that [ E ˆβ i ˆβ X X ˆβ i ˆβ ] X. = σ r 2 i X 1 i r X X 1. Hint: Remember that for a random vector Z and a matrix A EZ AZ = r E AZZ. Solution: Using the hint the expression to compute is equal to X r[ X E [ˆβ i ˆβˆβ i ˆβ ]]. Because both estimates of β are unbiased the expectation is the covariance matrix of ˆβ i ˆβ which is σ 2 X i X i 1 X X 1. Hence the result is X σ r 2 i X 1 i r X X 1. 8