Problems with solution to the written Master s Examination-Option I

Problems with solution to the written Master s Examination-Option I Probability and Statistics, Spring 7 [] Math 3 -MS Exam, Spring 7 If A is a real symmetric n n matrix then show that A is idempotent if and only if A = P P T, where P is a n r matrix such that and r is the rank of A. P T P = I r Solution-Math 3 -MS Exam, Spring 7-Majumdar If Part: : A = P P T P P T = P P T = A Only if part : Spectral Decomposition: A = Q Q T where QQ T = Q T Q = I n where is diagonal. Since the eigen values of A are or we can write [ ] Ir = where r =rank (A. So A = [ P ; P ] [ I r P T ] [ P T P T ] = P P T. Writing Q = [P ; P ] we have [ ] [ ] P I = Q T T [ ] P Q = T P P T P P ; P = P T P P T P T P = I P r. [] Math 33 - MS Exam, Spring 7

Let f and g be continuous on [, ], and suppose that f( < g( and f( > g(. Show that there exists x in (, such that f(x = g(x. Deduce that the equation has a solution in (,. x + 3 = sin πx Solution to Math 33 Problem -MS Exam, Spring 7-Miescke The function f g is continuous, and (f g( <, (f g( >. Hence, by the intermediate value theorem, there exists c in (, such that (f g(c =. For the given equation, use where x is in [, ]. f(x = sin πx and g(x = x +, 3

[3] Stat 4 - MS Exam, Spring 7 The joint p.d.f. of random variables X and Y are given below. Compute the correlation coefficient of X, Y. Are they independent? {, if < x <, < y <, x + y < f(x, y =, otherwise. Solution to Stat 4 problem- MS Exam, Spring 7-Majumdar Similarly E(Y = 3. E(X = E(X = ( x ( x E(XY = xdydx = x dydx = ( x x( xdx = 3 x ( xdx = 6 = E(Y. xydydx = V (X = E(X E(X = 6 9 = 8 = V (Y cov(x, Y = 9 = 36 ρ = Corr (X, Y = 36 8 8 =. Since ρ X, Y are not independent. [4] Stat 4 - (Chapter 5,6,7:-MS Exam, Spring 7 Suppose X,..., X n are iid with the pdf (a Find the mle ˆθ for θ. f(x; θ = x θ, < x θ (b Find the mle for the median of the distribution. 3

Solution to Stat 4 Problem from (Chapter 5,6,7:-MS Exam, Spring 7-Yang (a The likelihood function L(θ = n x i θ = θ n n n x i, < x,..., x n θ The log likelihood function l(θ = log L(θ = n log θ + n log + log The first partial of l is ( n x i l(θ θ which implies that = n θ <, for θ max{x,..., x n } > is the mle for θ. ˆθ = max{x,..., X n } (b Because the distribution is continuous, the median m is the constant satisfying Note that < m < θ and Then the median m m x θ dx = x x dx = θ θ m = m θ m = θ = θ By part (a, the mle for θ is ˆθ = max{x,..., X n }. So the mle for the median is max{x,..., X n } [5] Stat 4 - (Chapter 8,9:-MS Exam, Spring 7 4

Let X = (X,..., X n denote a random sample from the distribution N(θ, that has the pdf f(x; θ = (x θ exp (, < x <. π It is desired to test the hypothesis H : θ = against the alternative hypothesis H : θ =. (a Show that the likelihood ratio L(θ = ; X/L(θ = ; X is based upon the statistic Y = n X i. (b If n = 6, find the best critical region of size α =.5 for the hypothesis test. (Hint: Use the normal table attached. (c If n = 6, find the power of the test in part (b. Solution to Stat 4 Problem - (Chapter 8,9:-MS Exam, Spring 7-Yang (a Proof: The likelihood function of θ given x = (x,..., x n is L(θ; x = n f(x i ; θ = ( { [ n n exp x i θ π ]} n x i + nθ Therefore the likelihood ratio { L(θ = ; X L(θ = ; X = exp } n X i + n So it is based upon the statistic Y = n X i. (b For any positive constant k, L(θ = ; x L(θ = ; x k n x i n log k = c By the Neyman-Pearson Theorem, the best critical region C takes the form of {(x,..., x n : n x i c}, where the constant c is determined by P H (X C = α If n = 6, Y N(, 6 under the null hypothesis. So Y/4 N(, and 5

.5 = P H (Y c = P (Y/4 c/4 By the normal table attached, c/4 = (.64 +.65/ =.645. So the best critical region is C = { (x,..., x n : } n x i 6.58 (c If n = 6, under the alternative hypothesis H : θ =, Y N(6, 6. Then Z = (Y 6/4 = Y/4 4 N(,. The power is P H (Y 6.58 = P (Z.355 = P (Z.355 =.5(.996 +.999 =.997 [6] Stat 46 - MS Exam, Spring 7 The following are the number of minutes it took workers in a factory to complete a certain task, one before and one after each worker had received a special training for completing this task. Worker 3 4 5 6 7 8 9 After Training 4.6 5.8 6.9 7. 8. 6.5 6.9 7. 7.3 8.7 Before Training 8. 7.3 7. 7.7 8. 7. 6.3 6.8 8.5 8.8 (a Explain the model under which both, the Sign Test and the Wilcoxon Signed- Rank Test, can be used. (b Specify the appropriate null hypothesis and alternative. (c Compute the test statistic of the Sign Test for the given data. (d Compute the test statistic of the Wilcoxon Signed-Rank Test for the given data. (e Under what changes of the model could you still use the Sign Test, but not the Wilcoxon Signed-Rank Test? Solution to Stat 46 Problem- MS Exam, Spring 7. -Miescke (a Y i X i, i =,..., n, are independent. Y i X i has a c.d.f. F i with a density f i that is symmetric about θ, i =,..., n. (b H : θ = versus H A : θ >, where the Y i are from Before Training. (c B = #{i : Y i > X i, i =,..., n} = 7, and n is reduced 9 (Worker 5. 6

(d Y i X i 3.5.5.3.5 N/A.7.6.4.. Ranks of abs. diff. 9 8 4 N/A 6 5 3 7 Signed ranks 9 8 4 N/A 6 5 3 7 The sum of the positive signed ranks is T + = 37, and n is reduced to 9 (Worker 5. (e If the densities f i are not all symmetric then only the Sign Test can be used. [7] Stat 43 - MS Exam, Spring 7, A random sample of size n has been drawn from a finite population of N units by adopting SRSWOR method and the sample so drawn is denoted by s. Subsequently, a sub-sample s of size n [< n] is drawn from the chosen sample s, again by adopting SRSWOR method. Denote by ȳ s and ȳ s the respective sample means for a study variable Y based on the two samples. Assume: Eȳ s = Ȳ V ar[ȳ s ] = S [ n N ] where S denotes the population variance based on Y - values. (a. Show that ȳ s also serves as an unbiased estimate for the population mean Ȳ. (b. Compute Var (ȳ s and Cov (ȳ s, ȳ s (c. Find the best linear combination of ȳ s and ȳ s to serve as a pooled estimate of Ȳ. You may use conditional expectation arguments. Note that a subsample (s behaves like an induced sample with reference to the subpopulation captured by the original sample (s. Solution to Stat 43 Problem -MS Exam, Spring 7, Hedayat Essential Steps. (a E[ȳ s ; given s] = E [ȳ s ] = ȳ s and hence E[ȳ s ] = E E. = E[ȳ(s] = Ȳ. (b V [ȳ s ] = V E + E V = S [ n N ] + S [ n n ] = S [ n n ] (c E[ȳ s.ȳ s ] = E E.. = E[ȳ s] = Ȳ + V ar[ȳ s ] Therefore, Cov[ȳ s, ȳ s ] = V ar[ȳ s ] = S [[ n N ] The rest follows from a standard result in pooling two dependent estimates. 7

[8] Stat 46 - MS Exam, Spring 7 A Markov chain X, X,...,.. has the transition probability matrix.3..5.5..4 and is known to start in state X =. Let T = min{n : X n = }. Find the probability that T is an odd number. Solution to Stat 46 Problem - MS Exam, Spring 7-El-Neweihi Let u = P (T odd/x =, u = P (T is odd/x =, By the first step analysis u = (.3( u + (.( u +.5 u = (.5( u + (.( u +.4 Solving for u we get u = 5 37. [9] Stat 48 - MS Exam, Spring 7 Consider a normal population X N(µ, σ =. To test H : µ = 5 against H : µ < 5. Determine the sample size n and the rejection region { X < c } to satisfy the significance level.5 and power level.975 at µ = 3. [Given: Φ (.96 =.975.] Solution to Problem [9]:Stat 48 - MS Exam, Spring 7-Wang The significance level and power level provide the following equations { α = P { Reject H H is true.} β = P { Reject H H is false.} { P ( X < c µ = 5 =.5 P ( X < c µ = 3 =.975. ( c 5 Φ =.5 ( /n c 3 Φ =.975 /n c 5 =.96 /n c 3 =.96 /n Hence, we have sample size n = 77, c = 4 and the rejection region { X < 4 }. 8

[] Stat 48 Problem - MS Exam, Spring 7 A multiple linear regression model Y i = β + β x i + β x i + ε i, was used to fit n = data points. It is calculated that SST O =, SSR = 66. (. Construct the ANOVA table. (.Calculate and interpret the coefficient of the determination R. (3. Test H : β = β = at the α =.5 significance level. (4. Explain the meaning of the coefficient β. Find the statistic and its sampling distribution for an individual test H : β = against H : β. [Given: F (.5,, 7 = 3.59] Solution to Problem []:Stat 48 - MS Exam, Spring 7-Wang (. ANOVA Table Source DF SS MS F Regression 66 33 4.9 Error 7 34 7.88 Total 9 (. The coefficient of the determination R = SSR SST O = 33% It means that there are 33% of variation of the response explained by the linear regression model, or the variation in the data is reduced by 33% after introducing the two predictors. (3 The F -statistic in the ANOVA table F = 4.9 > F (.5,, 7 = 3.59. Therefore, we will reject the null hypothesis at level.5, i.e. at least one predictor contribute significantly to explain the variation in the response. (4. The regression coefficient β is the effect on mean response for a unit increase in predictor variable X, while holding the other predictor X constant. Use t-test for individual test H : β = and the statistic is where ˆβ ( and s ˆβ t ˆβ = ( ˆβ s ˆβ are the least square estimate and its standard error of linear coefficient β. Under H, the test statistic t ˆβ t (n p i.e. t ˆβ t (7. 9