The written Master s Examination

Transcription

1 The written Master s Examination Option Statistics and Probability Fall Full points may be obtained for correct answers to 8 questions. Each numbered question (which may have several parts) is worth the same number of points. All answers will be graded, but the score for the examination will be the sum of the scores of your best 8 solutions. Use separate answer sheets for each question. DO NOT PUT YOUR NAME ON YOUR ANSWER SHEETS. When you have finished, insert all your answer sheets into the envelope provided, then seal and print your name on it. Any student whose answers need clarification may be required to submit to an oral examination.

2 MS Exam, Option Probability and Statistics, FALL. (STAT 4) Let ~ N( μ, σ ) X and g be a differentiable function, and E g'( X ) <. (i) Show that cov( X, g( X)) = E( g( X)( X μ)) = σ E( g'( X)). 3 (ii) Calculate EX ( ).. (STAT 4) Let X be one observation from a population with pdf x θ e f( x θ) =, < x<, < θ <. x θ ( + e ) (i) Construct a most powerful size α test to test H : θ = versus : H θ =. (ii) Construct a UMP size α test to test H : θ versus : H θ >. a a 3. (STAT 4) Suppose X,...,X n is a random sample from the exponential distribution with parameter θ>: θx θ e, x > f ( x; θ ) =,, otherwise (i) Show that Y = n X i i= follows a Gamma distribution. (ii) Show that Y is complete sufficient for θ. (iii) Derive the minimum variance unbiased estimator for θ. Justify your answer. 4. (STAT 46) A group of researchers want to determine whether there is a direct relationship between math and computer anxiety. The test scored are shown in the table for 8 students, with larger scores for indicating greater amount of the trait. Student A B C D E F G H Math Anxiety Computer Anxiety

3 MS Exam, Option Probability and Statistics, FALL (STAT 46 cont.) (a). Suppose both scores are symmetrically distributed. State your hypothesis and computer p-value, determine whether the students have the same median of computer anxiety as of math anxiety. (b). Calculate the association measure between computer anxiety as math anxiety, and test if there is any relationship between the two sets of scores. 5. (STAT 43) The purpose is to unbiasedly estimate the proportion of left-handers among school-going students in a large co-ed community school. It is known that on the whole there are 35 students enrolled in the school and 6% of these students are enrolled in the science stream. Assume stratified simple random sampling with proportional allocation for both boys-girls group classification as also for science-arts stream classification. Based on a stratified simple random sample of 35 students on the whole, the following table has been prepared : Boys Girls 5 sample left-handers in science stream 8 in arts stream 5 8 (a) For both the groups, estimate the total number of left-handers in each stream : science and arts. (b) For the science stream as a whole, find an estimate of the total number of left-handers and compute its estimated se. (c) For the arts stream as a whole, find an estimate of the proportion of left-handers and its 95% confidence interval. 6. (STAT 46) Customers arrive at a service facility according to a Poisson process with rate λ (customers/hour). Let X () t be the number of customers that have arrived up to time t. Let W, W,... be the successive arrival times of the customers. Determine: (a). E ( W X () t = ), (b). E W + () = W X t, (c). E ( W3 X () t = ). 3

4 MS Exam, Option Probability and Statistics, FALL 7. (STAT 47) Consider max x + 3x 7x3 such that x x = 3x + x x x3 9 x, x, x 3 unrestricted. (i). Write down the dual to the above linear programming problem. (ii). Write down the dual to the dual program you have obtained. 8. (STAT 47) In the last minute some doctors in New York are hectically trying to attend a conference in Los Angeles and are willing to go with connecting flights. A travel agent finds the following information: From To Number of seats available New York Chicago 5 New York Houston 7 Houston Atlanta 8 Chicago Atlanta 6 Chicago Denver Denver Los Angeles 5 Atlanta Los Angeles 4 Use Ford -Fulkerson algorithm to find the maximum number of doctors who could go to Los Angeles via these connecting flights. 4

5 MS Exam, Option Probability and Statistics, FALL 9. (STAT 473) Seven year old kids Ann, Beth, Cindy, Debbie, and Emma went on a field trip to a factory making kitchen utensils. At the end of the field trip the kids were allowed to pick either a cup or a saucer as memento. Ann opted for a saucer and the rest four opted for a cup. While coming out of the factory they noticed a guy paying $5 for a cup and a saucer. There is an ice cream shop charging $ per cone ice cream. All children are keen on getting rid of their mementos to buy ice cream recovering part of the expenses from selling their mementos. Ann is approached by all other kids with their cups to be sold. What will be considered fair value by Shapley for the saucer that Ann owns?. (STAT 48) Consider an example of the factorial design. The effects of temperature (factor, two levels are low and high, or and +) and reaction time (factor, two levels are short and long, or and +) on the percent yield of a certain chemical reaction (response Y) are studied. The experiment was replicated (n=) and the order of the eight runs was randomized. The design table and observations are listed below: Run I x x x x Average Yield Individual Observations , , , , 68.7 (a) Estimate the overall mean, main effects, and the interaction. Specify the model you used here, as well as your model assumptions. (b) Test if your estimated effects are significantly from at the significance level.5. For your reference, some critical values of the standard normal distribution are Prob(X >.96) =.5, Prob(X >.645) =.5. 5

6 MS Exam, Option Probability and Statistics, FALL. (STAT 48) Consider a regression model that relates gas mileage and weight of automobiles. Thirtyeight cars were selected, and their weights x (in units of, pounds) and fuel efficiencies MPG (miles per gallon, the response y ) were measured. (a) Given the summary statistics: x = 8. 79, y = 94. 9, x = , i i y i i i i y = , and x = , find the least-squares estimates of the regression coefficients in the simple regression model y = β + βx + ε. (b) Given SSE = ( y i yˆ i ) = which is the sum of squares due to error, along with the summary statistics in (a), construct a 95% confidence interval for β. For your reference, some critical values of t distributions are t(.5; df=38)=.4, t(.5; df=36)=.8. (c) The figure below is the residual plot (residuals versus fitted values) of the simple linear regression model. Based on that, discuss whether this model is appropriate. What other model(s) would you suggest? 6

7 7

8 Statistics 4&4 MS Exam Fall Semester. (STAT4) Let X E g'( X ) < N( μ, σ ) and g be a differentiable function, and. (i) Show that cov( X, g( X)) = E( g( X)( X μ)) = σ E( g'( X)). 3 (ii) Calculate EX ( ). Solution: (i) The first equality follows from the definition of cov( X, g( X )) immediately. To show the second equality, we have ( EgX ( ( )( X μ)) = gx ( )( x )exp dx πσ x μ) μ σ. Using integration by parts with u = g( x) and ( x μ) dv = ( x μ)exp dx σ yields that ( x μ) ( x μ) EgX ( ( )( X μ)) = σ gx ( )exp σ g'( x)exp dx + πσ σ σ ( x μ) = σ g'( x) exp πσ σ = σ Eg ( '( X)). dx 3 (ii) Note that. Let, then g'( X ) = EX ( ) = EX ( ( X μ+ μ)) = EX ( ( X μ)) + μex ( ) X and g( X) = X EX EX X EX 3 ( ) = ( ( μ)) + μ ( ) = EgX X + EX + X ( ( )( μ)) μ(( ( )) var( )) = σ E X + μ μ + σ = μ + μσ 3 ( ) ( ) 3. Remark: the equality in (i) is known as Stein s Lemma.

9 . (STAT4) Let X be one observation from a population with pdf x θ e f( x θ) =, < x<, < θ <. x θ ( + e ) (i) Construct a most powerful size α test to test H : θ = versus H : θ =. (ii) Construct a UMP size α test to test H : θ versus H : a θ >. Solution: a (i) According to N-P Lemma, the most powerful test is to reject H if x x x f( x θ = ) ( + e ) e + e = = e x x x f( x θ = ) e ( + e ) + e k. Note that e + e + e x x is an increasing function in x (by showing its derivative in x is positive), so the most powerful test is to reject H if x k '. To determine k ', implies that k ' = log(( α) / α). (ii) First, for any θ x e α = PX ( k' θ = ) = dx= x ( + e ) + e k ' θ, the likelihood ratio is > k ' x θ θ e θ = e x θ f( x θ = θ) + f( x θ = θ) + e. The derivative of the likelihood ratio is x θ x θ x θ x e θ e > x θ x θ x θ d f( x θ = θ) θ θ d + e θ θ + e e = e =. dx f ( x θ = θ) dx + e + e ( + e ) f( x θ = θ) Therefore, is an increasing function in x (MLR), and the UMP test is to f( x θ = θ ) reject H if x k. This is the same test as in (i), so k = log(( α) / α).

10 Suppose X ; :::; X n is a random sample from the exponential distribution with parameter > : e f(x; ) = x if x > ; otherwise. (i) Show that Y = nx X i follows a Gamma distribution. i= (ii) Show that Y is complete su cient for. (iii) Derive the minimum variance unbiased estimator for : Justify your answer. Solution: (i) The mgf of X is: So, the mgf of Y is Z E e tx = e tx e x dx = E e ty = n t : t This is the mgf of a Gamma distribution with = n; = =: (ii) Since Gamma belongs to the exponential family of distributions, Y is complete su cient for. (iii) E Y = = (n) n : n Z y yn e y dy Hence n Y is unbiased for : It follows from Rao-Blackwell Theorem that n Y is the minimum variance unbiased estimator for :

11 STAT 46 Problem in Fall A group of researchers want to determine wether there is a direct relationship between math and computer anxiety. The test scored are shown in the table for 8 students, with larger scores for indicating greater amount of of the trait. Student A B C D E F G H Math Anxiety Computer Anxiety (a). Suppose both scores are symmetrically distributed. State your hypothesis and computer p-value, determine wether the students have the same median of computer anxiety as of math anxiety. (b). Calculate the association measure between computer anxiety as math anxiety,.and test if there is any relationship between the two sets of scores. Solution: (a). Let D = Y X, both hypotheses: H : M D = vs. H : M D. Student A B C D E F G H X i Y i D i = Y i X i r ( D i ) Signed-rank test: T + = N r ( D i ) I {Di>} = = 7 i= p value = P { T + 7 } =.5 =.5 There is no significant difference between the medians of the two anxiety scores. (b). Use spearman s test statistic for association measure, first rank the two scores respectively Spearman s Rho test Student A B C D E F G H S i = rank(x i ) R i = rank(y i ) D i = S i R i R = 6 n i= D i n (n ) = =.94 Its p value = P (R.94) <.5.

12 Solution to Sampling Problem- Fall. Background On the whole there are N = 35 students and in a random sample with proportional allocation of 35 students, there are boys and 5 girls. Therefore, in the population, there are boys and the rest [5] are girls. Further, 6% of the students are enrolled in science stream. We assume that this 6% refers to each of the two groups : boys and girls. Therefore, in the population, we have a x table of frequency counts as follows : Boys Girls Science Stream 9 Arts Stream 8 6 TOTAL 5 We are given the sample frequency counts of the left-handers for each of the above x categories. Under proportional sampling, we have thus the following table, indicating the number of left-handers in parentheses. Boys Girls Popl. Size Sample Size Popl. Size Sample Size Science Stream (8) 9 9 () Arts Stream 8 8 (5) 6 6 (8) -- (a) (i) For boys in science stream estimated total number of left-handers = x 8 / = 8 (ii) For boys in arts stream estimated total number of left-handers = 8 x 5 / 8 = 5 (iii) For girls in science-stream estimated total number of left-handers = 9 x / 9 = (iv) For girls in arts stream estimated total number of left-handers = 6 x 8 / 6 = 8 (b) For science-stream as a whole estimated total number of left-handers = 8 + = estimated s.e. = sqrt. [ ^ x (8/)(/)/(9) + 9^ x (/9)(78/9)/(89)] =. (c) For arts stream as a whole estimated proportion of left-handers = [(5 + 8)/(8 + 6) = To compute 95% confidence interval, we need to compute estimated s.e. of the above estimate. This is given by s.e. = sqrt.[(8^)(5/8)(75/8)/(79) + (6)^(8/6)(5/6)/(59)] / 4 Finally, 95% confidence interval is computed as estimate +/-.96 times estimated s.e.

13

14

15

16

17 Stat 48 (Experimental Design) Problem: Consider an example of the factorial design. The effects of temperature (factor, two levels are low and high, or and +) and reaction time (factor, two levels are short and long, or and +) on the percent yield of a certain chemical reaction (response Y) are studied. The experiment was replicated (n=) and the order of the eight runs was randomized. The design table and observations are listed below: Run I x x x x Average Yield Individual Observations , , , , 68.7 (a) Estimate the overall mean, main effects, and the interaction. Specify the model you used here, as well as your model assumptions. (b) Test if your estimated effects are significantly from at the significance level.5. For your reference, some critical values of the standard normal distribution are Prob(X >.96) =.5, Prob(X >.645) =.5. Solution for Stat 48 (Experimental Design) Problem: (a) The estimates of overall mean: μˆ = ( )/4=6.; main effect : ˆμ = ( )/4=.4; main effect : ˆμ = ( )/4= 4.; interaction: ˆμ = ( )/4= -.4. The model we used here is Yij = μ + x iμ + xiμ + x i xiμ + ε ij, i =,,3, 4, j =,. Model assumption: ε ij are i.i.d. ~ N(, σ ). (b) The overall variance estimate 4 4 s = si = ( Yij Yi ) = i= 4( ) i= j= The estimate of the variance of an effect is s Var(effect)= = An estimated effect is significantly from at the significance level.5 if its absolute value goes beyond =. 4. Therefore, the overall mean and two main effects are significantly from at level.5, while the interaction is not significant.

18 Stat 48 (Linear Regression) Problem: Consider a regression model that relates gas mileage and weight of automobiles. Thirtyeight cars were selected, and their weights x (in units of, pounds) and fuel efficiencies MPG (miles per gallon, the response y ) were measured. i i i (a) Given the summary statistics: x = 8. 79, y = 94. 9, x = , i y = , and x 56 i y = 539., find the least-squares estimates of i the regression coefficients in the simple regression model y = β + βx + ε. (b) Given SSE = ( y i yˆ i ) = which is the sum of squares due to error, along with the summary statistics in (a), construct a 95% confidence interval for β. For your reference, some critical values of t distributions are t(.5; df=38)=.4, t(.5; df=36)=.8. (c) The figure below is the residual plot (residuals versus fitted values) of the simple linear regression model. Based on that, discuss whether this model is appropriate. What other model(s) would you suggest? Solution for Stat 48 (Linear Regression) Problem: (a) For the least-squares estimates: ˆ xi yi ( xi )( yi ) /38 β = = 8.365, xi ( xi ) /38 ˆβ = y /38 ˆβ xi /38= i

19 (b) An estimate of the standard deviation of ( ˆ MSE SSE /(38 ) s β ) = = =.663. ( x x) x ( x ) /38 i i i ˆ β β Since ~ t(38 ) s( ˆ β), a 95% confidence interval for βˆ is ˆ β ˆ ±.8 s ( β) = ±.8.663=[-9.7, -7.]. (c) The figure shows a quadratic pattern which indicates the simple regression model is not appropriate. One may want to try the model y = β + βx + β x + ε. Another possibility is to look for transformations on y that simplify the structure of the model, say ( y) = β + β x + ε. βˆ is g