Solution to HW - 1. Problem 1. [Points = 3] In September, Chapel Hill s daily high temperature has a mean



Similar documents
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Covariance and Correlation

Independent t- Test (Comparing Two Means)

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Correlation in Random Variables

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

How To Find The Correlation Of Random Bits With The Xor Operator

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

The Bivariate Normal Distribution

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Math 431 An Introduction to Probability. Final Exam Solutions

Descriptive Statistics


BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

The correlation coefficient

Chapter 5. Random variables

Joint Exam 1/P Sample Exam 1

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

II. DISTRIBUTIONS distribution normal distribution. standard scores

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

3.4 Statistical inference for 2 populations based on two samples

University of Chicago Graduate School of Business. Business 41000: Business Statistics

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Two-sample hypothesis testing, II /16/2004

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Mathematical Expectation

Stat 704 Data Analysis I Probability Review

HYPOTHESIS TESTING: POWER OF THE TEST

Lecture 8. Confidence intervals and the central limit theorem

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Autocovariance and Autocorrelation

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

Homework 11. Part 1. Name: Score: / null

Simple Linear Regression Inference

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Introduction to Hypothesis Testing

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

A) B) C) D)

Unit 26 Estimation with Confidence Intervals

Northumberland Knowledge

Two-sample inference: Continuous data

Generalized Linear Models

Lecture 6: Discrete & Continuous Probability and Random Variables

Data Mining: Algorithms and Applications Matrix Math Review

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

Module 2 Probability and Statistics

Part 2: Analysis of Relationship Between Two Variables

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Chapter Eight: Quantitative Methods

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Using Excel for inferential statistics

Some probability and statistics

Regression Analysis: A Complete Example

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Two Related Samples t Test

E3: PROBABILITY AND STATISTICS lecture notes

Section 13, Part 1 ANOVA. Analysis Of Variance

Basic Concepts in Research and Data Analysis

Impact of Enrollment Timing on Performance: The Case of Students Studying the First Course in Accounting

Statistical tests for SPSS

Nonparametric Tests for Randomness

Least Squares Estimation

Chapter 5 Analysis of variance SPSS Analysis of variance

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Problem sets for BUEC 333 Part 1: Probability and Statistics

Hypothesis testing - Steps

1 Simple Linear Regression I Least Squares Estimation

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Hypothesis Testing: Two Means, Paired Data, Two Proportions

University of California, Los Angeles Department of Statistics. Random variables

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

ECON1003: Analysis of Economic Data Fall 2003 Answers to Quiz #2 11:40a.m. 12:25p.m. (45 minutes) Tuesday, October 28, 2003

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Introduction to Statistics and Quantitative Research Methods

Factors affecting online sales

Econometrics Simple Linear Regression

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Statistics Review PSY379

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Elementary Statistics

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 5 Solutions

Chapter 7: Simple linear regression Learning Objectives

1. Let A, B and C are three events such that P(A) = 0.45, P(B) = 0.30, P(C) = 0.35,

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Name: Date: Use the following to answer questions 3-4:

CALCULATIONS & STATISTICS

TABLE OF CONTENTS. 4. Daniel Markov 1 173

WHAT IS A JOURNAL CLUB?

2 Sample t-test (unequal sample sizes and unequal variances)

Manual for SOA Exam MLC.

Transcription:

Problem 1. [Points = 3] In September, Chapel Hill s daily high temperature has a mean of 81 degree F and a standard deviation of 10 degree F. What is the mean, standard deviation and variance in terms of Celsius? Answer 1: Note that C = (5/9) (F 32) = a + b F where a = 160/9 and b = 5/9. Therefore, the mean temperature is a+b 81 27.3 degree Celsius. The standard deviation is b 10 5.56 degree Celsius. Problem 2. [Points = 3] In a given population of two-earner male/female couples, male earnings have a mean of 40,000 USD per year and a standard deviation of 12,000 USD. Female earnings have a mean of 45,000 USD per year and a standard deviation of 18,000 USD. The correlation between male and female earnings for a couple is 0.80. Let C denote the combined earnings for a randomly selected couple. (a) What is the mean of C? (b) What is the covariance between male and female earnings? (c) What is the standard deviation of C? Answer 2: Let us denote male earnings by m and female earnings by f. (a) C = m + f, and hence C = m + f = 85, 000 USD. (b) Cov(m, f) = Corr(m, f) sd(m) sd(f) =.8 12, 000 18, 000 = 172800000. (c) sd(c) = sd(m) 2 + sd(f) 2 + 2 Cov(m, f) = 28523.67 USD. Problem 3. [Points = 3] X and Y are discrete random variables with the following joint distribution: That is, P r[x = 1, Y = 14] = 0.02, and so forth. 1

Joint Probability Value of Y Distribution 14 22 30 40 65 1 0.02 0.05 0.10 0.03 0.01 Value of X 5 0.17 0.15 0.05 0.02 0.01 8 0.02 0.03 0.15 0.10 0.09 (a) Calculate the probability distribution, mean, and variance of Y. (b) Calculate the probability distribution, mean, and variance of Y given X = 8. (c) Calculate the covariance and correlation between X and Y. Answer 3: (a) The probability distribution of Y is given by P [Y = 14] = P [X = 1, Y = 14] + P [X = 5, Y = 14] + P [X = 8, Y = 14] =.21 P [Y = 22] = P [X = 1, Y = 22] + P [X = 5, Y = 22] + P [X = 8, Y = 22] =.23 P [Y = 30] = P [X = 1, Y = 30] + P [X = 5, Y = 30] + P [X = 8, Y = 30] =.30 P [Y = 40] = P [X = 1, Y = 40] + P [X = 5, Y = 40] + P [X = 8, Y = 40] =.15 P [Y = 65] = P [X = 1, Y = 65] + P [X = 5, Y = 65] + P [X = 8, Y = 65] =.11. Now note that E[Y ] = 14 P [Y = 14]+22 P [Y = 22]+30 P [Y = 30]+40 P [Y = 40] + 65 P [Y = 65] = 30.15 and that E[Y 2 ] = 14 2 P [Y = 14] + 22 2 P [Y = 22] + 30 2 P [Y = 30] + 40 2 P [Y = 40] + 65 2 P [Y = 65] = 1127.23. Therefore, V (Y ) = E[Y 2 ] E 2 [Y ] = 218.21. (b) The probability distribution of Y conditional on X = 8 is given by P [Y = 14 X = 8] = P [X = 8, Y = 14]/P [X = 8] =.05 P [Y = 22 X = 8] = P [X = 8, Y = 22]/P [X = 8] =.08 2

P [Y = 30 X = 8] = P [X = 8, Y = 30]/P [X = 8] =.38 P [Y = 40 X = 8] = P [X = 8, Y = 40]/P [X = 8] =.26 P [Y = 65 X = 8] = P [X = 8, Y = 65]/P [X = 8] =.23, where I use P [X = 8] from the following computations P [X = 1] = P [X = 1, Y = 14] + P [X = 1, Y = 22] + P [X = 1, Y = 30] + P [X = 1, Y = 40] + P [X = 1, Y = 65] =.21 P [X = 5] = P [X = 5, Y = 14] + P [X = 5, Y = 22] + P [X = 5, Y = 30] + P [X = 5, Y = 40] + P [X = 5, Y = 65] =.4 P [X = 8] = P [X = 8, Y = 14] + P [X = 8, Y = 22] + P [X = 8, Y = 30] + P [X = 8, Y = 40] + P [X = 8, Y = 65] =.39. Now note that E[Y X = 8] = 14 P [Y = 14 X = 8] + 22 P [Y = 22 X = 8] + 30 P [Y = 30 X = 8] + 40 P [Y = 40 X = 8] + 65 P [Y = 65 X = 8] = 39.21 and that E[Y 2 X = 8] = 14 2 P [Y = 14 X = 8] + 22 2 P [Y = 22 X = 8] + 30 2 P [Y = 30 X = 8] + 40 2 P [Y = 40 X = 8] + 65 2 P [Y = 65 X = 8] = 1778.69. Therefore, V (Y X = 8) = E[Y 2 X = 8] E 2 [Y X = 8] = 241.65. (c) Cov(X, Y ) = E[XY ] E[X]E[Y ] and Corr(X, Y ) = Cov(X, Y )/ V (X)V (Y ). So we will further need to compute E[X], V (X) and E[XY ]. Note that E[X] = 1 P [X = 1] + 5 P [X = 5] + 8 P [X = 8] = 5.33, E[X 2 ] = 1 2 P [X = 1] + 5 2 P [X = 5]+8 2 P [X = 8] = 35.7 and hence V (X) = E[X 2 ] E 2 [X] = 6.76. Also, by a similar computation we get E[XY ] = 171.7. Therefore, Cov(X, Y ) = E[XY ] E[X]E[Y ] = 11.01 and Corr(X, Y ) =.29. Problem 4. [Points = 4] Let X and Z be independently distributed standard normal random variables, and let Y = X 2 + Z. 3

(a) Show that E[Y X] = X 2. (b) Show that E[Y ] = 1. (c) Show that E[XY ] = 0. (Hint: Use the fact that the odd moments of a standard normal random variable are all zero.) (d) Show that Cov(X, Y ) = 0. Answer 4: X N(0, 1) E[X] = 0, V (X) = E[X 2 ] E 2 [X] = 1, and the same for Z. Also X and Z independent implies E[Z X] = E[Z] = 0 and Cov(X, Z) = E[XZ] E[X]E[Z] = 0 0 = 0. (a) E[Y X] = E[X 2 + Z X] = E[X 2 X] + E[Z X] = X 2 + 0 = X 2. (b) By the law of iterated expectations, we have E[Y ] = E X [E[Y X]] = E X [X 2 ] = 1. (c) Cov(X, Y ) = Cov(X, X 2 + Z) = Cov(X, X 2 ) + Cov(X, Z) = Cov(X, X 2 ) = E[X 3 ] E[X]E[X 2 ] = E[X 3 ] = 0 (because the odd moments of N(0, 1) are zero). Problem 5. [Points = 7] Grades on a standardized test are known to have a mean of 1000 for students in the United States. The test is administered to 453 randomly selected students in Florida. In this sample, the mean is 1013 and the standard deviation is 108. (a) Construct a 95 % confidence interval for the average test score for Florida students. (b) Is there statistically significant evidence that Florida students perform differently than other students in the United States? (c) Another 503 students are selected at random from Florida. They are given a threehour preparation course before the test is administered. Their average test score is 1019 with a standard deviation of 95. 4

(i) Construct a 95 % confidence interval for the change in average test score associated with the prep course. (ii) Is there a statistically significant evidence that the prep course helped? (d) The original 453 students are given the prep course and then asked to take the test a second time. The average change in their test scores is 9 points and the standard deviation of the change is 60 points. (i) Construct a 95 % confidence interval for the change in average test scores. (ii) Is there statistically significant evidence that students will perform better on their second attempt after taking the prep course? (iii) Students may have performed better in their second attempt because of the prep course or because they gained test-taking experience in their first attempt. Describe an experiment that would quantify these two effects. Answer 5: (a) A 95 % confidence interval will be [(Florida mean) ± 1.96 (Florida sd)/ n] = [1003.05, 1022.94]. (b) Note that the National average 1000 is not included in the 95 % confidence for Florida. So we can say that we reject at the 5 percent level the hypothesis that the Florida students perform the same as the students from all over the nation. (c) (i) Note that the students with the prep course are different from the original Florida students and it is hence reasonable to assume that there is no correlation between the two groups. Let E[old: score w/o prep] = µ wo and E[new: score with prep] = µ w. We want to compute a 95 % confidence interval for = µ w µ wo, i.e. the 5

change in the average test score. Note that = X new:w X old:wo = 1019 1013 = 6 and since we can assume the scores of the two groups to be uncorrelated, sd( ) = sd 2 new:w/n new:w + sd 2 old:wo /n old:wo = 108 2 /453 + 95 2 /503 6.61 Therefore, a 95 % confidence interval for the change is [ ± 1.96 sd( )] = [ 6.96, 18.96]. (ii) Note that 0 in inside this confidence interval (and so are some negative values). However, the confidence interval also contains positive values, i.e. an increase in average score. Hence we cannot reject at the 5 % level the null hypothesis that the prep course had no effect. But we also cannot reject at the 5 % level that the increase in average score is equal to some positive value ranging from 0 to 18.96. (d) (i) A 95 % confidence interval will be [(change mean) ± 1.96 (change sd)/ n] = [3.47, 14.53]. (ii) Of course. Note that no-negative or zero value is included in the confidence interval. (iii) It will be more effective to divide the 453 students randomly into two groups (of roughly equal size) and ask one group to take the test the second time without the prep course and the other group to take the test after a prep course. We will discuss more on this in the class. 6