Chapter 8: Interval Estimates and Hypothesis Testing


 Spencer Sharp
 2 years ago
 Views:
Transcription
1 Chapter 8: Interval Estimates and Hypothesis Testing Chapter 8 Outline Clint s Assignment: Taking Stock Estimate Reliability: Interval Estimate Question o Normal Distribution versus the Student tdistribution: One Last Complication o Assessing the Reliability of a Coefficient Estimate: Applying the Student tdistribution Theory Assessment: Hypothesis Testing o Motivating Hypothesis Testing: The Cynic o Formalizing Hypothesis Testing: The Steps Summary: The Ordinary Least Squares (OLS) Estimation Procedure o Regression Model and the Role of the Error Term o Standard Ordinary Least Squares (OLS) Premises o Ordinary Least Squares (OLS) Estimation Procedure: Three Important Estimation Procedures o Properties of the Ordinary Least Squares (OLS) Estimation Procedure and the Standard Ordinary Least Square (OLS) Premises Each estimation procedure is unbiased. The estimation procedure for the coefficient value is the best linear unbiased estimation procedure (BLUE). Causation versus Correlation Chapter 8 Prep Questions 1. Run the following simulation and answer the questions posed. Summarize your answers by filling in the following blanks: [Link to MITLab 8P.1 goes here.] Actual Values From To Repetitions Between β x Var[e] Value Value From and to Values % % % 2. In the simulation you just ran (Question 1):
2 2 a. Using the appropriate equation, compute the variance of the coefficient estimate s probability distribution? b. What is the standard deviation of the coefficient estimate s probability distribution? c. Using the Normal distribution s rules of thumb, what is the probability that the coefficient estimate in one repetition would lie between: 1) 1.5 and 2.5? 2) 1.0 and 3.0? 3).5 and 3.5? d. Are your answers to part c consistent with your simulation results? 3. Recall the normal distribution. a. What is the definition of the normal distribution s z? b. Consider the regression results from Professor Lord s first quiz: Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE tstatistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Recall the regression results from Professor Lord s first quiz: Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE tstatistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression a. Does the positive coefficient estimate suggest that studying more will improve a student s quiz score? Explain. Consider the views of a cynic: Cynic s view: Studying has no impact on a student s quiz score; the positive coefficient estimate obtained from the first quiz was just the luck of the draw. In fact, studying does not affect quiz scores. b. If the cynic were correct and studying has no impact on quiz scores, what would the actual coefficient, β x, equal?
3 3 c. Is it possible that the cynic is correct? To help you answer this question, run the following simulation: [Link to MITLab 8P.4 goes here.] Clint s Assignment: Taking Stock We shall begin by taking stock of where Clint stands. Recall the theory he must assess. Theory: Additional studying increases quiz scores. Clint s assignment is to assess the effect of studying on quiz scores: Project: Use data from Professor Lord s first quiz to assess the effect of studying on quiz scores. Clint uses a simple regression model to assess the theory. Quiz score is the dependent variable and number of minutes studied is the explanatory variable: y t = β Const + β x x t + e t where y t = Quiz score of student t x t = Minutes studied by student t e t = Error term for student t β Const and β x are the model s parameters. They incorporate the view that Professor Lord awards each student some points just for showing up; subsequently, the number of additional points each student earns depends on how much he/she studied: β Const represents the number of points Professor Lord gives a student just for showing up. β x represents the number of additional points earned for each additional minute of study. Since the values of β Const and β x are not observable, Clint adopted the econometrician s philosophy: Econometrician s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. Clint used the results of the first quiz to estimate the values of β Const and β x by applying the ordinary least squares (OLS) estimation procedure to find the best fitting line: First Quiz Data T ( y y)( x x) Student x y b t t t= 1 x = = = = T ( xt x) t= 1 6 bconst = y bx x = 81 15= 81 18= 63 5
4 4 Clint s estimates suggest that Professor Lord gives each student 63 points for showing up; subsequently, each student earns 1.2 additional points for each additional minute studied. Clint realizes that he cannot expect the coefficient estimate to equal the actual value; in fact, he is all but certain that it will not. So now, Clint must address two related issues: Estimate Reliability: How reliable is the coefficient estimate, 1.2, calculated from the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Theory Assessment: How confident should Clint be that the theory is correct, that studying improves quiz scores? We shall address both of these issues in this chapter. First, we consider estimate reliability. Estimate Reliability: Interval Estimate Question The interval estimate question quantifies the notion of reliability: Interval Estimate Question: What is the probability that the estimate, 1.20, lies within of the actual value? The general properties of the ordinary least squares (OLS) estimation procedure allow us to address this question. It is important to distinguish between the general properties and one specific application. Recall that the general properties refer to what we know about the estimation procedure before the quiz is given; the specific application refers to the numerical values of the estimates calculated from the results of the first quiz:
5 5 General Properties versus One Specific Application OLS Estimation Procedure: Estimate β Const and β x by Apply the estimation finding the b Const and b x that procedure once to the first minimize the sum of squared quiz s data: residuals Model: Before experiment y t = β Const + β x x t + e t After experiment Random Variable: Probability Distribution Mean[b x ] = β x Var[ e] Var[ b x ] = T ( x x) t= 1 t 2 b x OLS equations: = T t= 1 Const ( y y)( x x) T t t= 1 ( x x) b = y b x t x t 2 Estimate: Numerical Value b x = = = b Const = 81 15= 63 5 Mean and variance describe the center and spread of the estimate s probability distribution The estimates are random variables and a quiz can be viewed as an experiment. We cannot determine the numerical value of an estimate with certainty before the experiment (quiz) is conducted. What then do we know beforehand? We can describe the probability distribution of the estimate. We know that the mean of the coefficient estimate s probability distribution equals the actual value of the coefficient and its variance equals the variance of the error term s probability distribution divided by the sum of squared x deviations: Mean of Estimate s Probability Variance of Estimate s Distribution Equals Actual Value Probability Distribution Estimation Procedure Is Unbiased Determines the Reliability of the Estimate As Variance Decreases Reliability Increases
6 6 Both the mean and variance of the coefficient estimate s probability distribution play a crucial role: Since the mean of the coefficient estimate s probability distribution, Mean[b x ], equals the actual value of the coefficient, β x, the estimation procedure is unbiased; the estimation procedure does not systematically underestimate or overestimate the actual coefficient value. When the estimation procedure for the coefficient value is unbiased, the variance of the estimate s probability distribution, Var[b x ], determines the reliability of the estimate; as the variance decreases, the probability distribution becomes more tightly cropped around the actual value; consequently, it becomes more likely for the coefficient estimate to be close to the actual coefficient value. To assess his estimate s reliability, Clint must consider the variance of the coefficient estimate s probability distribution. But we learned that Clint can never determine the actual variance of the error term s probability distribution, Var[e]. Instead, Clint adopts a two step strategy for estimating the variance of the coefficient estimate s probability distribution: Step 1: Estimate the variance of the error term s Step 2: Apply the relationship between the probability distribution from the available variances of coefficient estimate s and information data from the first quiz error term s probability distributions EstVar[ e] = AdjVar[ Res's] Var[ e] Var[ b x ] = T SSR 54 2 = = = 54 ( xt x) Degrees of Freedom 1 t= 1 é ã EstVar[ e] 54 EstVar[ b x ] = = =.27 T ( x x) EstSD[ bx] = EstVar[ b x] =.27 =.5196 Unfortunately, there is one last complication before we can address the interval estimate question. t= 1 t
7 7 Normal Distribution versus the Student tdistribution: One Last Complication We begin by reviewing the normal distribution. Recall that the variable z played a critical role in using the normal distribution: Value of Random Variable Distribution Mean z = Distribution Standard Deviation = Number of Standard Deviations from the Mean In words, z equals the number of standard deviations the value lies from the mean. But Clint does not know what the variance and standard deviation of the coefficient estimate s probability distribution equal. That is why he must estimate them. Consequently, he cannot use the normal distribution to calculate probabilities. When the standard deviation is not known and must be estimated, the Student tdistribution must be used. The variable t is similar to the variable z; instead of equaling the number of standard deviations the value lies from the mean, t equals the number of estimated standard deviations the value lies from the mean: Value of Random Variable Distribution Mean t = Estimated Distribution Standard Deviation = Number of Estimated Standard Deviations from the Mean Recall that the estimated standard deviation is called the standard error; hence, Value of Random Variable Distribution Mean t = Standard Error = Number of Standard Errors from the Distribution Mean Probability Distribution of Random Variable Normal Student t Distribution Mean Figure 8.1: Normal and Student tdistributions Value of Random Variable
8 8 Like the normal distribution, the tdistribution is symmetric about its mean. Since estimating the standard deviation introduces an additional element of uncertainty, the Student tdistribution is more spread out than the normal distribution as illustrated in Figure 8.1. The Student tdistribution s spread depends on the degrees of freedom. As the number of degrees of freedom increase, we have more information; consequently, the tdistribution s spread decreases, moving it closer and closer to the normal distribution. Since the spread of the Student tdistribution depends on the degrees of freedom, the table describing the Student tdistribution is more cumbersome than the normal distribution table. Fortunately, our Econometrics Lab allows us to avoid the cumbersome Student tdistribution table. Assessing the Reliability of a Coefficient Estimate How reliable is the coefficient estimate, 1.2, calculated from the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? The interval estimate question to address this question: Interval Estimate Question: What is the probability that the coefficient estimate, 1.2, lies within of the actual coefficient value? We begin by filling in the first blank, choosing our close to value. The value we choose depends on how demanding we are; that is, our close to value depends on the range that we consider to be close to the actual value. For purposes of illustration, we shall choose 1.5; so we write 1.5 in the first blank. Interval Estimate Question: What is the probability that the coefficient estimate, 1.2, lies within 1.5 of the actual coefficient value? Figure 8.2 illustrates the probability distribution of the coefficient estimate and the probability that we wish to calculate. The estimation procedure we used to calculate the coefficient estimate, the ordinary least squares (OLS) estimation procedure is unbiased: Mean[b x ] = β x Consequently, we place the actual coefficient value, β x, at the center of the probability distribution.
9 9 Probability that the estimate is within 1.5 of the actual value Student tdistribution Mean[b x ] = β x b x β x 1.5 Actual Value = β x β x +1.5 Figure 8.2: Probability Distribution of Coefficient Estimate Close To Value Equals 1.5 As discussed above, we must use the Student tdistribution rather than the normal distribution since we must estimate the standard deviation of the probability distribution. The regression results from Professor Lord s first quiz provide the estimate: Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE tstatistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Estimated Equation: Esty = x Interpretation of Estimates: b Const = 63: Students receive 63 points for showing up. b x = 1.2: Students receive 1.2 additional points for each additional minute studied. Critical Result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate, suggests that additional studying increases quiz scores. This evidence lends support to our theory. Table 8.1: Quiz Scores Regression Results
10 10 The standard error equals the estimated standard deviation. t equals the number of standard errors (estimated standard deviations) that the value lies from the distribution mean: t Value of Random Variable Distribution Mean = Standard Error = Number of Standard Errors from the Distribution Mean Since the distribution mean equals the actual value, we can translate 1.5 below and above the actual value into t s. Since the standard error equals.5196, 1.5 below and above the actual value translates into 2.89 standard errors below and above the actual value: 1.5 below actual value 1.5 above actual value = 2.89 SE's below actual value = 2.89 SE's above actual value To summarize, The probability that the estimate lies within 1.5 of the actual value. = The probability that the estimate lies within 2.89 SE s of the actual value. That is, between t s of 2.89 and 2.89 Figure 8.3 adds this information to the probability distribution graph. Probability that the estimate is within 1.5 of the actual value Student tdistribution Mean[b x ] = β x SE[b x ] = SE s 2.89 SE s β x 1.5 β x +1.5 Actual Value = β x t = 2.89 t = 2.89 Figure 8.3: Probability Distribution of Coefficient Estimate Close To Value Equals 1.5 b x
11 11 Econometrics Lab 8.1: Calculate Prob[Results IF H 0 True]. We can now use the Econometrics Lab to calculate the probability that the estimate is within 1.5 of the actual value by computing probabilities the left and right tails probabilities. 1 Left Tail: [Link to MITLab 8.1a goes here.] o The following information has been entered: Degrees of freedom: 1 t: 2.89 o Click Calculate. The left tail probability is approximately.11. Right Tail: [Link to MITLab 8.1b goes here.] o The following information has been entered: Degrees of freedom: 1 t: 2.89 o Click Calculate. The right tail probability is approximately.11. Since the Student tdistribution is symmetric, both the left and right tail probabilities equal.11. Hence, the probability that the estimate is within 1.5 of the actual value equals.78: 1.00 ( ) = Probability that the estimate is within 1.5 of the actual value Student tdistribution Mean[b x ] = β x SE[b x ] = SE s 2.89 SE s β x 1.5 β x +1.5 Actual Value = β t = 2.89 x t = 2.89 Figure 8.4: Probability Distribution of Coefficient Estimate Applying Student t Distribution b x
12 12 We can now fill in the second blank in the interval estimate question: Interval Estimate Question: What is the probability that the coefficient estimate, 1.2, lies within 1.5 of the actual coefficient value?.78 We shall turn our attention to assessing the theory. Theory Assessment: Hypothesis Testing Hypothesis testing allows Clint to assess how much confidence he should have in the theory. We begin by motivating hypothesis testing using the same approach as we took with Clint s opinion poll. We shall play the role of the cynic. Then, we shall formalize the process. Motivating Hypothesis Testing: The Cynic Recall that the theory suggests that a student s score on the quiz depends on the number of minutes he/she studies: Theory: Additional studying increases scores. Review the regression model: y t = β Const + β x x t + e t The theory suggests that β x is positive. Review the regression results for the first quiz: Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE tstatistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Estimated Equation: Esty = x Interpretation of Estimates: b Const = 63: Students receive 63 points for showing up. b x = 1.2: Students receive 1.2 additional points for each additional minute studied. Critical Result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate, suggests that additional studying increases quiz scores. This evidence lends support to our theory. Table 8.2: Quiz Scores Regression Results
13 13 The estimate for β x, 1.2, is positive. We estimate that an additional minute of studying increases a student s quiz score by 1.2 points. This lends support to Clint s theory. But, how much confidence should Clint have in the theory? Does this provide definitive evidence that Clint s theory is correct or should we be skeptical? If β x = 0 Prob[b x > 0].50 0 Figure 8.5: Probability Distribution of Coefficient Estimate Could the Cynic Be Correct? To answer this question, recall our earlier hypothesis testing discussion and play the cynic. What would a cynic s view of our theory and the regression results be? Cynic s view: Studying has no impact on a student s quiz score; the positive coefficient estimate obtained from the first quiz was just the luck of the draw. In fact, studying has no effect on quiz scores; the actual coefficient, β x, equals 0. Is it possible that our cynic is correct? b x
14 14 Econometrics Lab 8.1: Assessing the Cynic s View We shall use our Could the Cynic Be Correct? simulation to show that it is. A positive coefficient estimate can arise in one repetition of the experiment even when the actual coefficient is 0. [Link to MITLab 8.2 goes here.] In the simulation, the default actual coefficient value is 0. Check the FromTo checkbox. Also, 0 is specified in the From list. In the To list, no value is specified; consequently, there is no upper FromTo bound. The FromTo Percent box will report the percent of repetitions in which the coefficient estimate equals 0 or more. Be certain that the Pause checkbox is cleared. Click Start and then after many, many repetitions, click Stop. In about half the repetitions, the coefficient estimate is positive; that is, when the actual coefficient, β x, equals 0, the estimate is positive about half the time. The histogram illustrates this. Now, we can apply the relative frequency interpretation of probability. If the actual coefficient were 0, the probability of obtaining a positive coefficient from one quiz would be about onehalf. Consequently, we cannot dismiss the cynic s view as absurd. To assess the cynic s view, we pose the following question: Question for the Cynic: What is the probability that the result would be like the one obtained (or even stronger), if studying actually has no impact on quiz scores? That is, what is the probability that the coefficient estimate from the first quiz would be 1.2 or more, if studying had no impact on quiz scores (if the actual coefficient, β x, equals 0)? Answer: Prob[Results IF Cynic Correct].
15 15 The magnitude of the probability determines the likelihood that the cynic is correct, the likelihood that studying has no impact on quiz scores: Prob[Results IF Cynic Correct] small Prob[Results IF Cynic Correct] large Unlikely that the cynic is correct Likely that the cynic is correct Unlikely that the studying has no impact Likely that the studying has no impact To compute this probability let us review what we know about the probability distribution of the coefficient estimate: OLS estimation If H 0 Standard Number of Number of procedure unbiased true error observations parameters é ã é ã Mean[ b x ] = β x = 0 SE[b x ] =.5196 DF = 3 2 = 1 Question for the Cynic: What is the probability that the coefficient estimate from the first quiz would be 1.2 or more, if studying had no impact on quiz scores (if the actual coefficient, β x, equaled 0)? Student tdistribution Mean = 0 SE =.5196 DF = Figure 8.6: Probability Distribution of Coefficient Estimate Prob[Results IF Cynic Correct] How can we answer this question? We turn to the Econometrics Lab. b x
16 16 Econometrics Lab 8.2: Using the Econometrics Lab to Calculate Prob[Results IF Cynic Correct] [Link to MITLab 8.3 goes here.] The appropriate information has been entered: Mean: 0 Value: 1.2 Standard Error:.5196 Degrees of Freedom: 1 Click Calculate. The probability that the estimate lies in the right tail equals.13. The answer to the question for the cynic is.13: Answer: Prob[Results IF Cynic Correct] =.13 In fact, there is an even easier way to compute the probability. We do not even need to use the Econometrics Lab to because the statistical software calculates this probability automatically. To illustrate this, we shall first calculate the tstatistic based on the premise that the cynic is correct, based on the premise that the actual value of the coefficient equals 0: Value of Random Variable Distribution Mean t = = = Standard Error.5196 = Number of Standard Errors from the Distribution Mean 1.2 lies standard errors from 0. Next, return to the regression results and focus attention on the row corresponding to the coefficient and on the tstatistic and Prob column. Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE tstatistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Estimated Equation: Esty = x Interpretation of Estimates: b Const = 63: Students receive 63 points for showing up. b x = 1.2: Students receive 1.2 additional points for each additional minute studied. Critical Result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate, suggests that additional studying increases quiz scores. This evidence lends support to our theory. Table 8.3: Quiz Scores Regression Results
17 17 Two interesting observations emerge: First, the tstatistic column equals 2.309, the value of the tstatistic we just calculated; the tstatistic based on the premise that the cynic is correct and the actual coefficient equals 0. The tstatistic column reports the number of standard errors the coefficient estimate based on the premise that the actual coefficient equals 0. Second, the Prob column equals This is just twice the probability we just calculated using the Econometrics Lab: 2 Prob[Results IF Cynic Correct] = Prob Column 2.13 =.26 Student tdistribution Mean = 0 SE =.5196 DF = /2.2601/ Figure 8.7: Probability Distribution of Coefficient Estimate Tails Probability The Prob column is based on the premise that the actual coefficient equals 0 and then focuses on the two tails of the probability distribution where each tail begins 1.2 (the numerical value of the coefficient estimate) from 0. As Figure 8.7 illustrates, the value in the Prob column equals the probability of lying in the tails; the probability that the estimate resulting from one week s quiz lies at least 1.2 from 0 assuming that the actual coefficient, β x, equals 0. That is, the Prob column reports the tails probability: Tails Probability: The probability that the coefficient estimate, b x, resulting from one regression would lie at least 1.2 from 0 based on the premise that the actual coefficient, β x, equals 0. Consequently, we do not need to use the Econometrics Lab to answer the question that we pose for the cynic: Question for the Cynic: What is the probability that the coefficient estimate from the first quiz is 1.2 or more, if studying had no impact on quiz scores (if the actual coefficient, β x, equals 0)? Answer: Prob[Results IF Cynic Correct] b x
18 18 Student tdistribution Mean = 0 SE =.5196 DF = / Figure 8.8: Probability Distribution of Coefficient Estimate Prob[Results IF Cynic Correct] We can use the regression results to answer this question. From the Prob column we know that the tails probability equals We are only interested in the right tail, however, the probability that the coefficient estimate will equal 1.2 or more, if the actual coefficient equals 0. Since the Student tdistribution is symmetric, the probability of lying in one of the tails is The answer to the 2 question we posed to assess the cynic s view is.13: Tails Probability.2601 Prob[Results IF Cynic Correct] = = b x
19 19 Formalizing Hypothesis Testing: The Steps We formalized hypothesis testing in Chapter 4 when we considered Clint s public opinion poll. We shall follow the same steps here, with one exception. We add a Step 0 to construct an appropriate model to assess the theory. Theory: Additional studying increases quiz scores. Step 0: Formulate a model reflecting the theory to be tested. We have already constructed this model: y t = β Const + β x x t + e t y t = Quiz score β Const reflects points for showing up x t = Minutes studied β x reflects points for each minute studied e t = Error term The theory suggests that β x is positive. Step 1: Collect data, run the regression, and interpret the estimates. First Quiz Data Student x y b Const = Estimated points for showing up = 63 b x = Estimated points for each minute studied = 1.2 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE tstatistic Prob x Const Number of Observations 3 Sum Squared Residuals SE of Regression Estimated Equation: Esty = x Interpretation of Estimates: b Const = 63: Students receive 63 points for showing up. b x = 1.2: Students receive 1.2 additional points for each additional minute studied. Critical Result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate, suggests that additional studying increases quiz scores. This evidence lends support to our theory. Table 8.4: Quiz Scores Regression Results
20 20 Step 2: Play the cynic and challenge the results; construct the null and alternative hypotheses. Cynic s view: Despite the results, studying has no impact on quiz scores. The results were just the luck of the draw. Now, we construct the null and alternative hypotheses. Like the cynic, the null hypothesis challenges the evidence; the alternative hypothesis is consistent with the evidence: H 0 : β x = 0 Cynic is correct: Studying has no impact on a student s quiz score. H 1 : β x > 0 Cynic is incorrect: Additional studying increases quiz scores. Step 3: Formulate the question to assess the cynic s view and the null hypothesis. Question for the Cynic: Generic Question: What is the probability that the results would be like those we actually obtained (or even stronger), if the cynic is correct and studying actually has no impact? Specific Question: The regression s coefficient estimate was 1.2: What is the probability that the coefficient estimate in one regression would be 1.2 or more if H 0 were actually true (if the actual coefficient, β x, equals 0)? Answer: Prob[Results IF Cynic Correct] or Prob[Results IF H 0 True] The magnitude of this probability determines whether we reject the null hypothesis: Prob[Results IF H 0 True] small Prob[Results IF H 0 True] large Unlikely that H 0 is true Likely that H 0 is true Reject H 0 Do not reject H 0
21 21 Step 4: Use the general properties of the estimation procedure, the probability distribution of the estimate, to calculate Prob[Results IF H 0 True]. OLS estimation If H 0 Standard Number of Number of procedure unbiased true error observations parameters é ã é ã Mean[b x ] = β x = 0 SE[b x ] =.5196 DF = 3 2 = 1 We have already calculated this probability. First, we did so using the Econometrics Lab. Then, we noted that the statistical software had done so automatically. We need only divide the tails probability, as reported in the Prob column of the regression results, by 2:.2601 Prob[Results IF H 0 True] =.13 2 The probability that the coefficient estimate in one regression would be 1.2 or more if H 0 were actually true (if the actual coefficient, β x, equals 0) is.13. Step 5: Decide on the standard of proof, a significance level. The significance level is the dividing line between the probability being small and the probability being large. Prob[Results IF H 0 True] Prob[Results IF H 0 True] less than significance level greater than significance level Prob[Results IF H 0 True] small Prob[Results IF H 0 True] large Unlikely that H 0 is true Likely that H 0 is true Reject H 0 Do not reject H 0 Recall that the traditional significant levels used in academia are 1, 5, and 10 percent. Obviously,.13 is greater than.10. Consequently, Clint would not reject the null hypothesis that studying has no impact on quiz scores even with a 10 percent significance level.
22 22 Summary: The Ordinary Least Squares (OLS) Estimation Procedure Regression Model and the Role of the Error Term Now, let us sum up what we have learned about the ordinary least squares (OLS) estimation procedure: y t = β Const + β x x t + e t y t = Dependent variable x t = Explanatory variable e t = Error term t = 1, 2,, T T = Sample size The error term is a random variable; it represents random influences. The mean of the each error term s probability distribution equals 0: Mean[e t ] = 0 For each t = 1, 2,, T Standard Ordinary Least Squares (OLS) Premises Error Term Equal Variance Premise: The variance of the error term s probability distribution for each observation is the same; all the variances equal Var[e]: Var[e 1 ] = Var[e 2 ] = = Var[e T ] = Var[e] Error Term/Error Term Independence Premise: The error terms are independent: Cov[e i, e j ] = 0. Knowing the value of the error term from one observation does not help us predict the value of the error term for any other observation. Explanatory Variable/Error Term Independence Premise: The explanatory variables, the x t s, and the error terms, the e t s, are not correlated. Knowing the value of an observation s explanatory variable does not help us predict the value of that observation s error term.
23 23 Ordinary Least Squares (OLS) Estimation Procedure: Three Important Estimation Procedure There are three important estimation procedures embedded within the ordinary least squares (OLS) estimation procedures. A procedure to estimate the Values of the regression parameters, β x and β Const : T ( y y)( x x) b b y b x t t t= 1 x = and Const = T x 2 ( xt x) t= 1 Variance of the error term s probability distribution, Var[e]: SSR EstVar[ e ] = Degrees of Freedom Variance of the coefficient estimate s probability distribution, Var[b x ]: EstVar[ e] EstVar[ b x ] = T 2 ( x x) t= 1 t Properties of the Ordinary Least Squares (OLS) Estimation Procedure and the Standard Ordinary Least Square (OLS) Premises When the standard ordinary least square (OLS) premises are met: Each estimation procedure is unbiased; each estimation procedure does not systematically underestimate or overestimate the actual value. The ordinary least squares (OLS) estimation procedure for the coefficient value is the best linear unbiased estimation procedure (BLUE).
24 Causation versus Correlation Our theory and Step 0 illustrate the important distinction between causation and correlation: Theory: Additional studying increases quiz scores. Step 0: Formulate a model reflecting the theory to be tested. y t = β Const + β x x t + e t y t = Quiz score β Const reflects points for showing up x t = Minutes studied β x reflects points for each minute studied e t = Error term The theory suggests that β x is positive. Our model is a causal model. An increase in studying causes a student s quiz score to increase: Increase in studying (x t ) Causes Quiz score to increase (y t ) Correlation results whenever a causal relationship describes the reality accurately. That is, when additional studying indeed increases quiz scores, studying and quiz scores will be (positively) correlated: Knowing the number of minutes a student studies allows us to predict his/her quiz score. Knowing a student s quiz score helps us predict the number of minutes he/she has studied. More generally, a causal model that describes reality accurately implies correlation: Causation Implies Correlation Beware that correlation need not imply causation, however. For example, consider precipitation in the Twin Cities, precipitation in Minneapolis and precipitation in St Paul. Since the cities are near each other precipitation in the two cities are highly correlated. When it rains in Minneapolis it also always rains in St Paul also and vice versa. But there is no causation involved here. Rain in Minneapolis does not cause rain in St. Paul nor does rain in St. Paul cause rain in Minneapolis. The rain is caused by the weather system moving over the cities. In general, the correlation of two variables need not imply that a causal relationship exists between the variables: Correlation Need Not Imply Causation 24
25 25 Appendix 8.1: Student tdistribution Table Right Tail Critical Values α: Right Tail Probability 0 Figure 8.9: Student tdistribution Right Tail Probabilities Degrees of Freedom α = 0.10 α = 0.05 α = α = 0.01 α = t
26 Table 8.5: Right Tail Critical Values for the Student tdistribution 26
27 27 Appendix 8.2 Assessing the Reliability of a Coefficient Estimate Using the Student tdistribution Table We begin by describing the Student tdistribution table; a portion of it appears in Table 8.6: Degrees of Freedom α = 0.10 α = 0.05 α = α = 0.01 α = Table 8.6: Right Tail Critical Values for the Student tdistribution The first column represents the degrees of freedom. The numbers in the body of the table are called the critical values. A critical value equals the number of standard errors a value lies from the mean. The top row specifies α s value of, the right tail probability. Since the tdistribution is symmetric, the left tail probability also equals α. The probability of lying within the tails, in the center of the distribution, is 1 2α. This no doubt sounds confusing, but everything should become clear after we show how Clint can use this table to answer the interval estimate question. 1 2α Student tdistribution α α Critical Value SE Critical Value SE Distribution Mean Estimate Figure 8.10: Student tdistribution Illustrating the Probabilities Interval Estimate Question: What is the probability that the estimate, 1.2, lies within of the actual value? Let us review the regression results from Professor Lord s first quiz: Coefficient Estimate = b x = 1.2 Standard Error of Coefficient Estimate = SE[b x ] =.5196
28 28 Next, we shall modify Figure 8.10 to reflect our specific example. Focus on Figure We are interested in the coefficient estimate; consequently, we replace the horizontal axis label by substituting b x for Estimate. Also, we know that the estimation procedure Clint uses, the ordinary least squares (OLS) estimation procedure, is unbiased; hence, the distribution mean equals the actual value. We can replace the Distribution Mean with the actual coefficient value, β x. 1 2α Student tdistribution α α Critical Value SE Critical Value SE b x β x Figure 8.11: Student tdistribution Illustrating the Probabilities for Coefficient Estimate Now, let us help Clint fill in the blanks. When using the table we begin by filling in the second blank rather than the first. Second Blank: Choose α to specify the tail probability. Clint must choose a value for α. As we shall see, the value he chooses depends on how demanding he is. For example, suppose that Clint believes that a.80 probability of the estimate lying in the center of the distribution, close to the mean, is good enough. He would then choose an α equal to.10. To understand why, note that when α equals.10, the probability of the estimate lying in the right tail would be.10. Since the tdistribution is symmetric, the probability of the estimate lying in the left tail would be.10 also. Therefore, the probability that the estimate lies in the center of the distribution would be.80; accordingly, we write.80 in the second blank. What is the probability that the estimate, 1.2, lies within of the actual value?.80 First Blank: Calculate tail boundaries.
29 29 The first blank quantifies what close to means. The standard error and the Student tdistribution table allow us to fill in the first blank. To do so, we begin by calculating the degrees of freedom. Recall that the degrees of freedom equal 1: Degrees of Freedom = Sample Size = 3 2 = 1 Number of Estimated Parameters Degrees of Freedom α = 0.10 α = 0.05 α = α = 0.01 α = Table 8.7: Right Tail Critical Values for the Student tdistribution α Equals 0.10 and Degrees of Freedom Equals 1 Clint chose a value of α equal to.10. The table indicates that the critical value for α =.10 with 1 degree of freedom is The probability that the estimate falls within standard errors of the mean is.80. Next, the regression results report that the standard error equals.5196: SE[b x ] =.5196 After multiplying the critical value given in the table, 3.078, by the standard error,.5196, we can fill in the first blank: = Student tdistribution Critical Value SE Critical Value SE b x = = 1.6 β x 1.6 β x β x +1.6 Figure 8.12: Student tdistribution Calculations for an α Equal to.10
30 30 What is the probability that the estimate, 1.2, lies within 1.6 of the actual value?.80 1 Appendix 8.2 shows how we can use the Student tdistribution table to address the interval estimate question. Since the table is cumbersome we shall use the Econometrics Lab to do so.
Inferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationHypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam
Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationOutline. Correlation & Regression, III. Review. Relationship between r and regression
Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationMultiple Hypothesis Testing: The Ftest
Multiple Hypothesis Testing: The Ftest Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationRegression stepbystep using Microsoft Excel
Step 1: Regression stepbystep using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationAP Statistics 2001 Solutions and Scoring Guidelines
AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationBivariate Regression Analysis. The beginning of many types of regression
Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression
More informationStandard Deviation Calculator
CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationChapter 9. Section Correlation
Chapter 9 Section 9.1  Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationSimple Regression and Correlation
Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas
More informationDEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests
DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x  x) B. x 3 x C. 3x  x D. x  3x 2) Write the following as an algebraic expression
More informationPearson s Correlation
Pearson s Correlation Correlation the degree to which two variables are associated (covary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the
More informationTHE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.
THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM
More informationStatistics 112 Regression Cheatsheet Section 1B  Ryan Rosario
Statistics 112 Regression Cheatsheet Section 1B  Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationEstimation and Inference in Cointegration Models Economics 582
Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there
More informationMeasuring the Power of a Test
Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection
More informationChapter 15 Multiple Choice Questions (The answers are provided after the last question.)
Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately
More informationEviews Tutorial. File New Workfile. Start observation End observation Annual
APS 425 Professor G. William Schwert Advanced Managerial Data Analysis CS3110L, 5852752470 Fax: 5854615475 email: schwert@schwert.ssb.rochester.edu Eviews Tutorial 1. Creating a Workfile: First you
More information6 3 The Standard Normal Distribution
290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationPASS Sample Size Software
Chapter 250 Introduction The Chisquare test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationChapter 7 Part 2. Hypothesis testing Power
Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship
More informationChapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationOneWay Analysis of Variance
OneWay Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationChapter 5: Basic Statistics and Hypothesis Testing
Chapter 5: Basic Statistics and Hypothesis Testing In this chapter: 1. Viewing the tvalue from an OLS regression (UE 5.2.1) 2. Calculating critical tvalues and applying the decision rule (UE 5.2.2) 3.
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationHypothesis Testing or How to Decide to Decide Edpsy 580
Hypothesis Testing or How to Decide to Decide Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at UrbanaChampaign Hypothesis Testing or How to Decide to Decide
More informationLesson Lesson Outline Outline
Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationOdds ratio, Odds ratio test for independence, chisquared statistic.
Odds ratio, Odds ratio test for independence, chisquared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
ttests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationNovember 08, 2010. 155S8.6_3 Testing a Claim About a Standard Deviation or Variance
Chapter 8 Hypothesis Testing 8 1 Review and Preview 8 2 Basics of Hypothesis Testing 8 3 Testing a Claim about a Proportion 8 4 Testing a Claim About a Mean: σ Known 8 5 Testing a Claim About a Mean: σ
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationInstrumental Variables & 2SLS
Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20  Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental
More informationLecture 2: Simple Linear Regression
DMBA: Statistics Lecture 2: Simple Linear Regression Least Squares, SLR properties, Inference, and Forecasting Carlos Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching
More informationHypothesis Testing. Bluman Chapter 8
CHAPTER 8 Learning Objectives C H A P T E R E I G H T Hypothesis Testing 1 Outline 81 Steps in Traditional Method 82 z Test for a Mean 83 t Test for a Mean 84 z Test for a Proportion 85 2 Test for
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationUnit 29 ChiSquare GoodnessofFit Test
Unit 29 ChiSquare GoodnessofFit Test Objectives: To perform the chisquare hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni
More informationTesting for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.unimuenchen.de/2962/ MPRA Paper No. 2962, posted
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationttests and Ftests in regression
ttests and Ftests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and Ftests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R
More informationStatistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More information, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.
BA 275 Review Problems  Week 9 (11/20/0611/24/06) CD Lessons: 69, 70, 1620 Textbook: pp. 520528, 111124, 133141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An
More informationHypothesis Testing for Two Variances
Hypothesis Testing for Two Variances The standard version of the twosample t test is used when the variances of the underlying populations are either known or assumed to be equal In other situations,
More informationCopyright 2013 by Laura Schultz. All rights reserved. Page 1 of 6
Using Your TINSpire Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.
More informationHypothesis Testing (unknown σ)
Hypothesis Testing (unknown σ) Business Statistics Recall: Plan for Today Null and Alternative Hypotheses Types of errors: type I, type II Types of correct decisions: type A, type B Level of Significance
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationSampling and Hypothesis Testing
Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationTesting for serial correlation in linear paneldata models
The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear paneldata models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear paneldata
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More informationInstrumental Variables & 2SLS
Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20  Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental
More informationRegression analysis in practice with GRETL
Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that
More informationModule 5 Hypotheses Tests: Comparing Two Groups
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
More informationChapter 7. Oneway ANOVA
Chapter 7 Oneway ANOVA Oneway ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The ttest of Chapter 6 looks
More informationAP Statistics 2010 Scoring Guidelines
AP Statistics 2010 Scoring Guidelines The College Board The College Board is a notforprofit membership association whose mission is to connect students to college success and opportunity. Founded in
More informationTwosample hypothesis testing, II 9.07 3/16/2004
Twosample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For twosample tests of the difference in mean, things get a little confusing, here,
More information