" Y. Notation and Equations for Regression Lecture 11/4. Notation:


 Jeffry Powers
 8 years ago
 Views:
Transcription
1 Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through m. So, the predictor variables are X1, X2,, Xm. Y: The outcome variable that is being predicted or explained in a regression ˆ Y : The estimated outcome value as predicted by the regression equation bi: The regression coefficient for predictor Xi b0: The intercept in the regression equation bi : The standard error of a regression coefficient SSY: The total sum of squares for Y, which represents the uncertainty in the outcome SSresidual: The residual sum of squares, representing the uncertainty left over after we use regression to generate the best possible predictions for the outcome SSregression: The sum of squares for the regression, representing the amount of uncertainty or variability that the predictors can explain R 2 : The proportion of variability explained by the regression MSresidual: The residual mean square, which is the mean squared error of the regression prediction, Y ˆ ; also used as an estimate of the population variance, MSregression: The mean square for the regression F: F statistic, which is the test statistic for deciding whether a regression explains meaningful variability in the outcome " Y 2 Regression equation. A regression equation is a formula that uses a set of predictor variables (Xi) to make a prediction for some outcome variable (Y). The prediction is called Y ˆ. The purpose of a regression equation is to find the best possible way to do this, that is, to combine the predictors in a way that gives estimates ( Y ˆ ) that are as close as possible to the right answer (Y). When there s only one predictor (m = 1), regression is essentially the same as correlation, and the regression equation is the same as the regression line we would draw on a scatterplot. The equation for a line always has the form Y ˆ = b0 + b1 X, where b0 and b1 are numbers representing the intercept and the slope. Finding the line that s closest to the data is the same as choosing b0 and b1 so that the predicted Y ˆ values are as close as possible to the true Y values. When there are more than one variable, we take the basic equation for a line and extend it in a natural way.
2 ˆ Y = b 0 + b 1 X 1 + b 2 X b m X m " = b 0 + b i X i i=1 to m (1) The two lines of Equation 1 mean the same thing. The first line writes everything out, and the second line combines things into a sum (you can choose to remember either one). In the sum, the index i takes on all values from 1 to m. When i = 1, the summand is b1x1; when i is 2, the summand is b2x2; and so on until bmxm. Notice that the contribution of each predictor (bixi) is the same as in the equation for a simple line with one predictor. This is why we say that each predictor has a linear effect on the predicted outcome. If we held all of the predictors except one fixed, then the relationship between the remaining predictor and the prediction would be a line. The goal of regression is to find the values of b0 through bm that lead to the best predictions. Therefore, the bs are a main focus of regression. Each bi is called the regression coefficient for its corresponding predictor (e.g., b1 is the regression coefficient for X1). The regression coefficient tells what kind of influence each predictor has on the predicted outcome. When a regression coefficient is positive, that means the predictor has a positive effect, just like with a positive correlation. When a regression coefficient is negative, the predictor has a negative effect, just like with a negative correlation. The magnitude or absolute value of bi tells how strong the effect is. If bi is near zero, then Xi has a weak effect on the outcome, just like with a correlation near zero. If bi is large (either positive or negative), then Xi has a strong effect on the outcome, just like with a correlation near ±1. The difference between regression coefficients and correlations is that regression coefficients aren t standardized, meaning they re not restricted to lie between  1 and 1. Therefore, the strength of a regression coefficient needs to be interpreted in terms of the units of the predictor and outcome variables. In general, bi tells how many units Y ˆ increases by for every unit increase in Xi. For example, if Xi is height (in inches) and Y is how long it takes a person to run a mile (in seconds), bi = 7 would mean that for every extra inch of height, a person tends to take 7 seconds longer to run a mile. A negative coefficient means a decrease; e.g., bi =  5 would mean that for every extra inch of height, a person tends to take 5 seconds less to run a mile. The b0 variable is a special regression coefficient called the intercept. Just like the intercept of a simple line, b0 tells what the value of Y ˆ is when all the Xis are zero (i.e., where the line intersects the Y axis). If zero isn t a sensible value of any of the predictors (e.g., no one is 0 inches tall), then the intercept taken by itself won t be a very sensible value for the outcome (e.g., it could be  50 seconds). Therefore, we generally don t think too hard about what the value of the intercept indicates; we just know that it needs to be in the regression equation so that the overall pattern of predictions can be shifted up or down as needed to match the true values of the outcome. Partitioning sums of squares. One important question with any regression equation is how well it does predicting the outcome. We answer this question in terms of how much variability in Y is explained by the regression, meaning how much uncertainty goes away when we use the regression equation to predict the outcome. Variability or uncertainty in
3 this case is measured in sums of squares (SS), which are very similar to variance and mean squared error, except that we don t divide by degrees of freedom (yet). The total variability in Y is called SSY (sum of squares for Y) and is defined as SSY = (Y MY) 2 (2) As usual, MY represents the mean of the sample Y. Notice that if we divided SSY by n 1, we would have the sample variance of Y. So, the sum of squares for Y is just like the variance for Y except that we don t divide by n 1 (i.e., it s a sum instead of an average). As with variance, we can think of the sum of squares as a measure of uncertainty, or how much error we would expect to make if we had to guess the value of Y blindly. If we have no information about a subject, our best guess of their Y score is the mean, MY. Therefore (Y MY) 2 is our squared error, and SSY is the sum of the squared error over all subjects. If we do know something about a subject, then we can make a better prediction of their Y score than by blindly guessing the mean. This is what the regression equation does for us it uses all the predictors, Xi, to come up with the best possible prediction, Y ˆ. Once we have Y ˆ, we can ask how well it does as an estimate of Y, again by adding up the squared error over all subjects. The result is called the residual sum of squares, because it represents the uncertainty in Y that s left over after we do the regression. Notice that the residual sum of squares is the same as mean squared error except that once again it s a sum instead of a mean, because we don t divide by degrees of freedom. SSresidual = (Y ˆ Y ) 2 (3) As stated above, the goal of regression is to find the values of the regression coefficients (bi) that lead to the best predictions. By best predictions, we mean minimizing the squared difference between Y and Y ˆ. In other words, the goal is to minimize SSresidual. This is how we determine the regression coefficients (or, usually, how a computer determines them for us). The question now is how well the regression did. If the predictors do a really good job of explaining the outcome, SSresidual will be close to zero. If the predictors tell us little or nothing about the outcome, then SSresidual will be close to the original, total sum of squares, SSY. SSresidual is always less than or equal to SSY (because SSY is the error of the naive prediction, MY, and SSresidual is the error of the best possible prediction, Y ˆ ), but the question is how much less. The reduction in uncertainty from SSY to SSresidual is called SSregression, because it s the amount of variability in the outcome (Y) that the regression can explain. SSregression = SSY SSresidual SSY = SSregression + SSresidual (4) The two versions of Equation 4 say the same thing. The first version shows how we get SSregression by subtracting the residual sum of squares from the total sum of squares. The second version shows how the total sum of squares (i.e., the original variability in the data) can be broken into two parts: the portion that we can explain using the predictors (SSregression), and the portion that we cannot explain (SSresidual).
4 Explained variability. Once we ve worked out the total variability (SSY) and the portion explained by the predictors (SSregression), we can calculate their ratio. The ratio is the fraction of the total variability explained by the predictors, and it s our final measure of how useful the regression was. The fraction of explained variability is called R 2, because it extends the idea of the squared correlation, r 2. Recall that when we re doing a simple correlation (i.e., there s only one predictor), r 2 is the fraction of the variance in Y that can be explained by X. In other words, when there s only one predictor, R 2 and r 2 are equal. R 2 is just a more general concept that works when there are multiple predictors. If R 2 is close to 1, then the regression explains most of the variability in Y, meaning that if we know the values of the predictors for some subject then we can confidently predict the outcome for that subject. If R 2 is close to 0, then the predictors don t give us much information about the outcome. (R 2 is always between 0 and 1.) R 2 = SS regression SS Y (5) Hypothesis testing: The effect of one predictor. Once we ve run a regression to find the regression coefficients for a set of predictors, we can ask how reliable the regression coefficients are. The regression coefficients are statistics, meaning they re computed from a sample (for each subject in our sample, we have measured all the Xis and Y). We can use the regression coefficients as estimates of the population, but as with all estimators, they are imperfect. If we gathered a new sample and ran the regression on the new data, we d obtain somewhat different values for the regression coefficients. Therefore, each bi has a sampling distribution, which represents the probabilities of all possible values we could get for bi if we replicated the experiment. Each bi also has a standard error, which as usual is the standard deviation of the sampling distribution. The standard error tells us how reliable our estimate is, meaning how far we can expect it to be from the true population value. We won t go into detail about how the standard errors of the regression coefficients are calculated, because it s much easier to use a computer for this. However, it s important to understand what the standard errors can tell us. First, the standard error can be used to create a confidence interval for each bi, in the same way we create confidence intervals for means. I won t describe the math, but the idea is the same as before: The confidence interval is centered on the actual value of bi obtained from the sample, and the width of the confidence interval is determined by the size of the standard error. The second thing we can use standard errors for is hypothesis testing. In regression, the most common hypotheses to test regard whether the predictors have reliable influences on the population. This corresponds to asking whether the true values of the regression coefficients are different from zero. For each predictor, Xi, the null hypothesis bi = 0 states that Xi has no reliable effect on Y (the alternative hypothesis is bi 0). Notice that there s a separate null hypothesis for each predictor, and we can test each one individually. To test whether bi = 0, we calculate a t statistic equal to bi (the actual regression coefficient we obtained from our sample) divided by its standard error. t = b i " bi (6)
5 Just as with the t statistic for a t test, t for a regression coefficient tells us how large that coefficient is relative to how large it would be expected to be by chance (i.e., according to the null hypothesis that its true value is zero). If t is large (either positive or negative), then bi is larger than would be expected by chance, so chance is not a good explanation for the result that we got. In this case, we reject the null hypothesis and adopt the alternative hypothesis that bi 0 (i.e., that Xi has a real effect on Y). If t is close to zero, then bi fits with what we d expect by chance, so we retain the null hypothesis that bi = 0 (i.e., Xi has no real effect on Y). The t statistic from a regression is used in the same way as in a t test. If t is greater than tcrit, then we reject the null hypothesis. The alternative (and equivalent) approach is to compute a p value, which is the probability of a result as or more extreme than t: p( tdf > t ). This is the formula for a two tailed test, but we can also compute a one tailed p value if the direction of the effect (i.e., the sign of bi) was predicted in advance. In either case, we reject the null hypothesis if p < α. The only remaining information needed to find tcrit or p is the degrees of freedom. As usual, the degrees of freedom for t equals the degrees of freedom for the standard error used to compute t. The standard error for a regression coefficient, " bi, comes from SSresidual, which, as is explained below, has n m 1 degrees of freedom. Therefore, a t test for a regression coefficient uses df = n m 1 (you don t need to memorize this). Hypothesis testing of multiple predictors. Another way to test whether predictors have reliable effects on the outcome is to test whether they explain more variability than would be expected by chance. This can be done with multiple predictors, by comparing how much variability the regression explains with those predictors to how much it explains with those predictors left out. We ll focus on the most common situation, where we want to test whether the full set of predictors, X1 through Xm, collectively tell us anything meaningful about the outcome, Y. As explained above, SSregression represents the amount of variability in Y explained by all of the predictors. We want to test whether SSregression is larger than would be expected by chance. Our null hypothesis is that none of the predictors has any effect on the outcome, meaning that the true values of the regression coefficients (except perhaps the intercept, b0) are all zero. Notice that this is the same null hypothesis used above for testing predictors one at a time, except that now we re testing them all at once, to see whether any predictor gives reliable information about the outcome. Even if all the bis are zero in the population, the regression coefficients we get from our sample will deviate from zero because of sampling variability. This leads the variability explained by the regression, SSregression, to be greater than zero, even though this explained variability is meaningless random error. Therefore, SSregression has a sampling distribution that, as usual, tells us how large it can be expected to be just by chance. Comparing the actual value of SSregression obtained from the sample to the amount expected by chance allows us to test whether chance is a good explanation for the results (H0) or the predictors are explaining something real about the outcome (H1).
6 To test whether SSregression is larger that would be expected by chance, we first divide it by its degrees of freedom to get the mean square, MSregression. MS regression = SS regression df regression (7) According to the null hypothesis that the regression doesn t explain anything real, MSregression has a (modified) chi square distribution, multiplied by " 2 Y, the variance of Y in the population. If we knew " 2 Y then we would know the likelihood function for MSregression exactly. This is the same situation that came up with t tests, where we knew the likelihood function for M except for not knowing σ 2. Once again, we divide by an estimate of " 2 Y to get our final test statistic, and once again, we estimate " 2 Y using the residual mean square. In this case, the residual mean square is the residual sum of squares divided by its degrees of freedom (see Eq. 3). MS residual = SS residual (8) df residual When we divide MSregression by MSresidual, " 2 Y cancels out, and we end up with a test statistic that doesn t depend on any population parameters. That is, we have a test statistic with a likelihood function that we know exactly, which is what s required for hypothesis testing. The test statistic is called F. F = MS regression MS residual (9) According to the null hypothesis, the F statistic has what s called an F distribution. F distributions arise any time you divide one chi square variable by another, such as MSregression and MSresidual. Because the distribution of a chi square variable depends on its degrees of freedom, an F distribution depends on the degrees of freedom of both chi square variables that it s based on. That is, an F distribution is defined by two degrees of freedom. So, the last things we need to know are the degrees of freedom for MSregression and MSresidual. The total degrees of freedom in SSY equals n 1, and these are divided up between SSregression and SSresidual. SSresidual loses m degrees of freedom for the m regression coefficients (b1 through bm) that go into defining Y ˆ, and these degrees of freedom end up in SSregression (essentially because SSregression = SSY SSresidual). The degrees of freedom for the sums of squares carry over to the mean squares, so MSregression has m degrees of freedom and MSresidual has n m 1 degrees of freedom. (You don t need to memorize the degrees of freedom, but you should understand the basic idea of where they come from.) Once we have F and both dfs, we can compare F to its sampling distribution to get a p value. The goal here is to decide whether F is bigger than would be expected by chance. If so, then SSregression is bigger than would be expected by chance, which means the regression is explaining something meaningful. Therefore, we want to know the probability of an F value as big as or bigger than the one we got from the data. We always compute a one tailed p value (using the upper tail), because an F that is unusually small doesn t tell us
7 anything interesting. The p value isn t something that can be calculated by hand; instead we use a computer (e.g., the pf() function in R). As always, if the p value is less than our alpha level (e.g.,.05) then we reject the null hypothesis and conclude that the predictors are explaining something meaningful about the outcome. p = p( F dfregression,df residual " F) (10) There may seem like a lot to this hypothesis test, but conceptually there are four simple steps. These same steps are used in other statistical tests, such as ANOVA, which we will learn next. Remembering and understanding these steps will not only help you understand hypothesis testing with regression, but it will make understanding ANOVA a lot easier as well. 1. Break the total sum of squares into an explainable sum of squares and an unexplainable sum of squares. 2. Divide each sum of squares by its degrees of freedom to get a mean square. 3. Divide the explainable mean squares by the unexplainable mean squares to get the test statistic, F. 4. Compare F to an F distribution to get the p value, or find the critical value and compare F directly to Fcrit.
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationTesting for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationCorrelational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationExperimental Designs (revisited)
Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More informationTwosample hypothesis testing, II 9.07 3/16/2004
Twosample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For twosample tests of the difference in mean, things get a little confusing, here,
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationSIMPLE LINEAR CORRELATION. r can range from 1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationOneWay Analysis of Variance
OneWay Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationThe importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 3031, 2008 B. Weaver, NHRC 2008 1 The Objective
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationHypothesis testing  Steps
Hypothesis testing  Steps Steps to do a twotailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More informationGeneral Regression Formulae ) (N2) (1  r 2 YX
General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Betahat: β = r YX (1 Standard
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationConfidence Intervals on Effect Size David C. Howell University of Vermont
Confidence Intervals on Effect Size David C. Howell University of Vermont Recent years have seen a large increase in the use of confidence intervals and effect size measures such as Cohen s d in reporting
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationStata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25yearold nephew, who is dating a 35yearold woman. God, I can t see them getting
More information1 Simple Linear Regression I Least Squares Estimation
Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression  ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationresearch/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPCS, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
More informationOdds ratio, Odds ratio test for independence, chisquared statistic.
Odds ratio, Odds ratio test for independence, chisquared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review
More informationElements of statistics (MATH04871)
Elements of statistics (MATH04871) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis 
More informationOneWay Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
OneWay Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
More informationRegression stepbystep using Microsoft Excel
Step 1: Regression stepbystep using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationSection 1: Simple Linear Regression
Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationCalculating PValues. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating PValues Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating PValues" (2014). A with Honors Projects.
More informationSTATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4
STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate
More informationTesting Group Differences using Ttests, ANOVA, and Nonparametric Measures
Testing Group Differences using Ttests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone:
More informationINTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the oneway ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationSTA201TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance
Principles of Statistics STA201TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis
More informationIntroducing the Linear Model
Introducing the Linear Model What is Correlational Research? Correlational designs are when many variables are measured simultaneously but unlike in an experiment none of them are manipulated. When we
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More informationStatistics and Data Analysis
Statistics and Data Analysis In this guide I will make use of Microsoft Excel in the examples and explanations. This should not be taken as an endorsement of Microsoft or its products. In fact, there are
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationSTT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)
STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 4323342) Helproom: (A102 WH) 11:20AM12:30PM,
More informationViolent crime total. Problem Set 1
Problem Set 1 Note: this problem set is primarily intended to get you used to manipulating and presenting data using a spreadsheet program. While subsequent problem sets will be useful indicators of the
More informationIntroduction to Linear Regression
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
More informationStats Review Chapters 910
Stats Review Chapters 910 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by Michael Sullivan, III And the corresponding Test
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationTwo Related Samples t Test
Two Related Samples t Test In this example 1 students saw five pictures of attractive people and five pictures of unattractive people. For each picture, the students rated the friendliness of the person
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the ttest
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One Way ANOVA Limitations of the ttest Although the ttest is commonly used, it has limitations Can only
More informationUNDERSTANDING THE DEPENDENTSAMPLES t TEST
UNDERSTANDING THE DEPENDENTSAMPLES t TEST A dependentsamples t test (a.k.a. matched or pairedsamples, matchedpairs, samples, or subjects, simple repeatedmeasures or withingroups, or correlated groups)
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationChapter 9. TwoSample Tests. Effect Sizes and Power Paired t Test Calculation
Chapter 9 TwoSample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and TwoSample Tests: Paired Versus
More informationThe F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests
Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More informationCurve Fitting. Before You Begin
Curve Fitting Chapter 16: Curve Fitting Before You Begin Selecting the Active Data Plot When performing linear or nonlinear fitting when the graph window is active, you must make the desired data plot
More informationHypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easytoread notes
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 42 A Note on NonLinear Relationships 44 Multiple Linear Regression 45 Removal of Variables 48 Independent Samples
More information1. How different is the t distribution from the normal?
Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. tdistributions.
More informationCHAPTER 13. Experimental Design and Analysis of Variance
CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection
More informationCOMP6053 lecture: Relationship between two variables: correlation, covariance and rsquared. jn2@ecs.soton.ac.uk
COMP6053 lecture: Relationship between two variables: correlation, covariance and rsquared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution
More informationSOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS
SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy
More informationIllustration (and the use of HLM)
Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will
More information0.8 Rational Expressions and Equations
96 Prerequisites 0.8 Rational Expressions and Equations We now turn our attention to rational expressions  that is, algebraic fractions  and equations which contain them. The reader is encouraged to
More informationPOLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More information12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More information