" Y. Notation and Equations for Regression Lecture 11/4. Notation:


 Jeffry Powers
 1 years ago
 Views:
Transcription
1 Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through m. So, the predictor variables are X1, X2,, Xm. Y: The outcome variable that is being predicted or explained in a regression ˆ Y : The estimated outcome value as predicted by the regression equation bi: The regression coefficient for predictor Xi b0: The intercept in the regression equation bi : The standard error of a regression coefficient SSY: The total sum of squares for Y, which represents the uncertainty in the outcome SSresidual: The residual sum of squares, representing the uncertainty left over after we use regression to generate the best possible predictions for the outcome SSregression: The sum of squares for the regression, representing the amount of uncertainty or variability that the predictors can explain R 2 : The proportion of variability explained by the regression MSresidual: The residual mean square, which is the mean squared error of the regression prediction, Y ˆ ; also used as an estimate of the population variance, MSregression: The mean square for the regression F: F statistic, which is the test statistic for deciding whether a regression explains meaningful variability in the outcome " Y 2 Regression equation. A regression equation is a formula that uses a set of predictor variables (Xi) to make a prediction for some outcome variable (Y). The prediction is called Y ˆ. The purpose of a regression equation is to find the best possible way to do this, that is, to combine the predictors in a way that gives estimates ( Y ˆ ) that are as close as possible to the right answer (Y). When there s only one predictor (m = 1), regression is essentially the same as correlation, and the regression equation is the same as the regression line we would draw on a scatterplot. The equation for a line always has the form Y ˆ = b0 + b1 X, where b0 and b1 are numbers representing the intercept and the slope. Finding the line that s closest to the data is the same as choosing b0 and b1 so that the predicted Y ˆ values are as close as possible to the true Y values. When there are more than one variable, we take the basic equation for a line and extend it in a natural way.
2 ˆ Y = b 0 + b 1 X 1 + b 2 X b m X m " = b 0 + b i X i i=1 to m (1) The two lines of Equation 1 mean the same thing. The first line writes everything out, and the second line combines things into a sum (you can choose to remember either one). In the sum, the index i takes on all values from 1 to m. When i = 1, the summand is b1x1; when i is 2, the summand is b2x2; and so on until bmxm. Notice that the contribution of each predictor (bixi) is the same as in the equation for a simple line with one predictor. This is why we say that each predictor has a linear effect on the predicted outcome. If we held all of the predictors except one fixed, then the relationship between the remaining predictor and the prediction would be a line. The goal of regression is to find the values of b0 through bm that lead to the best predictions. Therefore, the bs are a main focus of regression. Each bi is called the regression coefficient for its corresponding predictor (e.g., b1 is the regression coefficient for X1). The regression coefficient tells what kind of influence each predictor has on the predicted outcome. When a regression coefficient is positive, that means the predictor has a positive effect, just like with a positive correlation. When a regression coefficient is negative, the predictor has a negative effect, just like with a negative correlation. The magnitude or absolute value of bi tells how strong the effect is. If bi is near zero, then Xi has a weak effect on the outcome, just like with a correlation near zero. If bi is large (either positive or negative), then Xi has a strong effect on the outcome, just like with a correlation near ±1. The difference between regression coefficients and correlations is that regression coefficients aren t standardized, meaning they re not restricted to lie between  1 and 1. Therefore, the strength of a regression coefficient needs to be interpreted in terms of the units of the predictor and outcome variables. In general, bi tells how many units Y ˆ increases by for every unit increase in Xi. For example, if Xi is height (in inches) and Y is how long it takes a person to run a mile (in seconds), bi = 7 would mean that for every extra inch of height, a person tends to take 7 seconds longer to run a mile. A negative coefficient means a decrease; e.g., bi =  5 would mean that for every extra inch of height, a person tends to take 5 seconds less to run a mile. The b0 variable is a special regression coefficient called the intercept. Just like the intercept of a simple line, b0 tells what the value of Y ˆ is when all the Xis are zero (i.e., where the line intersects the Y axis). If zero isn t a sensible value of any of the predictors (e.g., no one is 0 inches tall), then the intercept taken by itself won t be a very sensible value for the outcome (e.g., it could be  50 seconds). Therefore, we generally don t think too hard about what the value of the intercept indicates; we just know that it needs to be in the regression equation so that the overall pattern of predictions can be shifted up or down as needed to match the true values of the outcome. Partitioning sums of squares. One important question with any regression equation is how well it does predicting the outcome. We answer this question in terms of how much variability in Y is explained by the regression, meaning how much uncertainty goes away when we use the regression equation to predict the outcome. Variability or uncertainty in
3 this case is measured in sums of squares (SS), which are very similar to variance and mean squared error, except that we don t divide by degrees of freedom (yet). The total variability in Y is called SSY (sum of squares for Y) and is defined as SSY = (Y MY) 2 (2) As usual, MY represents the mean of the sample Y. Notice that if we divided SSY by n 1, we would have the sample variance of Y. So, the sum of squares for Y is just like the variance for Y except that we don t divide by n 1 (i.e., it s a sum instead of an average). As with variance, we can think of the sum of squares as a measure of uncertainty, or how much error we would expect to make if we had to guess the value of Y blindly. If we have no information about a subject, our best guess of their Y score is the mean, MY. Therefore (Y MY) 2 is our squared error, and SSY is the sum of the squared error over all subjects. If we do know something about a subject, then we can make a better prediction of their Y score than by blindly guessing the mean. This is what the regression equation does for us it uses all the predictors, Xi, to come up with the best possible prediction, Y ˆ. Once we have Y ˆ, we can ask how well it does as an estimate of Y, again by adding up the squared error over all subjects. The result is called the residual sum of squares, because it represents the uncertainty in Y that s left over after we do the regression. Notice that the residual sum of squares is the same as mean squared error except that once again it s a sum instead of a mean, because we don t divide by degrees of freedom. SSresidual = (Y ˆ Y ) 2 (3) As stated above, the goal of regression is to find the values of the regression coefficients (bi) that lead to the best predictions. By best predictions, we mean minimizing the squared difference between Y and Y ˆ. In other words, the goal is to minimize SSresidual. This is how we determine the regression coefficients (or, usually, how a computer determines them for us). The question now is how well the regression did. If the predictors do a really good job of explaining the outcome, SSresidual will be close to zero. If the predictors tell us little or nothing about the outcome, then SSresidual will be close to the original, total sum of squares, SSY. SSresidual is always less than or equal to SSY (because SSY is the error of the naive prediction, MY, and SSresidual is the error of the best possible prediction, Y ˆ ), but the question is how much less. The reduction in uncertainty from SSY to SSresidual is called SSregression, because it s the amount of variability in the outcome (Y) that the regression can explain. SSregression = SSY SSresidual SSY = SSregression + SSresidual (4) The two versions of Equation 4 say the same thing. The first version shows how we get SSregression by subtracting the residual sum of squares from the total sum of squares. The second version shows how the total sum of squares (i.e., the original variability in the data) can be broken into two parts: the portion that we can explain using the predictors (SSregression), and the portion that we cannot explain (SSresidual).
4 Explained variability. Once we ve worked out the total variability (SSY) and the portion explained by the predictors (SSregression), we can calculate their ratio. The ratio is the fraction of the total variability explained by the predictors, and it s our final measure of how useful the regression was. The fraction of explained variability is called R 2, because it extends the idea of the squared correlation, r 2. Recall that when we re doing a simple correlation (i.e., there s only one predictor), r 2 is the fraction of the variance in Y that can be explained by X. In other words, when there s only one predictor, R 2 and r 2 are equal. R 2 is just a more general concept that works when there are multiple predictors. If R 2 is close to 1, then the regression explains most of the variability in Y, meaning that if we know the values of the predictors for some subject then we can confidently predict the outcome for that subject. If R 2 is close to 0, then the predictors don t give us much information about the outcome. (R 2 is always between 0 and 1.) R 2 = SS regression SS Y (5) Hypothesis testing: The effect of one predictor. Once we ve run a regression to find the regression coefficients for a set of predictors, we can ask how reliable the regression coefficients are. The regression coefficients are statistics, meaning they re computed from a sample (for each subject in our sample, we have measured all the Xis and Y). We can use the regression coefficients as estimates of the population, but as with all estimators, they are imperfect. If we gathered a new sample and ran the regression on the new data, we d obtain somewhat different values for the regression coefficients. Therefore, each bi has a sampling distribution, which represents the probabilities of all possible values we could get for bi if we replicated the experiment. Each bi also has a standard error, which as usual is the standard deviation of the sampling distribution. The standard error tells us how reliable our estimate is, meaning how far we can expect it to be from the true population value. We won t go into detail about how the standard errors of the regression coefficients are calculated, because it s much easier to use a computer for this. However, it s important to understand what the standard errors can tell us. First, the standard error can be used to create a confidence interval for each bi, in the same way we create confidence intervals for means. I won t describe the math, but the idea is the same as before: The confidence interval is centered on the actual value of bi obtained from the sample, and the width of the confidence interval is determined by the size of the standard error. The second thing we can use standard errors for is hypothesis testing. In regression, the most common hypotheses to test regard whether the predictors have reliable influences on the population. This corresponds to asking whether the true values of the regression coefficients are different from zero. For each predictor, Xi, the null hypothesis bi = 0 states that Xi has no reliable effect on Y (the alternative hypothesis is bi 0). Notice that there s a separate null hypothesis for each predictor, and we can test each one individually. To test whether bi = 0, we calculate a t statistic equal to bi (the actual regression coefficient we obtained from our sample) divided by its standard error. t = b i " bi (6)
5 Just as with the t statistic for a t test, t for a regression coefficient tells us how large that coefficient is relative to how large it would be expected to be by chance (i.e., according to the null hypothesis that its true value is zero). If t is large (either positive or negative), then bi is larger than would be expected by chance, so chance is not a good explanation for the result that we got. In this case, we reject the null hypothesis and adopt the alternative hypothesis that bi 0 (i.e., that Xi has a real effect on Y). If t is close to zero, then bi fits with what we d expect by chance, so we retain the null hypothesis that bi = 0 (i.e., Xi has no real effect on Y). The t statistic from a regression is used in the same way as in a t test. If t is greater than tcrit, then we reject the null hypothesis. The alternative (and equivalent) approach is to compute a p value, which is the probability of a result as or more extreme than t: p( tdf > t ). This is the formula for a two tailed test, but we can also compute a one tailed p value if the direction of the effect (i.e., the sign of bi) was predicted in advance. In either case, we reject the null hypothesis if p < α. The only remaining information needed to find tcrit or p is the degrees of freedom. As usual, the degrees of freedom for t equals the degrees of freedom for the standard error used to compute t. The standard error for a regression coefficient, " bi, comes from SSresidual, which, as is explained below, has n m 1 degrees of freedom. Therefore, a t test for a regression coefficient uses df = n m 1 (you don t need to memorize this). Hypothesis testing of multiple predictors. Another way to test whether predictors have reliable effects on the outcome is to test whether they explain more variability than would be expected by chance. This can be done with multiple predictors, by comparing how much variability the regression explains with those predictors to how much it explains with those predictors left out. We ll focus on the most common situation, where we want to test whether the full set of predictors, X1 through Xm, collectively tell us anything meaningful about the outcome, Y. As explained above, SSregression represents the amount of variability in Y explained by all of the predictors. We want to test whether SSregression is larger than would be expected by chance. Our null hypothesis is that none of the predictors has any effect on the outcome, meaning that the true values of the regression coefficients (except perhaps the intercept, b0) are all zero. Notice that this is the same null hypothesis used above for testing predictors one at a time, except that now we re testing them all at once, to see whether any predictor gives reliable information about the outcome. Even if all the bis are zero in the population, the regression coefficients we get from our sample will deviate from zero because of sampling variability. This leads the variability explained by the regression, SSregression, to be greater than zero, even though this explained variability is meaningless random error. Therefore, SSregression has a sampling distribution that, as usual, tells us how large it can be expected to be just by chance. Comparing the actual value of SSregression obtained from the sample to the amount expected by chance allows us to test whether chance is a good explanation for the results (H0) or the predictors are explaining something real about the outcome (H1).
6 To test whether SSregression is larger that would be expected by chance, we first divide it by its degrees of freedom to get the mean square, MSregression. MS regression = SS regression df regression (7) According to the null hypothesis that the regression doesn t explain anything real, MSregression has a (modified) chi square distribution, multiplied by " 2 Y, the variance of Y in the population. If we knew " 2 Y then we would know the likelihood function for MSregression exactly. This is the same situation that came up with t tests, where we knew the likelihood function for M except for not knowing σ 2. Once again, we divide by an estimate of " 2 Y to get our final test statistic, and once again, we estimate " 2 Y using the residual mean square. In this case, the residual mean square is the residual sum of squares divided by its degrees of freedom (see Eq. 3). MS residual = SS residual (8) df residual When we divide MSregression by MSresidual, " 2 Y cancels out, and we end up with a test statistic that doesn t depend on any population parameters. That is, we have a test statistic with a likelihood function that we know exactly, which is what s required for hypothesis testing. The test statistic is called F. F = MS regression MS residual (9) According to the null hypothesis, the F statistic has what s called an F distribution. F distributions arise any time you divide one chi square variable by another, such as MSregression and MSresidual. Because the distribution of a chi square variable depends on its degrees of freedom, an F distribution depends on the degrees of freedom of both chi square variables that it s based on. That is, an F distribution is defined by two degrees of freedom. So, the last things we need to know are the degrees of freedom for MSregression and MSresidual. The total degrees of freedom in SSY equals n 1, and these are divided up between SSregression and SSresidual. SSresidual loses m degrees of freedom for the m regression coefficients (b1 through bm) that go into defining Y ˆ, and these degrees of freedom end up in SSregression (essentially because SSregression = SSY SSresidual). The degrees of freedom for the sums of squares carry over to the mean squares, so MSregression has m degrees of freedom and MSresidual has n m 1 degrees of freedom. (You don t need to memorize the degrees of freedom, but you should understand the basic idea of where they come from.) Once we have F and both dfs, we can compare F to its sampling distribution to get a p value. The goal here is to decide whether F is bigger than would be expected by chance. If so, then SSregression is bigger than would be expected by chance, which means the regression is explaining something meaningful. Therefore, we want to know the probability of an F value as big as or bigger than the one we got from the data. We always compute a one tailed p value (using the upper tail), because an F that is unusually small doesn t tell us
7 anything interesting. The p value isn t something that can be calculated by hand; instead we use a computer (e.g., the pf() function in R). As always, if the p value is less than our alpha level (e.g.,.05) then we reject the null hypothesis and conclude that the predictors are explaining something meaningful about the outcome. p = p( F dfregression,df residual " F) (10) There may seem like a lot to this hypothesis test, but conceptually there are four simple steps. These same steps are used in other statistical tests, such as ANOVA, which we will learn next. Remembering and understanding these steps will not only help you understand hypothesis testing with regression, but it will make understanding ANOVA a lot easier as well. 1. Break the total sum of squares into an explainable sum of squares and an unexplainable sum of squares. 2. Divide each sum of squares by its degrees of freedom to get a mean square. 3. Divide the explainable mean squares by the unexplainable mean squares to get the test statistic, F. 4. Compare F to an F distribution to get the p value, or find the critical value and compare F directly to Fcrit.
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationwhere b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
More informationIntroduction to Regression. Dr. Tom Pierce Radford University
Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationStatistics 112 Regression Cheatsheet Section 1B  Ryan Rosario
Statistics 112 Regression Cheatsheet Section 1B  Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationBivariate Regression Analysis. The beginning of many types of regression
Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationSimple Regression and Correlation
Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationInference for Regression
Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about
More informationTesting for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationRegression Analysis: Basic Concepts
The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance
More informationLecture 18 Linear Regression
Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation  used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation
More informationRegression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur
Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture  7 Multiple Linear Regression (Contd.) This is my second lecture on Multiple Linear Regression
More informationCHAPTER 2 AND 10: Least Squares Regression
CHAPTER 2 AND 0: Least Squares Regression In chapter 2 and 0 we will be looking at the relationship between two quantitative variables measured on the same individual. General Procedure:. Make a scatterplot
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More information17.0 Linear Regression
17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationRegression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationThe scatterplot indicates a positive linear relationship between waist size and body fat percentage:
STAT E150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationLesson Lesson Outline Outline
Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and
More informationToday: Dummy variables. Dummy variables in a multiple regression, regression wrap up.
Today: Dummy variables. Dummy variables in a multiple regression, regression wrap up. Looking back in regression, we ve looked at how an interval data response y changes as an interval data explanatory
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Open book and note Calculator OK Multiple Choice 1 point each MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Find the mean for the given sample data.
More informationExperimental Designs (revisited)
Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described
More informationTwosample hypothesis testing, II 9.07 3/16/2004
Twosample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For twosample tests of the difference in mean, things get a little confusing, here,
More informationCorrelational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationDEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests
DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also
More informationChapter 11: Two Variable Regression Analysis
Department of Mathematics Izmir University of Economics Week 1415 20142015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions
More informationData and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10
Data and Regression Analysis Lecturer: Prof. Duane S. Boning Rev 10 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance (ANOVA) 2. Multivariate Analysis of Variance Model forms 3.
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationE205 Final: Version B
Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random
More informationOutline. Correlation & Regression, III. Review. Relationship between r and regression
Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation
More information12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares
More informationSIMPLE LINEAR CORRELATION. r can range from 1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationThe importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 3031, 2008 B. Weaver, NHRC 2008 1 The Objective
More informationRegression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationA. Karpinski
Chapter 3 Multiple Linear Regression Page 1. Overview of multiple regression 32 2. Considering relationships among variables 33 3. Extending the simple regression model to multiple predictors 34 4.
More informationSimple Linear Regression Chapter 11
Simple Linear Regression Chapter 11 Rationale Frequently decisionmaking situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related
More information7 Hypothesis testing  one sample tests
7 Hypothesis testing  one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationOneWay Analysis of Variance
OneWay Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationSimple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More informationAMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015
AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This
More information, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.
BA 275 Review Problems  Week 9 (11/20/0611/24/06) CD Lessons: 69, 70, 1620 Textbook: pp. 520528, 111124, 133141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An
More informationAP Statistics 2002 Scoring Guidelines
AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought
More informationStatistics for Management IISTAT 362Final Review
Statistics for Management IISTAT 362Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to
More informationHypothesis testing  Steps
Hypothesis testing  Steps Steps to do a twotailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More information0.1 Multiple Regression Models
0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different
More informationThe general form of the PROC GLM statement is
Linear Regression Analysis using PROC GLM Regression analysis is a statistical method of obtaining an equation that represents a linear relationship between two variables (simple linear regression), or
More informationRegression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More informationConfidence Intervals on Effect Size David C. Howell University of Vermont
Confidence Intervals on Effect Size David C. Howell University of Vermont Recent years have seen a large increase in the use of confidence intervals and effect size measures such as Cohen s d in reporting
More informationMath 62 Statistics Sample Exam Questions
Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationECO220Y1Y Duration  3 hours Examination Aids: Calculator
Page 1 of 16 UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2011 EXAMINATIONS ECO220Y1Y Duration  3 hours Examination Aids: Calculator Last Name: First Name: Student #: ENTER YOUR NAME AND STUDENT
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationRegression stepbystep using Microsoft Excel
Step 1: Regression stepbystep using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationresearch/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPCS, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
More informationRegression, least squares
Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen
More informationCalculating PValues. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating PValues Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating PValues" (2014). A with Honors Projects.
More informationOneWay Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
OneWay Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
More informationSolutions to Worksheet on Hypothesis Tests
s to Worksheet on Hypothesis Tests. A production line produces rulers that are supposed to be inches long. A sample of 49 of the rulers had a mean of. and a standard deviation of.5 inches. The quality
More information12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods
More informationGeneral Regression Formulae ) (N2) (1  r 2 YX
General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Betahat: β = r YX (1 Standard
More informationTesting Group Differences using Ttests, ANOVA, and Nonparametric Measures
Testing Group Differences using Ttests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone:
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationEPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM
EPS 6 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM ANCOVA One Continuous Dependent Variable (DVD Rating) Interest Rating in DVD One Categorical/Discrete Independent Variable
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationStatistics and Data Analysis
Statistics and Data Analysis In this guide I will make use of Microsoft Excel in the examples and explanations. This should not be taken as an endorsement of Microsoft or its products. In fact, there are
More informationHomework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.
Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More information