SIMPLE REGRESSION ANALYSIS

Save this PDF as:

Size: px
Start display at page:

Transcription

1 SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two let us designate them x and y and we suppose that they are related by an expression of the form y = β 0 + β 1 x + ε. We ll leave aside for a moment the nature of the variable ε and focus on the x - y relationship. y = β 0 + β 1 x is the equation of a straight line; β 0 is the intercept (or constant) and β 1 is the x coefficient, which represents the slope of the straight line the equation describes. To be concrete, suppose we are talking about the relation between air temperature and the drying time of paint. We know from experience that as x (temperature) increases, y (drying time) decreases, and we might suppose that the relationship is linear. But suppose that we need to know the exact nature of the relationship, so that we can predict drying time at various temperatures. How could we discover the actual values of β 0 and β 1? Well, to start with, we cannot discover the actual values. Note that β 0 and β 1 are Greek letters, indicating that these are parameters, and they are somewhat in the nature of population parameters which can never be known exactly. What we can do is to get estimates of these parameters let us call them b 0 and b 1, using Latin characters to indicate that these are statistics and only approximations of the real thing. How could we find b 0 and b 1? Let s do an experiment: let s paint some boards at different temperatures and observe how long it takes them to dry. Such an experiment might generate the following data: Temperature Drying Time (hours) If we plotted these points on an axis system, we could see the relationship: Each little box represents a temperature/drying time observation, and we can see that in a general way, as temperature goes up, drying time goes down. One way to find the exact relationship would be to take a straight edge and try to draw the line that comes closest to hitting all the dots. By measuring with a ruler, we could then determine the slope and intercept of this line. This is somewhat laborious, however, and since everyone is likely to see the scatter diagram a little differently, this technique would not give us a single, agreed-upon line. Another way to find the exact relationship would be to use regression analysis. By entering the data into the appropriate formulas (discussed below) we could find the line shown next:

2 Stats II, Lecture Notes, page This straight line has the equation DT = x Temp, so that b 0 = 19.6 and b 1 = 0., and we may take this equation to describe the relation between our two variables. By entering a temperature, we can evaluate the equation to discover the drying time at that temperature. We are left with a question however: why is it that all the points aren t on the straight line? Well, because other things besides temperature affect drying time, and those other things had different values for each of our sample points. This is likely always to be the case for any y, there may be literally hundreds of things that affect its value, and we have concentrated on only one. To take explicit account of this, when we stated the regression equation above, we included a factor labeled ε. Since we cannot control or account for all the myriad things that affect drying time, we summarize them in ε and take ε to be a random variable whose value on any particular occasion is determined simply by chance; it is the operation of this ε that causes our actual observations to diverge from the straight line. The existence of these random deviations is what makes regression analysis interesting and useful. If all relations were exact and free from chance, we could simply establish two points and draw a line between them to represent the relation; as it is, we must deal with randomness, and the regression line thus represents the average relationship between x and y.

3 Stats II, Lecture Notes, page 3 The Regression Model Assumptions of the Regression Model the relation between x and y is given by y = β 0 + β 1 x + ε ε is a random variable, which may have both positive and negative values, so ε is normally distributed E(ε) = 0 the standard deviation of ε, σ yx, is constant over the whole range of variation of x. This property is called homoscedasticity. since E(ε) = 0, we re supposing that E(y) = β 0 + β1x + E(ε) = β 0 + β 1 x Finding the regression line: the method of ordinary least squares or OLS begin with assumed values for b 0 and b 1 and suppose that the relation between x and y is given by y = b 0 + b 1 x; some b 0 s and b 1 s will give us better fits than others let y = a + bx i be the value of y estimated by the regression equation when x has the value x i ; then if y i is actual value, y i ŷ i is called the residual or the error substituting, let e i = y i ŷ i = y i b 0 b 1 x i different b 0 s and b 1 s will cause each e i to have a different value: The residuals along the line marked A are larger than those along the line marked B but the sum of deviations is always zero square each residual and define the sum of squared errors as (y i b 0 b 1 x i ) x and y are data: the variables are b 0 and b 1, and choosing different values of these will change the size of the sum of squares minimizing the sum of squares with respect to b 0 and b 1, using minimization methods from differential calculus, gives unique values for the b s Resulting formulas are rarely used explicitly anymore, but ( Σxi yi ) n x y b1 = ( Σx ) n x b 0 = y bx

4 Stats II, Lecture Notes, page 4 Continuing the paint-drying example: (recall that temp = x and drying time = y) Temperature Drying Time (hours) x y x x = 68.5 ȳ = 4.5 x y = 1067 x = b = = b 0 = 4.5 ( ) 68.5 = Using a Spreadsheet to explore Linear Relations Looking at relations often begins with a scatter plot of the data, as in the following examples Each point in the scatterplot represents an individual faculty member; the arrow is pointing at a point which represents a faculty member with publications and salary just over 1.4 times the department s minimum at the time Chairman s Claim: Salary differences are determined by differences in success in publication. But there is no discernible pattern in this plot individuals with same number of publications have very different salaries individuals with about the same salary have very different numbers of publications almost any line would fit the data equally well

5 Stats II, Lecture Notes, page 5 Same time period, the next diagram plots the salary index against total research projects, self-declared stuff put on the Annual Report: except for the three outliers in the upper left corner, all points are more or less on a straight line, and we might sensibly conclude that more research projects systematically means a larger salary Important point: if the three outliers are left in the data set, the regression line is nearly horizontal, and the correlation is low leaving the outliers out of the data set greatly improves the fit the outliers would be difficult to spot, even in this small data set, without doing the scatter plot we are led to investigate the nature of these outliers Creating a Scatter Diagram with Excel Begin by entering the Data you want to plot in columns Click on the Chart Wizard button in the task bar or click on Insert then click on Chart brings up the Chart Wizard, a series of dialog boxes choose XY (Scatter) and the variant with no lines if the X and Y variables are in adjacent columns with the X to the left, you can mark the whole data range, and Excel will assume that the left-most column is X if the X and Y variables are separated or reversed, click on the Series tab, then on Add ; there are separate entry lines for X and Y variables; be careful to get them straight and note that in Microsoft s peculiar terminology, a Series is an X variable and a Y variable. In most of the scatter diagrams we ll be doing in Stats II, there is only one series, consisting of two columns of variables. Variable names at the head of a column will be picked up and preserved as titles on your graph Many things can be edited after the chart is prepared; others cannot experience is the most reliable teacher To print, click on the chart to highlight it, then be sure that the print dialog box specifies printing the selected chart Charts may also be copied to the clipboard, then inserted into Word documents

6 Stats II, Lecture Notes, page 6 Using Excel to Calculate the Regression Line Be sure to enter data in separate columns SIMPLE REGRESSION II Spreadsheet Calculations & Predictions Click on TOOLS from TOOLS choose DATA ANALYSIS then choose REGRESSION you ll get a dialog box if you marked the data before clicking on TOOLS, all you have to do is click on OK otherwise, go ahead and use the dialog box to enter your data ranges choose an output range: if you don t specify, Excel creates a new sheet and puts your output on it Output example: SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 4 ANOVA df SS MS F Significance F Regression Residual Total 3.75 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept X Variable Intercept Coefficient: the b 0 which estimates β 0 X Variable 1 Coefficient: the b 1 that estimates β 1, or the slope of the regression line Using the Regression Equation Equation may be used to interpolate: predict values within the limits of observation extrapolate (or forecast): predict values outside the range of observation or experience To make a point prediction, simply plug an x value into the equation DT = x Temp; what will drying time be at 75 degrees? DT = x 75 =.76 hr. Extrapolation: drying time at 45 degrees: DT = x 45 = 9.36 can lead to absurdity: drying time at 95 degrees? Practitioners generally prefer interval estimates: we re 95% confident that drying time will be between 9 and 9.7 hours, for example

7 Stats II, Lecture Notes, page 7 Using a Spreadsheet to Make Predictions We must use the regression output to create the regression equation somewhere in the spreadsheet. We also need to select a set of adjacent cells as entry cells, so that we can conveniently type in the x value for which we wish to make a prediction and the confidence level for interval predictions. For point predictions we must enter x value at which to make predictions spreadsheet output gives you constant term or intercept x coefficient thus you can create the regression equation as (Cell address for Constant) + (Cell address for x coefficient)*(cell address used to enter x value) Homework: 13.1, 13.5, 13.7

8 Stats II, Lecture Notes, page 8 SIMPLE REGRESSION III Evaluating the Equation Issue One: Have the assumptions of OLS regression been met? Linear relation constant distribution of errors absence of relation between succeeding values of y variable over time if y t+1 is correlated with y t the condition is called autocorrelation or serial correlation analysis of residuals if there s a pattern, there s a problem Issue Two: Is there a significant relationship between x and y? Purposes and Intuition of the Procedures The least-squares technique will always give us coefficients, but there may be no relation as when a scatter plot is just a random set of dots also the data points are a sample, subject to sampling error suggests a need for techniques to evaluate whether there is a relation and, if so, how good x is at explaining y Ultimately, the simple techniques all come down to How close the data points are to the regression line the coefficient of determination or r how different the slope of the regression line is from zero: the t test on β 1 the quantity b 1 s b1 is distributed as a t distribution, so tests the hypothesis H 0 : β 1 = 0 Examples:

9 Stats II, Lecture Notes, page 9 r, the coefficient of determination This number serves as an evaluation of the overall closeness of the relationship and represents the proportion of variation in y explained by variation in x how closely data points cluster around the regression line necessarily, 0 r 1 values near 0 indicate a weak relation; not much of the variation in y is associated with variation in x Cautions: r doesn t necessarily mean anything about the relation between x and y as such some third force, z, may be causing both x and y the problem of omitted variables low r doesn t mean there s no relation only that x explains little of the variation in y Definition: r Σ( y y\$) = 1 Σ( y y) In this definition the numerator is the sum of squared residuals, or sum of squared errors represents deviation of actual from predicted values, that is, variation which is not explained by the regression denominator is a measure of total variation in y somewhat the same thing as the standard deviation of the residuals divided by the standard deviation of y the smaller are the residuals, that is, the closer the observations lie to the regression line, the closer this fraction is to zero since the ratio (sum of squared residuals)/(sum of squared deviations) represents the portion of variation not explained, 1 that ratio is the portion of total variation that is explained. Evaluating the coefficients confidence intervals for the coefficients remember that b 1 estimates the unknown parameter β 1 confidence interval for β 1 : b 1 t (n-)d.f. s b1 Example: we have regressed quantity sold on the price of cat food; our data were generated by setting 15 different prices in 15 different test markets. We have the estimated regression equation Sales = Price, with price measured in dollars. Further, s b1 = 500. Calculate a 90% confidence interval for β 1 Solution: There are 15 = 13 degrees of freedom, which gives a t value of for a 90% confidence interval. Thus our confidence interval is 000 ± x 500 or 000 ± we could also say that we re 90% confident that β changing the price by \$0.10 will reduce sales by at least but by no more than 85.5 confidence intervals are generated automatically by Excel Hypothesis tests for the coefficients b 1 is based on a sample; a different set of data (different sample) will give different b 1 the mean of all these possible b 1 s is β 1, the actual slope of the x-y line if β 1 = 0 we could, by sheer chance, once in a while get a b 1 0 but the further b is from 0, the less likely that s due to chance the quantity (b 1 β 1 )/s b1 is distributed as a t distribution most often we wish to test

10 Stats II, Lecture Notes, page 10 H 0 : β 1 = 0 H 1 : β 1 0 that is, if β 1 = 0, there is no systematic relation between x and y, and the equation reduces to y = β 0 + ε rejecting H 0 means concluding that there is a significant relationship between x and y Note that since β 1 = 0 by hypothesis, the t statistic reduces to t = b 1 /s b1 From above: with 13 degrees of freedom, at 1% significance level the critical values of t = ±3.01. From our data t = 000/500 = 4 < 3.01, so we reject H 0 and conclude that there is a statistically significant relationship between price and quantity An F test for the regression using ANOVA recall that F = (variation between groups)/(variation within groups), where the between group variation represented systematic variation caused by a difference of treatments, while within group variation is random, or error variation here, variation between groups is equivalent to variation explained by the regression, so that Regression Mean Square F = = Error Mean Square RMS EMS Regression Sum of Squares: SSR = ( yˆ y) since the ŷ values are predicted by the regression, this sum represents variation from the mean value of y which is due to variation in x then RMS = SSR/m where m is the number of explanatory variables and represents the degrees of freedom in SSR Error Sum of Squares: SSR = ( y yˆ) each term represents the difference between the predicted value and actual value; or the portion of the variation in y that isn t explained by variation in x then EMS = SSE/(n ) since n is the degrees of freedom in the residuals Total Sum of Squares: SST = SSR + SSE = (y ȳ ) which measures the total variation in y from all sources We may simply give the F ratio, although it is customary to present an ANOVA table Excel reports the ANOVA table as part of the output, giving both the F value and the p value ( significance F ) for the ANOVA test In simple regression, the ANOVA test merely repeats the result of the t test on the regression coefficient in Excel output, note that F test and t test on the x1 coefficient have the same p value real significance is in multiple regression More on Residuals Assumptions of OLS regression: relation is linear, that is, y = β 0 + β 1 X + ε ε is normally distributed with E(ε) = 0 σ YX constant the residuals may be taken as a sample of the ε term and used to check whether these assumptions are met Some possibilities: relation is not linear: positive residuals, then negative, then positive (or the reverse) σ YX is not constant (heteroscedasticity): the residuals will fan out, or get larger, as x increases

11 Stats II, Lecture Notes, page 11 Autocorrelation (serial correlation): a problem with time-series data the residuals themselves trace a pattern, typically rising then falling then rising Example: A classic picture of autocorrelation Handguns Residual Plot Residuals Handguns per 1000 Population Effect of autocorrelation: biases r upward biases s b downward calculated t statistic is biased upward and p-values downward β 1 appears to be significantly different from 0 when it isn t Amount of variation explained appears to be greater than it actually is Testing for autocorrelation: the Durbin-Watson statistic Making Predictions with a Regression Equation Equation may be used to interpolate: predict values within the limits of observation extrapolate (or forecast): predict values outside the range of observation or experience To make a point prediction, simply plug an x value into the equation DT = x Temp; what will drying time be at 75 degrees? DT = x 75 =.76 hr. Extrapolation: drying time at 45 degrees: DT = x 45 = 9.36 can lead to absurdity: drying time at 95 degrees? Statisticians generally prefer interval estimates interval for the average (or mean) value of y, given a value of x if we paint twenty boards at 70 degrees, find an interval for the average drying time of the twenty boards interval for a specific value of y, given a value of x if we paint a board, find a confidence interval for the time it ll take to dry the first of these is somewhat narrower since it s an average of repeated cases. Like all averages, positive deviations tend to be offset by negative ones Interval for the average y value, using a t statistic y ˆ ± t C s yx 1 ( x x) + n sy. x sb

12 Stats II, Lecture Notes, page 1 here, t has n - degrees of freedom ˆ ( y y ) s = i i yx n s yx appears as Standard Error in Excel s b s. = x y x nx this term appears in Excel as Standard Error for X Variable 1 (or whatever name is used) NOTE: The formula for the confidence interval seems different from that in your text because it is adapted to use spreadsheet output; interested students may prove to themselves, by appropriate substitutions, that the two formulae are in fact equivalent Example: From above, SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 4 ANOVA df SS MS F Significance F Regression Residual Total 3.75 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept X Variable from spreadsheet output, s yx = and s b = ; there are two degrees of freedom, so the appropriate t value for a 90% confidence interval is.9. Let us calculate a confidence interval for the average value of drying time at a temperature of 50 degrees. ŷ = x 50 = 8.30 plugging numbers into the above formula, the confidence interval is y ˆ ± t C s yx 1 ( x x) + n syx sb = 8.3 ± ( ) = 8.30 ±.46 Note: although I ve rounded off at intermediate steps in this calculation, you should NEVER do so in practice; carry your calculations out to as many decimal places as possible, rounding off only with the final answer. Notice that the quantity (x - x) gets larger the further we get from the mean; hence the confidence interval widens

13 Stats II, Lecture Notes, page 13 Interval for a specific value of y, given a value of x, using t y ˆ ± t C s yx 1 ( x x) 1+ + n s yx sb where all terms are as previously defined This expression is very similar to that for an average value: it differs only in the inclusion of a 1 under the radical Example: from above, the confidence interval is y ˆ ± t C s yx 1 ( x x) 1+ + n syx sb = 8.3 ± ( ) = 8.30 ± 3.46 Using a Spreadsheet to Make Predictions Use the regression output to create the regression equation somewhere in the spreadsheet. We also need to select a set of adjacent cells as entry cells, so that we can conveniently type in the x value for which we wish to make a prediction and the confidence level for interval predictions. Use the TINV function to find the t value for use calculating the confidence interval: =TINV(0.05, df) gives the t value for a 95% confidence interval; df can be found by entering the cell address for number of observations. Correlation Simple Correlation the coefficient of correlation (or Pearson s coefficient of correlation) is a measure of the closeness of the relation between x and y Definition: r = Σ( x x) ( y y) Σ( x x) Σ( y y) here the x s and y s are paired observations Intuition of the correlation coefficient denominator serves to scale the expression; note that numerator denominator, so that -1 r +1 look at terms in numerator when x is greater than its mean and y is greater than its mean, both terms are positive and product is positive if x < x while y < ȳ, the product is again positive but if when x < x, y > ȳ, or vice versa, product is negative so, if x and y deviate from their respective means in the same direction, the sum of the products is positive and r > 0 but if deviations are opposite, sum of products is negative, and r < 0 Magnitude: if x and y are not associated, then large (x - x) is as likely as not to be associated with small (y - ȳ) the size of the products is simply random and the sum will be close to 0 but if x and y are associated, then large x differences will be associated with large y differences, producing large products and a large sum

14 Stats II, Lecture Notes, page 14 Example: Prices and unit sales of a particular model of rat trap were recorded on five occasions with these results: Price Units Sold \$ Calculate r, the correlation coefficient Solution: First, note that x = 8 and ȳ = 6. Then arrange the data in a table as follows x (x - x) (x - x) y (y - ȳ) (y - ȳ) (x - x) x (y - ȳ) x - 3 = = then r = 50 = In these calculations, look at the pattern of positive and negative deviations if all pairs (x - x) x (y - ȳ) are positive or negative, then r will be positive or negative respectively if some are positive and some are negative, the will tend to offset one another the more this occurs, the closer will r be to zero Now, actually, r is a statistic, that is, it s based on a sample. It is an estimate of the unknown parameter ρ (Greek letter rho) and is subject to sampling error. ρ is what we would get if we computed the above expression for every possible pair of x and y it represents the true degree of correlation between x and y we can test for whether ρ actually is different from zero; for samples drawn independently from the same population, the following quantity is distributed as a t distribution with n - degrees of freedom t = r 1 r n Thus we can test H 0 : ρ = 0 vs. H 1 : ρ 0 from above t = ( 0. 97) 5 = = for a two-tailed test, with 3 degrees of freedom, at α = 0.01, t C = ± 5.841, so we can reject H 0 and conclude r 0 that is, we reject the premise that the observed correlation is due to sampling error and conclude that there is a linear relation between price and sales Using Excel to find r Appears as Multiple R in regression output attach the sign from regression coefficient DO NOT DO THIS if there is more than one X variable Use the Tools -Data Analysis -Correlation Procedure

15 Stats II, Lecture Notes, page 15 Data must be entered into adjacent columns: designate the whole block as Input Range Output: Column 1 Column Column 1 1 Column the correlation between the variables in column 1 and column is the correlation between X and Y Use the CORREL spreadsheet formula CORREL(X range, Y range) returns the correlation coefficient CORRELATION DOES NOT PROVE CAUSATION!

Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

SELF-TEST: SIMPLE REGRESSION

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

Study Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables

Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written

Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

Simple Linear Regression

Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

CHAPTER 2 AND 10: Least Squares Regression

CHAPTER 2 AND 0: Least Squares Regression In chapter 2 and 0 we will be looking at the relationship between two quantitative variables measured on the same individual. General Procedure:. Make a scatterplot

The scatterplot indicates a positive linear relationship between waist size and body fat percentage:

STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the

Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

Chapter 9. Section Correlation

Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation

EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY

EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY Objective: To learn how to use the Excel spreadsheet to record your data, calculate values and make graphs. To analyze the data from the Acceleration Due

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

Regression Analysis. Pekka Tolonen

Regression Analysis Pekka Tolonen Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model

Chapter 12 : Linear Correlation and Linear Regression

Number of Faculty Chapter 12 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if

Chapter 4 Describing the Relation between Two Variables

Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The response variable is the variable whose value can be explained by the value of the explanatory or predictor

2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 3 2 Describing Relationships 3.1 Scatterplots and Correlation 3.2 Learning Targets After

Multiple Regression in SPSS STAT 314

Multiple Regression in SPSS STAT 314 I. The accompanying data is on y = profit margin of savings and loan companies in a given year, x 1 = net revenues in that year, and x 2 = number of savings and loan

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

Kenyon College ECON 375 Introduction to Econometrics Spring 2011, Keeler. Introduction to Excel January 27, 2011

Kenyon College ECON 375 Introduction to Econometrics Spring 2011, Keeler Introduction to Excel January 27, 2011 1. Open Excel From the START button, go to PROGRAMS and then MICROSOFT OFFICE, and then EXCEL.

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

Using Your TI-NSpire Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

TIME SERIES REGRESSION

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 TIME SERIES REGRESSION I. AGENDA: A. A couple of general considerations in analyzing time series data B. Intervention analysis

1. ε is normally distributed with a mean of 0 2. the variance, σ 2, is constant 3. All pairs of error terms are uncorrelated

STAT E-150 Statistical Methods Residual Analysis; Data Transformations The validity of the inference methods (hypothesis testing, confidence intervals, and prediction intervals) depends on the error term,

Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

15.1 The Regression Model: Analysis of Residuals

15.1 The Regression Model: Analysis of Residuals Tom Lewis Fall Term 2009 Tom Lewis () 15.1 The Regression Model: Analysis of Residuals Fall Term 2009 1 / 12 Outline 1 The regression model 2 Estimating

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

0.1 Multiple Regression Models

0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

Chapter 2. Looking at Data: Relationships. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 2 Looking at Data: Relationships Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 2 Looking at Data: Relationships 2.1 Scatterplots

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

Minitab Guide. This packet contains: A Friendly Guide to Minitab. Minitab Step-By-Step

Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different

Data and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10

Data and Regression Analysis Lecturer: Prof. Duane S. Boning Rev 10 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance (ANOVA) 2. Multivariate Analysis of Variance Model forms 3.

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

PASS Sample Size Software. Linear Regression

Chapter 855 Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression analysis is to test hypotheses about the slope (sometimes

A correlation exists between two variables when one of them is related to the other in some way.

Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether

In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

Advanced High School Statistics. Preliminary Edition

Chapter 2 Summarizing Data After collecting data, the next stage in the investigative process is to summarize the data. Graphical displays allow us to visualize and better understand the important features

For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

CHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08

CHARTS AND GRAPHS INTRODUCTION SPSS and Excel each contain a number of options for producing what are sometimes known as business graphics - i.e. statistical charts and diagrams. This handout explores

Lecture 5: Correlation and Linear Regression

Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

Simple Linear Regression

1 Excel Manual Simple Linear Regression Chapter 13 This chapter discusses statistics involving the linear regression. Excel has numerous features that work well for comparing quantitative variables both

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

Simple Regression Theory I 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY I 1 Simple Regression Theory I 2010 Samuel L. Baker Regression analysis lets you use data to explain and predict. A simple regression line drawn through data points In Assignment

Regression and Correlation

Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

Simple Regression and Correlation

Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

Hypothesis Testing hypothesis testing approach formulation of the test statistic

Hypothesis Testing For the next few lectures, we re going to look at various test statistics that are formulated to allow us to test hypotheses in a variety of contexts: In all cases, the hypothesis testing

ID X Y

Dale Berger SPSS Step-by-Step Regression Introduction: MRC01 This step-by-step example shows how to enter data into SPSS and conduct a simple regression analysis to develop an equation to predict from.

UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

The Classical Linear Regression Model

The Classical Linear Regression Model 1 September 2004 A. A brief review of some basic concepts associated with vector random variables Let y denote an n x l vector of random variables, i.e., y = (y 1,

Chapter 10. The relationship between TWO variables. Response and Explanatory Variables. Scatterplots. Example 1: Highway Signs 2/26/2009

Chapter 10 Section 10-2: Correlation Section 10-3: Regression Section 10-4: Variation and Prediction Intervals The relationship between TWO variables So far we have dealt with data obtained from one variable

Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

Simple Linear Regression Excel 2010 Tutorial

Simple Linear Regression Excel 2010 Tutorial This tutorial combines information on how to obtain regression output for Simple Linear Regression from Excel and some aspects of understanding what the output

Notes 5: More on regression and residuals ECO 231W - Undergraduate Econometrics

Notes 5: More on regression and residuals ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Regression Method Let s review the method to calculate the regression line: 1. Find the point of

Directions: Answer the following questions on another sheet of paper

Module 3 Review Directions: Answer the following questions on another sheet of paper Questions 1-16 refer to the following situation: Is there a relationship between crime rate and the number of unemployment

04 Paired Data and Scatter Diagrams

Paired Data and Scatter Diagrams Best Fit Lines: Linear Regressions A runner runs from the College of Micronesia- FSM National campus to PICS via the powerplant/nahnpohnmal back road The runner tracks

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

Analyzing Linear Relationships, Two or More Variables

PART V ANALYZING RELATIONSHIPS CHAPTER 14 Analyzing Linear Relationships, Two or More Variables INTRODUCTION In the previous chapter, we introduced Kate Cameron, the owner of Woodbon, a company that produces

AP * Statistics Review. Linear Regression

AP * Statistics Review Linear Regression Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

B Linear Regressions. Is your data linear? The Method of Least Squares

B Linear Regressions Is your data linear? So you want to fit a straight line to a set of measurements... First, make sure you really want to do this. That is, see if you can convince yourself that a plot

Linear Regression with One Regressor

Linear Regression with One Regressor Michael Ash Lecture 8 Linear Regression with One Regressor a.k.a. Bivariate Regression An important distinction 1. The Model Effect of a one-unit change in X on the

Relationship of two variables

Relationship of two variables A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. Scatter Plot (or Scatter Diagram) A plot

EXPERIMENT 6: HERITABILITY AND REGRESSION

BIO 184 Laboratory Manual Page 74 EXPERIMENT 6: HERITABILITY AND REGRESSION DAY ONE: INTRODUCTION TO HERITABILITY AND REGRESSION OBJECTIVES: Today you will be learning about some of the basic ideas and

Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

Keynesian Theory of Consumption. Theoretical and Practical Aspects. Kirill Breido, Ilona V. Tregub

Keynesian Theory of Consumption. Theoretical and Practical Aspects Kirill Breido, Ilona V. Tregub The Finance University Under The Government Of The Russian Federation International Finance Faculty, Moscow,

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

Lecture 18 Linear Regression

Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation - used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation

Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

Homework 11. Part 1. Name: Score: / null

Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is