Inference for the slope of the LSRL Student Saturday Session

Size: px
Start display at page:

Download "Inference for the slope of the LSRL Student Saturday Session"

Transcription

1 Student Notes Prep Session Topic: Inference for Regression The AP Statistics exam is likely to have several items that test your ability to compute confidence intervals and perform significance tests for the slope of a least squares regression line. Past questions on this topic have provided computer output with the standard error of the slope. Note that the topic outline does not include inference for the intercept of a least squares line, nor does it include inference for predictions. Formula Provided: The following formula is provided in the section on descriptive statistics: (yi! y i )2 n!2 sb1 = (xi! x )2 This formula is not very intuitive but it measures the spread in the sampling distribution of the slope of the LSRL. That is, if you were to take every possible sample of size n from a population, calculate each slope and create a distribution of all possible slopes from these samples, then sb1 is the standard deviation of this sampling distribution. Communication, skills, and understanding Inference about the population (true) regression line y =!O +!1 x is based on the sample least squares regression line y = bo + b1 x. Be sure you understand that the regression coefficients bo and b1 vary with different samples. Inference about!1 is based on knowledge of the sampling distribution of the sample slope b1. Theory tells us that if certain conditions (see below) are satisfied, sample slopes will be normally distributed with mean equal to the true value of the slope for the population,!1, and standard deviation b1!! 1!. When we estimate! b1 with sb1 (see formula above), the quantity has a t! b1 = sb1 (xi! µ x )2 distribution with n 2 degrees of freedom. Conditions for inference: 1. Linear model is appropriate. (True relationship is linear.) Check the scatter plot for linearity and the residual plot for no pattern. 2. Independent observations. This is a design issue that should be addressed in information about the data. Random sampling or random assignment will suffice. 3. Normality the y values vary normally about the true regression line. Check that residuals are approximately normally distributed using a histogram, dotplot, normal probability plot, or stem and leaf plot. 4. Standard deviation of y values is the Same for every value of x. Check the residual plot to be sure that the spread of the residuals about the horizontal axis is approximately uniform (no trumpet appearance). Confidence intervals (as always) require that you 1. Identify the confidence interval procedure that you will use 2. State the conditions and verify that they are satisfied. 3. Carry out the computations for the confidence interval. Be sure to state degrees of freedom. A confidence interval for slope has the familiar form (estimate) ± (ctirtcal value)(standard deviation of the statistic). In * sb1 the context of the slope of the LSRL, that is b1 ± tn!2 4. Interpret the confidence interval in the context of the problem.

2 Significance tests (as always) require that you 1. State the hypotheses in terms of parameters, using standard symbols and words that communicate the context of the problem. 2. Identify the test you will perform. Then state the conditions and verify that they are satisfied. 3. Calculate the test statistic and P value. Also state the degrees of freedom. 4. State your conclusion in context and connect it to the P value. The statement of the null hypothesis assumed by computers and calculators is H O :!1 = 0. However, the null hypothesis may specify any value for!1, for example, H O :!1 = 1. When the null hypothesis is H O :!1 = 0, the test is often referred to as the model utility test since the null hypothesis can be thought of as a statement that there is no useful relationship between the variables. Rejecting the null hypothesis leads to the conclusion that the linear regression model with x as the explanatory variable is useful for predicting y. It would also be appropriate to state that there is evidence to suggest a linear relationship between the variables. Typically, the information will be presented as output from a computer software package such as Minitab. The output may look similar to the output in this example from the 2012 exam. It is important that you can pick out the important information in the output. Example: As part of a class project at a large university, Amber selected a random sample of 12 students in her major field of study. All students in the sample were asked to report their number of hours spent studying for the final exam and their score on the final exam. A regression analysis on the data produced the following partial computer output. Assume all conditions for inference have been met. y intercept slope Predictor Constant Study Hours Coef S = R-sq = 56.7% SE Coef Standard error of the slope T P P value associated with the 2-tailed test.

3 From the information given on the previous page, answer the following questions: A. Find the equation of the least squares regression line. Be sure to define the variables in the equation. B. Interpret the slope and y intercept in the equation in part A. C. What percent of the variation in students grades can be attributed to a linear relationship to the number of hours spent studying? D. Find a 95% confidence interval for the slope of the least squares regression line for all students in Amber s field of study. Interpret the interval. E. Is there evidence at the 0.05 level to suggest that students scores increase with the number of hours spent studying?

4 Answers: A. y = x where y is the estimated score a student in Amber s major would score on the exam if he/she spent x hours studying for the exam. B. 56.7% - this is simply r 2 which is given in the table as R-sq * sb1 = (2.697) ± (2.228)(0.745) = (1.037, 4.357) - I am 95% confident that C. A 95 % confidence interval is b1 ± tn!2 for each additional hour of studying, a students score will increase by approximately to points. D. 1. H O :!1 = 0 H A :!1 > 0 where!1 is the slope of the LSRL for all students in Amber s major. 2. I will perform a linear regression t test for the slope of the least squares regression line. The problem stated that the conditions were satisfied. These conditions are: a linear model is appropriate (the scatterplot of hours vs scores looked linear (not curved), the sample was a simple random sample, the distribution of the residuals is approximately normal, the spread of the residuals was fairly constant for all values of x. 3. t = 3.62 and P(t10 > 3.62)! (.5)(0.005) =.0025 (half of the two sided P-val given in the table) 4. Since the p-value is less than 0.05, I reject the null hypothesis in favor of the alternative. That is, I have sufficient evidence to suggest that the scores for students in Amber s major tend to increase as they spend more time studying. Multiple Choice Questions 1. In a study of the performance of a computer printer, the size (in kilobytes) and the printing time (in seconds) for each of 22 small text files were recorded. A regression line was a satisfactory description of the relationship between size and printing time. The results of the regression analysis are shown below. Which of the following should be used to compute a 95 percent confidence interval for the slope of the regression line? A. B. C. D. E ± 2.086! ±1.96! ±1.725! ± 2.086! ±1.725!

5 For problems 2-3: Boiling and melting points (in degrees Celsius) are recorded for 21 selected substances, and regression analysis is used to describe the relationship between them. The results of the analysis are shown below. Dependent variable is: Boiling point Predictor Constant Melting point Coef S = R-sq = 73.4% SE Coef T P Assume that all of the conditions for regression have been met. 2. Which of the following gives a 95% confidence interval for the slope of the regression line? A. B. C. D. E ±1.729(0.2104) ±1.96(0.2104) ± 2.093(0.2104) ±1.729(146.7) ± 2.093(626.4) 3. Suppose that a significance test was conducted to determine whether there was a useful positive linear relationship between the melting point and the boiling points of substances. Does this analysis provide sufficient evidence to suggest that there is a positive linear relationship between melting points and boiling points of substances at the 5% level? A. B. C. D. E. Yes because the slope of the line for these 21 substances is (which is positive). Yes, the p value for the 1-sided test is and < Yes, the p value for the 1-sided test is (0.5)(0.0001) = and < No, the p value for this 1-sided test is and > No, the p value for this 1-sided test is (2)(0.0481) = and > 0.05.

6 MC Answers: 1-A, 2-C, 3-C 2005B #5 5. John believes that as he increases his walking speed, his pulse rate will increase. He wants to model this relationship. John records his pulse rate, in beats per minute (bpm), while walking at each of seven different speeds, in miles per hour (mph). A scatterplot and regression output are shown below. A. Using the regression output, write the equation of the fitted regression line.

7 B. Do your estimates of the slope and intercept parameters have meaningful interpretations in the context of this question? If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval for the slope parameter in his final report. Compute the margin of error that John should use. Assume that conditions for inference are satisfied.

8 AP STATISTICS 2005 SCORING GUIDELINES (Form B) Question 5 Solution Part (a): Predicted Pulse = (Speed) Part (b): The intercept ( bpm) provides an estimate for John s mean resting pulse (walking at a speed of zero mph). The slope ( bpm/mph) provides an estimate for the mean increase in John s heart rate as his speed is increased by one mile per hour. Part (c): * " sb1 = " = bpm. The margin of error for the confidence interval for the slope parameter is tn!2 Scoring Part (a) is scored as essentially correct (E) or incorrect (I). Parts (b) and (c) are scored as essentially correct (E), partially correct (P), or incorrect (I). Note: If the student uses x and y, then both variables must be identified. Part (b): There are four steps to constructing correct interpretations: Step 1: A correct mathematical interpretation of the reported slope ( ) as a rate of increase in heart rate as walking speed increases. Step 2: A correct mathematical interpretation of the reported intercept as a pulse rate when walking speed is zero. Step 3: Correct use of units of measurement, e.g., John s heart rate increases bpm as his speed is increased by one mile per hour. Step 4: Interpretation of the reported values as estimates of the corresponding mean quantities. Part (b) is essentially correct (E) if all four steps are correct. Part (b) is partially correct (P) if two or three steps are correctly addressed. Step 2 is scored as incorrect, for example, if the student suggests that the intercept does not have a meaningful interpretation. Part (b) is incorrect (I) if at most one step is correct. Note: The student is only penalized once for switching the variables.

9 Part (c) is essentially correct (E) if the standard error of the slope is identified and the correct critical value is used to calculate the margin of error. Part (c) is partially correct (P) if the student: Computes the 98% confidence interval but does not identify the margin of error; OR Recognizes that the margin of error consists of the standard error of the coefficient and the critical value but uses an incorrect value for one of the two components or uses a t-value with 6 degrees of freedom and an incorrect standard error. Part (c) is incorrect (I) if the student uses: The standard error of the coefficient as the margin of error; OR A critical value as the margin of error. 4 Complete Response (3E) All three parts essentially correct 3 Substantial Response (2E 1P) Two parts essentially correct and one part partially correct 2 Developing Response (2E 0P or 1E 2P) Two parts essentially correct and zero parts partially correct OR One part essentially correct and two parts partially correct 1 Minimal Response ( 1E 1P or 1E 0P or 0E 2P) One part essentially correct and either zero parts or one part partially correct OR Zero parts essentially correct and two parts partially correct

10

11 Sample: 5A Score: 4 In part (a) the correct formula for the estimated regression line is reported, and the variables are clearly defined. The student clearly realizes the estimated regression line provides estimates of pulse rate for various walking speeds. In part (b) the student clearly indicates that John s pulse rate would be expected to be close to the estimated intercept ( bpm) when his walking speed is zero. This conveys the notion of the estimated intercept as an estimate of John s mean pulse rate when he is not walking. Both the estimated intercept and the estimated slope are interpreted in the context of the problem using appropriate units of measurement. The margin of error is correctly evaluated in the response to part (c). The student clearly shows that the t-value is based on 5 degrees of freedom.

12 Sample: 5B Score: 3 The response to part (a) does not report the estimated regression line in the context of the problem, nor does it define the X and Y variables used in the formula. The response to part (b) provides interpretations of both the estimated intercept and the estimated slope in the context of the problem. Appropriate units of measurement are used in the interpretation of the slope, but bpm is omitted from the interpretation of the intercept. The interpretation of the slopes uses increases on average by to indicate that the slope is an average rate of increase and heart rate is around to indicate that the intercept is a prediction of John s resting heart rate. The communication of these concepts could have been better. The margin of error is correctly evaluated in the response to part (c) and the supporting work is shown.

13 2010B #6 Although this next problem is classified as an investigative task, it has many of the elements linear regression including inference for the slope of the least squares regression line. STATISTICS SECTION II Part B Question 6 Spend about 25 minutes on this part of the exam. Percent of Section II score 25 Directions: Show all your work. Indicate clearly the methods you use, because you will be scored on the correctness of your methods as well as on the accuracy and completeness of your results and explanations. 6. A real estate agent is interested in developing a model to estimate the prices of houses in a particular part of a large city. She takes a random sample of 25 recent sales and, for each house, records the price (in thousands of dollars), the size of the house (in square feet), and whether or not the house has a swimming pool. This information, along with regression output for a linear model using size to predict price, is shown below and on the next page.

14 A. Interpret the slope of the least squares regression line in the context of the study.

15 B. The second house in the table has a residual of 49. Interpret this residual value in the context of the study. The real estate agent is interested in investigating the effect of having a swimming pool on the price of a house. C. Use the residuals from all 25 houses to estimate how much greater the price for a house with a swimming pool would be, on average, than the price for a house of the same size without a swimming pool.

16 To further investigate the effect of having a swimming pool on the price of a house, the real estate agent creates two regression models, one for houses with a swimming pool and one for houses without a swimming pool. Regression output for these two models is shown below. D. The conditions for inference have been checked and verified, and a 95 percent confidence interval for the true difference in the two slopes is ( 0.099, 0.110). Based on this interval, is there a significant difference in the two slopes? Explain your answer.

17 E. Use the regression model for houses with a swimming pool and the regression model for houses without a swimming pool to estimate how much greater the price for a house with a swimming pool would be than the price for a house of the same size without a swimming pool. How does this estimate compare with your result from part C? AP STATISTICS 2010 SCORING GUIDELINES (Form B) Question 6 Intent of Question The primary goals of this investigative task were to assess students ability to understand, apply and draw conclusions from a regression analysis beyond what they have previously studied. More specific goals were to assess students ability to (1) interpret a slope coefficient and residual value; (2) interpret a confidence interval; (3) compare two regression models and draw appropriate conclusions. Solution Part (a): The slope coefficient is This means that for each additional square foot of size, the predicted price of the house increases by thousand dollars, which is $165. In other words, this model predicts that the average price of a house increases by $165 for each additional square foot of a house s size. Part (b): The residual value of 49 for this house indicates that its actual price is 49 thousand dollars higher than the model would predict for a house of its size.

18 Part (c): The average residual value for the eight houses with a swimming pool is: ( (!18) +! + 42) 149 = = 18.6 thousand dollars. 8 8 The average residual value for the 17 houses with no swimming pool is: ( (!45) +! + 33)!150 = =!8.8 thousand dollars The residual averages suggest that the regression line tends to underestimate the price of homes with a swimming pool by about 18.6 thousand dollars and to overestimate the price of homes with no pool by about 8.8 thousand dollars. The difference between these two residual averages is 18.6 ( 8.8) = 27.4 thousand dollars. This suggests that, for two houses of the same size, the house with a swimming pool would be estimated to cost $27,400 more than the house with no swimming pool. Part (d): No, this confidence interval does not indicate a significant difference (at the 95 percent confidence level, equivalent to the 5 percent significance level) between the two slope coefficients because the interval includes the value zero. Part (e): If the two population regression lines do in fact have the same slope, the impact of a swimming pool is the (constant) vertical distance between the two lines. However, because the two fitted lines do not have the same slope, the distance between the two fitted lines depends on the size of the house. Using the available information, there are two acceptable approaches to estimating the impact of having a swimming pool. Approach 1: Use the two fitted lines to predict the price of a house with and without a pool for a particular house size. For example, using the value of size = 2,250 square feet (which is near the middle of the distribution of house sizes), we find: Predicted price for a 2,250 square-foot house with a swimming pool = (2,250) = thousand dollars. Predicted price for a 2,250 square-foot house with no swimming pool = (2,250) = thousand dollars. The difference in these predicted prices is = thousand dollars, which is an estimate of the impact of a swimming pool on the predicted price of a 2,250 square-foot house. This is quite similar to the estimate based on residuals in part (c). Approach 2: Because the slopes of the two sample regression lines were judged not to be significantly different, another acceptable approach would be to use the difference in the intercepts of the two fitted lines as an estimate of the vertical distance between the two population regression lines. The difference in the intercepts of the two fitted lines is ( ) = thousand dollars, which is an estimate of the impact of a swimming pool on the predicted price of a house, assuming this difference does not change with the size of the house. This is quite different from the estimate based on residuals in part (c).

19 Scoring This question is scored in four sections. Section 1 consists of part (a); section 2 consists of part (b); section 3 consists of part (c); section 4 consists of parts (d) and (e). Each of the four sections is scored as essentially correct (E), partially correct (P) or incorrect (I). Section 1 is scored as follows: Essentially correct (E) if the response identifies the correct value for the slope coefficient and provides a correct interpretation in context. Partially correct (P) if the response identifies the correct value for the slope coefficient and provides a correct interpretation but not in context OR the response provides an incorrect value for the slope but provides a correct interpretation of this value in context OR the response identifies the correct value for the slope but the interpretation is incomplete because of one or more of the following errors: The interpretation does not mention predicted or on average or any other indication of a probabilistic rather than a deterministic relationship. The interpretation does not include the notion of each additional square foot of size by saying something like for every square foot. The interpretation does not use units for the price variable, or it uses incorrect units for the price variable (e.g., dollars instead of thousands of dollars). Incorrect (I) if there is no interpretation or if the interpretation does not warrant a score of P. Note: It is possible to earn an E for section 1 without stating the actual numerical value of the slope, if a correct and well-communicated interpretation of the slope is given in context. Section 2 is scored as follows: Essentially correct (E) if the response provides a correct interpretation of the residual value, in context, including both direction and a comparison with the model s predicted or average value (e.g., actual price is higher than predicted). Partially correct (P) if the response provides an interpretation of the residual value that fails to mention direction or that gives the incorrect direction OR if the response provides a correct interpretation of the residual value that includes direction, but that is not in context. Incorrect (I) if there is no interpretation of the residual value OR the interpretation does not include direction and is not in context. Section 3 is scored as follows: Essentially correct (E) if the response correctly calculates averages of residual values both for houses with pools and houses without pools AND correctly reports the difference between those averages as the estimate of the impact of a swimming pool. Partially correct (P) if the response either correctly calculates averages of residual values both for houses with pools and houses without pools but does not correctly report the difference between those averages as the estimate of the impact of a swimming pool OR incorrectly calculates one or both averages of residual values but does report the difference between those averages as the estimate of the impact of a swimming pool OR does not use all of the residual values but does use a reasonable set of residual values (such as houses of similar size) and correctly calculates both averages and correctly reports the difference between those averages as the estimate of the impact of a swimming pool. Incorrect (I) if the response does not meet the criteria for an E or P. Notes: If the student calculates some other measure of center for the two sets of residuals (e.g., medians) and reports the difference as the estimate of the impact of a swimming pool, this part can be scored, at best, partially correct (P). If the student estimates the values of the residuals from the residual plot rather than using the residuals provided in the table, the response can be scored as essentially correct (E), provided it is clear that this is what was done.

20 Section 4 is scored as follows: Essentially correct (E) if the response includes all three of the following components: 1. Correctly notes that the confidence interval in part (d) includes zero and so the difference in the slopes is not statistically significant. 2. Calculates a reasonable estimate in part (e): For approach 1, this includes choosing a house size within the range of the data and correctly computing the difference in predicted prices. For approach 2, this includes appealing to the fact that the slopes were judged as not significantly different and computing the difference in intercepts. 3. Includes a comparison of the estimate in part (e) to the estimate in part (c). Partially correct (P) if the response includes only one of (1) and (2) above. Incorrect (I) if the response includes neither (1) nor (2) above. Notes If the response uses approach 1, the difference between the two predicted values can range from to 33.44, depending on the house size used. If the response uses approach 2, the constant vertical distance can be estimated from the graph showing the two regression lines rather than on the difference in intercepts, provided that the response makes it clear that this is what is being done. In the comparison with the estimate in part (c), an assessment of the size of the difference in estimates is not required. Statements that merely use phrases like greater than, about the same, etc. are acceptable for the comparison component of parts (d) and (e). If this section receives a score of partially correct only because the student neglects to compare the estimate in part (e) to the estimate in part (c), the response should be scored up if a decision on whether to score up or down is required. If the response subtracts the two fitted equations to obtain a general expression for the vertical distance between the two fitted lines as a function of house size, this should be considered an essentially correct approach for component 2 of section 4. The resulting expression is (size). If the student uses a house size outside the range of the data to compute the difference in predicted price, this can only be considered correct if the student appeals to the fact that the slopes of the sample regression lines are not significantly different. Each essentially correct (E) section counts as 1 point. Each partially correct (P) section counts as 1/2 point. 4 Complete Response 3 Substantial Response 2 Developing Response 1 Minimal Response If a response is between two scores (for example, 2 1/2 points), use a holistic approach to determine whether to score up or down, depending on the overall strength of the response and communication. In deciding whether to score up or down, pay particular attention to the response to the investigative part of the question (section 4).

21

22 Sample: 6A Score: 4 Part (a) of this response includes a correct interpretation of the slope, in context, so section 1, consisting of part (a), was scored as essentially correct. Section 2, consisting of part (b), was also scored as essentially correct because the residual of 49 is correctly interpreted in context. In part (c) residual averages are computed separately for houses with pools and for houses without pools, and the difference in the residual averages is correctly calculated; thus section 3, consisting of part (c), was scored as essentially correct. In part (d) the response correctly states that there is no significant difference in the slopes and provides appropriate justification based on the given confidence interval. In part (e) a house size of 2,000 square feet, which is within the range of house sizes in the sample, is chosen, and the difference in price for a house of this size with a pool and a house of this size without a pool is computed. This estimate is then compared with the estimate in part (c). Section 4, consisting of parts (d) and (e), therefore includes all three components needed to receive a score of essentially correct. The entire answer, based on all four sections, was judged a complete response and earned a score of 4.

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Answer Key for California State Standards: Algebra I

Answer Key for California State Standards: Algebra I Algebra I: Symbolic reasoning and calculations with symbols are central in algebra. Through the study of algebra, a student develops an understanding of the symbolic language of mathematics and the sciences.

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six free-response questions Question #1: Extracurricular activities

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

PLOTTING DATA AND INTERPRETING GRAPHS

PLOTTING DATA AND INTERPRETING GRAPHS PLOTTING DATA AND INTERPRETING GRAPHS Fundamentals of Graphing One of the most important sets of skills in science and mathematics is the ability to construct graphs and to interpret the information they

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS Instructor: Mark Schilling Email: mark.schilling@csun.edu (Note: If your CSUN email address is not one you use regularly, be sure to set up automatic

More information

AP Statistics: Syllabus 1

AP Statistics: Syllabus 1 AP Statistics: Syllabus 1 Scoring Components SC1 The course provides instruction in exploring data. 4 SC2 The course provides instruction in sampling. 5 SC3 The course provides instruction in experimentation.

More information

(More Practice With Trend Forecasts)

(More Practice With Trend Forecasts) Stats for Strategy HOMEWORK 11 (Topic 11 Part 2) (revised Jan. 2016) DIRECTIONS/SUGGESTIONS You may conveniently write answers to Problems A and B within these directions. Some exercises include special

More information

Violent crime total. Problem Set 1

Violent crime total. Problem Set 1 Problem Set 1 Note: this problem set is primarily intended to get you used to manipulating and presenting data using a spreadsheet program. While subsequent problem sets will be useful indicators of the

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Formula for linear models. Prediction, extrapolation, significance test against zero slope. Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Problem Solving and Data Analysis

Problem Solving and Data Analysis Chapter 20 Problem Solving and Data Analysis The Problem Solving and Data Analysis section of the SAT Math Test assesses your ability to use your math understanding and skills to solve problems set in

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

Chapter 5 Estimating Demand Functions

Chapter 5 Estimating Demand Functions Chapter 5 Estimating Demand Functions 1 Why do you need statistics and regression analysis? Ability to read market research papers Analyze your own data in a simple way Assist you in pricing and marketing

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

More information

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving Section 7 Algebraic Manipulations and Solving Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving Before launching into the mathematics, let s take a moment to talk about the words

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

The Point-Slope Form

The Point-Slope Form 7. The Point-Slope Form 7. OBJECTIVES 1. Given a point and a slope, find the graph of a line. Given a point and the slope, find the equation of a line. Given two points, find the equation of a line y Slope

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information