The response or dependent variable is the response of interest, the variable we want to predict, and is usually denoted by y.

Similar documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

1. Measuring association using correlation and regression

SIMPLE LINEAR CORRELATION

CHAPTER 14 MORE ABOUT REGRESSION

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

The OC Curve of Attribute Acceptance Plans

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

How To Calculate The Accountng Perod Of Nequalty

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

14.74 Lecture 5: Health (2)

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Calculation of Sampling Weights

The Mathematical Derivation of Least Squares

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Forecasting the Direction and Strength of Stock Market Movement

Section 5.4 Annuities, Present Value, and Amortization

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Using Series to Analyze Financial Situations: Present Value

Problem Set 3. a) We are asked how people will react, if the interest rate i on bonds is negative.

Economic Interpretation of Regression. Theory and Applications

21 Vectors: The Cross Product & Torque

An Alternative Way to Measure Private Equity Performance

Although ordinary least-squares (OLS) regression

High Correlation between Net Promoter Score and the Development of Consumers' Willingness to Pay (Empirical Evidence from European Mobile Markets)

What is Candidate Sampling

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Hedging Interest-Rate Risk with Duration

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

where the coordinates are related to those in the old frame as follows.

Finite Math Chapter 10: Study Guide and Solution to Problems

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

Faraday's Law of Induction

Financial Mathemetics

Two Faces of Intra-Industry Information Transfers: Evidence from Management Earnings and Revenue Forecasts

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Traffic-light a stress test for life insurance provisions

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Stress test for measuring insurance risks in non-life insurance

7.5. Present Value of an Annuity. Investigate

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Texas Instruments 30X IIS Calculator

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

DEFINING %COMPLETE IN MICROSOFT PROJECT

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

On some special nonlevel annuities and yield rates for annuities

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money

1. Math 210 Finite Mathematics

1 Example 1: Axis-aligned rectangles

Meta-Analysis of Hazard Ratios

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

CHAPTER EVALUATING EARTHQUAKE RETROFITTING MEASURES FOR SCHOOLS: A COST-BENEFIT ANALYSIS

How To Find The Dsablty Frequency Of A Clam

This circuit than can be reduced to a planar circuit

Lecture 2: Single Layer Perceptrons Kevin Swingler

Level Annuities with Payments Less Frequent than Each Interest Period

total A A reag total A A r eag

A Multi-mode Image Tracking System Based on Distributed Fusion

We assume your students are learning about self-regulation (how to change how alert they feel) through the Alert Program with its three stages:

Time Value of Money Module

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Thursday, December 10, 2009 Noon - 1:50 pm Faraday 143

Section 5.3 Annuities, Future Value, and Sinking Funds

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Management Quality, Financial and Investment Policies, and. Asymmetric Information

Shielding Equations and Buildup Factors Explained

Overview of monitoring and evaluation

Lecture 14: Implementing CAPM

Least Squares Fitting of Data

STATISTICAL DATA ANALYSIS IN EXCEL

Support Vector Machines

A Master Time Value of Money Formula. Floyd Vest

Prediction of Disability Frequencies in Life Insurance

Logistic Regression. Steve Kroon

BERNSTEIN POLYNOMIALS

Recurrence. 1 Definitions and main statements

An interactive system for structure-based ASCII art creation

The Greedy Method. Introduction. 0/1 Knapsack Problem

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

Online Appendix Supplemental Material for Market Microstructure Invariance: Empirical Hypotheses

Cambodian Child s Wage Rate, Human Capital and Hours Worked Trade-off: Simple Theoretical and Empirical Evidence for Policy Implications

The Effects of Tax Rate Changes on Tax Bases and the Marginal Cost of Public Funds for Canadian Provincial Governments

STAMP DUTY ON SHARES AND ITS EFFECT ON SHARE PRICES

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Hot and easy in Florida: The case of economics professors

Regression Models for a Binary Response Using EXCEL and JMP

Description of the Force Method Procedure. Indeterminate Analysis Force Method 1. Force Method con t. Force Method con t

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

Fuzzy Regression and the Term Structure of Interest Rates Revisited

How To Predct On The Web For Hfmd

THE DETERMINANTS OF THE TUNISIAN BANKING INDUSTRY PROFITABILITY: PANEL EVIDENCE

Ethnic Chinese Networking in Cross-border Investment: The Impact of Economic and Institutional Development. Abstract:

10.2 Future Value and Present Value of an Ordinary Simple Annuity

Macro Factors and Volatility of Treasury Bond Returns

The Application of Gravity Model in the Investigation of Spatial Structure

Transcription:

1

DISPLAYING THE RELATIONSHIP DEFINITIONS: Studes are often conducted to attempt to show that some eplanatory varable causes the values of some response varable to occur. The response or dependent varable s the response of nterest, the varable we want to predct, and s usually denoted by y. The eplanatory or ndependent varable attempts to eplan the response and s usually denoted by. A scatterplot shows the relatonshp between two quanttatve varables and y. The values of the varable are marked on the horzontal as, and the values of the y varable are marked n the vertcal as. Each par of observatons (, y ), s represented as a pont n the plot. Two varables are sad to be postvely assocated f, as ncreases, the values of y tends to ncrease. Two varables are sad to be negatvely assocated f, as ncreases, the values of y tends to decrease. When a scatterplot does not show a partcular drecton, nether postve, nor negatve, we say that there s no lnear assocaton.

Fnal 100 90 80 70 60 50 40 30 0 10 0 Scatterplot of Fnal vs Mdterm Scores The 10th student (1, 38) 0 10 0 30 40 50 Mdterm X Mdterm Score Y Fnal Score 1 39 6 44 69 3 3 68 4 40 86 5 45 88.5 6 46 88.5 7 33 76 8 39 66.5 9 3.5 75 10 1 38 11 30 71 1 39 88 13 44 96.5 14 8.5 71.5 15 38 96 16 43 8.5 17 4 85 18 5.5 8 19 47 95 0 36 39 1 31.5 58 3 49 3 4 6 4 1 59 5 41 90 3

Let's Do It! 1 The data below was obtaned n a study of age and systolc blood pressure of s randomly selected subjects. Make a scatter plot to eamne the relatonshp between () = age and (y) = pressure. Comment on the relatonshp wth respect to form, drecton, strength, and any departures or usual values. Subject Age Pressure y A 43 18 B 48 10 C 56 135 D 61 143 E 67 141 F 70 15 4

Notes of Cauton 1. An observed relatonshp between two varables does not mply that there s some causal lnk between the two varables. For eample, consder the followng scatter-plot of IQ score versus shoe sze: IQ Shoe Sze As a person ages ther shoe sze ncreases as well as ther IQ. Although there s a postve assocaton, there s no causal lnk between the two varables shoes sze and IQ. Most studes attempt to show that some eplanatory varable "causes" the values of the response to occur. Whle we can never postvely determne whether or not there s a dstnct cause-and-effect relatonshp, we can assess f there appears to be such relatonshp.. A relatonshp between two varables can be nfluenced by confoundng varables. Consder the followng scatter-plot of the number of sport magaznes read n a month versus the heght of the person: Number of magaznes read : women : men Heght Overall there appears to be a postve assocaton between heght and number of magaznes. However, f for each gender, there does not appear to be an assocaton. Gender s a confoundng varable and aggregatng the data across gender can result n msleadng conclusons. Any study, especally an observatonal study, has the potental to be wrongly nterpreted because of confoundng varables. 5

3. Unusual data ponts (outlers) can mslead the assocaton, especally f the data set s small. Consder the followng scatter-plot of the percentage of people who speak Englsh versus populaton sze. Percent who speak Englsh Outler Populaton Sze The eght ponts n the scatter-plot represents eght countres from Central and South Amerca selected at random. The outler s Meco Cty. 4. Sometmes a scatter plot, such as the one n Fgure below, shows a curvlnear relatonshp between the data. In ths stuaton, Methods for curvlnear relatonshps are beyond the scope of ths course. 6

Smple Lnear Regresson Fnal 100 90 80 70 60 50 40 30 0 10 0 Scatterplot of Fnal vs Mdterm Scores L n e # 1 L n e # 0 10 0 30 40 50 Mdterm So the queston remans as to how to fnd a best-fttng lne? Equaton of a Lne y = a + b where b = slope - the amount y changes when s ncreased by 1 unt. a = y-ntercept - the value of y when s set equal to zero. 7

DEFINITION:: The least squares regresson lne, gven by y a b, s the lne that makes the sum of the squared vertcal devatons of the data ponts from the lne as small as possble. Performng the regresson s often stated as regress y on. Least squares regresson lne for regressng fnal eam scores, y, on mdterm eam scores,, s gven by y 7. 5175.. Estmated slope of b=1.75 tells us that for a 1-pont ncrease on the mdterm we would epect, on average, an ncrease of 1.75 ponts on the fnal eam. Estmated y-ntercept of a=7.5 tells us that f someone were to score 0 ponts on the mdterm, we would predct they would get 7.5 ponts on the fnal eam. Suppose a new student scores 40 ponts on the mdterm. Based on our model, what would be ther predcted fnal eam score? Plug the value of =40 nto our estmated equaton. The predcted fnal Eam score s y 7. 5175. ( 40) 77. 5 ponts. 8

Let's Do It! 13. Chldhood Growth The growth of chldren from early chldhood through adolescence generally follows a lnear pattern. Data on the heghts of female Amercans durng chldhood, from four to nne years old, were compled and the least squares regresson lne was obtaned as y 80 6, where y s heght n centmeters and s age n years. Note that 1 nch s equal to.54 centmeters. (a) Interpret the value of the estmated slope b= 6. (b) Would nterpretaton of the value of the estmated y-ntercept, a= 80, make sense here? If yes, nterpret t. If no, eplan why not. (c) What would you predct the heght to be for a female Amercan at 8 years old? Gve your answer frst n centmeters then n nches. (d) What would you predct the heght to be for a female Amercan at 5 years old? Gve your answer frst n centmeters then n nches. (e) Why do you thnk your answer to part (d) was so naccurate? 9

Calculatng the Least Squares Regresson Lne The Least Squares Regresson Lne The least squares regresson lne s gven by y a b where slope = y y b y ntercept = a y b n n y y Eample Test 1 versus Test Obtanng the Regresson Lne By Hand a) Look at the relatonshp graphcally wth a scatter-plot to confrm ntally that a lnear model seems approprate. b)calculate the estmated regresson lne by completng the calculaton table shown below. n n y y 5 884 6070 0 b 5 760 60 00 11.. 70 a y b 11. 60 08.. 5 5 Least squares equaton: y 08. 11. c) Slope of the lne s b= 1.1. Ths means that Test scores are epected to go up by 1.1 ponts on average for each addtonal pont scored on test 1. 10

d) A student who scored 15 ponts on Test 1 s predcted to score y 08. 11. ( 15) 17. 3 ponts on Test. Test 1 versus Test Obtanng the Regresson Lne Usng the TI Calculator To obtan the least squares regresson lne usng the TI graphng calculator we would frst need to enter the data. L1 L 8 9 10 13 1 14 14 15 16 19 Enter the values of the quanttatve varable = Test 1 nto L 1 and enter to correspondng value of the quanttatve varable y=test nto L. To get the least squared regresson equaton we use the followng sequence of buttons Your output screen should provde the least squares regresson equaton as y = a + b wth the y-ntercept of a=0.8 and the slope of b=1.1. Cauton: There are two lnear regresson optons-namely LnReg(a+b) and LnReg(a+b). We request the latter opton, whch uses b to represent the slope. 11

Let's Do It! 13.3 Ol-Change Data The table below presents data on = the number of ol changes per year and y = the cost of repars for a random sample of 10 cars of a certan make and model, from a gven regon. (a) Make a scatter-plot of the ponts as a check for lnearty and outlers. Comment on your plot. (b) Fnd the least squares regresson lne for regressng cost on number of ol changes. Descrbe what the estmated y-ntercept and estmated slope represent. (c) Use your least squares regresson lne to predct the cost of car repars for a car that had four ol changes. 1

CORRELATION: HOW STRONG IS THE LINEAR RELATIONSHIP? DEFINITION: The sample correlaton coeffcent r measures the strength of the lnear relatonshp between two quanttatve varables. It descrbes the drecton of the lnear assocaton and ndcates how closely the ponts n a scatter-plot are to the least squares regresson lne. Features of the correlaton coeffcent. 1. Range 1 r 1. Sgn The sgn of the correlaton coeffcent ndcates drecton of assocaton negatve [-1, 0) or postve (0, +1]. 3. Magntude The magntude of the correlaton coeffcent ndcates the strength of the lnear assocaton. If the data follow a straght lne r 1 (f the slope s postve) or r 1 (f the slope s negatve), ndcatng a perfect lnear assocaton. If r 0 then there s no lnear assocaton. 4. Measures Strength The correlaton only measures the strength of the lnear assocaton. 5. Unt-less The correlaton s computed usng standard scores of the two varables. It has no unt of measure and the absolute value of r wll not change f the unts of measurement for or y are changed. The correlaton between and y the same as the correlaton between y and. Some Pctures... s 13

y r 0.8. Postve, moderate to strong lnear assocaton, y Negatve, weak lnear assocaton, r 0. y A strong assocaton, just not a lnear one, r 0. Let's Do It! 113.8Matchng Graphs The scatter-plot #1 to the rght yelds a regresson lne of y = -.6 + 1.1 and a correlaton of r = 0.84. Usng ths nformaton as a base, match each of the four scatter-plots below to the correct descrpton of ts regresson lne and correlaton coeffcent. The scales on the aes of the scatter-plots are the same. 14

15

How to Calculate the Correlaton Coeffcent r The formula: r n n y y n y y Eample Test 1 v e r s us Test Obtanng t he Correlaton Coeffcent By Hand We already have computed the summaton quanttes needed for fndng r, shown n the calculaton table. Completed Calculaton Table y y y 8 9 64 7 81 10 13 100 130 169 1 14 144 168 196 14 15 196 10 5 16 19 56 304 361 Total: 60 y 70 760 y 884 y 103 r n y y n n y y 5( 884) ( 60)( 70) 5( 760) ( 60) 5( 103) ( 70) 0. 965 The large postve correlaton coeffcent and the scatter-plot ndcate a strong, postve, lnear assocaton between Test 1 and Test scores. 16

Obtanng the Correlaton Usng the TI To get the regresson lne and the correlaton coeffcent usng the TI we frst need to turn on the dagnostc opton. If the data s n L 1 and the y data s n L, then the steps are as follows: Let s Do It! Brth Rates We gathered data from 1970 for twelve natons on the percentage of women aged 14 or older who were economcally actve and the crude brth rate. (We defne the crude brth rate as the number of brths n a year per 1000 populaton sze) We are nterested n the relatonshp of the crude brth rate (y) on the percentage of women who were economcally actve () a. Create the scatter-plot. Determne f there s a postve, negatve, or assocaton between and y. Naton y Algera 48 Argentna 19 1 Denmark 34 14 E. Germany 40 11 Guatemala 8 41 Inda 1 37 Ireland 0 Jamaca 0 31 Japan 37 19 Phlppnes 19 4 USA 30 15 Sovet Unon 46 18 b. Fnd the equaton of the regresson lne. Interpret the slope. c. Fnd the correlaton coeffcent r. Homework wll be posted on MyMathlab 17