Multiple Regression. SPSS output. Multiple Regression Multiple Regression Model:

Similar documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

CHAPTER 14 MORE ABOUT REGRESSION

STATISTICAL DATA ANALYSIS IN EXCEL

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Regression Models for a Binary Response Using EXCEL and JMP

SIMPLE LINEAR CORRELATION

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Economic Interpretation of Regression. Theory and Applications

1. Measuring association using correlation and regression

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Credit Limit Optimization (CLO) for Credit Cards

Quantization Effects in Digital Filters

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Statistical Methods to Develop Rating Models

1 De nitions and Censoring

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

How To Calculate The Accountng Perod Of Nequalty

Lecture 5,6 Linear Methods for Classification. Summary

Although ordinary least-squares (OLS) regression

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

5 Multiple regression analysis with qualitative information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Forecasting the Direction and Strength of Stock Market Movement

Survival analysis methods in Insurance Applications in car insurance contracts

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Portfolio Loss Distribution

Effective wavelet-based compression method with adaptive quantization threshold and zerotree coding

! # %& ( ) +,../ # 5##&.6 7% 8 # #...

The Choice of Direct Dealing or Electronic Brokerage in Foreign Exchange Trading

Generalized Linear Models for Traffic Annuity Claims, with Application to Claims Reserving

Evaluating the generalizability of an RCT using electronic health records data

Media Mix Modeling vs. ANCOVA. An Analytical Debate

Analysis of Premium Liabilities for Australian Lines of Business

Location Factors for Non-Ferrous Exploration Investments

The Application of Fractional Brownian Motion in Option Pricing

International University of Japan Public Management & Policy Analysis Program

The Racial and Gender Interest Rate Gap. in Small Business Lending: Improved Estimates Using Matching Methods*

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Quantification of qualitative data: the case of the Central Bank of Armenia

Marginal Returns to Education For Teachers

2013 Australasian College of Road Safety Conference A Safe System: The Road Safety Discussion Adelaide

Calculation of Sampling Weights

Automobile Demand Forecasting: An Integrated Model of PLS Regression and ANFIS

Online Appendix for Forecasting the Equity Risk Premium: The Role of Technical Indicators

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

The Current Employment Statistics (CES) survey,

The OC Curve of Attribute Acceptance Plans

Evaluation of E-learning Platforms: a Case Study

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Logistic Regression. Steve Kroon

Understanding the Impact of Marketing Actions in Traditional Channels on the Internet: Evidence from a Large Scale Field Experiment

RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

An Alternative Way to Measure Private Equity Performance

ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C White Emerson Process Management

The Choice of Direct Dealing or Electronic Brokerage in Foreign Exchange Trading

7 ANALYSIS OF VARIANCE (ANOVA)

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

A DYNAMIC ANALYSIS OF

14.74 Lecture 5: Health (2)

BANKRUPTCY PREDICTION BY USING SUPPORT VECTOR MACHINES AND GENETIC ALGORITHMS

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Available online ISSN: Society for Business and Management Dynamics

Classification errors and permanent disability benefits in Spain

Transition Matrix Models of Consumer Credit Ratings

Meta-analysis in Psychological Research.

Financial Instability and Life Insurance Demand + Mahito Okura *

A household-based Human Development Index. Kenneth Harttgen and Stephan Klasen Göttingen University, Germany

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution

The Greedy Method. Introduction. 0/1 Knapsack Problem

Mean Molecular Weight

Shielding Equations and Buildup Factors Explained

Lei Liu, Hua Yang Business School, Hunan University, Changsha, Hunan, P.R. China, Abstract

Richard W. Andrews and William C. Birdsall, University of Michigan Richard W. Andrews, Michigan Business School, Ann Arbor, MI

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines.

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

Measures of Fit for Logistic Regression

Decision Tree Model for Count Data

Sketching Sampled Data Streams

The Impact of Residential Density on Vehicle Usage and Energy Consumption *

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Covariate-based pricing of automobile insurance

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Causality and potential outcomes Average causal effects

The impact of bank capital requirements on bank risk: an econometric puzzle and a proposed solution

von Hinke Kessler Scholder, Stephanie; Smith, George Davey; Lawlor, Debbie A.; Propper, Carol; Windmeijer, Frank

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

The announcement effect on mean and variance for underwritten and non-underwritten SEOs

Does a Threshold Inflation Rate Exist? Quantile Inferences for Inflation and Its Variability

Transcription:

Weght Multple Regresson Multple Regresson Relatng a response (dependent, nput) y to a set of eplanatory (ndependent, output, predctor) varables,,,. A technue for modelng the relatonshp between one response varable wth several predctor varables. y y,,,..., Determnstc component... Random component Multple Regresson : y... mnmze e [ y (... )],,,,..., n the model can all be estmated by least suare estmators:, ˆ, ˆ, ˆ,..., ˆ ˆ The Least-Suare Regresson Euaton: y ˆ ˆ ˆ ˆ... ˆ ˆ Study weght (y) usng age ( ) and heght ( ). 0 0 00 00 Data: (months), heght (nches), weght (pounds) were recorded for a group of school chldren. 0 00 : 0 Scatter plo above show that both age and heght are lnearly related to weght. y 50 wth weght y, age, and heght 4 SPSS output Summary Adjusted Std. Error of R R Suare R Suare the Estmate.794 a..67.868 a. Predctors:,, Coeffcent of determnaton: the percentage of varablty n the response varable (Weght) that can be descrbed by predctor varables (, ) through the model. 5 Regresson Resdual a. Predctors:,, b. Dependent Varable: Weght ANOVA b Sum of Suares df Mean Suare F Sg. 56.54 86.67 99..000 a 9.76 4.858 8994.05 6 Test for sgnfcance of the model: H 0 : s nsgnfcant ( s are all zeros). H a : s sgnfcant (Some s are not zeros). 6

Multple Regresson estmaton: SPSS output -7..099-0.565.000..055.8 4..000.579.77.090.57.67.008.000.579.77 Tes for Regresson Coeffcen H 0 : = 0 vs. H a : 0 H 0 : = 0 vs. H a : 0 H 0 : = 0 vs. H a : 0 Collnearty * statstcs: If the VIF (Varance Inflaton Factor) s greater than 0 there s problem of Multcollnearty. (Some sad VIF needs to be less than 4.) 7-7..099-0.565.000..055.8 4..000.579.77.090.57.67.008.000.579.77 Least suare regresson euaton: yˆ 7.8.4. 09 The average weght of chldren 44 months old and whose heght s 55 nches would be: 7.8 +.4(44) +.09(55) = 76.69 lbs (estmated by the model) 8 How to nterpret, and? : y = + where y: Weght, :, : s the constant or the y-ntercept n the model. It s the average response when both predctor varables are 0. s the rate of change of epected (average) weght per unt change of age adjusted for the heght varable. s the rate of change of epected (average) weght per unt change of heght adjusted for the age varable. Other possble models: ( y: Weght, :, : ) y = + + y = + + Interacton term Wth nteracton term (Non-addtve): y = + + + + y = + + + y = + + + 9 0 Coeffcent Estmaton wth Interacton Between and : INTAG_HT y wth weght y, age, and heght 66.996 06.89.6.59 -.97.6 -.9 -.476.4.004 50.009 -.E-0. -.006 -.08.985.0 77.06.96E-0.00.66.847.066.00 50.996 Hgh VIF mples very serous collnearty. Interacton should not be used n the model. For boys: -.7 5.590-7.94.000.08.084.89.67.000.44.59.68.68.574 7.8.000.44.59 Is there a serous collnearty? Wrte the weght predcton euaton usng age and heght as predctor varables. Fnd the average weght for boys that are 44 months old and 55 nches tall.

Weght Multple Regresson For grls: -50.597 0.767-7.5.000.9.076.86.54.0.4..4.8.650 8.88.000.4. Is there a serous collnearty? Wrte the weght predcton euaton usng age and heght as predctor varables. Fnd the average weght for boys that are 44 months old and 55 nches tall. Indcator Varables - are bnary varables that take only two possble values, 0 and, and can be use for ncludng categorcal varables n the model. Weght Male: Female: 0 Male Female Group Statstcs Std. Error N Mean Std. Devaton Mean 6 0.448 9.968.779 98.878 8.66.767 4 One Bnary Independent Varable : (A model that models two ndependent samples stuaton wth eual varances condton.) y = + Two ndependent samples t-test can be modeled wth smple lnear regresson model SPSS output for two ndependent samples t-test for comparng the mean weght between male and female. Levene's Test f or Eualty of Varances Independent Samples Test t-test for Eualty of Means where y : Weght, : ( = 0 for female, = for male) When = 0: y = When = : y = The dfference of the means of the two categores s. 5 Mean Std. Error F Sg. t df Sg. (-taled) Dff erence Dff erence Weght Eual varances..48.85 5.07 4.5.58 assumed Eual varances.8 4..0 4.5.507 not assumed SPSS output for lnear regresson wth gender as predctor 98.878.86 5.846.000 6 4.5.58.8.85.07.000.000 L and as Predctor Varables : y wth y weght, ( 0 female, age, and male) gender -.8 8.778 -.74.04.669.05.64.5.000.000.000 4.59.94.7.8.00.000.000 0 00 and are both sgnfcant varables for predctng weght. Male There s sgnfcant dfference n average weght between genders f adjusted for age varable. Female 0 00 0 7 8

Multple Regresson,, & as Predctors : wth y y weght age heght gender ( 0 female, male) W e g h t 0 00 0 00 50 Male 9 Female 0-8.09.64-0.454.000.8.056.6 4.50.000.56.7.05.67..6.000.59.854 -.8.4 -.009 -.0.84.9.07 varable becomes nsgnfcant wth and varables n the model. When comparng the dfference n average wegh between genders, and adjusted for age and heght varables, the dfference s statstcally nsgnfcant. How to nclude a categorcal varable n the model? The proper way to nclude a categorcal varable s to use ndcator varables. For havng a categorcal varable wth k categores, one should set up k ndcator varables. Race varable: Whte =, Black =, Hspanc =. - ndcator varables wll be needed. Common Mstake: Use of the nternally coded values of a categorcal eplanatory varable drectly n lnear regresson modelng calculaton. Race : Whte =, Black =, Hspanc =. Number of hours of eercse per week Use of ndcator varables and for Race varable = represen Whte, otherwse = 0, = represen Black, otherwse = 0, = 0 and = 0 represen Hspanc. : y = Body Fat Percentage Number of hours of eercse per week Race : y = Body Fat Percentage Race Interpretaton of the model: Race: Whte = and = 0, y = Race: Black = 0 and =, y = Race: Hspanc = 0 and = 0, y = 4 4

Female lfe epectancy 99 Multple Regresson Suppose that the least suares regresson euaton for the model above s y 0... Estmate the avg. body fat for a whte person eercse 0 hours per week: 0 +. +. 0.0 =. Study female lfe epectancy usng percentage of urbanzaton and brth rate. 90 90 Estmate the avg. body fat for a black person eercse 0 hours per week: 0 +. 0 +..0 = 0. Estmate the avg. body fat for a hspanc person eercse 0 hours per week: 0 +. 0 +. 0.0 = 8.9 50 50 5 0 0 0 0 50 Brths per 000 populaton, 99 0 0 Percent urban, 99 00 6 0 y lfe epectancy, : y brth rate, Summary Adjusted Std. Error of R R Suare R Suare the Estmate.904 a.87.8 4.89 a. Predctors:, Brths per 000 populaton, 99, Percent urban, 99 percent urban Regresson Resdual ANOVA b Sum of Suares df Mean Suare F Sg. 577.056 688.58 6.595.000 a 85. 8.948 5.876 0 a. Predctors:, Brths per 000 populaton, 99, Percent urban, 99 b. Dependent Varable: Female lfe epectancy 99 Test for sgnfcance of the model: Coeffcent of determnaton: the percentage of varablty n the response varable (female lfe epectancy) that can be descrbed by predctor varables (brth rate, percentage of urbanzaton) through the model. 7 H 0 : s nsgnfcant ( s are all zeros). H a : s sgnfcant (Some s are not zeros). 8 estmaton: (SPSS output) Brths per 000 populaton, 99 Percent urban, 99 a. Dependent Varable: Female lfe epectancy 99 Tes for Regresson Coeffcen H 0 : = 0 v.s. H a : 0 H 0 : = 0 v.s. H a : 0 H 0 : = 0 v.s. H a : 0.50.000 76.6.4 -.555.045 -.648 -.96.000.55.84.54.05. 6.8.000.55.84 Collnearty * statstcs:if the VIF (Varance Inflaton Factor) s greater than 0 there s multcollnearty problem. (Some sad VIF needs to be less than 4.) 9 Least suare regresson euaton for estmatng average response value yˆ 76.6.555. 54 The average female lfe epectancy for the countres whose brth rate per 000 s 0 and whose percentage of urbanzaton s would be 76.6-0.555(0) + 0.54() = 65.76. 0 5

Multple Regresson Female Lfe Epectancy Multple Scatter Plot Before Transformaton Female lfe epectan Response varable: Female lfe epectancy Eplana varables: Brth Rate, Urbanzaton, Phones, Doctors, and GDP. Brths per 000 popu Percent urban, 99 Phones per 00 peopl Whch varables are sgnfcant factors to female lfe epectancy n the model? Doc tors per 0,000 p GDP per c apta Multple Scatter Plot After ln() Transformaton on Phones, Doctors, GDP Female lfe epectan Summary b Adjusted R Std. Error of R R Suare Suare the Estmate Durbn-Waon.94 a.87.867 4.08.0 a. Predctors:, Natural log of GDP, Percent urban, 99, Brths per 000 populaton, 99, Natural log of doctors per 0000, per 00 people b. Dependent Varable: Female lfe epectancy 99 Brths per 000 popu Percent urban, 99 Natural log of phone Natural log of docto Natural log of GDP ANOVA b Sum of Suares df Mean Suare F Sg. Regresson.0 5 44.666 45.4.000 a Resdual 768.48 06 6.68 89.679 a. Predctors:, Natural log of GDP, Percent urban, 99, Brths per 000 populaton, 99, Natural log of doctors per 0000, per 00 people b. Dependent Varable: Female lfe epectancy 99 4 Multcollnearty Stepwse Selecton z ed Coeffcent Coeffcen s 77.448 5.89.87.000 Brths per 000 -.7.058 -.9-4.659.000.56.90 populaton, 99 Percent urban, 99.97E-0.0.04.69.5.6.5.75.679.55 4.675.000.086.590 per 00 people Natural log of doctors.894.59.6.94.00.78 5.6 per 0000 ANOVA d Sum of Suares df Mean Suare F Sg. Regresson 59.884 59.884 449..000 a Resdual 7.795 0 4.84 89.679 Regresson.84 595.4.87.000 b Resdual 0.86 09 8.907 89.679 Regresson 069.50.67 8.45.000 c Resdual 8.77 08 6.87 Natural log of GDP -.90.784 -.90 -.77.079.05 9.54 89.679 a. Dependent Varable: Female lfe epectancy 99 Tolerance measures the strength of the lnear relaton between the ndependent varables.it s better to be a. Predctors:, per 00 people Predctors:, per 00 people, Brths per 000 populaton, 99 hgher than 0.. VIF s the recprocal of Tolerance. 5 6 b. c. Predctors:, per 00 people, Brths per 000 populaton, 99, Natural log of doctors per 0000 d. Dependent Varable: Fem ale lfe epectancy 99 6

Multple Regresson What are the varables that are sgnfcantly related to the female s lfe epectancy? per 00 people per 00 people Brths per 000 populaton, 99 per 00 people Brths per 000 populaton, 99 Natural log of doctors per 0000 Coeffcen B Std. Error a. Dependent Varable: Female lfe epectancy 99 z ed Coeffcent s Beta.84.56 07.84.000 5.6.4.896.98.000.000.000 7.566.9 4.9.000.5..58 9.048.000.9.04 -.7.055 -.8-5.957.000.9.04 68.76.7 9.48.000.86.44.44 5.496.000.4 4.68 -.46.056 -.88-4.64.000..576.054.546.84.76.000. 4.6 7 Use of regresson analyss Descrpton (model, system, relaton): Relaton between lfe epectancy & brth rate, GDP, Relaton between salary & rank, years of servce, Control: Ded too young, underpad, overpad, Predcton: Lfe epectancy, salary for new comers, future salary, Varable screenng (mportant factors): Sgnfcant factors for lfe epectancy, Sgnfcant factors for salary. 8 Constructon of regresson models. Hypothesze the form of the model for y,,,..., Selectng predctor varables. Decdng functonal form of the regresson euaton. Defnng scope of the model (desgn range).. Collect the sample data (observatons, epermen).. Use sample estmate unknown parameters n the model. 4. Understand the dstrbuton of the random error. 5. dagnostcs, resdual analyss. 6. Apply the model n decson makng. 7. Revew the model wth new data. 9 What s lnear model? Eample of a lnear model: y = 0 + + y = 0 + + + y = 0 + + + + y = 0 + + + + 4 + 5 + y = 0 + ln() + y = 0 + e + s lnear n terms of parameters. 7