Chapter 11 Regression Analysis

Similar documents
Simple Linear Regression

The simple linear Regression Model

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING Sunflowers Apparel

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

Regression Analysis. 1. Introduction

Preparation of Calibration Curves

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

Curve Fitting and Solution of Equation

Chapter Eight. f : R R

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Measures of Central Tendency: Basic Statistics Refresher. Topic 1 Point Estimates

APPENDIX III THE ENVELOPE PROPERTY

Michael J. Rosenfeld, draft version 1.7 (under construction). draft November 5, 2015

The Analysis of Development of Insurance Contract Premiums of General Liability Insurance in the Business Insurance Risk

AP Statistics 2006 Free-Response Questions Form B

Report 52 Fixed Maturity EUR Industrial Bond Funds

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

MDM 4U PRACTICE EXAMINATION

An Effectiveness of Integrated Portfolio in Bancassurance

Questions? Ask Prof. Herz, General Classification of adsorption

Using the Geographically Weighted Regression to. Modify the Residential Flood Damage Function

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

1. The Time Value of Money

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

3 Multiple linear regression: estimation and properties

Speeding up k-means Clustering by Bootstrap Averaging

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

Settlement Prediction by Spatial-temporal Random Process

Stanislav Anatolyev. Intermediate and advanced econometrics: problems and solutions

Reinsurance and the distribution of term insurance claims

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

Classic Problems at a Glance using the TVM Solver

SPATIAL INTERPOLATION TECHNIQUES (1)

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Experimental Education.

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

Average Price Ratios

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Basic statistics formulas

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

Report 19 Euroland Corporate Bonds

Banking (Early Repayment of Housing Loans) Order,

CHAPTER 2. Time Value of Money 6-1

USEFULNESS OF BOOTSTRAPPING IN PORTFOLIO MANAGEMENT

Application of geographic weighted regression to establish flood-damage functions reflecting spatial variation

Beta. A Statistical Analysis of a Stock s Volatility. Courtney Wahlstrom. Iowa State University, Master of School Mathematics. Creative Component

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

Models for Selecting an ERP System with Intuitionistic Trapezoidal Fuzzy Information

DETERMINISTIC AND STOCHASTIC MODELLING OF TECHNICAL RESERVES IN SHORT-TERM INSURANCE CONTRACTS

Measuring the Quality of Credit Scoring Models

where p is the centroid of the neighbors of p. Consider the eigenvector problem

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

Lecture 7. Norms and Condition Numbers

Statistical Intrusion Detector with Instance-Based Learning

Bayesian Network Representation

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes

Online Appendix: Measured Aggregate Gains from International Trade

Near Neighbor Distribution in Sets of Fractal Nature

SP Betting as a Self-Enforcing Implicit Cartel

Analysis of one-dimensional consolidation of soft soils with non-darcian flow caused by non-newtonian liquid

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring

An Introduction To Error Propagation: Derivation, Meaning and Examples C Y

Relaxation Methods for Iterative Solution to Linear Systems of Equations

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

CSSE463: Image Recognition Day 27

Analyses of Integrity Monitoring Techniques for a Global Navigation Satellite System (GNSS-2)

On Error Detection with Block Codes

Sequences and Series

How To Make A Supply Chain System Work

Incorporating demand shifters in the Almost Ideal demand system

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Fundamentals of Mass Transfer

1 Correlation and Regression Analysis

Now here is the important step

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

Borehole breakout and drilling-induced fracture analysis from image logs

Capacitated Production Planning and Inventory Control when Demand is Unpredictable for Most Items: The No B/C Strategy

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

10.5 Future Value and Present Value of a General Annuity Due

Design of Experiments

A Hierarchical Fuzzy Linear Regression Model for Forecasting Agriculture Energy Demand: A Case Study of Iran

RUSSIAN ROULETTE AND PARTICLE SPLITTING

The Digital Signature Scheme MQQ-SIG

of the relationship between time and the value of money.

Confidence Intervals

Chapter = 3000 ( ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

Transcription:

Chapter Regresso Aalyss Defto: Whe the values of two varables are measured for each member of a populato or sample, the resultg data s called bvarate. Whe both varables are quattatve, we may represet the data set as a set of ordered pars of umbers, (, y). The varable s called the put (or depedet) varable; the varable y s called the respose (or depedet) varable. We may eame the relatoshp betwee the two varables graphcally usg a scatter dagram, or scatterplot. The smplest type of model relatg two quattatve varables s called a smple lear regresso model, whch there s a assumed lear relatoshp betwee two varables. Oe varable s called the depedet varable, or predctor varable. The other varable s called the depedet varable, or the respose varable. Smple Lear Regresso Model The respose varable s assumed to be related to the predctor varable accordg to the followg equato: Y, where Y the value of the respose varable for the th member of the sample, a parameter, called the tercept of the le of best ft, or the regresso le, a parameter, called the slope of the le of best ft, or the regresso le, the value of the predctor varable for the th member of the sample,

a radom error varable assocated wth the th member of the sample; t s assumed that the radom errors are depedet ad ~ Normal, detcally dstrbuted, wth. A pcture of the model s show o p. 39. Sce t s assumed that a lear tred relatoshp ests betwee the predctor varable ad the respose varable, before we proceed to use the model, we must do a scatterplot to see whether the assumpto of learty s reasoable. We eed to use sample data to estmate the three parameters,,,. The estmato wll be doe usg the method of least squares. Gve a sample of sze, the data cossts of ordered pars, (, y ), (, y ),, (, y ). We wll fd the best estmators of the slope ad tercept by mmzg the resdual sum of squares (also called the error sum of squares): SSE y y e y, wth respect to the two parameters. I dog ths, we are smultaeously mmzg the squared vertcal dstaces of the data pots from the le of best ft to the data. A cocrete eample s useful here. Eample: p. 3

Image costructg ths scatterplot cocretely as follows: 3 ) Draw the coordate aes o a sheet of plywood. ) Hammer als to the board at each data pot. 3) Obta a th woode dowel ad s rubber bads. 4) Place each rubber bad aroud the dowel ad oe of the als. 5) Wat utl the dowel comes to rest. The rest posto of the dowel wll be the mmum eergy cofgurato of the system, the cofgurato for whch there wll be the least total stretchg of the rubber bads. Ths posto wll also be the least squares regresso le relatg thermal coductvty ad desty. We dfferetate SSE w.r.t. each parameter, ad set each dervatve equal to, obtag SSE y SSE y., ad Ths gves us two equatos two ukows, called the ormal equatos: y, ad y. The soluto s

4 y y y. SS SS y y y, The the estmated regresso le, or le of best ft to the data, s gve by: Y. The estmate of the error varace s foud from the error sum of squares to be SSE MSE. There are oly degrees of freedom assocated wth the error sum of squares because two parameters, the slope ad the tercept, have already bee estmated. To do ferece, we eed to kow the dstrbutoal propertes of the estmators, ad. Oe of the basc assumptos of the model s that the radom error terms, are..d. ormal wth mea ad commo varace. The Y ~ Normal,. Furthermore, the Y s are depedet of each other. From the ormal equatos, t s clear that s a lear fucto of the Y s, ad that s also a lear fucto of the Y s. We kow that a statstc that s a lear fucto of depedet ormal radom varables also has a ormal dstrbuto.

Specfcally, t ca be show that both estmators are ubased, ad that ~ Normal,, ad that SS ~ Normal, SS. We may use these facts to do hypothess testg ad terval estmato about the slope ad tercept. The stadard error of s gve by SE MSE SS. The stadard error of s gve by 5 SE MSE SS Therefore, we fd that. MSE SS ~ t, ad that ~ t. We wat to test whether there s a MSE SS lear tred relatoshp betwee the predctor ad the respose varable. Our hypotheses are H : v. H a :. We may use the dstrbutoal propertes of the estmated slope to fd a test statstc. We may do the hypothess test usg the t-dstrbuto of the estmator.

Tmee to Fracture (Hours) Eample: The paper A study of staless steel stress-corroso crackg by potetal measuremets (Corroso, 96, pp. 45-43) reported o the relatoshp betwee appled stress (the predctor varable,, kg/mm ) ad tme to fracture (the respose varable, hours) for 8-8 staless steel uder uaal tesle stress a 4% CaCl soluto at C. Ted dfferet settgs of appled stress were used, ad the resultg data values (as read from a graph whch appeared the paper) are gve the table below:.5 5 5 7.5 5 3 35 4 y 63 58 55 6 6 37 38 45 46 9 We wat to ) determe whether there s a lear tred relatoshp betwee appled tesle stress ad tme to fracture, ad ) estmate the relatoshp. We frst do a scatterplot, usg Ecel: Scatterplot of Tme to Fracture v. Tesle Stress 6 7 6 5 4 3 3 4 5 Tesle Stress (kg/square mm) It appears that there s a moderately strog egatve lear tred relatoshp betwee tme to fracture ad tesle stress.

Net we wat to test whether ths relatoshp geeralzes to the etre populato of 8-8 staless steel samples. 7 Step : H : H a :. Step :. =.5 Step 3: The test statstc that wll be used s MSR F, whch uder MSE the ull hypothess has a F(, 7). Step 4: We wll reject the ull hypothess f the value of the test statstc s greater tha F,7,.5 5.59.. Step 5: We eter the data Ecel. We choose Tools, Data Aalyss, ad Regresso. Ecel produces the followg ANOVA table. SUMMARY OUTPUT Regresso Statstcs Multple R.79537 R Square.635866 Adjusted R Square.5865835 Stadard Error 9.437466 Observatos ANOVA df SS MS F Sgfcace F Regresso 46.3766 46.3766 3.76978954.5949788 Resdual 8 666.38938 83.598673 Total 9 8.4 Coeffcets Stadard Error t Stat P-value Itercept 66.47699 5.6489399.75938.556E-6 - X Varable -.9884956.477596 3.776676.5949788

Step 6: We reject the ull hypothess at the.5 level of sgfcace. We have suffcet evdece to coclude that ;.e., there s a lear tred relatoshp betwee tesle stress ad tme to fracture. Def: The coeffcet of determato s defed by SSE SSR R. Ths quatty s the proporto of the varato SST SST of the respose varable that s eplaed by the lear relatoshp betwee the predctor varable ad the respose varable. I our eample, R =.635. Hece 63.5% of the varato tme to fracture s eplaed by the lear relatoshp betwee tesle stress ad tme to fracture. A large value for R (ear ) dcates that the model has good eplaatory power. A value for R ear dcates that the model does ot have good eplaatory power. The estmated regresso equato (le of best ft), may also be read from the last table the Ecel output. We have Y 66.477. 99. Ths says that for every kg/mm crease tesle stress, the tme to fracture decreases by.99 hours, o average. If the appled tesle stress s kg/mm, the the predcted tme to fracture s Y 66.477 (.99)() 55. 669 hours. 8