APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING
|
|
|
- Cornelia Dixon
- 10 years ago
- Views:
Transcription
1 APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract - The underlying idea of modeling relationship of two variables with linear regression involve situation in which there is one independent variable. The problem of this research work is modeling the expected arrival rate of bank customers (whose observed arrival rate assumed to have followed a Poisson distribution) using simple linear regression model. For the successful execution of this research work, primary data on customers arrival rate from First Bank Nigeria Plc located at Panseke Area of Onikolobo, Abeokuta, Ogun State between 9:00a.m and 2:00p.m in the interval of 5 minutes was employed. Data collected were analyzed electronically using Ms Excel 2007 and SPSS Results from the analysis reveal that approximately 31 customers are expected to arrive the banking hall of First Bank Nigeria Plc, Panseke Branch of Abeokuta, Ogun State every 5 minutes. The observed arrival rate has a positive impact on the expected arrival rate and this impact is significant. Keywords: Arrival Rate, Bank, Probability Distribution, Variables 1. INTRODUCTION Modeling refers to the development of mathematical expressions that describe in some sense the behavior of a random variable of interest. This variable may be the price of wheat in the world market, the number of deaths from lung cancer, the rate of growth of a particular type of tumor, or the tensile strength of metal wire (John Neter, 2004). In all cases, this variable is called the dependent variable and denoted with Y. A subscript on Y identifies the particular unit from which the observation was taken, the time at which the price was recorded, the county in which the deaths were recorded, the experimental unit on which the tumor growth was recorded, and so forth. Most commonly the modeling is aimed at describing how the mean of the dependent variable E(Y ) changes with changing conditions; the variance of the dependent variable is assumed to be unaffected by the changing conditions (John Neter, 2004). Other variables which are thought to provide information on the behavior of the dependent variable are incorporated into the model as predictor or explanatory variables. These variables are called the independent variables and are denoted by X with subscripts as needed to identify different independent variables. Additional subscripts denote the observational unit from which the data were taken. The Xs are assumed to be known constants. In addition to the Xs, all models involve unknown constants, called parameters, which control the behavior of the model. These parameters are denoted by Greek letters and are to be estimated from the data (Christopher J. Nachtsheim, 2004). The mathematical complexity of the model and the degree to which it is a realistic model depend on how much is known about the process being studied and on the purpose of the modeling exercise. In preliminary studies of a process or in cases where prediction is the primary objective, the models usually fall into the class of models that are linear in the parameters. That is, the parameters enter the model as simple coefficients on the independent variables or functions of the independent variables. Such models are referred to loosely as linear models (Christopher J. Nachtsheim, 2004). Regressions are a statistical tool used in addressing variety of research hypotheses. It has the potential of predicting a particular outcome. It provides information about the set of data and the contribution of each variable in the analysis. It is usually used as a control tool when exploring the prediction of a model (Tabachnick and Fidell, 1996). Regression modeling is the process of constructing forecasting models based on the relationship between a dependent variable and independent variables to make the future forecast. Regression modeling is a kind of multifactor forecasting. The basis of regression modeling is the construction of regression models (Tetyana Kuzhda, 2012). Regression models are used to predict one variable from one or more other variables. Regression models provide the scientist with a powerful tool, allowing predictions about future events to be made with information about past or present events. In order to construct a regression model, both the information which is going to be used to make the prediction and the information which is to be predicted must be obtained from a sample of objects or individuals. The relationship between the two pieces of information is then modeled with a linear transformation. Then in the future, only the first information is necessary, and the regression model is used to transform this information into the predicted. In other words, it is necessary to have information on both variables before the model can be constructed. Regression models are one of the most famous examples 17
2 of economic and statistical models used in the forecasting of socio-economic processes (Tetyana Kuzhda, 2012). Linear regression is an approach to modeling the relationship between two or more independent variables (X) and a single dependent variable (Y). The case of one explanatory variable is called simple regression model. More than one explanatory variable is multiple regression models (Robert Mills, 2011). Linear regressions are designed to measure one specific type of relationship between variables: those that take linear form. The theoretical assumption is that for every one-unit change in the independent variable, there will be a consistent and uniform change in the dependent variable. Perhaps one reason why linear regression is so popular is that this is a fairly easy way to conceive of social behavior if more of one thing is added, the other thing will increase or decrease proportionately. Many relationships do operate this way (Michael H. Kutner, 2004). 2. STATEMENT OF THE PROBLEM The underlying idea of modeling relationship of two variables with linear regression involve situation in which there is one independent variable. The problem of this research work is modeling the expected arrival rate of bank customers (whose observed arrival rate assumed to have followed a Poisson distribution) using simple linear regression model. 3. AIM AND OBJECTIVES OF THE STUDY The aim of this study is to model the expected arrival rate of bank customers using simple linear regression model. The objectives are: (i) To determine the strength of the relationship between the observed and expected arrival rate of customers. (ii) To determine the proportion of variation in the expected arrival rate that is being explained by the observed arrival rate. (iii) To determine the impact of the observed arrival rate on the expected arrival rate. (iv) To determine if the observed arrival rate exert significant influence on the expected arrival rate. 4. SCOPE OF THE STUDY This study covers data on customers arrival rate from First Bank Nigeria Plc located at Panseke area of Onikolobo, Abeokuta, Ogun State between 9:00a.m and 2:00p.m in the interval of 5 minutes. The data collected is primary in nature. 5. LITERATURE REVIEW Econometrics is concerned with model building. An intriguing point to begin the inquiry is to consider the question, What is the model? The statement of a model typically begins with an observation or a proposition that one variable is caused by another, or varies with another, or some qualitative statement about a relationship between a variable and one or more covariates that are expected to be related to the interesting one in question. The model might make a broad statement about behavior, such as the suggestion that individuals usage of the health care system depends on, for example, perceived health status, demographics such as income, age, and education, and the amount and type of insurance they have. It might come in the form of a verbal proposition, or even a picture such as a flowchart or path diagram that suggests directions of influence. The econometric model rarely springs forth in full bloom as a set of equations. Rather, it begins with an idea of some kind of relationship. The natural next step for the econometrician is to translate that idea into a set of equations, with a notion that some feature of that set of equations will answer interesting questions about the variable of interest. To continue our example, a more definite statement of the relationship between insurance and health care demanded might be able to answer, how does health care system utilization depend on insurance coverage? Specifically, is the relationship positive all else equal, is an insured consumer more likely to demand more health care, or is it negative? And, ultimately, one might be interested in a more precise statement, how much more (or less)? From a purely statistical point of view, the researcher might have in mind a variable, y, broadly demand for health care, H, and another variable, x, income, I (Greene, 2010). The bivariate regression model The bivariate regression model is also known a simple regression model. It is a statistical tool that estimates the relationship between a dependent variable (y) and a single independent variable (x). The dependent variable is a variable which we want to forecast (Burç Ülengin, 2011). A simple linear regression model is a mathematical relationship between two quantitative variables, one of which, y, is the variable we want to predict, using information on the second variable, x, which is assumed to be non-random. Simple linear regression is the most commonly used technique for determining how one variable of interest (the response variable) is affected by changes in another variable (the explanatory variable). The terms "response" and "explanatory" mean the same thing as "dependent" and "independent", but the former terminology is preferred because the "independent" variable may actually be interdependent with many 18
3 other variables as well. Simple linear regression is used for three main purposes: 1. To describe the linear dependence of one variable on another. 2. To predict values of one variable from values of another, for which more data are available. 3. To correct for the linear dependence of one variable on another, in order to clarify other features of its variability. Any line fitted through a cloud of data will deviate from each data point to greater or lesser degree. The vertical distance between a data point and the fitted line is termed a "residual". This distance is a measure of prediction error, in the sense that it is the discrepancy between the actual value of the response variable and the value predicted by the line. Linear regression determines the best-fit line through a scatter plot of data, such that the sum of squared residuals is minimized; equivalently, it minimizes the error variance. The fit is "best" in precisely that sense: the sum of squared errors is as small as possible. That is why it is also termed "Ordinary Least Squares (OLS)" regression (Mosteller and Tukey, 1977). The simple linear regression model, in which there is only one explanatory variable on the right hand side of the regression equation is written as: General form: is random disturbance, which causes for given x, y can take different values. The Objective is to estimate 0 and 1 such a way that the fitted values should be as close as possible (Burç Ülengin, 2011). The classical assumptions Burç Ülengin (2011) highlighted seven classical assumptions of the simple linear regression model. 1. The regression model is linear in the coefficients, correctly specified, & has an additive error term. 2. E( ) = All explanatory variables are uncorrelated with the error term. 4. Errors corresponding to different observations are uncorrelated with each other. 5. The error term has a constant variance. 6. No explanatory variable is an exact linear function of any other explanatory variable(s). 7. The error term is normally distributed such that:. Best fit estimates In practice, the econometrician will possess a sample of y values corresponding to some fixed x values rather than data from the entire population of values. Therefore the econometrician will never truly know the values of 0 and 1 (Joshua Sherman, 2003). Dependent Variable Independent Variable However, we may estimate these parameters. We will denote these estimators as 0 and 1. So how shall we find 0 and 1? We need a method or rule for how to estimate the population parameters using sample data. Specific form: Random disturbance The regression model is indeed a line equation. The unknown parameters 0 and 1 are the intercept and slope of the regression function. We refer to them as population parameters (Joshua Sherman, 2003). y is called the response variable or the dependent variable (because its value depends to some extent on the value of x). x is called the predictor (because it is used to predict y) or explanatory variable (because it explains the variation or changes in y). It is also called the independent variable (because its value does not depend on y). The parameters of the true regression line are the constants, β 0 and β 1 0 = intercept that tell us the value of y when x = 0. 1 = slope coefficient that tell us the rate of change in y per unit change in x. The most widely used rule is the method of least squares, or ordinary least squares (OLS). According to this principle, a line is fitted to the data that renders the sum of the squares of the vertical distances from each data point to the line as small as possible (Joshua Sherman, 2003). Therefore the fitted line may be written as: (1) The vertical distances from the fitted line to each point are the least squares residuals,. They are given by: Where is the predicted value of (2) Mathematically, we want to find 0 and 1 such that the sum of the squared vertical distances from the data points to the line is minimized. Square of the residuals gives 19
4 (3) The sum of the square of the residuals gives (9) (4) On substituting (7) into (9) we have Let that represents the sum of square of the residuals, so (5) We are to minimize with respect to and. This is achieved by differentiating and equate the derivative to zero (0). Thus (6) Or (10) Where and are the sample means of the observations on y and x. OLS and the true parameter values The OLS estimators 0 and 1 are related to 0 and 1 thus (Joshua Sherman, 2003): 1. If assumptions 1 and 2 from earlier hold, then ( 0 ) = 0 and ( 1 ) = 1. That is, if we were able to take repeated samples, the expected value of the estimators 0 and 1 would equal the true parameter values 0 and When the expected value of any estimator of a parameter equals the true parameter value, then that estimator is unbiased. Also Or (7) (8) So the idea behind OLS is that if we are dealing with an instance in which certain assumptions hold, the expected value of the estimators 0 and 1 will equal the true parameter values 0 and 1 (Joshua Sherman, 2003). Poisson process A Poisson process is a specific counting process. Let N(t) be a counting process. That is, N(t) is the number of occurrences (or arrivals, or events) of some process over the time interval [0,t]. N(t) looks like a step function. Examples: N(t) could be any of the following. (a) Cars entering a shopping center (time). (b) Defects on a wire (length). (c) Raisins in cookie dough (volume). Let λ > 0 be the average number of occurrences per unit time (or length or volume). In the above examples, we might have: (a) λ =10/min. (b) λ =0.5/ft. (c) λ =4 /in 3. First, some notation: is a generic function that goes to zero faster than h goes to zero. 20
5 A Poisson process is one that satisfies the following assumptions: 1. There is a short enough interval of time, say of length h, such that, for all t, Pr(N(t + h) N(t) = 0) = 1 λh + o(h) Pr(N(t + h) N(t) =1) = λh + o(h) Pr(N(t + h) N(t) 2) = o(h) That is arrivals basically occur one-at-a-time, and then at rate λ/unit time. (We must make sure that λ doesn t change overtime.) 2. If, then N( ) N( ) and N( ) N( ) are independent random variables. That is the numbers of arrivals in two disjoint time intervals are independent. The Poisson distribution Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume, length etc. For example, The number of cases of a disease in different towns The number of mutations in set sized regions of a chromosome The number of dolphin pod sightings along a flight path through a region The number of particles emitted by a radioactive source in a given time The number of births per hour during a given day In such situations we are often interested in whether the events occur randomly in time or space. The Poisson distribution plays a key role in modelling such problems. The Poisson distribution is a discrete probability distribution for the counts of events that occur randomly in a given interval of time or space (Jonathan Marchini, 2008). Suppose we are given an interval (this could be time, length, area or volume) and we are interested in the number of successes in that interval. Assume that the interval can be divided into very small subintervals such that: 1. the probability of more than one success in any subinterval is zero; 2. the probability of one success in a subinterval is constant for all subintervals and is proportional to its length; 3. subintervals are independent of each other. If we let X = The number of events in a given interval, Then, if the mean number of events per interval is λ The probability of observing x events in a given interval is given by (11) If the probabilities of X are distributed in this way, we write (12) is the parameter of the distribution. We say X follows a Poisson distribution with parameter. Using the Poisson to approximate the Binomial The Binomial and Poisson distributions are both discrete probability distributions. In some circumstances the distributions are very similar (Jonathan Marchini, 2008). In general, If n is large (say > 50) and p is small (say < 0.1) then a Bin(n, p) can be approximated with a where. The idea of using one distribution to approximate another is widespread throughout statistics. In many situations it is extremely difficult to use the exact distribution and so approximations are very useful. 6. MATERIALS & METHODS Research design This research was designed to model the expected arrival rate of bank customers (whose observed arrival rate assumed to have followed a Poisson distribution) using simple linear regression model. For the successful execution of this research work, primary data on customers arrival rate from First Bank Nigeria Plc located at Panseke area of Onikolobo, Abeokuta, Ogun State between 9:00a.m and 2:00p.m in the interval of 5 minutes was employed. Data collected were analyzed electronically using Ms Excel 2007 and SPSS Techniques of data analysis The data analysis techniques employed are the Simple Linear Regression, Correlation (R) and Coefficient of Determination (R 2 ). Method of data analysis In analyzing the data for simple linear regression, customers observed arrival rate represents the independent variable while customers expected arrival rate represents the dependent variable. Data for the independent variable was collected primarily while data for the dependent variable was estimated as: where (13) 21
6 is the number of intervals of total observed frequency. periods (5 minutes) or is the number of arrivals per period (i.e during the 1 st, 2 nd,, N th intervals). is the probability of arrivals during the 1 st, 2 nd,, N th intervals. is assumed to follow a Poisson distribution and defined as: is the average arrival per period and defined as: (14) is the total number of arrivals for the N periods. The expected arrival rate ( observed arrival rate ( ) as: ) is then regressed on the (15) is the intercept, which implies the expected arrivals when is the slope, which implies that for every single increase in observed arrival rate the expected arrival rate will increase by. 7. RESULT E E E E E E > E TABLE 1: Estimation of Expected Arrival Rate E E E E E E E E E E E E E E E TABLE 2: Model R R Model Summary b Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (Constant), Observed Arrival Rate b. Dependent Variable: Expected Arrival Rate 22
7 TABLE 3: Model 1 ANOVA a Sum of Squares df Mean Square F Sig. Regression b Residual Total a. Dependent Variable: Expected Arrival Rate b. Predictors: (Constant), Observed Arrival Rate TABLE 4: Model 1 Coefficients a Unstandardized Coefficients B Std. Error Standardized Coefficients Beta t Sig. (Constant) Observed Arrival Rate a. Dependent Variable: Expected Arrival Rate The regression model is thus: 8. DISCUSSION OF RESULTS The value of implies that we expect approximately 31 customers to arrive the banking hall every 5 minutes. The minimum and maximum observed frequencies are respectively 0 and7 while the minimum and maximum expected frequencies are respectively 0 and 4. The correlation (R) value of indicates that there is a strong positive but imperfect correlation between the observed and expected arrival rate of customers. The coefficient of determination (R 2 ) value of indicates that approximately 51.3% of the variation in the expected arrival rate is being explained by observed arrival rate. The regression coefficients a = indicates that when the observed arrival rate is zero, the expected arrival rate is 0.43 which is approximately also equal zero and b = implies that for every single observed arrival rate, the expected arrival rate will increase by which is approximately 1. The observed arrival rate has a positive impact on the expected arrival rate with a positive t-value of The sig-value of indicates that the observed arrival rate exert significant influence on the expected arrival rate. 9. CONCLUSIONS We expect approximately 31 customers to arrive the banking hall of First Bank Nigeria Plc, Panseke Branch of Abeokuta, Ogun State every 5 minutes. The relationship between the observed and expected customers arrival rate is positive and strong. An increase in the number of observed arrival rate will leads to corresponding equal increase in expected arrival rate. The observed arrival rate contributes to approximately half of the variation in the expected arrival rate of the customers. 10. RECOMMENDATIONS 1. An average arrival rate of 31 customers per 5 minutes is high. The bank should therefore target a corresponding average service rate of 31 or more customers every 5 minutes in other to avoid over congestion from the arrival rate. 2. In case of an anticipated congestion, the bank management should try as much as possible to decentralize their services to increase their service rate. REFERENCES [1] Burç Ülengin (2011). Forecasting With RegressionModels. ITU Management Engineering Faculty. [2] Greene (2010). The Linear Regression Model. [3] Jonathan Marchini (2008). The Poisson Distribution. [4] Joshua Sherman (2003). The Linear Regression Model. [5] Kuzhda, T. (2012). Retail Sales Forecastihg with Application. The multiple regression. Sotsial'no ekonomichni problemy i derzhava-socio-economic Problems and the State [online]. 6 (1), p [Accessed May 2012]. Available from: < tibrm.pdf>. [6] Michael H. Kutner, John Neter, Christopher J. Nachtsheim, William Wasserman (2004). Applied Linear Statistical Models. [7] Mosteller, F. and J. W. Tukey (1997). Data Analysis 23
8 and Regression. 588 pp., Addison-Wesley, [8] Robert M., (2011). Regression Analysis Appliation in Litigation. Microeconomics, Inc. [9] Tabachnick and Fidell (1996). Multiple Linear Regression. APPENDIX TABLE 1: Data on the number of arrivals ( ) per period and observed frequency ( ) Source: Field Survey (2015) 24
Lecture 5 : The Poisson Distribution
Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
IMPACT AND SIGNIFICANCE OF TRANSPORTATION AND SOCIO ECONOMIC FACTORS ON STUDENTS CLASS ATTENDANCE IN NIGERIA POLYTECHNICS: A
IMPACT AND SIGNIFICANCE OF TRANSPORTATION AND SOCIO ECONOMIC FACTORS ON STUDENTS CLASS ATTENDANCE IN NIGERIA POLYTECHNICS: A Study of Moshood Abiola Polytechnic 1 Mabosanyinje A. 2 Sulaimon M. O. 3 Adewunmi
SPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
Section 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010
Advances in Economics and International Finance AEIF Vol. 1(1), pp. 1-11, December 2014 Available online at http://www.academiaresearch.org Copyright 2014 Academia Research Full Length Research Paper MULTIPLE
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Simple Methods and Procedures Used in Forecasting
Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events
Regression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
Chapter 4. Probability Distributions
Chapter 4 Probability Distributions Lesson 4-1/4-2 Random Variable Probability Distributions This chapter will deal the construction of probability distribution. By combining the methods of descriptive
13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Elementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Regression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
A Primer on Forecasting Business Performance
A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.
Premaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
Statistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: [email protected] Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
The Basic Two-Level Regression Model
2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
12.5: CHI-SQUARE GOODNESS OF FIT TESTS
125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Example: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
Time series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-
MULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
Multiple Regression. Page 24
Multiple Regression Multiple regression is an extension of simple (bi-variate) regression. The goal of multiple regression is to enable a researcher to assess the relationship between a dependent (predicted)
4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4
4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression
Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.
SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. [email protected] Course Objective This course is designed
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis
Journal of tourism [No. 8] MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM Assistant Ph.D. Erika KULCSÁR Babeş Bolyai University of Cluj Napoca, Romania Abstract This paper analysis
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study
A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
16 : Demand Forecasting
16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050
PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050 Class Hours: 2.0 Credit Hours: 3.0 Laboratory Hours: 2.0 Date Revised: Fall 2013 Catalog Course Description: Descriptive
The importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective
Calculating the Probability of Returning a Loan with Binary Probability Models
Calculating the Probability of Returning a Loan with Binary Probability Models Associate Professor PhD Julian VASILEV (e-mail: [email protected]) Varna University of Economics, Bulgaria ABSTRACT The
Practice Problems #4
Practice Problems #4 PRACTICE PROBLEMS FOR HOMEWORK 4 (1) Read section 2.5 of the text. (2) Solve the practice problems below. (3) Open Homework Assignment #4, solve the problems, and submit multiple-choice
Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
Chapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Chapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
STAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
GLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Multiple Regression Analysis A Case Study
Multiple Regression Analysis A Case Study Case Study Method 1 The first step in a case study analysis involves research into the subject property and a determination of the key factors that impact that
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided
5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
Moderator and Mediator Analysis
Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
MAT 155. Key Concept. September 27, 2010. 155S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions
MAT 155 Dr. Claude Moore Cape Fear Community College Chapter 5 Probability Distributions 5 1 Review and Preview 5 2 Random Variables 5 3 Binomial Probability Distributions 5 4 Mean, Variance and Standard
List of Examples. Examples 319
Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.
Description. Textbook. Grading. Objective
EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: [email protected] Office Hours: by appointment Description This course
Week 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
Introduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
Correlation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
Example G Cost of construction of nuclear power plants
1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor
Logs Transformation in a Regression Equation
Fall, 2001 1 Logs as the Predictor Logs Transformation in a Regression Equation The interpretation of the slope and intercept in a regression change when the predictor (X) is put on a log scale. In this
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
