: 1 if individual t is not in the labor force (i.e. not seeking employment)

Size: px
Start display at page:

Download ": 1 if individual t is not in the labor force (i.e. not seeking employment)"

Transcription

1 Economics 140A Qualitative Dependent Variables With each of the classic assumptions covered, we turn our attention to extensions of the classic linear regression model. Our rst extension is to data in which the dependent variable is limited. Dependent variables that are limited in some way are common in economics, but not all require special treatment. We have seen examples with wages, income or consumption as the dependent variables - all of these must be positive. As these strictly positive variables take numerous values, we found the log transform to be su cient. Yet not all restrictions on the dependent variable can be handled so easily. If we model individual choice, the optimal behavior of individuals often results in a sizable fraction of the population at a corner solution. For example, a sizable fraction of working age adults do not work outside the home, so the distribution of hours worked has a sizable pile up at zero. If we t a linear conditional mean, we will likely predict negative hours worked for some individuals. The log transform used for wages will not work, as the log of zero is unde ned. Another issues arises with sample selection. It may well be the case that E (Y jx) is linear, but nonrandom sampling requires more detailed inference. Finally, a host of other data issues may arise linear conditional mean functions that switch over regimes, data recorded as counts or analysis of durations between events. As we will see, even if only a nite number of values are possible, a linear model for E (Y jx) may still be appropriate. While all these issues may arise, we focus on perhaps the most common restriction in which the dependent variable is qualitative in nature and so takes discrete values. For this reason, such models are also termed discrete dependent variable models or (less frequently) dummy dependent variable models. As we recall from our discussion of qualitative regressors, qualitative variables capture the presence or absence of some non-numeric quantity. For example, in studying home ownership the dependent variable is often 1 if household t owns their home Y t = 0 otherwise Many qualitative variables take more than two values. For example, in studies of employment dynamics the dependent variable can take three values 8 < 1 if individual t is employed Y t = 0 if individual t is unemployed but seeking employment 1 if individual t is not in the labor force (i.e. not seeking employment)

2 We focus attention on qualitative dependent variables that take only two values and for ease set these values to 0 and 1. In binary response models, interest is primarily in p (X) P (Y = 1jX) = P (Y = 1jX 1 ; ; X K ) ; for various values of X. For a continuous regressor X j, the partial e ect of X j (Y =1jX) the response probability is j. When multiplied by X j (for small X j ), the partial e ect yields the approximate change in P (Y = 1jX) when X j increases by X j holding all other regressors constant. For a discrete regressor X K, the partial e ect is P (Y = 1jx 1 ; ; x K 1 ; X K = 1) P (Y = 1jx 1 ; ; x K 1 ; X K = 0) Perhaps the most natural extension of the classic linear regression model is to leave the structure of the population model unchanged, so that Y t = X t + U t How does the presence of a qualitative dependent variable a ect our analysis? In quite substantial ways. Consider the familiar relation E (Y t jx) = X t Because Y t takes only the values 0 and 1, hence E (Y t jx) = P (Y t = 1jX) p (X) = X t and so the conditional mean is a probability and the model is termed the linear probability model. The coe cient 1 is interpreted as the e ect of a one unit change in X t on the probability that Y t = 1. Similarly, if X t is a binary regressor, then 1 captures the e ect of moving from X t = 0 to X t = 1. As there is no reason to believe that as X t varies the conditional mean will remain between 0 and 1, equality between the conditional mean and P (Y t = 1) is a substantial drawback of the linear probability model. As one can deduce, it is hard to t points that are clustered at 0 and 1 (on the y axis) with a straight line, so the R-square measure is not reliable. (Draw a graph with the points clustered at 0 and 1 on the y-axis and a straight line attempting to t them.) 2

3 Several other features of the linear probability model are easily obtained. Because Y t is a Bernoulli random variable, U t is a binomial random variable (0 + U t = 1 X t ) with probability 1 ( X t ) 1 ( X t ) with probability X t From the de nition of U t it is clear that the error has mean 0 and variance EU 2 t = ( X t ) 2 [1 ( X t )] + [1 ( X t )] 2 ( X t ) = ( X t ) [1 ( X t )] Thus the error term is heteroskedastic and binomial, violating two of the classic assumptions. The OLSE is unbiased and consistent for the linear probability model, although robust standard errors are needed to account for the heteroskedasticity. As an aside, the test statistic for H 0 1 = = K = 0 can be accurately constructed from the OLSE, as under H 0 the error is homoskedastic, EUt 2 = 0 (1 0 ). To improve the e ciency of the OLSE, construct the weighted least squares estimator. Let Yt P denote the predicted value of the dependent variable constructed from the OLSE. If 0 < Yt P < 1 for all t, then form the estimate of the error standard deviation q S t = [Yt P (1 Yt P )] The weighted least squares estimator is obtained from the model Y t 1 X t = S 0 + t S 1 + U t t S t S t Again, the reported standard errors are valid, as follows from our earlier treatment of weighted least squares. If Yt P =2 (0; 1), then WLS is infeasible without an ad hoc adjustment and should not be done. The linear probability model is a convenient approximation and generally gives good estimates of the partial e ects of the response probability near the center of the regressor distribution. If one wishes to know the partial e ect, averaged over the values of X, then the linear probability model may work well even if it gives poor estimates of the partial e ects for extreme values of X. Example (Married Women s Labor Force Participation) In a survey of 753 women, 428 report working more than zero hours. Also, 606 have no young children while 118 have exactly one young child. The variables in play are 3

4 inlf nonw nc ed ex k6 k + binary, value 1 indicates non-zero working hours non-wife income, in thousands of dollars education experience number of children less than 6 years old number of children between 6 and 18, inclusive. The estimated regression is inlf P ols se robust se = 586 (154) [151] R 2 = nonwf inc + 038ed + 039ex 0006 ex 2 016age 262k (0014) [0015] (007) [007] (006) [006] (00018) [00019] (002) [002] (034) [032] (013) [013] k + Except for k+, all regressor coe cients have sensible signs and are statistically signi cant. The regressor k + is neither statistically signi cant nor practically important. Also, the OLS and robust standard errors are almost identical! Interpretation An increase in non-wife income of $10,000 reduces participation in the labor force by only.034 (3.4 percent). As the sample mean of non-wife income is only $20,129 with a standard deviation of $11,635, a $10,000 increase is quite substantial. Having one more small child seems a rst-order e ect, reducing the probability of being in the labor force by 26.2 percent. Finally, of the 753 tted values, 33 lie outside the unit interval (hence we do not construct WLS estimators). The case for linear probability models grows stronger if most regressors are discrete and take only a few values, so that there are no extreme values. To understand how to construct a discrete regressor from a continuous regressor, return to the preceding example. Partition the variable k6 into three indicator variables I 0 = 1 (0 young children), I 1 = 1 (1 young child), and I 2 = 1 (2 or more young children). We replace k6 with (I 1 ; I 2 ) to allow the impact of the rst young child to di er and obtain the estimated coe cients 263 for I 1 and 274 for I 2. It appears that the key impact is having one young child, additional young children do not change labor force participation much. The use of discrete regressors is familiar to us from our discussion of regressor speci cation. For the model with union-gender interactions, the predicted values correspond to cell averages. When a model has the amount of indicator variables and interactions to t cell averages, 4

5 the model is saturated. A new fact about saturated models emerges here If a model is saturated, then the predicted probabilities must lie between 0 and 1 (because they mimic cell averages). For estimates of the partial e ects for extreme values of the regressors, we must develop a new framework. To do so, observe that the linear probability model falls within a broader class of models, termed single index models. The term single index arises because the various regressors a ect Y t through the scalar Xt, 0 which is the single index. The class of single index models is Y t = F (X 0 t) + U t ; where the linear probability model is given by F (Xt) 0 = Xt. 0 To overcome the problem that the predictions for Y t can lie outside the unit interval, we constrain F so that 0 < F (z) < 1 for all z Given this constraint, a natural choice for F is a cumulative distribution function (although it is not necessary to use a CDF). Index models in which F is a CDF are derived from a latent model. The latent model concerns a variable that underlies the decision and cannot be observed. If Y t measures whether or not household t owns a home, then the latent variable Yt captures the desire of household t to own a home. If the desire is high enough, then household t owns a home. Or, put another way, Yt captures the di erence in utility between the two options, namely owning and renting, in which case if Yt is positive, then the utility from owning exceeds that from renting and household t purchases a home. The (latent) population model that explains the latent variable is Yt = X t + V t ; where fv t g n is a sequence of i.i.d. random variables that are symmetric about their mean of 0, with variance 2 and distribution F. (Note, V t does not have to be symmetric about zero.) The latent variable and the observed variable are linked as 1 if Y Y t = t > 0 0 otherwise From the measurement rule we can see that if we multiply Yt by any positive constant, then Y t is unchanged. As a result, we can only estimate 0 and 1 up to a positive multiple (that is, relative to scale). To identify the coe cients, we 5

6 set = 1. We also see that if the threshold is c 6= 0, then we return to a zero threshold simply by subtracting c from 0. To identify the intercept, we set c = 0. To construct an estimator of the coe cients, note p (X) = P (Y t = 1jX) = P (Y t > 0jX) = P ( X t + V t > 0) = P ( X t > V t ) = P ( ( X t ) V t ) = F ( X t ) ; where the third displayed line follows because of symmetry about 0. The main criticism of the linear probability model has been addressed; because the distribution function is contained in [0; 1] so too is the probability that Y t equals 1. The presence of the latent model can give one the impression that we are interested in the e ect of X on Y, which is given by 1. Yet Y rarely has sensible units of measurement (desire to own a home, or di erences in utility), so the magnitude of 1 is not generally important. Rather, our goal is to explain the e ect of X on the response probability p (X). To understand the link, note that if X is a continuous = f ( X) 1 where f (z) If F is a strictly increasing function (as is true for Gaussian and logistic CDF s), then f (z) > 0 for all z and the sign of 1 determines the direction of the e ect on the response probability. Observe that the magnitude of the e ect depends on the value of the regressor, through f ( X). If the underlying density is unimodal and symmetric about 0, the maximum value of f (z) occurs at z = 0. For the leading cases Probit f (z) = Logit f (z) = 1 p 2 2 e z2 f (0) = 1 p e z (1 + e z ) 2 f (0) = 25 For the multiple regressor model, relative e ects do not vary with @X j = i j 6

7 If X is a discrete regressor, then the impact of an increase from c to c + 1 in X on the response probability is F ( (c + 1)) F ( c) If X is an indicator regressor, then c = 0. Example (E ect of Job Training) p (X) = probability of employment X k = indicator of participation in job training The direction of the job training e ect is the sign of k, while the magnitude of the e ect di ers depending on age, education, and experience (the other included regressors). Finally, consider the model X t = X 1t + 2 X 2 1t + 3 ln X 2t The partial e ect of X 1 on the response probability is f (X t ) ( X 1t ) so the direction of the partial e ect potentially changes at X 1 = partial e ect of l 2 on the response probability is f (X t ) The Because d ln X 2 = dx 2 X 2, a 1 percent change in X 2 is a.01 change in ln X 2. Therefore the partial e ect of a 1 percent change in X 2 on the response probability is f (X t ) Given the distributional assumptions on V t, the maximum likelihood estimator arises naturally. The distribution of Y 1 is used to form the likelihood as L ( 0 ; 1 jy 1 = y 1 ; X 1 = x 1 ) = [F ( x 1 )] y 1 [1 F ( x 1 )] 1 y 1 The likelihood for the sample is = L [ 0 ; 1 j (Y 1 ; X 1 ) = (y 1 ; x 1 ) ; ; (Y n ; X n ) = (y n ; x n )] ny [F ( x t )] yt [1 F ( x t )] 1 yt 7

8 The log-likelihood is ln L ( 0 ; 1 j) = (y t ln F ( x t ) + (1 y t ) ln [1 F ( x t )]) (We are able to construct ln L because of the strict inequality 0 < F (z) < 1.) The rst-order condition for the estimator of the coe cients is the partial derivative of the log-likelihood with respect to each of the coe cients and is termed the score. The ML estimators B 0 and B 1 are the values that set the scores equal to ln L ( 0 ; 1 i = i =B i y t F (B 0 + B 1 x t ) F (B 0 + B 1 x t ) [1 F (B 0 + B 1 x t ( x t = 0; i i =B i for i = 0; 1. The MLE is consistent and asymptotically Gaussian. To determine the covariance matrix for the estimators, we construct the expected value of the Hessian conditional on X, for which we ( xt) = it f ( i x t ), equals = E 2 ln L ( E 0 ; 1 0 jx F (1 F ) xf( xf) 0 + u(xf (1) ) # uxf [xf(1 F ) + ( xf)f ] [F (1 F )] 2 jx f 2 x t x 0 t ; because E (ujx) = 0 F (1 F ) which is a positive semi-de nite matrix. The estimator of the asymptotic variance of B is " # 1 f 2 x t x 0 t V = F (1 F ) If the inverse exists, the matrix is positive de nite. If the inverse does not exist, the problem is likely multicollinear regressors. It does not make sense to compute robust standard errors. The reason - in the latent model we specify all conditional 8

9 moments of Y jx. Therefore, if we believe the variance is misspeci ed, then the conditional mean must be misspeci ed as well. If we follow the classic regression model and assume that V t is Gaussian, then F () is the distribution of a standard Gaussian random variable. The resulting ML estimators are termed probit estimators (because is termed the probit function in statistics) and are obtained through nonlinear optimization (as the score function is not a linear function of the coe cient estimators). Because the Gaussian distribution function cannot be expressed in closed form (that is, an integral must be used), many researchers assume that V t has a logistic distribution. While the logistic density function is similar to the Gaussian and di ers only in the tails, the logistic distribution function can be expressed in closed form as F ( x t ) = exp ( x t ) 1 + exp ( x t ) The closed form expression for the logistic distribution delivers a simpli ed likelihood as well ny yt 1 yt exp (0 + L ( 0 ; 1 j) = 1 x t ) exp ( x t ) 1 + exp ( x t ) = exp ( P n 0 y P t + n 1 x ty t ) Q n [1 + exp ( x t )] Thus ln L ( 0 ; 1 j) = 0 y t + 1 x t y t ln [1 + exp ( x t )] The ML estimators, which are termed logit estimators, are the values B 0 and B 1 that ln L ( 0 ; 1 j) = y t x i;t i i =B i 1 + exp (B 0 + B 1 x t ) exp (B 0 + B 1 x t ) x i;t = 0; for i = 0; 1 with x 0;t = 1 and x 1;t = x t. (As for the probit estimators, a nonlinear solution technique must be used.) An immediate consequence of the score for 0 is that y t = ^P (y t = 1) ; 9

10 that is the observed frequency of y t = 1 equals the predicted frequency (here ^P (y t = 1) exp(b 0+B 1 x t) 1+exp(B 0 +B 1 x t) ). (Note, the same feature holds for the linear probability model, because the OLS coe cient estimators satisfy the relation that the sum of observed values of the dependent variable, which yields the observed frequencies, equals the sum of predicted values of the dependent variable, which yields the predicted frequencies.) One additional advantage of the logistic assumption is that ln F ( x t ) 1 F ( x t ) = x t The slope coe cient is interpreted as the e ect of a one unit change in the regressor on the logarithm of the odds ratio, where the odds ratio yields the probability of a success (Y = 1) divided by the probability of a failure (Y = 0). For the special case in which we have a number of observations for each value of the regressor, a simpler estimator can be constructed. Suppose that the regressor takes K distinct values and that there are n k observations on each value. For each of the distinct regressor P values calculate the observed frequency of success, that is construct ^p k = 1 nk n k y t. The estimator of the coe cients is then obtained as the OLS coe cient estimator for the regression model ln ^p k 1 ^p k = x k + u k for k = 1; ; K. The method is sensible if n k is large for each k. If n k is not constant across k, then the error is heteroskedastic and weighted least squares should be used. To perform hypothesis tests, any of the three test statistics can be used. As the tests are asymptotically equivalent (and the nite sample comparisons are speci c to the model) simply choose the statistic that is easiest to compute. We begin with test of exclusion restrictions, of which the leading example is the need to include additional regressors Z (perhaps indicators for region or industry). Set Y t = F (X t + Z t ) + U t (If Z t consists only of functions of X t, we have a pure functional form test.) The Wald test is computed directly in Stata. To construct an LR test, rst estimate the full model via probit and obtain the estimated value of the log-likelihood, ^L U. Next construct the probit estimate for the restricted model, in which the 10

11 conditional mean is assumed to be F (X t ), and obtain the estimated value of the log-likelihood, ^LR. The LR test statistic is 2 ^LU ^LR ) 2 Q Q = dim () If Q is large, then probit can be di cult to construct. For large Q, the LM statistic is preferred as only the restricted model is estimated. First, construct the probit estimate for the restricted model, B, and form ^F = F (XB) ^f = f (XB) ^U = Y ^F. We then regress the residuals on both the included and excluded regressors, where we do WLS for e ciency ^U h i 1 t ^ft ^ft 2 = 1 X t + 2 Z t + V t w t = ^Ft 1 ^Ft w t w t w t Because the residuals sum to zero, there is no need for an intercept. The explained sum of squares from the regression is identical to the LM statistic. Alternatively, nr 2 can be used as it is an asymptotically equivalent statistic, although numerically distinct. Both are distributed as 2 (Q) random variables. Although less common, there are more general restrictions of interest to test. To test for heteroskedastic errors, the latent model becomes Y t = X t + U t U t jx N 0; e 2Zt We analyze the leading case, in which Z t consists of all varying regressors (all but the intercept). With heteroskedastic errors, our calculations become P (Y t = 1jX) = P (U t > X t jx) = P e Zt U t > e Zt X t jx = e Zt X t As noted before, if the error to the latent model is heteroskedastic, the speci cation of the conditional mean is altered (hence we do not construct robust standard errors for the original speci cation). In particular, the conditional mean is no longer a single index model, as the regressors a ect the response probabilty in two ways. To indicate the absence of a single index model, the response probability is often written as p (X) = m (X; X; ) ; 11

12 where the last two arguments emphasize the fact that regressors a ect the response probability through more than the single index X. The natural null hypothesis is H 0 = 0, under which the latent model is a standard probit model. As the restricted model is clearly the easiest to estimate, we again use the LM statistic. Again, construct the probit estimate for the restricted model, B, and form ^F = F (XB) ^f = f (XB) ^U = Y ^F. We then regress the residuals on both the included regressors and the score for (recall, ^ft multiplied by the excluded regressors forms the score for in the test of exclusion restrictions), where we do WLS for e ciency ^U t ^ft 5 m (X t ; X t ; ) = 1 X t + 2 w t w t + V t w t = =0 w t h ^Ft 1 ^Ft i 1 2 For the heteroskedaticity example (in which 0 = 0) 5 m (X t ; X t ; ) w t =0 = e Zt X t e Zt X t ( Z t ) =0 = (X t ) X t (Z t ) The explained sum of squares from the regression is identical to the LM statistic. Alternatively, nr 2 can be used as it is an asymptotically equivalent statistic, although numerically distinct. Again, both statistics are 2 (Q) random variables. 12

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

CAPM, Arbitrage, and Linear Factor Models

CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution James H. Steiger November 10, 00 1 Topics for this Module 1. The Binomial Process. The Binomial Random Variable. The Binomial Distribution (a) Computing the Binomial pdf (b) Computing

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Regression with a Binary Dependent Variable

Regression with a Binary Dependent Variable Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

More information

Review of Bivariate Regression

Review of Bivariate Regression Review of Bivariate Regression A.Colin Cameron Department of Economics University of California - Davis accameron@ucdavis.edu October 27, 2006 Abstract This provides a review of material covered in an

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

IDENTIFICATION IN A CLASS OF NONPARAMETRIC SIMULTANEOUS EQUATIONS MODELS. Steven T. Berry and Philip A. Haile. March 2011 Revised April 2011

IDENTIFICATION IN A CLASS OF NONPARAMETRIC SIMULTANEOUS EQUATIONS MODELS. Steven T. Berry and Philip A. Haile. March 2011 Revised April 2011 IDENTIFICATION IN A CLASS OF NONPARAMETRIC SIMULTANEOUS EQUATIONS MODELS By Steven T. Berry and Philip A. Haile March 2011 Revised April 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1787R COWLES FOUNDATION

More information

Empirical Methods in Applied Economics

Empirical Methods in Applied Economics Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

More information

Representation of functions as power series

Representation of functions as power series Representation of functions as power series Dr. Philippe B. Laval Kennesaw State University November 9, 008 Abstract This document is a summary of the theory and techniques used to represent functions

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

The Cobb-Douglas Production Function

The Cobb-Douglas Production Function 171 10 The Cobb-Douglas Production Function This chapter describes in detail the most famous of all production functions used to represent production processes both in and out of agriculture. First used

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA

More information

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data By LEVON BARSEGHYAN, JEFFREY PRINCE, AND JOSHUA C. TEITELBAUM I. Empty Test Intervals Here we discuss the conditions

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 94305-5405.

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 94305-5405. W hittemoretxt050806.tex A Bayesian False Discovery Rate for Multiple Testing Alice S. Whittemore Department of Health Research and Policy Stanford University School of Medicine Correspondence Address:

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Chapter 5. Random variables

Chapter 5. Random variables Random variables random variable numerical variable whose value is the outcome of some probabilistic experiment; we use uppercase letters, like X, to denote such a variable and lowercase letters, like

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Common sense, and the model that we have used, suggest that an increase in p means a decrease in demand, but this is not the only possibility.

Common sense, and the model that we have used, suggest that an increase in p means a decrease in demand, but this is not the only possibility. Lecture 6: Income and Substitution E ects c 2009 Je rey A. Miron Outline 1. Introduction 2. The Substitution E ect 3. The Income E ect 4. The Sign of the Substitution E ect 5. The Total Change in Demand

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

Risk Aversion. Expected value as a criterion for making decisions makes sense provided that C H A P T E R 2. 2.1 Risk Attitude

Risk Aversion. Expected value as a criterion for making decisions makes sense provided that C H A P T E R 2. 2.1 Risk Attitude C H A P T E R 2 Risk Aversion Expected value as a criterion for making decisions makes sense provided that the stakes at risk in the decision are small enough to \play the long run averages." The range

More information

Panel Data Econometrics

Panel Data Econometrics Panel Data Econometrics Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans University of Orléans January 2010 De nition A longitudinal, or panel, data set is

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions. Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

More information

Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

More information

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution Recall: Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution A variable is a characteristic or attribute that can assume different values. o Various letters of the alphabet (e.g.

More information

The Dynamics of UK and US In ation Expectations

The Dynamics of UK and US In ation Expectations The Dynamics of UK and US In ation Expectations Deborah Gefang Department of Economics University of Lancaster email: d.gefang@lancaster.ac.uk Simon M. Potter Gary Koop Department of Economics University

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X Week 6 notes : Continuous random variables and their probability densities WEEK 6 page 1 uniform, normal, gamma, exponential,chi-squared distributions, normal approx'n to the binomial Uniform [,1] random

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Module 4 - Multiple Logistic Regression

Module 4 - Multiple Logistic Regression Module 4 - Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Basic Probability Concepts

Basic Probability Concepts page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information