# Sampling Theory for Discrete Data

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Sampling Theory for Discrete Data * Economic survey data are often obtained from sampling protocols that involve stratification, censoring, or selection. Econometric estimators designed for random samples may be inconsistent or inefficient when applied to these samples. * When the econometrician can influence sample design, then the use of stratified sampling protocols combined with appropriate estimators can be a powerful tool for maximizing the useful information on structural parameters obtainable within a data collection budget. * Sampling of discrete choice alternatives may simplify data collection and analysis for MNL models. 1

2 Basics of Sampling Theory Let z denote a vector of exogenous variables, and y denote an endogenous variable, or a vector of endogenous variables, such as choice indicators. The joint distribution of (z,y) in the population is P(y z, o )p(z) = Q(z y, o )q(y, o ); see Figure 1. P(y z, o ), the conditional probability of y, given z, in a parametric family with true parameter vector o, is the structural model of interest. "Structural" means this conditional probability law is invariant in different populations or policy environments when the marginal distribution of z changes. If z causes y, then P(yz, o ) is a structural relationship. Other notation: p(z) marginal distribution of exogenous variables Q(z y) conditional distribution of z given y q(y) marginal distribution of y 2

3 Figure 1. Population Probability Model y 1 y 2... y J Total z 1 P(y 1 z 1, 0 )p(z 1 ) P(y 2 z 1, 0 )p(z 1 ) P(y J z 1, 0 )p(z 1 ) p(z 1 ) z 2 P(y 1 z 2, 0 )p(z 2 ) P(y 2 z 2, 0 )p(z 2 ) P(y J z 2, 0 )p(z 2 ) p(z 2 ) : z K P(y 1 z K, 0 )p(z K ) P(y 2 z K, 0 )p(z K ) P(y J z K, 0 )p(z K ) p(z K ) Total q(y 1, 0 ) q(y 2, 0 ) q(y J, 0 ) 1 3

4 A simple random sample draws independent observations from the population, each with probability law P(y z, o )p(z). The kernel of the log likelihood of this sample depends only on the conditional probability P(y z,), not on the marginal density p(z); thus, maximum likelihood estimation of the structural parameters o does not require that the marginal distribution p(z) be parameterized or estimated. o influences only how data are distributed within rows of the table above; and how the data are distributed across rows provides no additional information on o. 4

5 Stratified Samples. An (exogenously) stratified random sample samples among rows with probability weights different from p(z), but within rows the sample is distributed with the probability law for the population. Just as for a simple random sample, the sampling probabilities across rows do not enter the kernel of the likelihood function for o, so the exogenously stratified random sample can be analyzed in exactly the same way as a simple random sample. The idea of a stratified random sample can be extended to consider what are called endogneous or choicebased sampling protocols. Suppose the data are collected from one or more strata, indexed s = 1,..., S. Each stratum has a sampling protocol that determines the segment of the population that qualifies for interviewing. 5

6 Let R(z,y,s) = qualification probability that a population member with characteristics (z,y) will qualify for the subpopulation from which the stratum s subsample will be drawn. For example, a stratum might correspond to the northwest 2x2 subtable in Figure 1, where y is one of the values y1 or y2 and z is one of the values z1 or z2. In this case, R(z,y,s) equals the sum of the four cell probabilities. Qualification may be related to the sampling frame, which selects locations (e.g., census tracts, telephone prefixes), to screening (e.g., terminate interview if respondent is not a home-owner), or to attrition (e.g., refusals) Examples: 1. Simple random subsample, with R(z,y,s) = Exogenously stratified random subsample, with R(z,y,s) = 1 if z A s for a subset A s of the universe Z of exogenous vectors, R(z,y,s) = 0 otherwise. For example, the set A s might define a geographical area. This corresponds to sampling randomly from one or more rows of the table. 6

7 Exogenous stratified sampling can be generalized to variable sampling rates by permitting R(z,y,s) to be any function from (z,s) into the unit interval; a protocol for such sampling might be, for example, a screening interview that qualifies a proportion of the respondents that is a function of respondent age. 7

8 3. Endogenously stratified subsample: R(z,y,s) = 1 if y B s, with B s a subset of the universe of endogenous vectors Y, and R(z,y,s) = 0 otherwise. The set B s might identify a single alternative or set of alternatives among discrete responses, with the sampling frame intercepting subjects based on their presence in B s ; e.g., buyers who register their purchase, recreators at national parks. Alternately, B s might identify a range of a continuous response, such as an income category. Endogenous sampling corresponds to sampling randomly from one or more columns of the table. A choice-based sample for discrete response is the case where each response is a different stratum. Then R(z,y,s) = 1(y = s). 8

9 Endogenous stratified sampling can be generalized to qualification involving both exogenous and endogenous variables, with B s defined in general as a subset of Z Y. For example, in a study of mode choice, a stratum might qualify bus riders (endogenous) over age 18 (exogenous). It can also be generalized to differential sampling rates, with a proportion R(z,y,s) between zero and one qualifying in a screening interview. 4. Sample selection/attrition, with R(z,y,s) giving the proportion of the population with variables (z,y) whose availability qualifies them for stratum s. For example, R(z,y,s) may give the proportion of subjects with variables (z,y) that can be contacted and will agree to be interviewed, or the proportion of subjects meeting an endogenous selection condition, say employment, that qualifies them for observation of wage (in z) and hours worked (in y). 9

10 The Sample Probability Law The population probability law for (z,y) is P(yz, o )p(z). The qualification probability R(z,y,s) characterizes the sampling protocol for stratum s. Then, R(z,y,s)P(yz, o )p(z) = joint probability that a member of the population will have variables (z,y) and will qualify for stratum s. (2) r(s) = R(z,y,s)P(yz, o )p(z) is the proportion of the population qualifying into the stratum, or qualification factor. The reciprocal of r(s) is called the raising factor. 10

11 (3) G(z,ys) = R(z,y,s)P(yz, o )p(z)/r(s), the conditional distribution G(z,ys) of (z,y) given qualification, is the sample probability law for stratum s. The probability law G(z,ys) depends on the unknown parameter vector, on p(z), and on the qualification probability R(z,y,s). In simple cases of stratification, R(z,y,s) is fully specified by the sampling protocol. The qualification factor r(s) may be known (e.g., stratification based on census tracts with known sizes); estimated from the survey (e.g.; qualification is determined by a screening interview); or estimated from an auxiliary sample. In case of attrition or selection, R(z,y,s) may be an unknown function, or may contain unknown parameters. 11

12 The precision of the maximum likelihood estimator will depend on the qualification factor. Suppose a random sample of size n s is drawn from stratum s, and let N = s n s denote total sample size. Let n(z,ys) denote the number of observations in the stratum s subsample that fall in cell (z,y). Then, the log likelihood for the stratified sample is (4) L = n(z,ys)log G(z,ys). 12

13 EXOGENOUS STRATIFIED SAMPLING If R(z,y,s) is independent of y, the qualification factor r(s) = R(z,s)p(z) is independent of o, and the log likelihood function separates into the sum of a kernel (5) L 1 = n(z,ys)log P(yz,) and terms independent of. Hence, the kernel is independent of the structure of exogenous stratification. 13

14 ENDOGENOUS STRATIFICATION Suppose the qualification probability R(z,y,s) depends on y. Then the qualification factor (2) depends on o, and the log likelihood function (4) has a kernel depending in general not only on, but also on the unknown marginal distribution p(z). Any unknowns in the qualification probability also enter the kernel. There are four possible strategies for estimation under these conditions: 1. Brute force -- Assume p(z) and, if necessary, R(z,y,s), are in parametric families, and estimate their parameters jointly with. For example, in multivariate discrete data analysis, an analysis of variance representation absorbs the effects of stratification. 14

15 2. Weighted Exogenous Sample Maximum Likelihood (WESML) This is a quasi-maximum likelihood approach which starts from the likelihood function appropriate to a random sample, and reweights the data (if possible) to achieve consistency. A familiar form of this approach is the classical survey research technique of reweighting a sample so that it appears to be a simple random sample. 3. Conditional Maximum Likelihood (CML): This approach pools the observations across strata, and then forms the conditional likelihood of y given z in this pool. This has the effect of conditioning out the unknown density p(z). 4. Full Information Maximum Likelihood (FICLE): This approach formally maximizes the likelihood function in p(z) as a function of the data, the remaining parameters, and a finite vector of auxiliary parameters, and then concentrates the likelihood function by substituting in these formal maximum likelihood estimates for p(z). 15

16 * Both the WESML and CML estimators are practical for many problems when auxiliary information is available that allows the raising factors to be estimated consistently. The FICLE estimator is computationally difficult and little used. WESML * Recall the kernel of the log likelihood for exogenous sampling is given by (5). Suppose now endogenous sampling with true log likelihood (4), and consider a quasi-maximum likelihood criterion based on (5), (7) W() = n(z,ys)w(z,y,s)log P(yz,), where w(z,y,s) = weight to achieve consistency. 16

17 * Suppose r(s) is consistently estimated by f(s), from government statistics, survey frame data such as the average refusal rate, or an auxiliary sample. Consider the weights (11) w(z,y) = 1/ ; these are well-defined if the bracketed expressions are positive and R(z,y,s) contains no unknown parameters. A classical application of WESML estimation is to a sample in which the strata coincide with the possible configurations of y, so that R(z,y,s) = 1(y = s). In this case, w(z,y) = Nf(y)/n y, the ratio of the population to the sample frequency. This is the raising factor encountered in classical survey research. Another application is to enriched samples, where a random subsample (s = 1) is enriched with an endogenous subsamples from one or more configurations of y; e.g., s = y = 2. Then, w(z,1) = N/n 1 and w(z,2) = Nf(2)/[n 1 f(2) + n 2 ]. 17

18 CML Pool the observations from the different strata. Then, the data generation process for the pooled sample is (15) Pr(z,y) = G(z,y,s)n s /N, and the conditional probability of y given z from this pool is (17) Pr(y z) =. The CML estimator maximizes the conditional likelihood of the pooled sample in and any unknowns in R(z,y,s). When r(s) is known, or one wishes to condition on estimates f(s) of r(s) from auxiliary samples, (17) is used directly. More generally, given auxiliary sample information on the r(s), these can be treated as parameters and estimated from the joint likelihood of (17) and the likelihood of the auxiliary sample. 18

19 For discrete response in which qualification does not depend on z, the formula (17) simplifies to Pr(yz) =, where y = R(z,y,s)n s /Nr(s) can be treated as an alternative-specific constant. For multinomial logit choice models, Pr(yz) then reduces to a multinomial logit formula with added alternative-specific constants. It is possible to estimate this model by the CML method using standard random sample computer programs for this model, obtaining consistent estimates for slope parameters, and for the sum of log y and alternative-specific parameters in the original model. What is critical for this to work is that the MNL model contain alternative-specific dummy variables corresponding to each choice-based stratum. * For an enriched sample, Pr(1z) = P(1z, o )n 1 /ND and Pr(2z) = P(2z, o )[n 1 /N + n 2 /Nr(2)]/D, where D = n 1 /N + P(2z, o )n 2 /N. * Example: Suppose y is a continuous variable, and the sample consists of a single stratum in which high income families are over-sampled by screening, so that 19

20 the qualification probability is R(z,y,1) = < 1 for y y o and R(z,y,1) = 1 for y > y o. Then Pr(yz) = P(yz, o )/D for y y o and Pr(yz) = P(yz, o )/D for y > y o, where D = + (1-)P(y>y o z, o ). * Both the WESML and CML estimators are computationally practical in a variety of endogenous sampling situations, and have been widely used. 20

21 Sampling of Alternatives for Estimation Consider a simple or exogenously stratified random sample and discrete choice data to which one wishes to fit a multinomial logit model. Suppose the choice set is very large, so that the task of collecting attribute data for each alternative is burdensome, and estimation of the MNL model is difficult. Then, it is possible to reduce the collection and processing burden greatly by working with samples from the full set of alternatives. There is a cost, which is a loss of statistical efficiency. Suppose C n is the choice set for subject n, and the MNL probability model is written P in = where i n is the observed choice and V in = x in. is the systematic utility of alternative i. Suppose the analyst selects a subset of alternatives A n for this subject which will be used as a basis for estimation; i.e., if i n is contained in A n, then the subject is treated as if the choice were actually being made from A n, and if i n is not contained in A n, the observation is discarded. In selecting A n, the analyst may use the 21

22 information on which alternative i n was chosen, and may also use information on the variables that enter the determination of the V in. The rule used by the analyst for selecting A n is summarized by a probability %(Ai n,v's) that subset A of C n is selected, given the observed choice i n and the observed V's (or, more precisely, the observed variables behind the V's). 22

23 The selection rule is a uniform conditioning rule if for the selected A n, it is the case that %(A n j,v's) is the same for all j in A n. Examples of uniform conditioning rules are (1) select (randomly or purposively) a subset A n of C n without taking into account what the observed choice or the V's are, and keep the observation if and only if i n is contained in the selected A n ; and (2) given observed choice i n. select m-1 of the remaining alternatives at random from C n, without taking into account the V's. 23

24 An implication of uniform conditioning is that in the sample containing the pairs (i n,a n ) for which i n is contained in A n, the probability of observed response i n conditioned on A n is = This is just a MNL model that treats choice as if it were being made from A n rather than C n, so that maximum likelihood estimation of this model for the sample of (i n,a n ) with i n A n estimates the same parameters as does maximum likelihood estimation on data from the full choice set. Then, this sampling of alternatives cuts down data collection time (for alternatives not in A n ) and computation size and time, but still gives consistent estimates of parameters for the original problem. 24

25 BIVARIATE SELECTION MODEL (29) y * = x + J, w * = z + ), x, z vectors of exogenous variables, not necessarily all distinct,, parameter vectors, not necessarily all distinct, ) a positive parameter. y * latent net desirability of work, w * latent log potential wage. NORMAL MODEL (30) ~ N, correlation '. 25

26 Observation rule "Observe y = 1 and w = w * if y * > 0; observe y = -1 and do not observe w when y * 0". The event of working (y = 1) or not working (y = 0) is observed, but net desirability is not, and the wage is observed only if the individual works (y * > 0). For some purposes, code the discrete response as s = (y+1)/2; then s = 1 for workers, s = 0 for non-workers. The event of working is given by a probit model. 0 is the standard univariate cumulative normal. The probability of working is P(y=1x) = P(J > -x) = 0(x), The probability of not working is Compactly, P(y=-1x) = P(J -x) = 0(-x). P(yx) = 0(yx). 26

27 In the bivariate normal, the conditional density of one component given the other is univariate normal, J ~ N(',1-' 2 ) = and J ~ N('J,1-' 2 ) =. 27

28 The joint density: marginal times conditional, (J,) ~ 1() = 1(J). The density of (y *,w * ) (31) f(y *,w * ) = 1( ) 1 = 1(y * -x) 1. Log likelihood of an observation, l(,,),'). In the case of a non-worker (y = -1 and w = NA), the density (31) is integrated over y * < 0 and all w *. Using the second form in (31), this gives probability 0(-x). 28

29 In the case of a worker, the density (31) is integrated over y * 0. Using the first form in (31) e l(,,),') =. 29

30 The log likelihood can be rewritten as the sum of the marginal log likelihood of the discrete variable y and the conditional log likelihood of w given that it is observed, l(,,),') = l 1 (,) + l 2 (,,),'), with the marginal component, (33) l 1 () = log 0(yx), and the conditional component (that appears only when y = 1), (34) l 2 (,,),') = -log ) + log 1( ) + log 0 - log 0(x). One could estimate this model by maximizing the sample sum of the full likelihood function l, by maximizing the sample sum of either the marginal or the conditional component, or by maximizing these components in sequence. Note that asymptotically efficient estimation requires maximizing the full likelihood, and that not all the parameters are identified in each component; e.g., only is identified from the marginal component. 30

31 Nevertheless, there may be computational advantages to working with the marginal or conditional likelihood. Maximization of l 1 is a conventional binomial probit problem. Maximization of l 2 could be done either jointly in all the parameters,, ', ); or alternately in, ', ), with the estimate of from a first-step binomial probit substituted in and treated as fixed. 31

32 When ' = 0, the case of "exogenous" selection in which there is no correlation between the random variables determining selection into the observed population and the level of the observation, l 2 reduces to the log likelihood for a regression with normal disturbances. When ' g 0, selection matters and regressing of w on z will not give consistent estimates of and ). 32

33 An alternative to maximum likelihood estimation is a GMM procedure based on the moments of w. Using the property that the conditional expectation of given y = 1 equals the conditional expectation of given J, integrated over the conditional density of J given y = 1, plus the property of the normal that d1(j)/dj = -J1(J), one has (35) E{wz,y=1} = z + )E{y=1} = z + ) E{J}1(J)dJ/0(x) = z + )' J1(J)dJ/0(x) = z + )'1(x)/0(x) z + M(x), where = )' and M(c) = 1(c)/0(c) is called the inverse Mill's ratio. 33

34 Further, E( 2 J) = Var(J) + {E(J)} 2 = 1 - ' 2 + ' 2 J 2, J 2 1(J)dJ = - J11(J)dJ = -c1(c) + 1(J)dJ = -c1(c) + 0(c), (36) E{(w-z) 2 z,y=1} = ) 2 E{ 2 y=1} = ) 2 E{ 2 J}1(J)dJ/0(x) = ) 2 {1 - ' 2 + ' 2 J 2 }1(J)dJ/0(x) Then, = ) 2 {1 - ' 2 + ' 2 - ' 2 x1(x)/0(x)} = ) 2 {1 - ' 2 x1(x)/0(x)} = ) 2 {1 - ' 2 xm(x)}. 34

35 (37) E = E{(w-z) 2 z,y=1} - [E{w-zz,y=1}] 2 = ) 2 {1 - ' 2 x1(x)/0(x) - ' 2 1(x) 2 /0(x) 2 } = ) 2 {1 - ' 2 M(x)[x + M(x)}. 35

36 A GMM estimator for this problem can be obtained by applying NLLS, for the observations with y = 1, to the equation (38) w = z + )'M(x) +, where is a disturbance that satisfies E{y=1} = 0. This ignores the heteroskedasticity of, but it is nevertheless consistent. This regression estimates only the product )'. 36

37 Consistent estimates of ) and ' could be obtained in a second step: (39) V{x,z,y=1} = ) 2 {1 - ' 2 M(x)[x + M(x)]}, Estimate ) 2 by regressing the square of the estimated residual, 2 e, (40) 2 e = a + b{m(x e )[x e + M(x e )]} +! provide consistent estimates of ) 2 and ) 2 ' 2, respectively. 37

38 There is a two-step estimation procedure, due to Heckman, that requires only standard computer software, and is widely used: [1] Estimate the binomial probit model, (42) P(yx,) = 0(yx), by maximum likelihood. [2] Estimate the linear regression model, (43) w = z + M(x e ) +, where = )' and the inverse Mill's ratio M is evaluated at the parameters estimated from the first stage. To estimate ) and ', and increase efficiency, one can do an additional step, [3] Estimate ) 2 using the procedure described in (40), with estimates e from the second step and e from the first step. 38

39 One limitation of the bivariate normal model is most easily seen by examining the regression (43). Consistent estimation of the parameters in this model requires that the term M(x ) be estimated consistently. This in turn requires the assumption of normality, leading to the first-step probit model, to be exactly right. Were it not for this restriction, estimation of in (43) would be consistent under the much more relaxed requirements for consistency of OLS estimators. To investigate this issue further, consider the bivariate selection model (29) with the following more general distributional assumptions: (i) J has a density f(j) and associated CDF F(J); and (ii) has E(J) = 'J and a second moment E( 2 J) = 1 - ' 2 that is independent of J. Define the truncated moments J(x) = E(JJ>-x) = Jf(J)dJ/[1 - F(-x)] and K(x) = E(1 - J 2 J>-x) 39

40 = [1 - J 2 ]f(j)dj/[1 - F(-x)]. Then, given the assumptions (i) and (ii), E(wz,y=1) = z + )'E(JJ>-x) E((w - E(wz,y=1)) 2 z,y=1) = z + )'J(x), = ) 2 {1 - ' 2 [K(x) + J(x) 2 ]}. Thus, even if the disturbances in the latent variable model were not normal, it would nevertheless be possible to write down a regression with an added term to correct for self-selection that could be applied to observations where y = 1: (45) w = z + )E{x+J>0} + = z + )'J(x) +, where is a disturbance that has mean zero and the heteroskedastic variance E( 2 z,y=1)) = ) 2 {1 - ' 2 [K(x) + J(x) 2 ]}. Now suppose one runs the regression (37) with an inverse Mill's ratio term to correct for self-selection, when in fact the disturbances are not normal and (44) is the correct specification. What bias results? The answer is that the closer M(x) is to J(x), the less 40

41 the bias. Specifically, when (44) is the correct model, regressing w on z and M(x) amounts to estimating the misspecified model w = z + M(x) + { + [J(x) - M(x)]}. 41

42 The bias in NLLS is given by = this bias is small if = )' is small or the covariance of J - M with z and M is small. 42

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Analysis of Microdata

Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London)

### MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo. 3 MT426 Notebook 3 3. 3.1 Definitions... 3. 3.2 Joint Discrete Distributions...

MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo c Copyright 2004-2012 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MT426 Notebook 3 3 3.1 Definitions............................................

### 1 Logit & Probit Models for Binary Response

ECON 370: Limited Dependent Variable 1 Limited Dependent Variable Econometric Methods, ECON 370 We had previously discussed the possibility of running regressions even when the dependent variable is dichotomous

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION Hypothesis Testing Qualitative Choice Analysis Workshop 77 T-test Use to test value of one parameter. I. Most common application: to test whether

### Introduction to latent variable models

Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Econometrics II. Lecture 9: Sample Selection Bias

Econometrics II Lecture 9: Sample Selection Bias Måns Söderbom 5 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom, www.soderbom.net.

### Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### IBM SPSS Complex Samples 22

IBM SPSS Complex Samples 22 Note Before using this information and the product it supports, read the information in Notices on page 51. Product Information This edition applies to version 22, release 0,

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### What Does the Correlation Coefficient Really Tell Us About the Individual?

What Does the Correlation Coefficient Really Tell Us About the Individual? R. C. Gardner and R. W. J. Neufeld Department of Psychology University of Western Ontario ABSTRACT The Pearson product moment

### Multiple Choice Models II

Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

### 1. Linear-in-Parameters Models: IV versus Control Functions

What s New in Econometrics? NBER, Summer 2007 Lecture 6, Tuesday, July 31st, 9.00-10.30 am Control Function and Related Methods These notes review the control function approach to handling endogeneity

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Sampling Designs. 1. Simple random sampling (SRS) Steps: The method is not very different from winning a lottery.

Sampling Designs 1. Simple random sampling (SRS) Steps: (1) Assign a single number to each element in the sampling frame. (2) Use random numbers to select elements into the sample until the desired number

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Control Functions and Simultaneous Equations Methods

Control Functions and Simultaneous Equations Methods By RICHARD BLUNDELL, DENNIS KRISTENSEN AND ROSA L. MATZKIN Economic models of agent s optimization problems or of interactions among agents often exhibit

### Why Sample? Why not study everyone? Debate about Census vs. sampling

Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### WHERE DOES THE 10% CONDITION COME FROM?

1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### ECON Introductory Econometrics. Lecture 15: Binary dependent variables

ECON4150 - Introductory Econometrics Lecture 15: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability

### The Simple Linear Regression Model: Specification and Estimation

Chapter 3 The Simple Linear Regression Model: Specification and Estimation 3.1 An Economic Model Suppose that we are interested in studying the relationship between household income and expenditure on

### Some probability and statistics

Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### the number of organisms in the squares of a haemocytometer? the number of goals scored by a football team in a match?

Poisson Random Variables (Rees: 6.8 6.14) Examples: What is the distribution of: the number of organisms in the squares of a haemocytometer? the number of hits on a web site in one hour? the number of

### CHAPTER 6 EXAMPLES: GROWTH MODELING AND SURVIVAL ANALYSIS

Examples: Growth Modeling And Survival Analysis CHAPTER 6 EXAMPLES: GROWTH MODELING AND SURVIVAL ANALYSIS Growth models examine the development of individuals on one or more outcome variables over time.

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

### Double Sampling: What is it?

FOR 474: Forest Inventory Techniques Double Sampling What is it? Why is it Used? Double Sampling: What is it? In many cases in forestry it is too expensive or difficult to measure what you want (e.g. total

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### Chapter 7 Sampling and Sampling Distributions. Learning objectives

Chapter 7 Sampling and Sampling Distributions Slide 1 Learning objectives 1. Understand Simple Random Sampling 2. Understand Point Estimation and be able to compute point estimates 3. Understand Sampling

### ST 371 (VIII): Theory of Joint Distributions

ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning two or

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### 4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Logit Models for Binary Data

Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

### i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### 1 Orthogonal projections and the approximation

Math 1512 Fall 2010 Notes on least squares approximation Given n data points (x 1, y 1 ),..., (x n, y n ), we would like to find the line L, with an equation of the form y = mx + b, which is the best fit

### 1) Overview 2) Sample or Census 3) The Sampling Design Process i. Define the Target Population ii. Determine the Sampling Frame iii.

1) Overview 2) Sample or Census 3) The Sampling Design Process i. Define the Target Population ii. Determine the Sampling Frame iii. Select a Sampling Technique iv. Determine the Sample Size v. Execute

### Chapter 3 Joint Distributions

Chapter 3 Joint Distributions 3.6 Functions of Jointly Distributed Random Variables Discrete Random Variables: Let f(x, y) denote the joint pdf of random variables X and Y with A denoting the two-dimensional

### APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

### AP STATISTICS 2010 SCORING GUIDELINES

2010 SCORING GUIDELINES Question 4 Intent of Question The primary goals of this question were to (1) assess students ability to calculate an expected value and a standard deviation; (2) recognize the applicability

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Notes 5: More on regression and residuals ECO 231W - Undergraduate Econometrics

Notes 5: More on regression and residuals ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Regression Method Let s review the method to calculate the regression line: 1. Find the point of

### An-Najah National University Faculty of Engineering Industrial Engineering Department. Course : Quantitative Methods (65211)

An-Najah National University Faculty of Engineering Industrial Engineering Department Course : Quantitative Methods (65211) Instructor: Eng. Tamer Haddad 2 nd Semester 2009/2010 Chapter 5 Example: Joint

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### STAT/MTHE 353: Probability II. STAT/MTHE 353: Multiple Random Variables. Review. Administrative details. Instructor: TamasLinder

STAT/MTHE 353: Probability II STAT/MTHE 353: Multiple Random Variables Administrative details Instructor: TamasLinder Email: linder@mast.queensu.ca T. Linder ueen s University Winter 2012 O ce: Je ery

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics

Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics John Pepper Assistant Professor Department of Economics University of Virginia 114 Rouss

### Joint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single

### Limitations of regression analysis

Limitations of regression analysis Ragnar Nymoen Department of Economics, UiO 8 February 2009 Overview What are the limitations to regression? Simultaneous equations bias Measurement errors in explanatory

### PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

### Toss a coin twice. Let Y denote the number of heads.

! Let S be a discrete sample space with the set of elementary events denoted by E = {e i, i = 1, 2, 3 }. A random variable is a function Y(e i ) that assigns a real value to each elementary event, e i.

### Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS