Multivariate Ordered Regression

Size: px
Start display at page:

Download "Multivariate Ordered Regression"

Transcription

1 Multivariate Ordered Regression February 28, 2012 Valentino Dardanoni, Antonio Forcina, Paolo Li Donni ABSTRACT TO BE WRITTEN JEL codes: Keywords: 1 Introduction Many interesting problems in economics involve the study of how an ordered response variable depends on a set of regressors. In many situations we may be concerned with the more general problem of modeling how the joint distribution of several closely related ordered response variables (Y 1,..., Y K ) depend on a vector of covariates z. Problems of this kind arise, for example, when we consider different choices made by each subject at a given point in time, or repeated choices on the same item made by each subject at different points in time. 1 In the microeconometric literature, the current approach to modelling the joint conditional distribution of ordinal response variables relies on assuming the existence of K latent variables which form a regression system Y k = z kτ k + ɛ k, k = 1,..., K where z k denotes the subset of regressors z relevant to the kth equation; a set of threshold parameters transform the continuous latent distribution into the actual discrete one. In addition, the joint distribution for the errors H(ɛ 1,..., ɛ K ) may also be specified. An alternative approach is to consider the conditional distribution of the ordered response variables as a multi-way table of joint probabilities, to arrange these into the vector π(z) and to define a suitable multivariate transformation of π(z), known as link function, which makes the dependence on covariates linear π(z) = g[λ(z)] = g(α + Zβ); (1) here Z is a matrix of known constants which depends on z and the vector of parameters λ(z) describe relevant aspects of (Y 1,..., Y K z) which have, hopefully, interpretation in terms of economic theory. Models of this kind have, essentially, two components: 1 If data involve observations taken at different points in time, the model described in this paper can be seen as a static panel model with free correlations across periods. With panel data, more specific approaches have been designed to take into account individual heterogeneity, lagged dependent variables and serial correlation. 1

2 The regression model λ(z) = α + Zβ: This is the parametric component of the model. The link function π(z) = g[λ(z)]: This is the potentially non parametric component of the model. It maps the linear function of covariates λ(z) onto a discrete joint distribution. Equation (1) defines a wide class of multivariate ordered regression models, whose elements are characterized by the specific link function g. Well known examples are the log-linear and the probit links. The log-linear link function (see e.g. Amemiya ([3], Chapter 9) or Agresti ([1], Chapter 4), in spite of its appealing simplicity, does not allow to model the univariate marginals directly, cannot take into account the ordered nature of the response variables, and does not have a latent variable interpretation. On the other hand, the so called multivariate ordered probit model, which is based on the probit link, does not suffer from any of these limitations. However, because of its simplicity, the probit link imposes some strong restrictions to the association structure of the response variables, namely (i) each bivariate marginal distribution has only one association parameter; and (ii) all log-linear interactions of order higher than two are constrained to zero. In addition, the process of fitting models based on the probit link may be computationally demanding when there are several response variables. Within the class of model defined by equation (1), we are typically interested in link functions whose elements have, possibly, the following desirable features: they describe relevant aspects of (Y 1,..., Y K ) z which have a substantial economic interpretation; they take into account the ordered nature of the response variables; they make it possible to model the univariate marginals and the association structure directly; they have an interpretation in terms of the system of latent equations. By exploiting recent advances in the theory of marginal modelling (Bartolucci, Colombi and Forcina, [4]), we propose a link function having most of the desirable properties listed above. The available theory of marginal models can handle a link function where the association structure is unrestricted, however this would make the exposition heavier with little practical advantages. Though the link function we discuss in this paper implies certain mild constraints on the association structure, it may be further simplified by testable restrictions. In this formulation, the table π(z) which arrays the joint probabilties of the response variables (Y 1,..., Y K ) z is decomposed into two main sets of parameters of interest, namely the global logits, which model the marginal distribution of ordered discrete variables, and the global log-odds ratios, which describe their association. As we show in section 2.4 below, these parameters have a natural interpretation in terms of stochastic dominance, thus allowing a clear economic intepretation of estimated parameters. In Section 2 we derive the main properties of the marginal modeling approach and its relation with the latent variable model. Section 3 discusses the computational properties of maximum likelihood estimates, and the asymptotic distribution of the likelihood ratio under suitable equality and inequality constraints. Section 4 clarifies the use of these models with an application to testing for asymmetric information in the Medigap insurance market. 2

3 2 Ordered regression In order to help the understanding of our approach to the multivariate case, we start by briefly reviewing the well known univariate case with a single ordinal response variable Y, taking value in {1,..., m}. Let q(z) be the m 1 1 vector denoting the survival function of Y, conditionally on z: q(z) = ( P r(y > 1 z)... P r(y > m 1 z) ) and notice that the term P r(y > 0 z) = 1 can be omitted. A generalized ordered regression model is an equation that relates q(z) to z through a vector valued function g : R m 1 (0, 1) m 1, known as a link function (McCullagh and Nelder [12]) which is assumed to be invertible and twice differentiable and such that q(z) = g(α + Zβ) (2) where Z is a matrix of known constants which depend on z. Since the link function is invertible, there is a well defined set of m 1 parameters which are linearly related to Z: λ(z) = g 1 (q(z)) = α + Zβ. (3) It is well known (see, for example Wooldridge [19], p. 457) that, under a few additional assumptions, the ordered regression model (2) is equivalent to assume the existence of a continuous latent variable Y which follows a linear regression model Y = τ 0 + z τ + ɛ (4) where the error ɛ is independent of z and has cumulative distribution function P (ɛ v) = G(v), and there is a vector of m 1 unknown parameters γ (called thresholds), with γ 1 < < γ m 1, such that Y j Y γ j. Notice that, since we can add an arbitrary constant to τ 0 and subtract the same constant from γ without affecting α, these two parameters cannot be identified simultaneously; in the sequel we assume that τ 0 = 0 so that the regression model in (4) has no intercept. When ɛ has a standard logistic distribution, the latent regression model (4) is equivalent to an ordered regression model where P r(ɛ v) = G(v) = e v /(1 + e v ). By inverting the survival function P r(y > j) = G( γ j + z β), it follows that λ(z) is a vector of so-called global logits which are linearly related to z λ j (z) = log[p r(y > j z)/p r(y j z)] = ( γ j ) + z β, j = 1,..., m 1, (5) and can be seen as the natural generalization of the standard binary logits when the variable is ordered. In fact, global logits can be interpreted as binary logits computed after dichotomizing the response categories at each cut point into a low and a high level. The standard ordered logit model can be seen as a special case of the ordered regression model (2) with G being the logit link. Different choices of the link function G give raise to different assumptions on the distribution of the latent regression error ɛ and the definition of the parameter vector λ(z). The best known alternative to the ordered logit model is of course the ordered probit model, where G is the standardized normal cdf. 3

4 2.1 Multivariate ordered regression The univariate latent regression model (4) may be extended to the multivariate case by assuming, in addition the existence of K latent continuous variables Yk, k = 1,..., K, such that Y k = z kτ k + ɛ k, k = 1,..., K, (6) the existence of a joint distribution for the errors ɛ 1,..., ɛ K. It then becomes a seemingly unrelated regression system formulated in terms of latent variables. Let H denote the joint distribution of the errors so that H(a 1,..., a K ) = P (ɛ 1 a 1,..., ɛ K a K ), and let H k, k = 1,..., K denote the corresponding univariate marginal distributions. The simplest modelling strategy would be to assume that the K error components in the latent regression models are independent, so that the regression system (6) implies K separate ordered regression models of the form H 1 k [P (Y k > j(k) z)] = γ j(k) + z kτ k, (k = 1,..., K, j = 1,, m k 1) which may be estimated as in the univariate case and no additional theory is required. However, there are several reasons for modelling also the association structure of (ɛ 1,..., ɛ K ). Apart from the fact that, by assuming that (ɛ 1,..., ɛ K ) are independent, we are likely to mispecify the true model with a loss of efficiency, the main reason for estimating the whole system of latent equations is that the nature and the degree of association between the response variables (conditionally on the covariates) may be of substantive interest, as in the application considered in this paper. 2.2 Multivariate Link Functions as copulas The probabilities which define the joint density of Y 1,..., Y K conditionally on z, can be displayed in a table with t = K 1 m k cells. It is convenient to arrange these probabilities into the vector π(z) in lexicographic order by letting variables Y k with a larger index k run faster from 1 to m k ; if unrestricted, π(z) belongs to the t-dimensional simplex Π defined by 1 t π(z) = 1 and π(z) 0. As in the univariate case, we are interested in invertible and differentiable mappings which allow to link individual response probabilities to a common vector of parameters π(z) = g[λ(z)] = g(α + Zβ) = g(xψ), (7) where the elements of Z are known functions of the vector of covariates z, X equals to ( I Z ), and ψ = ( α β ) collects the model parameters. In view of the fact that the marginal logits are directly related to the univariate latent regression models, it may be convenient to partition λ(z) into two components. The first one, denoted by λ u (z), contains the s = K k=1 (m k 1) global logits which determine the univariate marginal distributions, while the second one, denoted by λ a (z), contains a suitable subset of the remaining t 1 s parameters which model the association between the K response variables. Thus, it is convenient to rewrite the regression system (7) above as ( λ u ) ( (z) α u + Z u β u ) λ(z) = λ a = (z) α a + Z a β a = Xψ, (8) 4

5 where α u, β u and Z u denote respectively the intercepts, the regression coefficients and the covariate matrix for the set of univariate logits, while α a, β a and Z a refer to those for the association parameters. By modelling the univariate component directly, the association component of the link function outlined above defines a multivariate copula, a conceptual tool for modeling the association among the errors in the K regression equations in a way which treats the response variables in a symmetric fashion. Sklar s Theorem implies that any continuous distribution F may be determined by its marginal distributions F k and a copula C F (u 1,..., u K ) = F (F1 1 (u 1 ),..., FK 1 (u K)), u k [0, 1] for all k; thus a copula describes the association structure of F irrespective of its univariate marginals. Nelsen [14] provides an excellent introduction to copulas and their properties. This tool is particularly appropriate for describing the association between ordinal variables because copulas are invariant to strictly monotone transformations of the random variables; the other basic property of copulas is that univariate marginals and association structure may be modelled separately and then combined. Therefore, the latent regression model (6) above can be described by the set of regression coefficients β k, the threshold parameters γ k, the univariate marginal distribution of the ɛ k and the copula C H which in turn may also depend on covariates. A well known family of parametric copulas is the Gaussian copula: Cρ(u 1,..., u K ) = Ψ K (Ψ 1 1 (u 1),..., Ψ 1 1 (u K), ρ) where Ψ Q denotes the standard Q-variate normal distribution and ρ is the K(K 1)/2 vector of all bivariate correlation coefficients. When the Gaussian copula is combined with K standard normal marginal distributions it gives the multivariate normal distribution, which, when employed in the latent regression model (6) gives rise to the multivariate ordered probit model (sometimes also called the multiresponse ordered probit model). Though this model may look appealing, the set of discrete multivariate distributions which are compatible with this copula is very limited. The reason is that, once the cut points are determined in accordance with the univariate marginals, each discrete bivariate marginal distribution, say (Y k, Y h ), has (m k 1)(m h 1) additional free cells which are not constrained by the univariate marginals. Under the Gaussian copula all of these probabilities must conform to a bivariate normal and are determined by a single additional parameter (the correlation coefficient). This implies a rather strong restriction unless, of course, the underlying response variables are binary. 2.3 The Multivariate Ordered Logit Model The copula that we are going to describe in this section is determined by a suitable set of interaction parameters which, together with the global logits (of which they are the natural extension), constitute a parametrization of a relevant subset of Π which is of more direct interest. The approach that can be derived from this formulation has two main advantages with respect to the Gaussian copula: 1. the parameterization is easily invertible without the need for numerical integration; 2. the dependence structure is more flexible because, in the unrestricted model, the number of parameters that determine each bivariate distribution equals the number of free cells in the corresponding frequency table. 5

6 However, when the association parameters are constrained to be equal, the complexity of our model is identical to that of the Gaussian copula Global interaction parameters Dale (1986) was among the first to consider this parametrization in the bivariate case; an extension to multivariate distributions see Molenberghs and Lesaffre [13]; Bartolucci, Colombi and Forcina [4] provided a general framework for parameterizing discrete distribution with different kind of marginal interaction parameters; some of their results are used here. Loosely speaking, an interaction term is a parameter measuring the association among a set of variables, say I (1,..., K), which may be defined within a marginal distribution, say M, such that I M. In the following we restrict attention to bivariate interactions defined within the corresponding bivariate distribution. The bivariate interactions, which are the key association parameters in our model, are called global log-odds ratios, and, for any two response variables Y i and Y j, dichotomized at the cut point c i and c j respectively, may be written as λ {i j} (c i, c j z) = log ( P r(y i > c i, Y j > c j z)p r(y i c i, Y j c j z) P r(y i c i, Y j > c j z)p r(y i > c i, Y j c j z) ) The link function and its inverse Recall that the multinomial distribution, being a member of the exponential family, may be also parameterized by a vector, say θ(z), of canonical parameters; these are log-linear parameters defined within the overall joint distribution (rather than the corresponding marginals). Though these parameters do not have, usually, a direct interpretation, they can be easily converted into probabilities and provide a useful tool for defining the link function and its inverse. The link function that we propose is such that each univariate distribution Y i is determined by m i 1 global logits, each bivariate distribution Y i, Y j is determined by (m i 1)(m j 1) global interactions and all log-linear interactions of order greater than two are constrained to 0. Let Π D denote the subset of the probability simplex satisfying the above restrictions and v = i (m i 1)+ i>j (m i 1)(m j 1). Bartolucci, Colombi and Forcina [4] provide a simple algorithm for constructing a design matrix G D such that, when θ(z) varies in R v, π(z) = exp[g Dθ(z)] 1 exp[g D θ(z)] varies in Π D ; ([4], Appendix) provide an algorithm for constructing a contrast matrix C and a marginalization matrix M such that ( λ u ) (z) λ(z) = λ a = C log[mπ(z)], π(z) Π D (9) (z) Let L = {λ(z) : λ(z) = C log[mπ(z)] for some π(z) Π D } denote the space of compatible marginal interaction parameters. We can now state the main result of this section: Theorem 1 The mapping defined by (9) from Π D to L is invertible and differentiable and thus defines a proper link function π(z) = g(xψ). 6

7 The theorem is a special case of Theorem 1 in Bartolucci, Colombi and Forcina[4] who consider a more general class of hierarchical parametrization with I M. Unfortunately (9) has no analytical inverse; however a numerical inverse may be computed by a Newton algorithm and is extremely fast and reliable. Lemma 1 The mapping from λ(z) to θ(z) R v, or equivalently π(z) Π D, may be computed by the following algorithm: 1. at the initial step choose a value θ (0) such that λ (0) is sufficiently close to λ(z); 2. at the h-th step update the vector of canonical parameters by the first order approximation θ (h) = θ (h 1) + D[λ(z) λ (h 1) ] where D = θ/ λ = [Cdiag(Mπ) 1 Mdiag(π)G D ] 1 3. iterate until the norm of λ(z) λ (h 1) is close to 0. The Lemma is a direct application of the Newton algorithm. Since the mapping from θ(z) to λ(z) has continuous second order derivatives θ(z) whose elements are finite, the result may be derived, for example, from Theorem 4.4 in Süli and Mayers [18]. 2 Forcina and Dardanoni [9] study in detail the nature of the copula defined by the multivariate ordered logit link function and show that there exists a continuous multivariate latent distribution which has exactly the same parameters as its discrete analog. 2.4 Interpretation of the parameters in terms of stochastic dominance The main parameters of interest in our model are the univariate global logits and the bivariate global log-odds ratios. To understand their economic significance in this section we explore their properties in terms of stochastic dominance Global logits It is interesting to note that the global logits satisfy a stochastic ordering property which seems particularly appropriate when the response variables have an ordinal nature, in the sense that their relevant properties are preserved under arbitrary monotonic transformations: Lemma 2 Given two discrete ordered random variables Y h and Y k in {1,, m}, the following conditions are equivalent: 1. E[u(Y h ) z] E[u(Y k ) z] for any function u which is non decreasing; 2. Y h is stochastically greater than Y k conditionally on z; 3. log[p (Y h > j z)/p (Y h j z)] log[p (Y k > j z)/p (Y k j z)] j < m. 2 In our experience, by setting θ (0) = 0 v, the algorithm always converges as long as λ(z) is not too close to the boundary of the parameter space; this may happen for example when one or more elements are much larger than 20 in modulus. 7

8 Proof The equivalence between the first two conditions is well known (see for example Hadar and Russell [10] Theorems 1 and 2). The equivalence with the third condition follows by noting that global logits are strictly increasing transformations of the cumulative distribution. If we regress a global logit on a given covariate and the regression coefficient is positive, then the response variable becomes stochastically larger whenever that covariate increases; because of this, regression coefficients in the ordered logit regression have a direct interpretation in terms of stochastic dominance Global log-odds ratios The log-odds ratios, which determine the association for any pair of responses, are also closely related to the notion of positive quadrant dependence (P QD), an instance of positive dependence between ordinal variables first introduced by Lehmann [11]: two random variables Y h, Y k taking values in {1,, m h } and {1,, m k } are P QD if P r(y h i, Y k j) P r(y h i)p r(y k j), i {1,, m h }, j {1,, m k }, which intuitively means that, compared to the case of independence, small values of Y h tend to go with small values of Y k. Negative quadrant dependence is defined by reversing the inequality above. The ordinal nature of Y h and Y k seems to motivate the requirement that their relevant properties should be preserved under arbitrary monotonic transformations. The following result in the theory of stochastic orderings links the notion of positive dependence, P QD, to the log-odds ratios: Lemma 3 Given a pair of discrete ordered random variables Y h and Y k taking values in {1,, m h } and {1,, m k }, and any pair of increasing functions u, v, the following conditions are equivalent: 1. Cov[u(Y h ), v(y k ) z] 0; 2. Y h and Y k are P QD conditionally on z; 3. the set of log-odds ratios λ h,k (c i, c j z) 0 for all c i < m h, c j < m k. Proof See Nelsen [14], exercises 5.22 and 5.27, p This result may be interpreted as saying that, if the ordered variables are the discrete version of continuous latent variables discretized at arbitrary thresholds, the log-odds ratios are the most appropriate measure of association, in the sense that the sign of the dependence between the underlying variables is preserved, irrespective of how the ordered categories are constructed. 3 Statistical inference 3.1 Hypotheses of interest A convenient feature of the multivariate ordered regression model defined by equation (8) is that all the relevant hypotheses of interest can be expressed in the form of linear equality and inequality 3 Notice instead that the standard interpretation of ordered logit coefficients (see for example Crawford, Pollak and Vella, 1998), which refers to the density rather than the cumulative distribution of the response variable, implies often a rather convoluted interpretation. 8

9 constraints on the vector of model parameters ψ = ( α β ). A relevant set of testable restrictions consists in assuming that the bivariate association parameters do not depend on the cut points, an assumption which is the multivariate analog of the Plackett distribution. The family of bivariate Plackett distributions, introduced by Plackett [15], has been extended to the multivariate case by Molenberg and Lesaffre [13]. Forcina and Dardanoni [9] discuss the multivariate ordered regression model under the Plackett distribution; that model is a close analog to the multivariate ordered probit model, with the correlation coefficients replaced by the corresponding bivariate log-odds ratios. The advantage of our modeling strategy is that these assumptions are imposed by means of testable restrictions. Linear inequality constraints may be used to test a stochastic dominance effect of certain covariates on a set of latent regressions. We could also be interested to test positive dependence between a pair of responses against conditional independence by imposing that all the (m h 1)(m k 1) log-odds ratios are positive against being zero. Generally speaking, any set of hypothesis of interest may be expressed in the form H : {ψ : Eψ = 0, Uψ 0} by an appropriate choice of the equality and inequality matrices E and U. Clearly, the case with E or U equal to the null matrix correspond to restriction with only inequalities or only equalities respectively. 3.2 Likelihood inference Suppose we have independent observations (Y 1i,..., Y Ki, z i ) for a sample of n units. Let t(z i ) be a vector of size m k made of 0 s except for the element corresponding to the observed combination of (Y 1,..., Y K ) for the ith unit which is equal to 1. To simplify notations, in the sequel we write t(i) instead of t(z i ); a similar convention will be adopted for any vector which depends on z i. Under independent sampling, conditionally on z i, t(i) has a multinomial distribution with vector of probabilities π(i). In order to manipulate the likelihood function more easily, we write the multinomial as an exponential family with the vector of canonical parameters introduced before Lemma 2; in practice, these are all the log-linear interactions which belong to the hierarchical set of marginals D, so that λ(i) has the same dimension as θ(i). The contribution of the ith unit to the log likelihood is L(i) = t(i) log[π(i)] which, by expressing π(i) in terms of the canonical parameters, may be written as If we define D(i) = θ(i) λ(i) = L(i) = t(i) G D θ(i) log[1 exp(g D θ(i))]. [ ] 1 λ(i) = [Cdiag[Mπ(i)] 1 Mdiag[π(i)]G θ(i) D ] 1 the individual score vector is easily computed by differentiating L with respect to the vector of model parameters ψ by the chain rule s(i) = L(i) ψ = λ(i) ψ θ(i) L(i) λ(i) θ(i) = X(i) D(i) G D [t(i) π(i)]. 9

10 The individual contribution to the expected information matrix has also a simple form because E{G D [t(i) π(i)]} = 0 F (i) = E [ 2 ] L(i) = X(i) D(i) G ψ ψ D Ω(i)G D D(i)X(i) where Ω(i) = diag[π(i)) π(i)π(i) ] is the kernel for the variance matrix of the multinomial distributions. Having assumed that the units are independent, the log-likelihood is L(ψ) = i L(i), thus the score vector s(ψ) = L(ψ) ψ and the expected information matrix F (ψ) = E( 2 L(ψ) ) can be obtained ψ ψ by summing over units. Dardanoni, Fiorini and Focina (2012), for the case of two ordinal responses, discuss an approach to likelihood inference similar to the one proposed here. 3.3 Parameter estimation Maximum likelihood estimates of ψ under H can be obtained by an algorithm which extends to inequality constraints the seminal algorithm introduced by Aitchison and Silvey [2]. The Aitchison and Silvey algorithm (see for instance Colombi and Forcina [7]) is based on iterated linear approximations of the regression model onto the space of the canonical parameters which are variation independent; the approximation is updated until convergence. Formally: assign a starting value ψ (0) which produces compatible λ(i) for all units; at the hth step, compute a linear approximation of θ(i) and a quadratic approximation of the log likelihood θ(i) h = θ(i) h 1 + D(i) h 1 [X(i)ψ h λ(i) h 1 ] Q h (ψ) = (ψ b h ) F h 1 (ψ b h )/2 where b h = [F h 1 ] 1 [s h 1 +s h 1 1 ] and s h 1 1 = i X(i) D(i) h 1 G D Ω(i) h 1 G D D(i) h 1 λ(i) h 1 set ψ h to be equal to the constrained maximum of Q h (ψ) under H, iterate until convergence, that is, until the estimate of λ is sufficiently stable and the linear approximation of θ is sufficiently accurate. The starting point must be chosen so that the corresponding λ(i) 0 is compatible for all subjects. This may be achieved by setting to zero the intercepts of association parameters and all the regression coefficients corresponding to the covariates. Notice that, when inequality constraints are present, so that U is not the null matrix, the maximization of Q h (ψ) at each step requires a quadratic optimization which is itself iterative; there are many algorithms for quadratic optimization under inequality constraints, which are usually very fast and reliable. Since the likelihood function and the transformation from θ(i) to λ(i) satisfy the conditions discussed by Aitchison and Silvey ([2] p. 817), it follows that, as n increases, the probability that a constrained maximum exists tends to one. If the algorithm converges, it must converge to a local maximum by the argument of Aitchison and Silvey ([2] p. 826). Notice that our parameterization satisfies the two basic assumptions given by Rao ([16], p.296), namely identifiability and continuity of the transformation from ψ to π. It follows that, provided that 10

11 ψ 0, the true value of ψ under H, is an interior point of the parameter space, the m.l.e. of ψ under H exists, is consistent and has an asymptotic normal distribution. 4 An application to the Positive Correlation Test of asymmetric information in insurance markets There is a constantly growing body of empirical literature studying the existence of asymmetric information in insurance markets (for a review see Cohen and Spiegelman [6] and Einav et al. [8])). Standard economic theory predicts that risk occurrence and insurance coverage are positively correlated, since individuals who know to be riskier tend to buy more coverage (adverse selection) or to consume more for a given structure of the contract (moral hazard). The theoretically predicted positive correlation has inspired the seminal Positive Correlation (PC) test by Chiappori and Salanié [5]. The PC test rejects the null of absence of asymmetric information in a given insurance market when, conditional on consumers characteristics used by insurance companies to price contracts, individuals with more coverage experience more of the insured risk. In their seminal paper Chiappori and Salanié [5] provide simple empirical strategies to test the positive correlation hypothesis when both insurance coverage and risk occurrence are binary variables. The PC test has been applied to hundreds of various insurance markets, including acute health, long-term care, automobile, annuities, life, reverse mortgages and crop. In most applications, its implementation relies on a simple bivariate probit model, where the null of the absence of private information is tested by absence of residual errors correlation. In this application, we explain how our approach can be used to empirically implement the PC test when insurance coverage and risk are ordered categorical. We focus on the Medigap health insurance market in the US. Medigap is a private health insurance designed to cover some gaps in the coverage left by Medicare, which is a public health insurance program which provides coverage for all individuals aged 65 in US. In general the structure of Medicare is such that it leaves beneficiaries at risk for large out-of-pocket expenses. As a result, elderly may purchase voluntary supplemental private policies, such as Medigap, to fill Medicare s gaps in non-covered health care services and limit cost sharing. Medigap insurance market is highly regulated by Federal law, which designed a particular mechanism favoring the insured. In particular, insurance companies must offer a basic plan if they offer any other more generous plan; in addition, there is an enrolment period where insurance companies cannot refuse any insurer even if there are pre-existing conditions. Finally, federal regulation allows insurance company to set premium by individual s age and gender. To study the Medigap health insurance market we use data from the Health and Retirement Study conducted during the year Since we focus specifically on Medigap, we exclude those individuals younger than 65 and that received additional coverage through a former employer, spouse or are covered by some other government agency. We then consider only those who bought deliberately additional coverage. The final sample size is then given by N = 3290 observations. Since the Medigap plans differ on how generous is the coverage, we define the Medigap insurance coverage indicator (Plan) equals 0 if the individual has no coverage, 1 if she is covered by Medigap plan A or B, and 2 if 4 We do not use the last few available waves because there are no specific information on the Medigap plan s letter. 11

12 she is covered by any other more generous Medigap plan. Risk occurrence is measured by the number of doctor visits and hospital admissions in the previous two years. Since only a very small number of elders had no doctor visits, we constructed the variable Doc, which takes 0 if individual had less than five doctor visits, 1 if she had between five and ten, and 2 if she visited a doctor more than ten times. Since a very small number of elders had more than two hospital admissions, we defined the variable Hosp equals 0 if respondent had no hospital admission, 1 if she had one hospital admission and 2 if she had at least two hospital inpatient staying. Finally, given that Federal Law allows insurance companies to set premium according to the insured s age and gender, we use as control only whether the individual is a female (Fem) and 26 age dummy variables ranging from 65 to 90 years old or more. 4.1 Empirical strategy and results Let i = 1,..., N denote individual, z i, denote the vector collecting age and gender of individual i, and P lan i, Doc i, and Hosp i denote insurance coverage, and doctor and hospital use of individual i. We can rewrite model (6) as P lan = α p + z β p + ɛ p Doc = α d + z β d + ɛ d (10) Hosp = α h + z β h + ɛ h where α and β are vectors of unknown parameters, and the errors ɛ k, k = p, d, h have standard logistic marginal distribution but unspecified association structure (copula). Within this structure, the null of the absence of asymmetric information amounts to testing independence of ɛ p, ɛ d and ɛ p, ɛ h. If we assume that ɛ p, ɛ d, ɛ h in model (10) are jointly distributed as multivariate normal with standard marginal distributions we obtain the multivariate ordered probit which can be estimated by simulated Maximum Likelihood using for example the Stata CMP module (Roodman [17]). Estimated coefficients are reported in table 2, and correlation terms are reported in Panel A of table 4. The correlation between P lan and Doc strongly rejects the null of asymmetric information; on the other hand, the null of no asymmetric information between P lan and Hosp has a p-value of As comparison, table 3 and Panel B of table 4 report estimated coefficients of our multivariate logit model with the Placket restriction, which has the same complexity (same number of parameters) of the multivariate probit since it restricts each bivariate association to one single parameter. A glance of the tables reveals that the two models (probit and Plackett) give a very similar qualitative picture. Allowing for the different scale, the covariates parameters follow very similar patterns, and, more importantly, association coefficients (correlation coefficients for the probit and log-odds ratios for the logit) have very similar z-ratios and significance. Thus, the main advantage of our model is that it does not require use of simulated methods for estimation, and tends to be more accurate and faster. 5 Both the multivariate probit and the multivariate logit with Plackett restrictions however suffer from the limitation of imposing a restrictive structure to the bivariate associations of interest. To relax the Plackett assumption, we allow λ {p,d}, λ {p,h} and λ {d,h} to vary across the categories of P lan, Doc and Hosp. Since these variables have three categories, this implies estimating = 12 association parameters rather then three with the Palckett restriction. Estimated parameters for this 5 On a 2.4 GHz P8600 processor, estimating the logit model took about 178 seconds, which is about half the time the time (358 seconds) employed estimating the CMP module with the default number of draws set by the program (115)). 12

13 model are reported in table 4 and in Panel C of table 1. A glance at Panel C of table 1 reveals that, while the 4 association coefficients for the two health care variables Doc and Hosp are similar across categories, they differ in the case of coverage/risk association P lan Doc and P lan Hosp. A formal test of the Plackett assumption λ {p,d} (1, 1) = λ {p,d} (1, 2) = λ {p,d} (2, 1) = λ {p,d} (2, 2) and λ {p,h} (1, 1) = λ {p,h} (1, 2) = λ {p,h} (2, 1) = λ {p,h} (2, 2) has LR test statistic equal to and is asymptotically distributed as a χ 2 with 12 3 = 9 dof. Therefore the null is overwhelmingly rejected with a p-value equal to Panel A Table 1: Estimated correlation terms Plan-Doc Plan-Hosp Doc-Hosp Coef. S.E. Coef. S.E. Coef. S.E. ρ (0.0271) (0.0307) Panel B λ (0.0783) (0.0911) (0.0725) Panel C λ(1, 1) (0.0903) (0.0936) (0.0872) λ(1, 2) (0.0951) (0.139) (0.137) λ(2, 1) (0.106) (0.107) (0.0850) λ(2, 2) (0.110) (0.164) (0.116) Notes: For all equations control variables are age and gender dummies. Omitted age category is 65 years old. Estimated parameters reveal that the coverage/risk correlation is not homogeneous across coverage and risk categories. In particular, association between coverage and risk, for both doctor visits and hospital stays, is significantly positive only for moderate levels of health care use. In other words, conditional on age and gender, the null of no asymmetric information cannot be rejected if actual risk is defined as heavy use of health resources. Our results show that allowing association to vary across categories may provide a clearer picture of the effect underlying individual s heterogeneity. Recall, however, that finding residual coverage/risk correlation does not necessarily help to understand whether this is due to the structure of the contract (moral hazard) or rather to the existence of unpriced (by the insurer) individual risk (adverse selection). References [1] Agresti, A. (2002). Categorical data analysis. Wiley-Interscience. [2] Aitchison, J. and Silvey, S. D. (1958). Maximum-likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics 29: pp [3] Amemiya, T. (1985). Advanced econometrics. Harvard University Press. 6 For robustness we have also computed the LR test relaxing the equality assumptions separately. The LR test statistics for the null of equal λs between P lan Doc and P lan Hosp are equal to and 9.47, which are rejected with a p-value of and respectively. On the contrary the LR test statistics for the null of equal λs between doc and hosp is equal to , which is not rejected with a p-value

14 [4] Bartolucci, F., Colombi, R. and Forcina, A. (2007). An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Statistica Sinica 17: 691. [5] Chiappori, P.-A. and Salanié, B. (2000). Testing for asymmetric information in insurance markets. The Journal of Political Economy 108: [6] Cohen, A. and Spiegelman, P. (2010). Testing for adverse selection in insurance markets. Journal of Risk and Insurance 77: [7] Colombi, R. and Forcina, A. (2001). Marginal regression models for the analysis of positive association of ordinal response variables. Biometrika 88: pp [8] Einav, L., Finkelstein, A. and Levin, J. (2010). Beyond testing: Empirical models of insurance markets. Annual Review of Economics 2: [9] Forcina, A. and Dardanoni, V. (2008). Regression models for multivariate ordered responses via the plackett distribution. Journal of Multivariate Analysis 99: [10] Hadar, J. and Russell, W. R. (1969). Rules for ordering uncertain prospects. The American Economic Review 59: pp [11] Lehmann, E. L. (1966). Some concepts of dependence. The Annals of Mathematical Statistics 37: pp [12] McCullagh, P. and Nelder, J. (1989). Generalized linear models (Monographs on statistics and applied probability 37). Chapman Hall, London. [13] Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinal data using a multivariate plackett distribution. Journal of the American Statistical Association 89: pp [14] Nelsen, R. (2006). An introduction to copulas. Springer Verlag. [15] Plackett, R. L. (1965). A class of bivariate distributions. Journal of the American Statistical Association 60: pp [16] Rao, C. (1973). Linear statistical inference and its applications. Wiley (New York). [17] Roodman, D. (2011). Fitting fully observed recursive mixed-process models with cmp. Stata Journal 11: (48). [18] Süli, E. and Mayers, D. (2003). An introduction to numerical analysis. Cambridge University Press. [19] Wooldridge, J. (2002). Econometric analysis of cross section and panel data. The MIT press. 5 Appendix 14

15 Table 2: Estimated α and β parameters for the multivariate probit model Plan Doc Hosp Coef. S.E. Coef. S.E. Coef. S.E. /cut (0.0972) (0.0832) (0.0958) /cut (0.0978) (0.0843) (0.0977) aged (0.141) (0.115) (0.134) aged (0.129) (0.111) (0.131) aged (0.130) (0.112) (0.131) aged (0.138) (0.115) (0.131) aged (0.134) (0.113) (0.134) aged (0.135) (0.115) (0.131) aged (0.137) (0.117) (0.137) aged (0.145) (0.123) (0.137) aged (0.142) (0.120) (0.141) aged (0.160) (0.128) (0.144) aged (0.159) (0.131) (0.147) aged (0.155) (0.129) (0.148) aged (0.149) (0.127) (0.143) aged (0.165) (0.138) (0.157) aged (0.172) (0.132) (0.146) aged (0.184) (0.137) (0.156) aged (0.178) (0.151) (0.165) aged (0.202) (0.148) (0.163) aged (0.196) (0.154) (0.166) aged (0.183) (0.158) (0.169) aged (0.197) (0.166) (0.186) aged (0.214) (0.183) (0.188) aged (0.226) (0.190) (0.213) aged (0.382) (0.241) (0.285) aged (0.156) (0.123) (0.134) fem (0.0492) (0.0404) (0.0455) 15

16 Table 3: Estimated α and β parameters for the Plackett model Plan Doc Hosp Coef. S.E. Coef. S.E. Coef. S.E. /cut (0.0434) (0.0354) (0.0393) /cut (0.0505) (0.0396) (0.0571) aged (0.251) (0.188) (0.238) aged (0.220) (0.181) (0.230) aged (0.223) (0.184) (0.228) aged (0.241) (0.188) (0.227) aged (0.233) (0.186) (0.238) aged (0.230) (0.189) (0.227) aged (0.234) (0.192) (0.238) aged (0.250) (0.200) (0.233) aged (0.246) (0.198) (0.245) aged (0.284) (0.212) (0.247) aged (0.278) (0.215) (0.251) aged (0.272) (0.212) (0.257) aged (0.254) (0.209) (0.247) aged (0.287) (0.226) (0.272) aged (0.315) (0.215) (0.249) aged (0.343) (0.226) (0.268) aged (0.306) (0.245) (0.276) aged (0.381) (0.240) (0.273) aged (0.356) (0.253) (0.279) aged (0.313) (0.259) (0.284) aged (0.348) (0.274) (0.318) aged (0.370) (0.297) (0.313) aged (0.397) (0.312) (0.364) aged (0.773) (0.385) (0.526) aged (0.281) (0.203) (0.228) fem (0.0861) (0.0659) (0.0779) 16

17 Table 4: Estimated α and β parameters Plan Doc Hosp Coef. S.E. Coef. S.E. Coef. S.E. /cut (0.0434) (0.0355) (0.0394) /cut (0.0505) (0.0397) (0.0572) aged (0.251) (0.188) (0.238) aged (0.219) (0.181) (0.231) aged (0.223) (0.183) (0.229) aged (0.240) (0.188) (0.228) aged (0.233) (0.186) (0.238) aged (0.229) (0.189) (0.228) aged (0.234) (0.192) (0.239) aged (0.248) (0.200) (0.233) aged (0.246) (0.197) (0.245) aged (0.286) (0.211) (0.247) aged (0.279) (0.215) (0.251) aged (0.272) (0.212) (0.257) aged (0.254) (0.209) (0.247) aged (0.288) (0.226) (0.271) aged (0.316) (0.215) (0.249) aged (0.345) (0.226) (0.268) aged (0.304) (0.245) (0.276) aged (0.378) (0.239) (0.273) aged (0.354) (0.253) (0.279) aged (0.311) (0.258) (0.284) aged (0.351) (0.273) (0.316) aged (0.371) (0.296) (0.312) aged (0.395) (0.312) (0.363) aged (0.750) (0.384) (0.522) aged (0.283) (0.202) (0.228) fem (0.0858) (0.0659) (0.0779) 17

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

The Effect Of Supplemental Insurance On Health Care Demand With Multiple Information: A Latent Class Analysis

The Effect Of Supplemental Insurance On Health Care Demand With Multiple Information: A Latent Class Analysis HEDG Working Paper 09/03 The Effect Of Supplemental Insurance On Health Care Demand With Multiple Information: A Latent Class Analysis VALENTINO DARDANONI PAOLO LI DONNI March 2009 ISSN 1751-1976 york.ac.uk/res/herc/hedgwp

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

EIEF Working Paper 03/12 February 2012. Incentive and Selection Effects of Medigap Insurance on Inpatient Care

EIEF Working Paper 03/12 February 2012. Incentive and Selection Effects of Medigap Insurance on Inpatient Care EIEF WORKING PAPER series IEF Einaudi Institute for Economics and Finance EIEF Working Paper 03/12 February 2012 Incentive and Selection Effects of Medigap Insurance on Inpatient Care by Valentino Dardanoni

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

A GENERALIZED DEFINITION OF THE POLYCHORIC CORRELATION COEFFICIENT

A GENERALIZED DEFINITION OF THE POLYCHORIC CORRELATION COEFFICIENT A GENERALIZED DEFINITION OF THE POLYCHORIC CORRELATION COEFFICIENT JOAKIM EKSTRÖM Abstract. The polychoric correlation coefficient is a measure of association for ordinal variables which rests upon an

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

You Are What You Bet: Eliciting Risk Attitudes from Horse Races

You Are What You Bet: Eliciting Risk Attitudes from Horse Races You Are What You Bet: Eliciting Risk Attitudes from Horse Races Pierre-André Chiappori, Amit Gandhi, Bernard Salanié and Francois Salanié March 14, 2008 What Do We Know About Risk Preferences? Not that

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

A revisit of the hierarchical insurance claims modeling

A revisit of the hierarchical insurance claims modeling A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

Financial Vulnerability Index (IMPACT)

Financial Vulnerability Index (IMPACT) Household s Life Insurance Demand - a Multivariate Two Part Model Edward (Jed) W. Frees School of Business, University of Wisconsin-Madison July 30, 1 / 19 Outline 1 2 3 4 2 / 19 Objective To understand

More information

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

A hidden Markov model for criminal behaviour classification

A hidden Markov model for criminal behaviour classification RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS

FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS FULLY MODIFIED OLS FOR HEEROGENEOUS COINEGRAED PANELS Peter Pedroni ABSRAC his chapter uses fully modified OLS principles to develop new methods for estimating and testing hypotheses for cointegrating

More information

Multiple Choice Models II

Multiple Choice Models II Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Longitudinal Meta-analysis

Longitudinal Meta-analysis Quality & Quantity 38: 381 389, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. 381 Longitudinal Meta-analysis CORA J. M. MAAS, JOOP J. HOX and GERTY J. L. M. LENSVELT-MULDERS Department

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Note on growth and growth accounting

Note on growth and growth accounting CHAPTER 0 Note on growth and growth accounting 1. Growth and the growth rate In this section aspects of the mathematical concept of the rate of growth used in growth models and in the empirical analysis

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

More information

General Sampling Methods

General Sampling Methods General Sampling Methods Reference: Glasserman, 2.2 and 2.3 Claudio Pacati academic year 2016 17 1 Inverse Transform Method Assume U U(0, 1) and let F be the cumulative distribution function of a distribution

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

How To Test Granger Causality Between Time Series

How To Test Granger Causality Between Time Series A general statistical framework for assessing Granger causality The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

State Space Time Series Analysis

State Space Time Series Analysis State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

More information

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NAG C Library Chapter Introduction. g08 Nonparametric Statistics g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric

More information

Linear Models for Continuous Data

Linear Models for Continuous Data Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 7: Conditionally Positive Definite Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information