Multivariate Ordered Regression

Transcription

1 Multivariate Ordered Regression February 28, 2012 Valentino Dardanoni, Antonio Forcina, Paolo Li Donni ABSTRACT TO BE WRITTEN JEL codes: Keywords: 1 Introduction Many interesting problems in economics involve the study of how an ordered response variable depends on a set of regressors. In many situations we may be concerned with the more general problem of modeling how the joint distribution of several closely related ordered response variables (Y 1,..., Y K ) depend on a vector of covariates z. Problems of this kind arise, for example, when we consider different choices made by each subject at a given point in time, or repeated choices on the same item made by each subject at different points in time. 1 In the microeconometric literature, the current approach to modelling the joint conditional distribution of ordinal response variables relies on assuming the existence of K latent variables which form a regression system Y k = z kτ k + ɛ k, k = 1,..., K where z k denotes the subset of regressors z relevant to the kth equation; a set of threshold parameters transform the continuous latent distribution into the actual discrete one. In addition, the joint distribution for the errors H(ɛ 1,..., ɛ K ) may also be specified. An alternative approach is to consider the conditional distribution of the ordered response variables as a multi-way table of joint probabilities, to arrange these into the vector π(z) and to define a suitable multivariate transformation of π(z), known as link function, which makes the dependence on covariates linear π(z) = g[λ(z)] = g(α + Zβ); (1) here Z is a matrix of known constants which depends on z and the vector of parameters λ(z) describe relevant aspects of (Y 1,..., Y K z) which have, hopefully, interpretation in terms of economic theory. Models of this kind have, essentially, two components: 1 If data involve observations taken at different points in time, the model described in this paper can be seen as a static panel model with free correlations across periods. With panel data, more specific approaches have been designed to take into account individual heterogeneity, lagged dependent variables and serial correlation. 1

2 The regression model λ(z) = α + Zβ: This is the parametric component of the model. The link function π(z) = g[λ(z)]: This is the potentially non parametric component of the model. It maps the linear function of covariates λ(z) onto a discrete joint distribution. Equation (1) defines a wide class of multivariate ordered regression models, whose elements are characterized by the specific link function g. Well known examples are the log-linear and the probit links. The log-linear link function (see e.g. Amemiya ([3], Chapter 9) or Agresti ([1], Chapter 4), in spite of its appealing simplicity, does not allow to model the univariate marginals directly, cannot take into account the ordered nature of the response variables, and does not have a latent variable interpretation. On the other hand, the so called multivariate ordered probit model, which is based on the probit link, does not suffer from any of these limitations. However, because of its simplicity, the probit link imposes some strong restrictions to the association structure of the response variables, namely (i) each bivariate marginal distribution has only one association parameter; and (ii) all log-linear interactions of order higher than two are constrained to zero. In addition, the process of fitting models based on the probit link may be computationally demanding when there are several response variables. Within the class of model defined by equation (1), we are typically interested in link functions whose elements have, possibly, the following desirable features: they describe relevant aspects of (Y 1,..., Y K ) z which have a substantial economic interpretation; they take into account the ordered nature of the response variables; they make it possible to model the univariate marginals and the association structure directly; they have an interpretation in terms of the system of latent equations. By exploiting recent advances in the theory of marginal modelling (Bartolucci, Colombi and Forcina, [4]), we propose a link function having most of the desirable properties listed above. The available theory of marginal models can handle a link function where the association structure is unrestricted, however this would make the exposition heavier with little practical advantages. Though the link function we discuss in this paper implies certain mild constraints on the association structure, it may be further simplified by testable restrictions. In this formulation, the table π(z) which arrays the joint probabilties of the response variables (Y 1,..., Y K ) z is decomposed into two main sets of parameters of interest, namely the global logits, which model the marginal distribution of ordered discrete variables, and the global log-odds ratios, which describe their association. As we show in section 2.4 below, these parameters have a natural interpretation in terms of stochastic dominance, thus allowing a clear economic intepretation of estimated parameters. In Section 2 we derive the main properties of the marginal modeling approach and its relation with the latent variable model. Section 3 discusses the computational properties of maximum likelihood estimates, and the asymptotic distribution of the likelihood ratio under suitable equality and inequality constraints. Section 4 clarifies the use of these models with an application to testing for asymmetric information in the Medigap insurance market. 2

3 2 Ordered regression In order to help the understanding of our approach to the multivariate case, we start by briefly reviewing the well known univariate case with a single ordinal response variable Y, taking value in {1,..., m}. Let q(z) be the m 1 1 vector denoting the survival function of Y, conditionally on z: q(z) = ( P r(y > 1 z)... P r(y > m 1 z) ) and notice that the term P r(y > 0 z) = 1 can be omitted. A generalized ordered regression model is an equation that relates q(z) to z through a vector valued function g : R m 1 (0, 1) m 1, known as a link function (McCullagh and Nelder [12]) which is assumed to be invertible and twice differentiable and such that q(z) = g(α + Zβ) (2) where Z is a matrix of known constants which depend on z. Since the link function is invertible, there is a well defined set of m 1 parameters which are linearly related to Z: λ(z) = g 1 (q(z)) = α + Zβ. (3) It is well known (see, for example Wooldridge [19], p. 457) that, under a few additional assumptions, the ordered regression model (2) is equivalent to assume the existence of a continuous latent variable Y which follows a linear regression model Y = τ 0 + z τ + ɛ (4) where the error ɛ is independent of z and has cumulative distribution function P (ɛ v) = G(v), and there is a vector of m 1 unknown parameters γ (called thresholds), with γ 1 < < γ m 1, such that Y j Y γ j. Notice that, since we can add an arbitrary constant to τ 0 and subtract the same constant from γ without affecting α, these two parameters cannot be identified simultaneously; in the sequel we assume that τ 0 = 0 so that the regression model in (4) has no intercept. When ɛ has a standard logistic distribution, the latent regression model (4) is equivalent to an ordered regression model where P r(ɛ v) = G(v) = e v /(1 + e v ). By inverting the survival function P r(y > j) = G( γ j + z β), it follows that λ(z) is a vector of so-called global logits which are linearly related to z λ j (z) = log[p r(y > j z)/p r(y j z)] = ( γ j ) + z β, j = 1,..., m 1, (5) and can be seen as the natural generalization of the standard binary logits when the variable is ordered. In fact, global logits can be interpreted as binary logits computed after dichotomizing the response categories at each cut point into a low and a high level. The standard ordered logit model can be seen as a special case of the ordered regression model (2) with G being the logit link. Different choices of the link function G give raise to different assumptions on the distribution of the latent regression error ɛ and the definition of the parameter vector λ(z). The best known alternative to the ordered logit model is of course the ordered probit model, where G is the standardized normal cdf. 3

4 2.1 Multivariate ordered regression The univariate latent regression model (4) may be extended to the multivariate case by assuming, in addition the existence of K latent continuous variables Yk, k = 1,..., K, such that Y k = z kτ k + ɛ k, k = 1,..., K, (6) the existence of a joint distribution for the errors ɛ 1,..., ɛ K. It then becomes a seemingly unrelated regression system formulated in terms of latent variables. Let H denote the joint distribution of the errors so that H(a 1,..., a K ) = P (ɛ 1 a 1,..., ɛ K a K ), and let H k, k = 1,..., K denote the corresponding univariate marginal distributions. The simplest modelling strategy would be to assume that the K error components in the latent regression models are independent, so that the regression system (6) implies K separate ordered regression models of the form H 1 k [P (Y k > j(k) z)] = γ j(k) + z kτ k, (k = 1,..., K, j = 1,, m k 1) which may be estimated as in the univariate case and no additional theory is required. However, there are several reasons for modelling also the association structure of (ɛ 1,..., ɛ K ). Apart from the fact that, by assuming that (ɛ 1,..., ɛ K ) are independent, we are likely to mispecify the true model with a loss of efficiency, the main reason for estimating the whole system of latent equations is that the nature and the degree of association between the response variables (conditionally on the covariates) may be of substantive interest, as in the application considered in this paper. 2.2 Multivariate Link Functions as copulas The probabilities which define the joint density of Y 1,..., Y K conditionally on z, can be displayed in a table with t = K 1 m k cells. It is convenient to arrange these probabilities into the vector π(z) in lexicographic order by letting variables Y k with a larger index k run faster from 1 to m k ; if unrestricted, π(z) belongs to the t-dimensional simplex Π defined by 1 t π(z) = 1 and π(z) 0. As in the univariate case, we are interested in invertible and differentiable mappings which allow to link individual response probabilities to a common vector of parameters π(z) = g[λ(z)] = g(α + Zβ) = g(xψ), (7) where the elements of Z are known functions of the vector of covariates z, X equals to ( I Z ), and ψ = ( α β ) collects the model parameters. In view of the fact that the marginal logits are directly related to the univariate latent regression models, it may be convenient to partition λ(z) into two components. The first one, denoted by λ u (z), contains the s = K k=1 (m k 1) global logits which determine the univariate marginal distributions, while the second one, denoted by λ a (z), contains a suitable subset of the remaining t 1 s parameters which model the association between the K response variables. Thus, it is convenient to rewrite the regression system (7) above as ( λ u ) ( (z) α u + Z u β u ) λ(z) = λ a = (z) α a + Z a β a = Xψ, (8) 4

5 where α u, β u and Z u denote respectively the intercepts, the regression coefficients and the covariate matrix for the set of univariate logits, while α a, β a and Z a refer to those for the association parameters. By modelling the univariate component directly, the association component of the link function outlined above defines a multivariate copula, a conceptual tool for modeling the association among the errors in the K regression equations in a way which treats the response variables in a symmetric fashion. Sklar s Theorem implies that any continuous distribution F may be determined by its marginal distributions F k and a copula C F (u 1,..., u K ) = F (F1 1 (u 1 ),..., FK 1 (u K)), u k [0, 1] for all k; thus a copula describes the association structure of F irrespective of its univariate marginals. Nelsen [14] provides an excellent introduction to copulas and their properties. This tool is particularly appropriate for describing the association between ordinal variables because copulas are invariant to strictly monotone transformations of the random variables; the other basic property of copulas is that univariate marginals and association structure may be modelled separately and then combined. Therefore, the latent regression model (6) above can be described by the set of regression coefficients β k, the threshold parameters γ k, the univariate marginal distribution of the ɛ k and the copula C H which in turn may also depend on covariates. A well known family of parametric copulas is the Gaussian copula: Cρ(u 1,..., u K ) = Ψ K (Ψ 1 1 (u 1),..., Ψ 1 1 (u K), ρ) where Ψ Q denotes the standard Q-variate normal distribution and ρ is the K(K 1)/2 vector of all bivariate correlation coefficients. When the Gaussian copula is combined with K standard normal marginal distributions it gives the multivariate normal distribution, which, when employed in the latent regression model (6) gives rise to the multivariate ordered probit model (sometimes also called the multiresponse ordered probit model). Though this model may look appealing, the set of discrete multivariate distributions which are compatible with this copula is very limited. The reason is that, once the cut points are determined in accordance with the univariate marginals, each discrete bivariate marginal distribution, say (Y k, Y h ), has (m k 1)(m h 1) additional free cells which are not constrained by the univariate marginals. Under the Gaussian copula all of these probabilities must conform to a bivariate normal and are determined by a single additional parameter (the correlation coefficient). This implies a rather strong restriction unless, of course, the underlying response variables are binary. 2.3 The Multivariate Ordered Logit Model The copula that we are going to describe in this section is determined by a suitable set of interaction parameters which, together with the global logits (of which they are the natural extension), constitute a parametrization of a relevant subset of Π which is of more direct interest. The approach that can be derived from this formulation has two main advantages with respect to the Gaussian copula: 1. the parameterization is easily invertible without the need for numerical integration; 2. the dependence structure is more flexible because, in the unrestricted model, the number of parameters that determine each bivariate distribution equals the number of free cells in the corresponding frequency table. 5

6 However, when the association parameters are constrained to be equal, the complexity of our model is identical to that of the Gaussian copula Global interaction parameters Dale (1986) was among the first to consider this parametrization in the bivariate case; an extension to multivariate distributions see Molenberghs and Lesaffre [13]; Bartolucci, Colombi and Forcina [4] provided a general framework for parameterizing discrete distribution with different kind of marginal interaction parameters; some of their results are used here. Loosely speaking, an interaction term is a parameter measuring the association among a set of variables, say I (1,..., K), which may be defined within a marginal distribution, say M, such that I M. In the following we restrict attention to bivariate interactions defined within the corresponding bivariate distribution. The bivariate interactions, which are the key association parameters in our model, are called global log-odds ratios, and, for any two response variables Y i and Y j, dichotomized at the cut point c i and c j respectively, may be written as λ {i j} (c i, c j z) = log ( P r(y i > c i, Y j > c j z)p r(y i c i, Y j c j z) P r(y i c i, Y j > c j z)p r(y i > c i, Y j c j z) ) The link function and its inverse Recall that the multinomial distribution, being a member of the exponential family, may be also parameterized by a vector, say θ(z), of canonical parameters; these are log-linear parameters defined within the overall joint distribution (rather than the corresponding marginals). Though these parameters do not have, usually, a direct interpretation, they can be easily converted into probabilities and provide a useful tool for defining the link function and its inverse. The link function that we propose is such that each univariate distribution Y i is determined by m i 1 global logits, each bivariate distribution Y i, Y j is determined by (m i 1)(m j 1) global interactions and all log-linear interactions of order greater than two are constrained to 0. Let Π D denote the subset of the probability simplex satisfying the above restrictions and v = i (m i 1)+ i>j (m i 1)(m j 1). Bartolucci, Colombi and Forcina [4] provide a simple algorithm for constructing a design matrix G D such that, when θ(z) varies in R v, π(z) = exp[g Dθ(z)] 1 exp[g D θ(z)] varies in Π D ; ([4], Appendix) provide an algorithm for constructing a contrast matrix C and a marginalization matrix M such that ( λ u ) (z) λ(z) = λ a = C log[mπ(z)], π(z) Π D (9) (z) Let L = {λ(z) : λ(z) = C log[mπ(z)] for some π(z) Π D } denote the space of compatible marginal interaction parameters. We can now state the main result of this section: Theorem 1 The mapping defined by (9) from Π D to L is invertible and differentiable and thus defines a proper link function π(z) = g(xψ). 6

7 The theorem is a special case of Theorem 1 in Bartolucci, Colombi and Forcina[4] who consider a more general class of hierarchical parametrization with I M. Unfortunately (9) has no analytical inverse; however a numerical inverse may be computed by a Newton algorithm and is extremely fast and reliable. Lemma 1 The mapping from λ(z) to θ(z) R v, or equivalently π(z) Π D, may be computed by the following algorithm: 1. at the initial step choose a value θ (0) such that λ (0) is sufficiently close to λ(z); 2. at the h-th step update the vector of canonical parameters by the first order approximation θ (h) = θ (h 1) + D[λ(z) λ (h 1) ] where D = θ/ λ = [Cdiag(Mπ) 1 Mdiag(π)G D ] 1 3. iterate until the norm of λ(z) λ (h 1) is close to 0. The Lemma is a direct application of the Newton algorithm. Since the mapping from θ(z) to λ(z) has continuous second order derivatives θ(z) whose elements are finite, the result may be derived, for example, from Theorem 4.4 in Süli and Mayers [18]. 2 Forcina and Dardanoni [9] study in detail the nature of the copula defined by the multivariate ordered logit link function and show that there exists a continuous multivariate latent distribution which has exactly the same parameters as its discrete analog. 2.4 Interpretation of the parameters in terms of stochastic dominance The main parameters of interest in our model are the univariate global logits and the bivariate global log-odds ratios. To understand their economic significance in this section we explore their properties in terms of stochastic dominance Global logits It is interesting to note that the global logits satisfy a stochastic ordering property which seems particularly appropriate when the response variables have an ordinal nature, in the sense that their relevant properties are preserved under arbitrary monotonic transformations: Lemma 2 Given two discrete ordered random variables Y h and Y k in {1,, m}, the following conditions are equivalent: 1. E[u(Y h ) z] E[u(Y k ) z] for any function u which is non decreasing; 2. Y h is stochastically greater than Y k conditionally on z; 3. log[p (Y h > j z)/p (Y h j z)] log[p (Y k > j z)/p (Y k j z)] j < m. 2 In our experience, by setting θ (0) = 0 v, the algorithm always converges as long as λ(z) is not too close to the boundary of the parameter space; this may happen for example when one or more elements are much larger than 20 in modulus. 7

8 Proof The equivalence between the first two conditions is well known (see for example Hadar and Russell [10] Theorems 1 and 2). The equivalence with the third condition follows by noting that global logits are strictly increasing transformations of the cumulative distribution. If we regress a global logit on a given covariate and the regression coefficient is positive, then the response variable becomes stochastically larger whenever that covariate increases; because of this, regression coefficients in the ordered logit regression have a direct interpretation in terms of stochastic dominance Global log-odds ratios The log-odds ratios, which determine the association for any pair of responses, are also closely related to the notion of positive quadrant dependence (P QD), an instance of positive dependence between ordinal variables first introduced by Lehmann [11]: two random variables Y h, Y k taking values in {1,, m h } and {1,, m k } are P QD if P r(y h i, Y k j) P r(y h i)p r(y k j), i {1,, m h }, j {1,, m k }, which intuitively means that, compared to the case of independence, small values of Y h tend to go with small values of Y k. Negative quadrant dependence is defined by reversing the inequality above. The ordinal nature of Y h and Y k seems to motivate the requirement that their relevant properties should be preserved under arbitrary monotonic transformations. The following result in the theory of stochastic orderings links the notion of positive dependence, P QD, to the log-odds ratios: Lemma 3 Given a pair of discrete ordered random variables Y h and Y k taking values in {1,, m h } and {1,, m k }, and any pair of increasing functions u, v, the following conditions are equivalent: 1. Cov[u(Y h ), v(y k ) z] 0; 2. Y h and Y k are P QD conditionally on z; 3. the set of log-odds ratios λ h,k (c i, c j z) 0 for all c i < m h, c j < m k. Proof See Nelsen [14], exercises 5.22 and 5.27, p This result may be interpreted as saying that, if the ordered variables are the discrete version of continuous latent variables discretized at arbitrary thresholds, the log-odds ratios are the most appropriate measure of association, in the sense that the sign of the dependence between the underlying variables is preserved, irrespective of how the ordered categories are constructed. 3 Statistical inference 3.1 Hypotheses of interest A convenient feature of the multivariate ordered regression model defined by equation (8) is that all the relevant hypotheses of interest can be expressed in the form of linear equality and inequality 3 Notice instead that the standard interpretation of ordered logit coefficients (see for example Crawford, Pollak and Vella, 1998), which refers to the density rather than the cumulative distribution of the response variable, implies often a rather convoluted interpretation. 8

9 constraints on the vector of model parameters ψ = ( α β ). A relevant set of testable restrictions consists in assuming that the bivariate association parameters do not depend on the cut points, an assumption which is the multivariate analog of the Plackett distribution. The family of bivariate Plackett distributions, introduced by Plackett [15], has been extended to the multivariate case by Molenberg and Lesaffre [13]. Forcina and Dardanoni [9] discuss the multivariate ordered regression model under the Plackett distribution; that model is a close analog to the multivariate ordered probit model, with the correlation coefficients replaced by the corresponding bivariate log-odds ratios. The advantage of our modeling strategy is that these assumptions are imposed by means of testable restrictions. Linear inequality constraints may be used to test a stochastic dominance effect of certain covariates on a set of latent regressions. We could also be interested to test positive dependence between a pair of responses against conditional independence by imposing that all the (m h 1)(m k 1) log-odds ratios are positive against being zero. Generally speaking, any set of hypothesis of interest may be expressed in the form H : {ψ : Eψ = 0, Uψ 0} by an appropriate choice of the equality and inequality matrices E and U. Clearly, the case with E or U equal to the null matrix correspond to restriction with only inequalities or only equalities respectively. 3.2 Likelihood inference Suppose we have independent observations (Y 1i,..., Y Ki, z i ) for a sample of n units. Let t(z i ) be a vector of size m k made of 0 s except for the element corresponding to the observed combination of (Y 1,..., Y K ) for the ith unit which is equal to 1. To simplify notations, in the sequel we write t(i) instead of t(z i ); a similar convention will be adopted for any vector which depends on z i. Under independent sampling, conditionally on z i, t(i) has a multinomial distribution with vector of probabilities π(i). In order to manipulate the likelihood function more easily, we write the multinomial as an exponential family with the vector of canonical parameters introduced before Lemma 2; in practice, these are all the log-linear interactions which belong to the hierarchical set of marginals D, so that λ(i) has the same dimension as θ(i). The contribution of the ith unit to the log likelihood is L(i) = t(i) log[π(i)] which, by expressing π(i) in terms of the canonical parameters, may be written as If we define D(i) = θ(i) λ(i) = L(i) = t(i) G D θ(i) log[1 exp(g D θ(i))]. [ ] 1 λ(i) = [Cdiag[Mπ(i)] 1 Mdiag[π(i)]G θ(i) D ] 1 the individual score vector is easily computed by differentiating L with respect to the vector of model parameters ψ by the chain rule s(i) = L(i) ψ = λ(i) ψ θ(i) L(i) λ(i) θ(i) = X(i) D(i) G D [t(i) π(i)]. 9

10 The individual contribution to the expected information matrix has also a simple form because E{G D [t(i) π(i)]} = 0 F (i) = E [ 2 ] L(i) = X(i) D(i) G ψ ψ D Ω(i)G D D(i)X(i) where Ω(i) = diag[π(i)) π(i)π(i) ] is the kernel for the variance matrix of the multinomial distributions. Having assumed that the units are independent, the log-likelihood is L(ψ) = i L(i), thus the score vector s(ψ) = L(ψ) ψ and the expected information matrix F (ψ) = E( 2 L(ψ) ) can be obtained ψ ψ by summing over units. Dardanoni, Fiorini and Focina (2012), for the case of two ordinal responses, discuss an approach to likelihood inference similar to the one proposed here. 3.3 Parameter estimation Maximum likelihood estimates of ψ under H can be obtained by an algorithm which extends to inequality constraints the seminal algorithm introduced by Aitchison and Silvey [2]. The Aitchison and Silvey algorithm (see for instance Colombi and Forcina [7]) is based on iterated linear approximations of the regression model onto the space of the canonical parameters which are variation independent; the approximation is updated until convergence. Formally: assign a starting value ψ (0) which produces compatible λ(i) for all units; at the hth step, compute a linear approximation of θ(i) and a quadratic approximation of the log likelihood θ(i) h = θ(i) h 1 + D(i) h 1 [X(i)ψ h λ(i) h 1 ] Q h (ψ) = (ψ b h ) F h 1 (ψ b h )/2 where b h = [F h 1 ] 1 [s h 1 +s h 1 1 ] and s h 1 1 = i X(i) D(i) h 1 G D Ω(i) h 1 G D D(i) h 1 λ(i) h 1 set ψ h to be equal to the constrained maximum of Q h (ψ) under H, iterate until convergence, that is, until the estimate of λ is sufficiently stable and the linear approximation of θ is sufficiently accurate. The starting point must be chosen so that the corresponding λ(i) 0 is compatible for all subjects. This may be achieved by setting to zero the intercepts of association parameters and all the regression coefficients corresponding to the covariates. Notice that, when inequality constraints are present, so that U is not the null matrix, the maximization of Q h (ψ) at each step requires a quadratic optimization which is itself iterative; there are many algorithms for quadratic optimization under inequality constraints, which are usually very fast and reliable. Since the likelihood function and the transformation from θ(i) to λ(i) satisfy the conditions discussed by Aitchison and Silvey ([2] p. 817), it follows that, as n increases, the probability that a constrained maximum exists tends to one. If the algorithm converges, it must converge to a local maximum by the argument of Aitchison and Silvey ([2] p. 826). Notice that our parameterization satisfies the two basic assumptions given by Rao ([16], p.296), namely identifiability and continuity of the transformation from ψ to π. It follows that, provided that 10

11 ψ 0, the true value of ψ under H, is an interior point of the parameter space, the m.l.e. of ψ under H exists, is consistent and has an asymptotic normal distribution. 4 An application to the Positive Correlation Test of asymmetric information in insurance markets There is a constantly growing body of empirical literature studying the existence of asymmetric information in insurance markets (for a review see Cohen and Spiegelman [6] and Einav et al. [8])). Standard economic theory predicts that risk occurrence and insurance coverage are positively correlated, since individuals who know to be riskier tend to buy more coverage (adverse selection) or to consume more for a given structure of the contract (moral hazard). The theoretically predicted positive correlation has inspired the seminal Positive Correlation (PC) test by Chiappori and Salanié [5]. The PC test rejects the null of absence of asymmetric information in a given insurance market when, conditional on consumers characteristics used by insurance companies to price contracts, individuals with more coverage experience more of the insured risk. In their seminal paper Chiappori and Salanié [5] provide simple empirical strategies to test the positive correlation hypothesis when both insurance coverage and risk occurrence are binary variables. The PC test has been applied to hundreds of various insurance markets, including acute health, long-term care, automobile, annuities, life, reverse mortgages and crop. In most applications, its implementation relies on a simple bivariate probit model, where the null of the absence of private information is tested by absence of residual errors correlation. In this application, we explain how our approach can be used to empirically implement the PC test when insurance coverage and risk are ordered categorical. We focus on the Medigap health insurance market in the US. Medigap is a private health insurance designed to cover some gaps in the coverage left by Medicare, which is a public health insurance program which provides coverage for all individuals aged 65 in US. In general the structure of Medicare is such that it leaves beneficiaries at risk for large out-of-pocket expenses. As a result, elderly may purchase voluntary supplemental private policies, such as Medigap, to fill Medicare s gaps in non-covered health care services and limit cost sharing. Medigap insurance market is highly regulated by Federal law, which designed a particular mechanism favoring the insured. In particular, insurance companies must offer a basic plan if they offer any other more generous plan; in addition, there is an enrolment period where insurance companies cannot refuse any insurer even if there are pre-existing conditions. Finally, federal regulation allows insurance company to set premium by individual s age and gender. To study the Medigap health insurance market we use data from the Health and Retirement Study conducted during the year Since we focus specifically on Medigap, we exclude those individuals younger than 65 and that received additional coverage through a former employer, spouse or are covered by some other government agency. We then consider only those who bought deliberately additional coverage. The final sample size is then given by N = 3290 observations. Since the Medigap plans differ on how generous is the coverage, we define the Medigap insurance coverage indicator (Plan) equals 0 if the individual has no coverage, 1 if she is covered by Medigap plan A or B, and 2 if 4 We do not use the last few available waves because there are no specific information on the Medigap plan s letter. 11

12 she is covered by any other more generous Medigap plan. Risk occurrence is measured by the number of doctor visits and hospital admissions in the previous two years. Since only a very small number of elders had no doctor visits, we constructed the variable Doc, which takes 0 if individual had less than five doctor visits, 1 if she had between five and ten, and 2 if she visited a doctor more than ten times. Since a very small number of elders had more than two hospital admissions, we defined the variable Hosp equals 0 if respondent had no hospital admission, 1 if she had one hospital admission and 2 if she had at least two hospital inpatient staying. Finally, given that Federal Law allows insurance companies to set premium according to the insured s age and gender, we use as control only whether the individual is a female (Fem) and 26 age dummy variables ranging from 65 to 90 years old or more. 4.1 Empirical strategy and results Let i = 1,..., N denote individual, z i, denote the vector collecting age and gender of individual i, and P lan i, Doc i, and Hosp i denote insurance coverage, and doctor and hospital use of individual i. We can rewrite model (6) as P lan = α p + z β p + ɛ p Doc = α d + z β d + ɛ d (10) Hosp = α h + z β h + ɛ h where α and β are vectors of unknown parameters, and the errors ɛ k, k = p, d, h have standard logistic marginal distribution but unspecified association structure (copula). Within this structure, the null of the absence of asymmetric information amounts to testing independence of ɛ p, ɛ d and ɛ p, ɛ h. If we assume that ɛ p, ɛ d, ɛ h in model (10) are jointly distributed as multivariate normal with standard marginal distributions we obtain the multivariate ordered probit which can be estimated by simulated Maximum Likelihood using for example the Stata CMP module (Roodman [17]). Estimated coefficients are reported in table 2, and correlation terms are reported in Panel A of table 4. The correlation between P lan and Doc strongly rejects the null of asymmetric information; on the other hand, the null of no asymmetric information between P lan and Hosp has a p-value of As comparison, table 3 and Panel B of table 4 report estimated coefficients of our multivariate logit model with the Placket restriction, which has the same complexity (same number of parameters) of the multivariate probit since it restricts each bivariate association to one single parameter. A glance of the tables reveals that the two models (probit and Plackett) give a very similar qualitative picture. Allowing for the different scale, the covariates parameters follow very similar patterns, and, more importantly, association coefficients (correlation coefficients for the probit and log-odds ratios for the logit) have very similar z-ratios and significance. Thus, the main advantage of our model is that it does not require use of simulated methods for estimation, and tends to be more accurate and faster. 5 Both the multivariate probit and the multivariate logit with Plackett restrictions however suffer from the limitation of imposing a restrictive structure to the bivariate associations of interest. To relax the Plackett assumption, we allow λ {p,d}, λ {p,h} and λ {d,h} to vary across the categories of P lan, Doc and Hosp. Since these variables have three categories, this implies estimating = 12 association parameters rather then three with the Palckett restriction. Estimated parameters for this 5 On a 2.4 GHz P8600 processor, estimating the logit model took about 178 seconds, which is about half the time the time (358 seconds) employed estimating the CMP module with the default number of draws set by the program (115)). 12

13 model are reported in table 4 and in Panel C of table 1. A glance at Panel C of table 1 reveals that, while the 4 association coefficients for the two health care variables Doc and Hosp are similar across categories, they differ in the case of coverage/risk association P lan Doc and P lan Hosp. A formal test of the Plackett assumption λ {p,d} (1, 1) = λ {p,d} (1, 2) = λ {p,d} (2, 1) = λ {p,d} (2, 2) and λ {p,h} (1, 1) = λ {p,h} (1, 2) = λ {p,h} (2, 1) = λ {p,h} (2, 2) has LR test statistic equal to and is asymptotically distributed as a χ 2 with 12 3 = 9 dof. Therefore the null is overwhelmingly rejected with a p-value equal to Panel A Table 1: Estimated correlation terms Plan-Doc Plan-Hosp Doc-Hosp Coef. S.E. Coef. S.E. Coef. S.E. ρ (0.0271) (0.0307) Panel B λ (0.0783) (0.0911) (0.0725) Panel C λ(1, 1) (0.0903) (0.0936) (0.0872) λ(1, 2) (0.0951) (0.139) (0.137) λ(2, 1) (0.106) (0.107) (0.0850) λ(2, 2) (0.110) (0.164) (0.116) Notes: For all equations control variables are age and gender dummies. Omitted age category is 65 years old. Estimated parameters reveal that the coverage/risk correlation is not homogeneous across coverage and risk categories. In particular, association between coverage and risk, for both doctor visits and hospital stays, is significantly positive only for moderate levels of health care use. In other words, conditional on age and gender, the null of no asymmetric information cannot be rejected if actual risk is defined as heavy use of health resources. Our results show that allowing association to vary across categories may provide a clearer picture of the effect underlying individual s heterogeneity. Recall, however, that finding residual coverage/risk correlation does not necessarily help to understand whether this is due to the structure of the contract (moral hazard) or rather to the existence of unpriced (by the insurer) individual risk (adverse selection). References [1] Agresti, A. (2002). Categorical data analysis. Wiley-Interscience. [2] Aitchison, J. and Silvey, S. D. (1958). Maximum-likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics 29: pp [3] Amemiya, T. (1985). Advanced econometrics. Harvard University Press. 6 For robustness we have also computed the LR test relaxing the equality assumptions separately. The LR test statistics for the null of equal λs between P lan Doc and P lan Hosp are equal to and 9.47, which are rejected with a p-value of and respectively. On the contrary the LR test statistics for the null of equal λs between doc and hosp is equal to , which is not rejected with a p-value

14 [4] Bartolucci, F., Colombi, R. and Forcina, A. (2007). An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Statistica Sinica 17: 691. [5] Chiappori, P.-A. and Salanié, B. (2000). Testing for asymmetric information in insurance markets. The Journal of Political Economy 108: [6] Cohen, A. and Spiegelman, P. (2010). Testing for adverse selection in insurance markets. Journal of Risk and Insurance 77: [7] Colombi, R. and Forcina, A. (2001). Marginal regression models for the analysis of positive association of ordinal response variables. Biometrika 88: pp [8] Einav, L., Finkelstein, A. and Levin, J. (2010). Beyond testing: Empirical models of insurance markets. Annual Review of Economics 2: [9] Forcina, A. and Dardanoni, V. (2008). Regression models for multivariate ordered responses via the plackett distribution. Journal of Multivariate Analysis 99: [10] Hadar, J. and Russell, W. R. (1969). Rules for ordering uncertain prospects. The American Economic Review 59: pp [11] Lehmann, E. L. (1966). Some concepts of dependence. The Annals of Mathematical Statistics 37: pp [12] McCullagh, P. and Nelder, J. (1989). Generalized linear models (Monographs on statistics and applied probability 37). Chapman Hall, London. [13] Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinal data using a multivariate plackett distribution. Journal of the American Statistical Association 89: pp [14] Nelsen, R. (2006). An introduction to copulas. Springer Verlag. [15] Plackett, R. L. (1965). A class of bivariate distributions. Journal of the American Statistical Association 60: pp [16] Rao, C. (1973). Linear statistical inference and its applications. Wiley (New York). [17] Roodman, D. (2011). Fitting fully observed recursive mixed-process models with cmp. Stata Journal 11: (48). [18] Süli, E. and Mayers, D. (2003). An introduction to numerical analysis. Cambridge University Press. [19] Wooldridge, J. (2002). Econometric analysis of cross section and panel data. The MIT press. 5 Appendix 14

15 Table 2: Estimated α and β parameters for the multivariate probit model Plan Doc Hosp Coef. S.E. Coef. S.E. Coef. S.E. /cut (0.0972) (0.0832) (0.0958) /cut (0.0978) (0.0843) (0.0977) aged (0.141) (0.115) (0.134) aged (0.129) (0.111) (0.131) aged (0.130) (0.112) (0.131) aged (0.138) (0.115) (0.131) aged (0.134) (0.113) (0.134) aged (0.135) (0.115) (0.131) aged (0.137) (0.117) (0.137) aged (0.145) (0.123) (0.137) aged (0.142) (0.120) (0.141) aged (0.160) (0.128) (0.144) aged (0.159) (0.131) (0.147) aged (0.155) (0.129) (0.148) aged (0.149) (0.127) (0.143) aged (0.165) (0.138) (0.157) aged (0.172) (0.132) (0.146) aged (0.184) (0.137) (0.156) aged (0.178) (0.151) (0.165) aged (0.202) (0.148) (0.163) aged (0.196) (0.154) (0.166) aged (0.183) (0.158) (0.169) aged (0.197) (0.166) (0.186) aged (0.214) (0.183) (0.188) aged (0.226) (0.190) (0.213) aged (0.382) (0.241) (0.285) aged (0.156) (0.123) (0.134) fem (0.0492) (0.0404) (0.0455) 15

16 Table 3: Estimated α and β parameters for the Plackett model Plan Doc Hosp Coef. S.E. Coef. S.E. Coef. S.E. /cut (0.0434) (0.0354) (0.0393) /cut (0.0505) (0.0396) (0.0571) aged (0.251) (0.188) (0.238) aged (0.220) (0.181) (0.230) aged (0.223) (0.184) (0.228) aged (0.241) (0.188) (0.227) aged (0.233) (0.186) (0.238) aged (0.230) (0.189) (0.227) aged (0.234) (0.192) (0.238) aged (0.250) (0.200) (0.233) aged (0.246) (0.198) (0.245) aged (0.284) (0.212) (0.247) aged (0.278) (0.215) (0.251) aged (0.272) (0.212) (0.257) aged (0.254) (0.209) (0.247) aged (0.287) (0.226) (0.272) aged (0.315) (0.215) (0.249) aged (0.343) (0.226) (0.268) aged (0.306) (0.245) (0.276) aged (0.381) (0.240) (0.273) aged (0.356) (0.253) (0.279) aged (0.313) (0.259) (0.284) aged (0.348) (0.274) (0.318) aged (0.370) (0.297) (0.313) aged (0.397) (0.312) (0.364) aged (0.773) (0.385) (0.526) aged (0.281) (0.203) (0.228) fem (0.0861) (0.0659) (0.0779) 16

17 Table 4: Estimated α and β parameters Plan Doc Hosp Coef. S.E. Coef. S.E. Coef. S.E. /cut (0.0434) (0.0355) (0.0394) /cut (0.0505) (0.0397) (0.0572) aged (0.251) (0.188) (0.238) aged (0.219) (0.181) (0.231) aged (0.223) (0.183) (0.229) aged (0.240) (0.188) (0.228) aged (0.233) (0.186) (0.238) aged (0.229) (0.189) (0.228) aged (0.234) (0.192) (0.239) aged (0.248) (0.200) (0.233) aged (0.246) (0.197) (0.245) aged (0.286) (0.211) (0.247) aged (0.279) (0.215) (0.251) aged (0.272) (0.212) (0.257) aged (0.254) (0.209) (0.247) aged (0.288) (0.226) (0.271) aged (0.316) (0.215) (0.249) aged (0.345) (0.226) (0.268) aged (0.304) (0.245) (0.276) aged (0.378) (0.239) (0.273) aged (0.354) (0.253) (0.279) aged (0.311) (0.258) (0.284) aged (0.351) (0.273) (0.316) aged (0.371) (0.296) (0.312) aged (0.395) (0.312) (0.363) aged (0.750) (0.384) (0.522) aged (0.283) (0.202) (0.228) fem (0.0858) (0.0659) (0.0779) 17