Mixed conditional logistic regression for habitat selection studies

Size: px
Start display at page:

Download "Mixed conditional logistic regression for habitat selection studies"

Transcription

1 Journal of Animal Ecology 2010, 79, doi: /j x Mixed conditional logistic regression for habitat selection studies Thierry Duchesne 1 *, Daniel Fortin 2 and Nicolas Courbin 2 1 De partement de Mathe matiques et de Statistique, Universite Laval, Sainte-Foy, QC, Canada G1V 0A6; and 2 Chaire de Recherche Industrielle CRSNG-Universite Laval en Sylviculture et Faune, De partement de Biologie, Universite Laval, Sainte-Foy, QC, Canada G1V 0A6 Summary 1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong interindividual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research. Key-words: case control location sampling, farmland, Global Positioning System, likelihoodratio test, mixed multinomial logit model, Prince Albert National Park, Spatially Explicit Landscape Event Simulator Introduction *Correspondence author. thierry.duchesne@mat.ulaval.ca The resource selection function (RSF) is currently one of the dominant tools used to quantify habitat selection (McLoughlin et al. 2010). RSFs link animal distribution to spatial patterns of habitat heterogeneity by contrasting the characteristics of animal locations with those of a set of random locations (Manly et al. 2002). Random locations are often drawn across home-ranges of individuals (Compton, Rhymer & McCollough 2002), in which case observed (response variable coded as ones) and random (response variable coded as zeros) locations are generally contrasted with unconditional logistic regressions. Under such a sampling design, however, estimation methods must consider that a certain Ó 2010 The Authors. Journal compilation Ó 2010 British Ecological Society

2 Mixed-effects models for habitat selection 549 number of random locations might have been visited, in which case they do not all represent true absences (Keating & Cherry 2004; Johnson et al. 2006). The use of a matched design can then become advantageous. With a matched design, each observed location is associated with a specific set of random locations drawn within a limited spatial domain (Boyce 2006), often corresponding to the distance where the animal could have travelled during the relocation time interval (Boyce et al. 2003). Because the animal could not also be at the random locations when its actual location was acquired, random locations represent true absences. Furthermore, matched designs are appropriate when evaluating the habitat selection of animals with home-ranges that are either not well defined or large relative to the distance individuals move between relocations (Arthur et al. 1996; Compton et al. 2002). RSFs based on a matched design are estimated by conditional logistic regression (Compton et al. 2002; Boyce et al. 2003; Boyce 2006; McDonald et al. 2006), an approach that is becoming increasingly used in habitat selection analysis. Despite difficulties in assigning a variance covariance structure (Craiu, Duchesne & Fortin 2008; Koper & Manseau 2009), the value of random effects in RSFs has been largely recognized in the case of models developed from non-matched designs (Gillies et al. 2006; Hebblewhite & Merrill 2008). Mixed effects should be better suited to analyse unbalanced data sets or when selection for the different landscape attributes vary among individuals (Gillies et al. 2006). Moreover, mixed-effects models can handle the situation where several matched sets of locations come from a same animal and are thus correlated. The addition of random effects also provides these advantages in studies based on matched sampling designs, but mixed-effects conditional logistic regressions have been largely overlooked in ecological research (but see Bruun & Smith 2003; Fortin et al. 2009). Moreover, unlike mixedeffects conditional logistic regression, fixed-effects models rely on assumptions that might not faithfully represent certain ecological systems. Fixed-effects models assume that the strength of selection is homogeneous among individuals within the population and thus estimate the population-averaged selection. Fixed-effects conditional logistic regression also implies independence from irrelevant alternatives (IIA, Revelt & Train 1998). The IIA hypothesis states that the strength of preference for (i.e. the odds of choosing) habitat type A over habitat type B does not depend on the other habitat types also available. Because behavioural decisions reflect trade-offs among multiple competing demands, changes in available options may alter individual preferences, thereby violating the IIA assumption. For example, prey often make greater use of patches located in relatively safe areas (Hay & Fuller 1981; Morrison et al. 2004; Hochman & Kotler 2007). The foraging efforts and selectivity of Nubian Ibex (Capra nubiana F. Cuvier, 1825) vary with the distance from the safety of a cliff (Hochman & Kotler 2007). In other words, the strength of preference for a given type of food patch over the baseline patch type depends on the presence or absence of a cliff at close proximity, a spatial dependency that might violate the IIA hypothesis. In this context, fixed effects may yield inappropriate conclusions, potentially leading to unfavourable management actions. In this study, we illustrate how departures from the assumption of homogeneous selection among individuals due either to inter-individual variability in movement rules or to the violation of the IIA assumption may influence the estimation of RSF parameters under a matched sampling design. We begin by showing how mixed effects can be incorporated into conditional logistic regression model. We follow an approach based on random utility theory (Cooper & Millspaugh 1999) because it is easily interpretable in the resource selection context and because the exponential form of the RSF is robust to misspecification (McFadden & Train 2000). We then explain why, unlike the fixed-effects model, its mixed-effects counterpart remains appropriate under some types of violation of IIA. We follow with a simulationbased investigation of the impact of departures from the homogeneity in selection probabilities and from the IIA assumption on the estimation of RSFs. Finally, we illustrate the methods with an analysis of habitat selection by the freeranging bison Bison bison (Linnaeus, 1758) of Prince Albert National Park (Saskatchewan, Canada) during the springs of In the spring, bison occasionally leave the park for adjacent private lands where they sometimes damage fences and crops, disturb livestock, and get killed by hunters. It can be beneficial for management to evaluate whether bison use of farmlands results from an active selection, and to quantify whether cross-boundary movements are made by few individuals or whether it is a widespread behaviour. We show that conditional mixed-effects RSFs are better suited than marginal fixed-effects RSFs to achieve this goal. Materials and methods RANDOM EFFECTS IN CONDITIONAL LOGISTIC REGRESSION As with ordinary (unconditional) logistic regression, random effects can be included in conditional logistic regression models by replacing fixed regression coefficients with random coefficients. Because of the conditioning involved, conditional models have no intercept term and random effects are included as random regression coefficients. In the resulting mixed multinomial logit model (sensu Revelt & Train 1998), each animal assigns a value, termed utility (U), to all landscape locations, and among the locations available at a given time selects the one with the highest utility (Cooper & Millspaugh 1999; McDonald et al. 2006). Let n =1,,K represent the individuals, t =1,,t n the time steps for individual n and j =1,,J the available locations (or a sample of all available locations, McDonald et al. 2006) for animal n at time step t. The mixed multinomial logit model considers utilities as random variables, with U njt being the utility that animal n assigns to the jth location available at time step t. Letx njt1 ;...; x njtm represent the values of m covariates (e.g. habitat attributes) measured at the jth location available to animal n at time step t. Now let us assume that the utility assigned to a location depends on its attributes, viz.

3 550 T. Duchesne et al. U njt ¼ b 1 x njt1 þ b 2 x njt2 þþb m x njtm þ b n1 z njt1 þþb nq z njtq þ e njt ¼ x njt 0b þ z njt 0b þ e njt ; eqn 1 where b 1,,b m are the fixed regression coefficients, b n1,,b nq are animal-level random effects, z njt1 ;...; z njtq are fixed values specifying the structure of the random effects (usually equal to the subset of the covariates x njti for which coefficients are random), e njt are independent and identically distributed random error terms, b =(b 1,,b m ), x njt =(x njt1 ;...; x njtm ), b =(b n1,,b nq ) and z njt =(z njt1 ;...; z njtq ). We make the assumption that the random errors follow an extreme value distribution, which reduces to the usual exponential RSF when there are no random effects (see below); this assumption is mild and the model thereby specified is very flexible (McFadden & Train 2000) Let the random effects b be independent and identically distributed with density f(b;h), with h a vector of unknown parameters. The probability that an animal chooses location j within the set of J locations {1, 2,,J}, i.e. U njt > U nit for all i j,is Z Pðx njt Þ¼ expðx 0 njt b þ z0 njt P bþ J i¼1 expðx0 nit b þ eqn 2 z0 nitbþfðb;hþdb: Though the distribution of the random effects is typically chosen as the multivariate normal distribution with mean vector 0 and variance covariance parameters to be estimated (Gillies et al. 2006; Hebblewhite & Merrill 2008), other distributions such as the lognormal, uniform or triangular can be used (Bhat 2001). When all z njti in eqn (1) take on value zero or when the variance of b is null (i.e. b is identically 0), eqn (2) simplifies to Pðx njt Þ¼ expðx0 njt P bþ J ; eqn 3 i¼1 expðx0 nitbþ and we get the ordinary (i.e. fixed effects) conditional logistic regression model (McDonald et al. 2006). (x 2P =0,x 2C = 0). If we assume a fixed-effect conditional logistic regression model (McDonald et al. 2006) with RSF proportional to exp(b P x jp + b C x jc ), then the ratio of the probability that the animal selects location j = 1 to the probability that the same animal (or another animal chosen at random) selects location j = 2 is given by expðb P Þ. P J j¼1 expðb Px jp þ b C x jc Þ expð0þ. P J j¼1 expðb Px jp þ b C x jc Þ ¼ expðb PÞ ¼ expðb 1 P Þ; eqn 4 which does not depend on whether there is a cliff among the other available locations. Now let us assume the same model, but this time with a random slope b P + b for covariate x P instead of the fixed slope b P. Because b remains fixed for all the locations of a given animal, the ratio of the probability that the animal chooses location j = 1 to the probability that it selects location j=2 is given by expðb P þ bþ. P J j¼1 expðb Px jp þ b C x jc Þ expð0þ. P J j¼1 expðb Px jp þ b C x jc Þ ¼ expðb P þ bþ ¼ expðb 1 P þ bþ; which still does not depend on the attributes of the alternate locations, but it depends on b, the unobserved animal-specific random effect. Now, if we consider the ratio of the probability that an animal chosen at random selects location j=1 to the probability that another animal, again chosen at random, selects location j=2, then we get R expðb P þ bþ. P J j¼1 expððb P þ bþx jp þ b C x jc Þ fðbþdb R expð0þ. P J j¼1 expððb P þ bþx jp þ b C x jc Þ fðbþdb 2 R expðbþ. 3 P J j¼1 expððb P þ bþx jp þ b C x jc Þ fðbþdb ¼ expðb P Þ6 R P 7 4 J 1fðbÞdb j¼1 expfðb 5 ; P þ bþx jp þ b C x jc g RANDOM EFFECTS, HETEROGENEITY IN SELECTION AND THE DEPENDENCE FROM IRRELEVANT ALTERNATIVES The addition of individual-level random effects in RSFs relaxes the assumption of homogeneous selection among animals. For example, adding an animal-level random regression coefficient allows for inter-individual variations in the response to covariate x, which means that each individual may respond differently to changes in x. Because the random effects are unobserved random variables that are common to all the locations of a given individual, the mixedeffects model does not assume that the observations of that individual are uncorrelated (Revelt & Train 1998). Note that, though they do not explicitly model the animal-level heterogeneity, fixed-effects model estimated by methods such as generalized estimating equations can handle correlated matched sets (Craiu et al. 2008). The mixed multinomial logit model relaxes the IIA assumption, but only at a population level. It does so by inducing correlation over alternatives in the stochastic portion of utility (Revelt & Train 1998; Skrondal & Rabe-Hesketh 2003). To illustrate this, we considered a forager, such as the Nubian Ibex (Hochman & Kotler 2007), responding to spatial patterns of risk. Suppose that each location is of one of three types, which we code using covariates x jp and x jc : location j may be a risky food patch, coded as x jp =1,x jc =0;a safe cliff, coded as x jp =0,x jc = 1; or a baseline habitat that offers no food or protection, coded as x jp =0,x jc = 0. We assume that J > 2 locations are available and that location j = 1 is a food patch (x 1P =1, x 1C = 0), and location j = 2 is the baseline habitat where the quantity in square brackets now depends on the characteristics of all available locations (Train 2003). This model thus relaxes the IIA assumption at the population level. In other words, by adding random coefficients in the conditional logistic regression model, the population-averaged probability of choosing a given habitat type depends on the local alternatives. ESTIMATION AND INFERENCE We now consider maximum likelihood estimation of the parameters of the model described by eqns (1) and (2) on the basis of data obtained with a matched sampling design. To simplify the notation and without loss of generality, we assume that the location chosen by animal n at time step t among the J available locations is assigned label j = 1 (and thus the locations not chosen are assigned labels j = 2, 3,,J). Maximum-likelihood estimates of the RSF and random effects distribution parameters are obtained by finding the values of b and h maximizing: Z Lðb; hþ ¼ YK Y tn n¼1 t¼1 expðx 0 n1t b þ z0 n1t P bþ J j¼1 expðx0 njt b þ fðb; hþdb: eqn 5 z0 njtbþ Because eqn (5) is a valid likelihood function, any likelihood-based inference method for b, such as Wald confidence intervals based on inverting the Hessian of the negative log-likelihood, likelihood-ratio tests, or AIC-based model selection can be applied (McFadden & Train 2000).

4 Mixed-effects models for habitat selection 551 According to parsimony principles, the need for random effects in RSFs should be assessed. If random effects are not needed, then fixed-effects conditional regression would improve estimation efficiency and model interpretability (Verbeke & Molenberghs 2000). Fixed-effects model can be considered as a special case of mixedeffects model where the variance and covariance parameters in f(b; h) are zero. A likelihood-ratio test for nested models can thus be used to evaluate the need to increase model complexity through the use of random effects. The likelihood-ratio statistic that tests whether the fixed-effects model is reasonable is given by r =2( 1 ) 0 ), where 0 and 1 are the values of the maximized log-likelihoods of the fixedand mixed-effects models, respectively. Because the value zero is on the boundary of the parameter space for variance parameters, the P- value is not simply based on the usual chi-squared distribution but rather on a mixture of chi-squared distributions, with the number of chi-squared variables in the mixture and their respective numbers of degrees of freedom depending on the structure of the variance and covariance parameters set to zero (Verbeke & Molenberghs 2000). Consider for example a mixed-effects model with a single random effect b with distribution N(0,r 2 ). The likelihood-ratio statistic to test whether b is needed follows, under the null model, a mixture of two chi-squared distributions with zero and one degree of freedom, respectively. This reduces the P-value to 05Pr½v 2 1 >rš,with v2 1 representing a chi-squared random variable with 1 degree of freedom. Direct numerical maximization of L(b, h) given by eqn (5) can be difficult, as it involves integrals that cannot be solved analytically. The numerical maximization of the likelihood is often more likely to converge for a fixed-effects RSF than its mixed-effects counterpart. Bhat (2001) described simulation methods based on Halton quasirandom numbers that can efficiently evaluate the likelihood function. Maximization of the likelihood from eqn (5) can be implemented with this method using the mxlmsl package (Train 2006) for matlab r2008a (MathWorks Inc. 2008). We provide the matlab code used for our bison case study in Appendix S3. There are other, albeit less direct, means of maximizing the likelihood from eqn (5). Chen & Kuo (2001) showed how to build a nonlinear Poisson model with random effects whose likelihood is equivalent to a closely related multinomial formulation of eqn (5). The required Poisson model can be fitted by maximum likelihood, where the integrals are evaluated with adaptive Gaussian quadrature or penalized quasi-likelihood. Bruun & Smith (2003) used the latter approach to evaluate habitat selection by European starlings (Sturnus vulgaris Linnaeus, 1758). Mixed conditional logistic regression models can also be fitted with Bayesian methods, but the approach then requires specifying prior distributions (informative or not) for b, h. R.V. Craiu, T. Duchesne, D. Fortin & S. Baillargeon (unpublished data), propose a numerically stable and efficient two-step method that gives accurate approximations to the maximum-likelihood estimates for mixed-effects conditional logistic regression. Perhaps, methods based on the results of the first step (i.e. separate models fitted to each animal) of such a two-step approach could help in determining whether the need for random effects arises from betweenanimal heterogeneity or the violation of IIA. Example 1: Simulation of patch selection under predation risk We use computer simulations to investigate the effect of departures from the assumption of homogeneous habitat selection among individuals. Deviations from the assumption were induced by imposing inter-individual variations in movement rules and by forcing movement decisions that violated the IIA assumption. Individual-based, spatially explicit modelling was conducted using the Spatially Explicit Landscape Event Simulator (Fall & Fall 2001). We simulated the movements of 200 virtual foragers, with each individual starting (time 0) at a random location within the landscape ( cells), and followed for 50 consecutive moves. Landscapes comprised four types of randomly distributed habitat patches: Patch type H1 offered the most food, followed by H2. Neither H3 nor H4 offered any food. H1 was risky, unless located <15 cells from H3, in which case H1 became safe. H2 was always safe. We tested four scenarios differing in the movement rules of individuals, with distinct statistical implications. Movements for scenarios 1 and 2 were both consistent with the IIA hypothesis; scenario 1 assumed a homogeneous movement rule, whereas scenario 2 involved inter-individual variation in the rules. Scenarios 3 and 4 both led to violation of the IIA hypothesis at the individual level, because the preference for H1 over H2 depended on whether H3 occurs within 15 cells; a homogeneous movement rule was used for scenario 3 whereas inter-individual variation in movement rules characterized scenario 4. The movement rules as well as the landscape used for each of the four scenarios are described in detail in Appendix S1. To assess the effect of varying patch availability on inferences, scenario 3 was applied to five additional landscapes, where the proportions of H1 and H2 remained unchanged but those of H3 and H4 varied according to Landscape 1: 0Æ01%, 69Æ99%, Landscape 2: 0Æ02%, 69Æ98%, Landscape 3: 0Æ03%, 69Æ97%, Landscape 4: 0Æ05%, 69Æ95%, and Landscape 5: 0Æ06%, 69Æ94%, respectively. In all scenarios, each observed location was matched to 10 locations randomly drawn within a 30-cell radius, which was enough to encompass all step distances (Forester, Im & Rathouz 2009). Patch type (H1 H4) was identified at all observed and random locations. Fixed- and mixed-effects conditional logistic regressions were used to build RSFs. Mixed-effects RSFs allowed the coefficient of H1 to vary among individuals according to N(b 1,r 2 ). In all models, H2 was used as the baseline patch type. Models were fitted by maximizing the likelihood given by eqn (5) using a publicly available matlab r2008a (MathWorks Inc. 2008) package (Train 2006). Example 2: Habitat selection by free-ranging bison The field study was conducted in the springs of (9 March 31 May 2005, 1 March 31 May in 2006 and 2007, and 1 March 10 March 2008) in Prince Albert National Park, where the bison population was comprised of 385 individuals. The bison range is mostly composed of forests (85 %), meadows (10%) and water bodies (5%). The range is adjacent to farmlands, where bison are occasionally found. We followed 24 female bison equipped with Global Positioning System collars (GPS collar 4400M from Lotek Engineering, Newmarket, ON, Canada) taking locations at 06:00 and 18:00 hours. Each observed location was paired with 10 random locations sampled within a 1Æ6-km radius circle (>90% of all travelled distances between relocations). Land-cover types at observed and random locations were characterized based on classified Landsat ETM+ satellite images (Fortin et al. 2009). Land-cover types were (i) meadow, including areas near lakes and rivers dominated by grasses, forbs and sedges (MEADOW); (ii) riparian areas largely comprised shrubs and located near streams and rivers (RIPARIAN); (iii) forest consisting of deciduous, conifer and mixed stands (FOREST); (iv) water bodies (WATER); (v) road including the areas located <15 m from a human-made trail or a road (ROAD); and (vi) farmlands (AGRIC). Fixed- and mixed-effects conditional logistic regressions fitted by maximum likelihood were used to

5 552 T. Duchesne et al. build RSFs. Random effects assuming N(0,r 2 ) were investigated for AGRIC, with FOREST as the baseline land-cover type. Results EXAMPLE 1: SIMULATION OF PATCH SELECTION UNDER PREDATION RISK Scenario 1 represented a situation where the IIA hypothesis was valid and where the movement strategy was fixed within the population of simulated foragers. As expected in such cases, the fixed- and mixed-effects RSFs yielded a similar coefficient estimate for H1 of )0Æ91 ± 0Æ03 (±SE) (Table 1), which agrees with the theoretical approximation (Appendix S2). Moreover, the standard deviation (SD) of the random coefficient associated with the mixed-effects model did not differ significantly from 0 (likelihood-ratio test: P=0Æ46), indicating that a random coefficient for H1 was not required (Table 1). The IIA assumption remained valid in scenario 2, but this time each animal had a different probability of choosing H1. In this context, the mixed-effects RSF received greater empirical support (Table 1) as its random coefficient was an important addition to the model fit (likelihood-ratio test: P <0Æ0001). We now consider a situation (scenario 3) where all individuals displayed the same movement strategy, but where the IIA assumption was violated because the odds of choosing H1 depended on whether a refuge patch H3 was at close proximity. Selection coefficients for H1 were then systematically lower when estimated by mixed-effects conditional logistic regression than by their fixed-effects counterpart (Fig. 1). Whether H1 was selected or avoided remained generally consistent with both models, with the exception of when H3 made up 0Æ04% of landscape. In this case, the fixedeffects RSF suggested a significant selection for H1, whereas the better fitting (likelihood-ratio test: P < 0Æ0001) mixedeffects model revealed that the average simulated forager had Table 1. Patch selection estimated by fixed- or mixed-effects conditional logistic regressions with normally distributed coefficients, for virtual foragers travelling in landscapes according to four scenarios. The scenarios differed depending on whether movement rules were similar among all individuals of the population and whether the assumption of independence from irrelevant alternatives (IIA) was violated. H2 was the baseline patch type in all resource selection functions Fixed-effects model Mixed-effects model Variable b SE 95% CI b SE 95% CI Scenario 1: no inter-individual variation, IIA assumption respected H1 )0Æ908 0Æ031 )0Æ969, )0Æ847 H4 )1Æ520 0Æ024 )1Æ567, )1Æ473 )1Æ520 0Æ024 )1Æ567, )1Æ473 H1 )0Æ908 0Æ031 )0Æ969, )0Æ847 SD of coefficient 0Æ000 0Æ174 Max. log likelihood )22 030Æ021 )22 030Æ020 Scenario 2: inter-individual variation, IIA assumption respected H1 )0Æ835 0Æ031 )0Æ896, )0Æ774 H4 )1Æ528 0Æ024 )1Æ575, )1Æ481 )1Æ528 0Æ024 )1Æ575, )1Æ481 H1 )0Æ873 0Æ041 )0Æ953, )0Æ793 SD of coefficient 0Æ368 0Æ041 Max. log likelihood )22 023Æ359 )21 999Æ292 Scenario 3: no inter-individual variation, IIA assumption violated H1 0Æ073 0Æ027 0Æ020, 0Æ126 H3 )0Æ736 0Æ528 )1Æ771, 0Æ299 )0Æ712 0Æ530 )1Æ751, 0Æ327 H4 )1Æ458 0Æ026 )1Æ509, )1Æ407 )1Æ462 0Æ027 )1Æ515, )1Æ409 H1 0Æ006 0Æ060 )0Æ112, 0Æ124 SD of coefficient 0Æ752 0Æ046 Max. log likelihood )21 557Æ935 )21 273Æ523 Scenario 4: inter-individual variation, IIA assumption violated H1 0Æ019 0Æ028 )0Æ036, 0Æ074 H3 )1Æ540 0Æ724 )2Æ959, )0Æ121 )1Æ461 0Æ726 )2Æ884, )0Æ038 H4 )1Æ454 0Æ026 )1Æ505, )1Æ403 )1Æ450 0Æ026 )1Æ501, )1Æ399 H1 )0Æ062 0Æ061 )0Æ182, 0Æ058 SD of coefficient 0Æ764 0Æ047 Max. log likelihood )21 655Æ263 )21 362Æ713

6 Mixed-effects models for habitat selection 553 were used. Population-averaged fixed-effects RSF indicated a general selection for farmlands over forest areas, whereas mixed-effects model revealed that bison had no preference for one land-cover type over the other. The mixed-effects model provided a better depiction of bison selection than the fixed-effects RSF (likelihood-ratio test: P < 0Æ0001). The mixed-effects RSF revealed important heterogeneity in the response to farmlands within the population (Table 2), with 41% (N[)0Æ275, 1Æ538]) of female bison having a positive selection coefficient for farmlands. Discussion Fig. 1. Changes in the selection coefficient (±95% confidence intervals) for patch type H1 by simulated foragers as function of the percentage of the landscape comprised refuge patch H3, as assessed by resource selection functions estimated from fixed- or mixed-effects conditional logistic regression. Simulations were made according to scenario 3 where the probability that a forager selects H1 compared to H2 increased when a refuge H3 was in close proximity. We also indicated the expected proportion of the population having positive coefficient for patch type H1, based on the N(^b; ^r 2 ) estimate of the distribution of b + b. Notice that values for fixed and random coefficients were slightly offset from one another to increase clarity. no overall selection for H1 (Fig. 1). In the most complex scenario 4, virtual foragers not only violated the premise of IIA, but the strength of selection for H1 also differed among them. Modelling habitat selection under this scenario required, once again, the use of a random coefficient for H1 (likelihood-ratio test: P < 0Æ0001). EXAMPLE 2: HABITAT SELECTION OF FREE-RANGING BISON Compared to the forest matrix, female bison selected meadows, water bodies and roads, but displayed no preference for riparian areas (Table 2). The response of bison to farmlands differed depending on whether fixed- or mixed-effects RSFs We used spatially explicit simulations to demonstrate how mixed-effects conditional logistic regression can capture inter-individual variation in selection induced by differences in movement rules among simulated foragers and by the presence or absence of refuge patches (which led to the violation of the IIA assumption). When the relative preference of resource patches was the same for all individuals and IIA was true (scenario 1), the fixed- and mixed-effects models estimated almost identical regression coefficients. Fixed-effects RSFs then provided an accurate representation of habitat selection within the population and were more parsimonious than mixed-effects RSFs. In contrast, when habitat selection probabilities varied among individuals but the IIA was still a valid assumption (scenario 2), the likelihood-ratio test indicated that the selection for H1 varied significantly within the population, thereby rejecting the fixed-effects model. These conclusions for scenario 2 also held under scenarios 3 (no inter-individual variation in selection and violation of the IIA assumption) and 4 (inter-individual variability in selection and violation of the IIA assumption). In these cases, RSFs that include random effects gave a more accurate representation of habitat selection in the population. The simulation study also demonstrated that individuallevel heterogeneity can be identified and taken into account in RSFs, even when data are collected under a matched sampling design. Furthermore, the simulations (i.e. scenario 3) Table 2. Resource selection functions for radiocollared female bison in Prince Albert National Park during the springs of , as estimated with fixed- or mixed-effects conditional logistic regressions, with normally distributed coefficients Fixed-effects model Mixed-effects model Variable b SE 95% CI b SE 95% CI Meadow 2Æ024 0Æ046 1Æ934, 2Æ114 2Æ024 0Æ046 1Æ934, 2Æ114 Water 0Æ399 0Æ094 0Æ215, 0Æ583 0Æ401 0Æ094 0Æ217, 0Æ585 Riparian area )0Æ315 0Æ163 )0Æ635, 0Æ005 )0Æ301 0Æ163 )0Æ620, 0Æ018 Road 0Æ942 0Æ143 0Æ663, 1Æ222 0Æ953 0Æ143 0Æ673, 1Æ233 Farmlands 0Æ348 0Æ118 0Æ117, 0Æ579 Farmlands )0Æ275 0Æ377 )1Æ014, 0Æ464 H4 )1Æ520 0Æ024 )1Æ567, )1Æ473 )1Æ520 0Æ024 )1Æ567, )1Æ473 SD of coefficient 1Æ243 0Æ344 Max. log likelihood )5947Æ846 )5930Æ033 Likelihood-ratio test P < 0Æ0001

7 554 T. Duchesne et al. stress that inter-individual variations in movement rules are only one potential source of heterogeneity which may entail the use of mixed-effects conditional logistic regression to analyse habitat selection data gathered from matched sampling designs. The trade-offs between food intake and predator avoidance can shape movement decisions, potentially leading to the violation of the IIA assumption. A faulty assumption of IIA may introduce sufficient heterogeneity in the response of animals to their habitat for random effects to be needed to adequately model animal distribution in response to spatial heterogeneity. Situations where the observed selection violates the IIA assumption can still be modelled with fixed-effects models when animals are homogeneous in their landscape preference. In this situation, the strength of preference for one habitat type over another depends on available alternatives and this dependence has to be modelled correctly and explicitly in the RSF using proper interaction terms. This precise knowledge is likely to be missing a priori in many studies and mixed-effects model offer a robust safeguard in such cases. Findings from the simulations imply that the heterogeneity in selection for farmlands expressed by the female bison of Prince Albert National Park can be due to several factors, including inter-individual variations in movement decisions and the violation of the IIA assumption. Mixed-effects logistic regression can conveniently handle both sources of heterogeneity and thereby provide a robust framework for ecological inference. We concurrently modelled the response of bison to multiple habitat attributes before drawing conclusions about their response to farmlands. For example, we found that bison selected roads, as well as meadows where individuals can find large quantities of high-quality food (Fortin, Fryxell & Pilote 2002; Craiu et al. 2008; Fortin et al. 2009). Fixed- and mixed-effects RSFs then pointed out distinct response of bison to farmlands. Fixed-effects models implied that bison generally made selective use of farmlands, whereas the mixed-effects RSFs refuted this assessment by revealing heterogeneous selection for farmlands. A likelihood-ratio test revealed that the mixed-effects RSF was superior to its fixed-effects counterpart. We thus conclude that the problem of cross-boundary movements is linked to a subset, though a fairly large one, of individuals within the population, with c. 40% of female bison making selective use of farmlands. The mixed-effects RSF thus draw park managers a very different picture from the general selection for farmlands that was implied by the population-averaged fixedeffects RSF. Solving human-wildlife conflicts may depend on whether the problem originates from a restricted number of individuals. In this case, the translocation of problematic individuals can be the solution (Sukumar 1991; Jones & Nealson 2003). On the other hand, this management approach might not be as effective when all members of the population adopt an unacceptable behaviour. Management or conservation actions should be tailored to the nature of the problem, and mixed-effects models are often better suited than fixed-effects models to evaluate adequately the situation. Our study stressed how drawing robust inference from RSFs may require the use of random effects in conditional logistic regression models. We demonstrated how fixed and mixed conditional logistic regression can lead to different conclusions about animal habitat interactions. Our simulations illustrated that in some situations models with random coefficients, which yield individual-specific inferences, can provide a more accurate assessment of resource selection by animals compared with fixed-effects models that provide population-averaged inference (Fieberg et al. 2009; Koper & Manseau 2009). We found that the selection for agricultural lands by the population of free-ranging bison of Prince Albert National Park can differ depending on whether random coefficients are used or not. Such differences could have important management and conservation implications. Indeed, habitat selection is commonly used to identify critical resources (Arthur et al. 1996), suitable habitat (Fortin et al. 2008), response to anthropogenic disturbances (Hebblewhite & Merrill 2008), ecological consequences of species reintroduction (Whittaker & Lindzey 2004; Mao et al. 2005). A biased assessment of habitat selection may therefore result in inadequate management or conservation actions. Matched sampling designs and conditional logistic regressions are increasingly used in ecological research (e.g. for RSFs, Boyce 2006; for step selection functions, Fortin et al. 2005), and fixed-effects models may lead to mistaken inferences about selection whenever hypotheses such as IIA or homogeneous strength of selection among animals are not respected. We suggest that mixed-effects conditional logistic regression should become a valuable, and sometimes necessary, statistical tool for valid inference in ecological research. Acknowledgements Funding for this study was provided by Parks Canada Species at Risks Recovery Action and Education Fund, a program supported by the National Strategy for the Protection of Species at Risk, Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, and l Université Laval. We are grateful to L. O Brodovich and D. Frandsen, M.-E. Fortin, K. Dancose and S. Courant for their assistance in the field, and to Pierre Racine for his help with SELES, and James Hodson for his editorial comments on the study. References Arthur, S.M., Manly, B.F.J., McDonald, L.L. & Garner, G.W. (1996) Assessing habitat selection when availability changes. Ecology, 77, Bhat, C.R. (2001) Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model. Transportation Research Part B-Methodological, 35, Boyce, M.S. (2006) Scale for resource selection functions. Diversity and Distributions, 12, Boyce, M.S., Mao, J.S., Merrill, E.H., Fortin, D., Turner, M.G., Fryxell, J. & Turchin, P. (2003) Scale and heterogeneity in habitat selection by elk in Yellowstone National Park. Ecoscience, 10, Bruun, M. & Smith, H.G. (2003) Landscape composition affects habitat use and foraging flight distances in breeding European starlings. Biological Conservation, 114, Chen, Z. & Kuo, L. (2001) A note on the estimation of the multinomial logit model with random effects. The American Statistician, 55, Compton, B.W., Rhymer, J.M. & McCollough, M. (2002) Habitat selection by wood turtles (Clemmys insculpta): an application of paired logistic regression. Ecology, 83,

8 Mixed-effects models for habitat selection 555 Cooper, A.B. & Millspaugh, J.J. (1999) The application of discrete choice models to wildlife resource selection studies. Ecology, 80, Craiu, R.V., Duchesne, T. & Fortin, D. (2008) Inference methods for the conditional logistic regression model with longitudinal data. Biometrical Journal, 50, Fall, A. & Fall, J. (2001) A domain-specific language for models of landscape dynamics. Ecological Modelling, 141, Fieberg, J., Rieger, R.H., Zicus, M.C. & Schildcrout, J.S. (2009) Regression modelling of correlated data in ecology: subject-specific and population averaged response patterns. Journal of Applied Ecology, 46, Forester, J.D., Im, H.K. & Rathouz, P.J. (2009) Acccounting for animal movement in estimation of resource selection functions: sampling and data analysis. Ecology, 90, Fortin, D., Fryxell, J.M. & Pilote, R. (2002) The temporal scale of foraging decisions in bison. Ecology, 83, Fortin, D., Beyer, H.L., Boyce, M.S., Smith, D.W., Duchesne, T. & Mao, J.S. (2005) Wolves influence elk movements: behavior shapes a trophic cascade in Yellowstone National Park. Ecology, 86, Fortin, D., Courtois, R., Etcheverry, P., Dussault, C. & Gingras, A. (2008) Winter selection of landscapes by woodland caribou: behavioural response to geographical gradients in habitat attributes. Journal of Applied Ecology, 45, Fortin, D., Fortin, M.E., Beyer, H.L., Duchesne, T., Courant, S. & Dancose, K. (2009) Group-size-mediated habitat selection and group fusion-fission dynamics of bison under predation risk. Ecology, 90, Gillies, C.S., Hebblewhite, M., Nielsen, S.E., Krawchuk, M.A., Aldridge, C.L., Frair, J.L., Saher, D.J., Stevens, C.E. & Jerde, C.L. (2006) Application of random effects to the study of resource selection by animals. Journal of Animal Ecology, 75, Hay, M.E. & Fuller, P.J. (1981) Seed escape from heteromyid rodents the importance of microhabitat and seed preference. Ecology, 62, Hebblewhite, M. & Merrill, E. (2008) Modelling wildlife human relationships for social species with mixed-effects resource selection models. Journal of Applied Ecology, 45, Hochman, V. & Kotler, B.P. (2007) Patch use, apprehension, and vigilance behavior of Nubian Ibex under perceived risk of predation. Behavioral Ecology, 18, Johnson, C.J., Nielsen, S.E., Merrill, E.H., McDonald, T.L. & Boyce, M.S. (2006) Resource selection functions based on use-availability data: theoretical motivation and evaluation methods. Journal of Wildlife Management, 70, Jones, N.D. & Nealson, T. (2003) Management of aggressive Australian magpies by translocation. Wildlife Research, 30, Keating, K.A. & Cherry, S. (2004) Use and interpretation of logistic regression in habitat selection studies. Journal of Wildlife Management, 68, Koper, N. & Manseau, M. (2009) Generalized estimating equations and generalized linear mixed-effects models for modelling resource selection. Journal of Applied Ecology, 46, Manly, B.F.J., McDonald, L.L., Thomas, D.L., McDonald, T.L. & Erickson, W.P. (2002) Resource Selection by Animals: Statistical Design and Analysis for Field Studies, 2nd edn. Kluwer Academic, Dordrecht. Mao, J.S., Boyce, M.S., Smith, D.W., Singer, F.J., Vales, D.J., Vore, J.M. & Merrill, E.H. (2005) Habitat selection by elk before and after wolf reintroduction in Yellowstone National Park. Journal of Wildlife Management, 69, MathWorks Inc. (2008) MATLAB Software: The Language of Technical Computing, Version R2008a. MathWorks Inc., Natick, MA, USA. McDonald, T.L., Manly, B.F.J., Nielson, R.M. & Diller, L.V. (2006) Discrete-choice modelling in wildlife studies exemplified by Northern Spotted Owl nighttime habitat selection. Journal of Wildlife Management, 70, McFadden, D. & Train, K. (2000) Mixed MNL models for discrete response. Journal of Applied Econometrics, 15, McLoughlin, P.D., Morris, D.W., Fortin, D., Vander Wal, E. & Contasti, A.L. (2010) Considering ecological dynamics in resource selection functions. Journal of Animal Ecology, 79, Morrison, S., Barton, L., Caputa, P. & Hik, D.S. (2004) Forage selection by collared pikas, Ochotona collaris, under varying degrees of predation risk. Canadian Journal of Zoology, 82, Revelt, D. & Train, K. (1998) Mixed logit with repeated choices: households choices of appliance efficiency level. Review of Economics and Statistics, 80, Skrondal, A. & Rabe-Hesketh, S. (2003) Multilevel logistic regression for polytomous data and rankings. Psychometrika, 68, Sukumar, R. (1991) The management of large mammals in relation to male strategies and conflict with people. Biological Conservation, 55, Train, K.E. (2003) Discrete Choice Models With Simulation. Cambridge University Press, Edinburgh. Train, K.E. (2006) Mixed Logit Estimation by Maximum Simulated Likelihood. Matlab package. Available at: train1006mxlmsl.html, accessed 3 February Verbeke, G. & Molenberghs, G. (2000) Linear Mixed Models for Longitudinal Data. Springer-Verlag, New York. Whittaker, D.G. & Lindzey, F.G. (2004) Habitat use patterns of sympatric deer species on Rocky Mountain Arsenal, Colorado. Wildlife Society Bulletin, 32, Received 26 October 2009; accepted 15 January 2010 Handling Editor: Fanie Pelletier Supporting Information Additional Supporting Information may be found in the online version of this article. Appendix S1. Detailed description of the four simulation scenarios used to assess the effect of heterogeneous habitat selection among animals. Appendix S2. Calculation of the long-run probabilities of being in a given patch type and theoretical value of the RSF coefficient for patch type H1 under simulation scenario 1. Appendix S3. matlab code to estimate mixed-effects resource selection function for the free-ranging bison of Prince Albert National Park, Saskatchewan, Canada. As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Christine Ebling, University of Technology Sydney, christine.ebling@uts.edu.au Bart Frischknecht, University of Technology Sydney,

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

LESSON 2 Carrying Capacity: What is a Viable Population? A Lesson on Numbers and Space

LESSON 2 Carrying Capacity: What is a Viable Population? A Lesson on Numbers and Space Ï MATH LESSON 2 Carrying Capacity: What is a Viable Population? A Lesson on Numbers and Space Objectives: Students will: list at least 3 components which determine the carrying capacity of an area for

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

How To Model The Fate Of An Animal

How To Model The Fate Of An Animal Models Where the Fate of Every Individual is Known This class of models is important because they provide a theory for estimation of survival probability and other parameters from radio-tagged animals.

More information

Multiple Choice Models II

Multiple Choice Models II Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

More information

Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Density dependent matrix model for gray wolf population projection

Density dependent matrix model for gray wolf population projection Ecological Modelling 151 (2002) 271 278 www.elsevier.com/locate/ecolmodel Density dependent matrix model for gray wolf population projection David H. Miller a, *, Al L. Jensen a, James H. Hammill b a School

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Solution: The optimal position for an investor with a coefficient of risk aversion A = 5 in the risky asset is y*:

Solution: The optimal position for an investor with a coefficient of risk aversion A = 5 in the risky asset is y*: Problem 1. Consider a risky asset. Suppose the expected rate of return on the risky asset is 15%, the standard deviation of the asset return is 22%, and the risk-free rate is 6%. What is your optimal position

More information

Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data By LEVON BARSEGHYAN, JEFFREY PRINCE, AND JOSHUA C. TEITELBAUM I. Empty Test Intervals Here we discuss the conditions

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Spatio-Temporal Modeling Issues: a Case of Soybean Aphid

Spatio-Temporal Modeling Issues: a Case of Soybean Aphid Spatio-Temporal Modeling Issues: a Case of Soybean Aphid 9. 7. 2012 Seong Do Yun Project Advisor: Dr. Gramig, Benjamin SHaPE Space, Health and Population Economics Research Group Project: Enhancing Ecosystem

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Robust procedures for Canadian Test Day Model final report for the Holstein breed

Robust procedures for Canadian Test Day Model final report for the Holstein breed Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

START Selected Topics in Assurance

START Selected Topics in Assurance START Selected Topics in Assurance Related Technologies Table of Contents Introduction Some Statistical Background Fitting a Normal Using the Anderson Darling GoF Test Fitting a Weibull Using the Anderson

More information

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Advantages of latent class over continuous mixture of Logit models

Advantages of latent class over continuous mixture of Logit models Advantages of latent class over continuous mixture of Logit models Stephane Hess Moshe Ben-Akiva Dinesh Gopinath Joan Walker May 16, 2011 Abstract This paper adds to a growing body of evidence highlighting

More information

Non-Inferiority Tests for Two Means using Differences

Non-Inferiority Tests for Two Means using Differences Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Inference from habitat-selection analysis depends on foraging strategies

Inference from habitat-selection analysis depends on foraging strategies Journal of Animal Ecology 2010, 79, 1157 1163 doi: 10.1111/j.1365-2656.2010.01737.x Inference from habitat-selection analysis depends on foraging strategies Guillaume Bastille-Rousseau 1, Daniel Fortin

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Aileen Murphy, Department of Economics, UCC, Ireland. WORKING PAPER SERIES 07-10

Aileen Murphy, Department of Economics, UCC, Ireland. WORKING PAPER SERIES 07-10 AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR IN IRELAND Aileen Murphy, Department of Economics, UCC, Ireland. DEPARTMENT OF ECONOMICS WORKING PAPER SERIES 07-10 1 AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR

More information

UNIVERSITY OF WAIKATO. Hamilton New Zealand

UNIVERSITY OF WAIKATO. Hamilton New Zealand UNIVERSITY OF WAIKATO Hamilton New Zealand Can We Trust Cluster-Corrected Standard Errors? An Application of Spatial Autocorrelation with Exact Locations Known John Gibson University of Waikato Bonggeun

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090

Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090 Advanced Statistical Analysis of Mortality Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc 160 University Avenue Westwood, MA 02090 001-(781)-751-6356 fax 001-(781)-329-3379 trhodes@mib.com Abstract

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Technical report Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information