# Models Where the Fate of Every Individual is Known

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Models Where the Fate of Every Individual is Known This class of models is important because they provide a theory for estimation of survival probability and other parameters from radio-tagged animals. The focus of known fate models is the estimation of survival probability S, the probability of surviving an interval between sampling occasions. These are models where it can be assumed that the sampling probabilities are. That is, the status (dead or alive) of all tagged animal is known at each sampling occasion. For this reason, precision is typically quite high, even in cases where sample size is often fairly small. The only disadvantages might be the cost of radios and possible effects of the radio on the animal or its behavior. The model is a product of simple binomial likelihoods. Data on egg morality in nests and studies of sessile organisms, such as mollusks, have also been modeled as known fate data. The Kaplan-Meier Method The Kaplan-Meier (958) method has been used commonly in the past and we will mention it as a starting point. This estimator is based on observed data at a series of occasions, where animals are marked and released only at occasion. The K-M estimator of the survival function is t S > n d n i = 3 3 =, 3 where n3 is the number of animals alive and at risk at occasion i, d3 is the number known dead >2 at occasion i, and the product is over i up to the t occasion. Critical here is that n3 is the number known alive at occasion i minus those individuals known dead or censored during the interval. It is rare that a survival study will observe the occasion of death of every individual in the study. Animals are lost" (i.e., censored) due to radio failure or other reasons. The treatment of such censored animals is often important, but often somewhat subjective. These K-M estimates produce a survival function (see White and Garrott 990); the cumulative survival up to time t. This is a step function and is useful in comparing, for example, the survival functions for males vs. females. If there are no animals that are censored, then the survival function (empirical survival function or ESF) is merely, > S Number longer than t n = for t 0. This is the same as the intuitive estimator where not censoring is occurring; S = n /n (e.g., S = n /n ). > > " > \$

2 The K-M method is an estimate of this survival function in the presence of censoring. Expressions for the variance of these estimates can be found in White and Garrott (990). A simple example of this method can be illustrated using the data from Conroy et al. (989) on 48 radio-tagged black ducks. The data are Survived to Occasion Week Number alive at start Number dying Number alive at end Number censored Here, the number alive at the start of an interval are known to be alive at the start of sampling occasion j). This is equivalent to being alive at the start of interval j. For example, 47 animals are known to be alive at the beginning of occasion 2. A further example is that 34 ducks survived to the start of occasion 5. Thus, the MLEs are S " = 47/48 = S = 45/47 = S\$ = 39/4 = 0.95 (note, only 4 because 4 were censored) S% = 34/39 = S& = 28/32 = (note, only 32 because 2 were censored) S' = 25/28 = S( = 24/25 = S) = 24/24 =.000. Here one estimates 8 parameters (call this model St ( )); one could seek a more parsimonious model in several ways. First, perhaps all the parameters were nearly constant; thus a model with a single survival probability might suffice (i.e., S(.)) If something was known about the intervals (similar to the flood years for the European dipper data) one could model these with one parameter and denote the other periods with a second survival parameter. Finally, one

3 might consider fitting some smooth function across the occasions and, thus, have perhaps only one intercept and one slope parameters (instead of 8 parameters). Still other possibilities exist for both parsimonious modeling and probable heterogeneity of survival probability across animals. These extensions are not possible with the K-M method and K-L-based model selection is not possible. Pollock's Staggered Entry Design The Kaplan-Meier method assumes that all animals are released at occasion and they are followed during the study until they die or are censored. Often new animals are released at each occasion period (say, weekly); we say this entry is staggered" (Pollock et al. 989). Assume, as before, that animals are fitted with radios and that these do not affect the animal's survival probability. This staggered entry fits easily into the K-M framework by merely redefining the n 3 to include the number of new animals released at occasion i. Therefore, conceptually, the addition of new animals into the marked population causes no difficulties in data analysis. The Binomial Model We focus on the so-called binomial model as this allows standard likelihood inference and is therefore similar to other models in program MARK. There are 3 possible scenarios under the known fate model. Each tagged animal either:. survives to end of study (detected at each sampling occasion after its release Ä the fate is known on every occasion). 2. dies sometime during the study (its carcass is found on the first sampling occasion after its death Ä the fate is known). 3. survives up to the point at which time it is censored. Note, for purposes of estimating survival probabilities, there is no difference between an animal seen alive and then removed from the population at occasion k vs. an animal alive at occasion k and then censored due to radio failure or whatever. The binomial model assumes the capture histories are mutually exclusive and exhaustive, that animals are independent, and all animals have the same underlying parameters during interval j (homogeneity across individuals). Known fate data can be modeled by a product of binomials. Let us modify the black duck data slightly, n = 48, n = 44, and n = 4; the first likelihood is " \$

4 8 8 8 " ± " " " _( S n, n ) = Š n n " 2 " S ( S ). Clearly, one could find the MLE, S ", for this expression (e.g., S " = 44/48 = 0.97). Of course, the other binomial terms are multiplicative, assuming independence. The survival during the second interval is based on n = 44 and n = 4, \$ ± \$ _( S n, n ) = Š n n 3 \$ S ( S ). \$ The likelihood function for the entire set of black duck data (modified to better make some technical points below) is the product of these individual likelihoods. The log-likelihood is the sum of terms such as log( _( S3 ± n- )) =! n 3 log(prob.). 3 This expression is in standard form" and should now be familiar. Encounter Histories Parameterization of encounter histories is critical. Each entry is paired, where the first position is a if the animal is known to be alive at occasion j; that is, at the start of the interval. A 0 in this first position indicates the animal was not yet tagged at the start of the interval j. The second position in the pair is 0 if the animal survived to the end of the interval. It is a if it died sometime during the interval. As the fate of every animal is assumed known at every occasion, the sampling probabilities ( p) and reporting probabilities ( r) are. The examples below will help clarify the coding, History Probability Number Observed SSSS " \$ % 7 Tagged at occasion and survived until the end of the study SS " ( S\$ ) 2 Tagged at occasion and died during the third interval S"( S ) 24 Tagged at occasion and died during the second interval

5 ( S " ) 43 Tagged at occasion and died during the first interval S " ( S%) 3 Tagged at occasion, censored during intervals 2 and 3, and died during the fourth interval S " 9 Tagged at occasion, known to be alive at the end of the first interval, but not released at occasion 2 and thus was censored after the first interval. Estimation of survival probabilities is based on a release () at the start of an interval and survival to the end of the interval (0), mortality probabilities are based on a release () and death () during the interval; if the animal then was censored, it does not provide information about S or S ). 3 3 Some rules" for encounter history coding: A. The two-digit pairs each pertain to an interval (the period of time between occasions). B. There are only 3 possible entries for each interval: 0 = an animal survived the interval, given it was alive at the start of the interval = an animal died during the interval, given it was alive at the start of the interval 00 = an animal was censored for this interval C. In order to know the fate of an animal during an interval, one must have encountered it BOTH at the beginning AND the end of the interval. Censoring Censoring appears innocent" but it is often not. If a substantial proportion of the animals do not have exactly known fates, it might be better to consider models that allow the sampling parameters to be <. In practice, one almost inevitably loose track of some animals. Reasons for uncertainty about an animal's fate include radio transmitters that fail (this may or may not be independent of mortality) or animals that leave the study area. In such cases, the encounter histories must be coded correctly to allow these animals to be censored. Censoring often require some judgment. When an animal is not detected at the end of an interval (i.e., immediately before occasion j) or at the beginning of the next interval (i.e., immediately after occasion j+), then its fate is unknown and must be entered as a 00 in the encounter history matrix. Generally, this results in 2 pairs with a 00 history; this is caused by the fact that interval j is a 00 because the ending fate was not known and the fact that the beginning fate for the next interval ( j+) was not known. Censored intervals almost always occur in runs of two or more (e. g., or ). See the example above where the history was

6 In this example, the animal was censored but re-encountered at the beginning of interval 4 (alive) and it died during that interval. It might seem intuitive to infer that the animal was alive and, thus, fill in the 2 censored intervals with 0 0 this is incorrect and results in bias. Censoring is assumed to be independent of the fate of the animal; this is an important assumption. If, for example, radio failure is due to mortality, bias will result in estimators of S. Of course, censoring reduces sample size, so there is a trade-off here. If many animals must be censored, then the possible dependence of fates and censoring must be a concern. Binomial Likelihood Functions Allowing Each Animal To Have Its Own Survival Parameter Before we move into models for individual covariates, some quick review of the binomial likelihood might be helpful. Consider the usual n flips of a coin where, :œprobability the coin lands heads; ;œ :œprobability the coin lands tails. Let n = 6 flips (trials). We often write the likelihood in a compact form as y n y _( p ± n, y ) œ (:) ( : ), where y œ number of heads. If we observe y = 5, then _( p ± 6, 5) œ (:) ( : ). Alternatively, we could write the likelihood for each individual outcome and take the product of these terms as the likelihood function. One alternative is to merely write the likelihood as (using the convention that q = ( : ), _( p ± 6, 5) œ : pppp qqqqqqqqqqq. Alternatively, we could write this as, _( p ± 6, 5) œ 5 6 : ( p ). i = 3 3 i=6

7 Finally, we could define an indicator variable to denote head or tail; let y œ if heads, 0 if >2 tails. Then the likelihood can be written for the i flip as _Š {, y,, y } p ± n, y á œ ( ) y y : i i ( : ) i= Note, the subscript i is for individual coin flips; i =, 2,..., 6 flips). In these last three forms, each outcome (head or tail) has a probability term in the likelihood. The likelihood is the product of these individual probabilities. These formulations are useful in understanding the modeling of individual covariates. INDIVIDUAL COVARIATES Many of the data types allow modeling parameters as functions of covariates that are unique to each individual i. The band recovery models and the Cormack-Jolly-Seber models allow individual covariates. We introduce this important subject here in the context of the know fate models. A number of people have suggested modeling of the individual animals, allowing covariates that vary by individual (e.g., White and Garrott 990, Smith et al. 994). This approach is very useful in the biological sciences. Here, each animal ( i) has a unique survival probability in the likelihood. In the black duck example (slightly modified), n" = 48 and n = 44 and the binomial likelihood for the survival probability during the first week (i.e., S " ) can be written as " ± " " " _( S n, n ) = Š n n " 2 " S ( S ) This can be re-expressed (omitting the multinomial coefficient) as " _( S ± n, n ) œ S ( S ), " " n n 3 3 i= i= n + where the subscript i is over all the tagged ducks (48 ducks in the study). Thus, the first term in the likelihood is the product of the survival probability over 44 (= n ) ducks that survival, while the second term in the product of of the mortality probabilities ( S) for the 4 ducks that died during the first week (4 = n n = 44 48). So, a final expression of this likelihood is "

8 _( S ± n, n ) œ S ( S ), " " i= i=45 Here, the likelihood is expressed as if all the animals that survived had the first 44 tag numbers, whereas the 4 animals that died had the final tag numbers. This is just to allow simple expressions to represent the data. Program MARK handles the fact that animals die or survive independently of the tag numbers and keeps track of which covariate is associated with which individual (from the encounter history matrix). Thus, this is not an issue the investigator must worry about; other than having the correct information in the encounter history matrix. Now we consider modeling the survival probability of these individuals as a nonlinear function of some covariate that varies for each individual animal i. The natural (but arbitrary) choice is the logit relationship with link function S 3 = +exp(-[ " + " ( )])! " X 3 log Š S /(- S ) = " + " ( X ) / 3 3! " 3 >2 where X3 is the value of the covariate for the i individual. Of course, other functions could be used (log, log-log, complementary log-log, etc.). More than one covariate can also be measured and used with this general approach. If we substitute the logit submodel and its individual covariate into the likelihood above, the expression looks messy, but is conceptually familiar, " 3 n _" (! ß "" ± n, n, X ) = +exp(-[ " + " ( )]) i =! " X 3 n " i = n + +exp(-[ " + " ( )])! " X 3 or

9 44 _" (! ß "" ± n, n, X ) = +exp(-[ " + " ( )]) " 3 i =! " X 3 48 i =45 +exp(-[ " + " ( )])! " X 3 Thus, the MLEs for "! and "" (the intercept and slope, respectively) are the focus of the estimation. The individual survival parameters in the likelihood have been replaced with the " 4. Of course, additional binomial terms could be multiplied for the parameters S, S \$,..., S ) in the black duck example to obtain an overall expression for the likelihood function. There are two notions to think clearly about: () the survival probabilities are replaced by a logit (sub)model of the individual covariate X3. Conceptually, every animal ihas its own survival probability and this may be related to the covariate. (2) during the analysis, the covariate of the animal must correspond to the survival probability of that animal. Program MARK handles this detail. Note, in the last expression of the likelihood (above) we assume it is the >2 >2 48 duck that died and this corresponds to the 48 covariate,. X 3 i >2 Inference follows usual likelihood methods: K-L information can be estimated using AIC, AIC -, or QAIC -, models can be ranked using the? 3, Akaike weights and evidence ratios can measure strength of evidence for various covariate models. Further inference can be based on the " and se( " ), etc. An Example of the Application Assume a biologist has found 88 active nests of the red-cockaded woodpecker in which nest initiation occurred on the same day. She selects a single nestling from each of the 88 nests and measures 3 covariates on each of these 88 nestlings. The covariates measured are the number of ectoparasites found, the number of hatchlings in the nest, and the weight at hatching. The first" occasion is actually on day 3 following hatching. Each bird is tagged uniquely with a colored leg band to allow it to be identified and its fate determined visually by daily inspection of the nest. Birds are followed for 2 days (while they are still in the nest; X %)

10 they typically start to leave the nest after 5 days) and their fate is determined daily. Thus, the data follow the known fate scenario, even though animals are not fitted with radios. Overdispersion should not be a factor as only one bird in each nest is the subject of the study. Sample size is 88 (no staggered entry) and there are 2 occasions (days 3, 4,..., 5). If the data were modeled without an occasion effect (i.e., model { S(.), rather than model { S> }, one might potentially include the model with all three covariates for each individual woodpecker ( i), as log Š S /(- S ) = " + " ( E ) + " ( H ) + " ( W ), / 3 3! " 3 3 \$ 3 where, E 3 is the number of ectoparasites on day 3 (= occasion ) H 3 is the number of nest mates on day 3 W 3 is the weight of the nesting on day 3. Of course, other a priori models would be considered in making inferences from these data, this is just an example. One might ask if any of the individual covariates are important. If so, which covariate is more important? Should or 2 or all 3 covariates be included in a good model for these data? The estimation would focus on the " parameters, but the interpretation would be interesting. For example, one might look at the mean of the S 3 given that all the covariates were held at their average values. Then this mean might be compared with means for low vs. high values of weight and the number of nest mates. One could compute the values of S for a range of ecoparasites, while holding the other covariates at their mean values. Other possibilities exist and could be explored for the selected model. The estimates of the survival probabilities (by individual) can be plotted in Excel or any other convenient software package. That is, given the logit (or whatever link function is chosen), the individual covariates and the " 4, one can plot the derived S 3, program MARK does not provide such plots (but you can export the beta estimates to Excel and do the plots yourself). Instead, S " (either the derived survival for the first animal in the encounter history matrix, or for the mean of the individual covariate, or else a value you specify) is provided as a check. Now, one can see that individual covariates can be used in the band recovery models and the open capture-recapture models. Too few biologists are taking full advantage of the information contained in individual covariates. Note, there are problems if the covariate changes through time in the band recovery and open C-R models. For example, if weight changes throughout the study period, one only has weights for those animals recaptured at various occasions. Thus, when animals are not captured (e.g., the never recaptured" animals) then the value of their covariate at that time is

11 not known! This issue is not a problem with the known fate models if animals are actually handled (as opposed to merely being resighted).

### CHAPTER 16. Known-fate models. 16.1. The Kaplan-Meier Method

CHAPTER 16 Known-fate models In previous chapters, we ve spent a considerable amount of time modeling situations where the probability of encountering an individual is less than 1. However, there is at

### THE MULTINOMIAL DISTRIBUTION. Throwing Dice and the Multinomial Distribution

THE MULTINOMIAL DISTRIBUTION Discrete distribution -- The Outcomes Are Discrete. A generalization of the binomial distribution from only 2 outcomes to k outcomes. Typical Multinomial Outcomes: red A area1

### Program MARK: Survival Estimation from Populations of Marked Animals

Program MARK: Survival Estimation from Populations of Marked Animals GARY C. WHITE 1 and KENNETH P. BURNHAM 2 1 Department of Fishery and Wildlife Biology, Colorado State University, Fort Collins, CO 80523,

### Binomial Sampling and the Binomial Distribution

Binomial Sampling and the Binomial Distribution Characterized by two mutually exclusive events." Examples: GENERAL: {success or failure} {on or off} {head or tail} {zero or one} BIOLOGY: {dead or alive}

### Design and Analysis of Ecological Data Conceptual Foundations: Ec o lo g ic al Data

Design and Analysis of Ecological Data Conceptual Foundations: Ec o lo g ic al Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### A Method of Population Estimation: Mark & Recapture

Biology 103 A Method of Population Estimation: Mark & Recapture Objectives: 1. Learn one method used by wildlife biologists to estimate population size of wild animals. 2. Learn how sampling size effects

### A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### Basic Probability Theory I

A Probability puzzler!! Basic Probability Theory I Dr. Tom Ilvento FREC 408 Our Strategy with Probability Generally, we want to get to an inference from a sample to a population. In this case the population

### Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### 6 Scalar, Stochastic, Discrete Dynamic Systems

47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

### People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### Hierarchical Insurance Claims Modeling

Hierarchical Insurance Claims Modeling Edward W. (Jed) Frees, University of Wisconsin - Madison Emiliano A. Valdez, University of Connecticut 2009 Joint Statistical Meetings Session 587 - Thu 8/6/09-10:30

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### 7.4 JOINT ANALYSIS OF LIVE AND DEAD ENCOUNTERS OF MARKED ANIMALS

361 7.4 JOINT ANALYSIS OF LIVE AND DEAD ENCOUNTERS OF MARKED ANIMALS RICHARD J. BARKER Department of Mathematics and Statistics, University of Otago, P.O. Box 56, Dunedin, New Zealand, rbarker@maths.otago.ac.nz

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)

### Non-Parametric Estimation in Survival Models

Non-Parametric Estimation in Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the

### Joint live encounter & dead recovery data

Joint live encounter & dead recovery data CHAPTER 9 The first chapters in this book focussed on typical open population mark-recapture models, where the probability of an individual being seen was defined

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### The Set of Candidate Models

1 The Set of Candidate Models Formulation of critical hypotheses and models to adequately reflect these hypotheses is conceptually more difficult than estimating the model parameters and their precision.

### A distribution-based stochastic model of cohort life expectancy, with applications

A distribution-based stochastic model of cohort life expectancy, with applications David McCarthy Demography and Longevity Workshop CEPAR, Sydney, Australia 26 th July 2011 1 Literature review Traditional

### An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.

An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao

### CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

### Probit Analysis By: Kim Vincent

Probit Analysis By: Kim Vincent Quick Overview Probit analysis is a type of regression used to analyze binomial response variables. It transforms the sigmoid dose-response curve to a straight line that

### Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

### Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

### Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012. Abstract. Review session.

June 23, 2012 1 review session Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012 Review session. Abstract Quantitative methods in business Accounting

### 5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits

Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. Hosmer-Lemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### When Does it Make Sense to Perform a Meta-Analysis?

CHAPTER 40 When Does it Make Sense to Perform a Meta-Analysis? Introduction Are the studies similar enough to combine? Can I combine studies with different designs? How many studies are enough to carry

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Random variables, probability distributions, binomial random variable

Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

### UNDERSTANDING INFORMATION CRITERIA FOR SELECTION AMONG CAPTURE-RECAPTURE OR RING RECOVERY MODELS

UNDERSTANDING INFORMATION CRITERIA FOR SELECTION AMONG CAPTURE-RECAPTURE OR RING RECOVERY MODELS David R. Anderson and Kenneth P. Burnham Colorado Cooperative Fish and Wildlife Research Unit Room 201 Wagar

### Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

### Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12

(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

### Generalized Linear Mixed Modeling and PROC GLIMMIX

Generalized Linear Mixed Modeling and PROC GLIMMIX Richard Charnigo Professor of Statistics and Biostatistics Director of Statistics and Psychometrics Core, CDART RJCharn2@aol.com Objectives First ~80

### ANNEX 2: Assessment of the 7 points agreed by WATCH as meriting attention (cover paper, paragraph 9, bullet points) by Andy Darnton, HSE

ANNEX 2: Assessment of the 7 points agreed by WATCH as meriting attention (cover paper, paragraph 9, bullet points) by Andy Darnton, HSE The 7 issues to be addressed outlined in paragraph 9 of the cover

### GCSE Statistics Revision notes

GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic

### Modelling spousal mortality dependence: evidence of heterogeneities and implications

1/23 Modelling spousal mortality dependence: evidence of heterogeneities and implications Yang Lu Scor and Aix-Marseille School of Economics Lyon, September 2015 2/23 INTRODUCTION 3/23 Motivation It has

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### ANOVA Analysis of Variance

ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,

### Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

### AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### Likelihood: Frequentist vs Bayesian Reasoning

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

### An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components Jingshu Wu, PhD, PE, Stephen McHenry, Jeffrey Quandt National Highway Traffic Safety Administration (NHTSA) U.S. Department

### GENERAL STRATEGIES FOR THE COLLECTION AND ANALYSIS OF RINGING DATA

1. Introduction GENERAL STRATEGIES FOR THE COLLECTION AND ANALYSIS OF RINGING DATA David R. Anderson and Kenneth P. Burnham Colorado Cooperative Fish and Wildlife Research Unit Room 201 Wagar Building,

### Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

### A practical guide to the management and analysis of survivorship data from radio-tracking studies

A practical guide to the management and analysis of survivorship data from radio-tracking studies Hugh A. Robertson and Ian M. Westbrooke DEPARTMENT OF CONSERVATION TECHNICAL SERIES 31 Published by Science

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### Longitudinal Data Analysis

Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology

### A Mathematical Study of Purchasing Airline Tickets

A Mathematical Study of Purchasing Airline Tickets THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Mathematical Science in the Graduate School of The Ohio State University

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### Fixed-Effect Versus Random-Effects Models

CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right

### LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.